Re: [psycopg] speed concerns with executemany() - Mailing list psycopg

From mike bayer
Subject Re: [psycopg] speed concerns with executemany()
Date
Msg-id 52fc9715-b357-d6fe-1003-472af95c3ad3@zzzcomputing.com
Whole thread Raw
In response to Re: [psycopg] speed concerns with executemany()  (Daniele Varrazzo <daniele.varrazzo@gmail.com>)
Responses Re: [psycopg] speed concerns with executemany()  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
List psycopg

On 01/05/2017 02:00 PM, Daniele Varrazzo wrote:
> On Thu, Jan 5, 2017 at 5:32 PM, Federico Di Gregorio <fog@dndg.it> wrote:
>> On 02/01/17 17:07, Daniele Varrazzo wrote:
>>>
>>> On Mon, Jan 2, 2017 at 4:35 PM, Adrian Klaver <adrian.klaver@aklaver.com>
>>> wrote:
>>>>
>>>> With NRECS=10000 and page size=100:
>>>>
>>>> aklaver@tito:~> python psycopg_executemany.py -p 100
>>>> classic: 427.618795156 sec
>>>> joined: 7.55754685402 sec
>>>
>>> Ugh! :D
>>
>>
>> That's great. Just a minor point: I won't overload executemany() with this
>> feature but add a new method UNLESS the semantics are exactly the same
>> especially regarding session isolation. Also, right now psycopg keeps track
>> of the number of affected rows over executemany() calls: I'd like to not
>> lose that because it is a breaking change to the API.
>
> It seems to me that the semantics would stay the same, even in
> presence of volatile functions. However unfortunately rowcount would
> break. That's just sad.
>
> We can have no problem an extra argument to executemany: page_size
> defaulting to 1 (previous behaviour) which could be bumped. It's sad
> the default cannot be 100.
>
> Mike Bayer reported (https://github.com/psycopg/psycopg2/issues/491)
> that SQLAlchemy actually uses the aggregated rowcount for concurrency
> control.
>
> So, how much it is of a deal-breaker? Can we afford losing aggregated
> rowcount to obtain a juicy speedup in default usage, or we'd rather
> leave the behaviour untouched but having people "opting in for speed"?
>
> ponder, ponder...
>
> Pondered: as the features had little test and I don't want to delay
> releasing 2.7 further, I'd rather release the feature with a page_size
> default of 1. People could use it and report eventual failures if they
> use a page_size > 1. If tests turn out to be positive that the
> database behaves ok we could think about changing the default in the
> future. We may want to drop the aggregated rowcount in the future but
> with better planning, e.g. to allow SQLAlchemy to ignore aggregated
> rowcount from psycopg >= 2.8...

SQLAlchemy can definitely ignore the aggregated rowcount as most DBAPIs
don't support it anyway, so we can flip the flag off if we know exactly
what psycopg version breaks it.   The ORM in most cases prefers to use
executemany in any case unless the mapping has specified a versioning
column, in which case it has to use the method that supplies accurate
rowcount.

Ideally if we can control whether or not we get "aggreagted rowcount" or
"speed" via alternate API / flags / etc. would be nice.  Seems like
SQLAlchemy will need downstream changes to support this in any case.


>
> How does it sound?
>
> -- Daniele
>
>


psycopg by date:

Previous
From: Daniele Varrazzo
Date:
Subject: Re: [psycopg] Releasing Linux binary packages of psycopg
Next
From: Jim Nasby
Date:
Subject: Re: [psycopg] speed concerns with executemany()