Thread: [psycopg] speed concerns with executemany()
I'm getting more and more regular complaints among users of SQLAlchemy of the relatively slow speed of the cursor.executemany() call in psycopg2. In almost all cases, these users have discovered that Postgresql is entirely capable of running an INSERT or UPDATE of many values with a high degree of speed using a single statement with a form like this: INSERT INTO table (a, b, c) VALUES (1, 2, 3), (4, 5, 6), (7, 8, 9), ... whereas if they run the same form using a single VALUES insert and sending the parameters as a sequence via executemany(), they will observe hyperbolically slow speeds - today, someone claims they can run approximately 200 sets of three integers each using the multiple VALUES approach in approximately .02 seconds, whereas running 200 values into a single executemany() call of the otherwise identical INSERT statement, they are claiming takes 20 seconds; that is, 100000% slower. I'm not really sure how that's even possible, considering the single INSERT with many VALUES is a much larger string to send over the network and be parsed by the server, if the overhead of a single INSERT is .02 seconds, we would think an executemany() of 200 INSERT statements each with a single parameter set would be at most four seconds. Obviously something is wrong with these users' environment, although I will note that the relative speed of psycopg2 executemany() over a 1G network is still pretty bad, compared to both sending a single INSERT with a large VALUES clause as well as compared to the executemany() speed of DBAPIs (even pure Python) for other databases like MySQL, I can provide some quick benchmarks if that's helpful. I understand that psycopg2 does not use prepared statements, and I have dim recollections that internal use of prepared statements for executemany is not on the roadmap for psycopg2. However, I'm still not sure what I should be telling my users when I get reports of these vastly slower results with executemany(). I'm not asking that psycopg2 change anything, I'm just looking to understand what the heck is going on when people are reporting this. Should I: 1. tell them they have a network issue that is causing executemany() to have a problem? (even though I can also observe executemany() is kind of slow, though not as slow as these people are reporting) 2. tell them there's some known issue, vacuuming / indexes/ or something that is known to have this effect? 3. tell them that yes, they should use multiple-VALUES within a single INSERT (which would eventually lead to strong pressure on me to reinvent executemany() within SQLAlchemy's psycopg2 dialect to use this form) ? (also if so, why is this the case? can this claimed 100000% slowdown be real?) 4. use a hack to actually make my own prepared statements within executemany() (I vaguely recall some recipe that you can get a prepared statement going with psycopg2 by rolling it on the outside) ? 5. Other reasons that executemany() is known to sometimes be extremely slow? I'm purposely trying to stay out of the realm of picking apart the libpq internals, assuming psycopg2 devs can shed some light what's going on here. Thanks for any guidance you can offer!
Are you running with the transaction isolation level set to ISOLATION_LEVEL_AUTOCOMMIT? If so, each of those INSERTs willbe in its own transaction, and thus will go through the COMMIT overhead. That by itself wouldn't explain a jump thatlarge (in most environments), but it will definitely be *much* slower. > On Dec 23, 2016, at 16:05, mike bayer <mike_mp@zzzcomputing.com> wrote: > > I'm getting more and more regular complaints among users of SQLAlchemy of the relatively slow speed of the cursor.executemany()call in psycopg2. In almost all cases, these users have discovered that Postgresql is entirely capableof running an INSERT or UPDATE of many values with a high degree of speed using a single statement with a form likethis: > > INSERT INTO table (a, b, c) > VALUES (1, 2, 3), (4, 5, 6), (7, 8, 9), ... > > whereas if they run the same form using a single VALUES insert and sending the parameters as a sequence via executemany(),they will observe hyperbolically slow speeds - today, someone claims they can run approximately 200 sets ofthree integers each using the multiple VALUES approach in approximately .02 seconds, whereas running 200 values into asingle executemany() call of the otherwise identical INSERT statement, they are claiming takes 20 seconds; that is, 100000%slower. I'm not really sure how that's even possible, considering the single INSERT with many VALUES is a much largerstring to send over the network and be parsed by the server, if the overhead of a single INSERT is .02 seconds, wewould think an executemany() of 200 INSERT statements each with a single parameter set would be at most four seconds. > > Obviously something is wrong with these users' environment, although I will note that the relative speed of psycopg2 executemany()over a 1G network is still pretty bad, compared to both sending a single INSERT with a large VALUES clause aswell as compared to the executemany() speed of DBAPIs (even pure Python) for other databases like MySQL, I can providesome quick benchmarks if that's helpful. > > I understand that psycopg2 does not use prepared statements, and I have dim recollections that internal use of preparedstatements for executemany is not on the roadmap for psycopg2. However, I'm still not sure what I should be tellingmy users when I get reports of these vastly slower results with executemany(). > > I'm not asking that psycopg2 change anything, I'm just looking to understand what the heck is going on when people arereporting this. Should I: > > 1. tell them they have a network issue that is causing executemany() to have a problem? (even though I can also observeexecutemany() is kind of slow, though not as slow as these people are reporting) > > 2. tell them there's some known issue, vacuuming / indexes/ or something that is known to have this effect? > > 3. tell them that yes, they should use multiple-VALUES within a single INSERT (which would eventually lead to strong pressureon me to reinvent executemany() within SQLAlchemy's psycopg2 dialect to use this form) ? (also if so, why is thisthe case? can this claimed 100000% slowdown be real?) > > 4. use a hack to actually make my own prepared statements within executemany() (I vaguely recall some recipe that you canget a prepared statement going with psycopg2 by rolling it on the outside) ? > > 5. Other reasons that executemany() is known to sometimes be extremely slow? > > I'm purposely trying to stay out of the realm of picking apart the libpq internals, assuming psycopg2 devs can shed somelight what's going on here. Thanks for any guidance you can offer! > > > > > > -- > Sent via psycopg mailing list (psycopg@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/psycopg -- -- Christophe Pettus xof@thebuild.com
On Sat, Dec 24, 2016 at 1:09 AM, Christophe Pettus <xof@thebuild.com> wrote: > Are you running with the transaction isolation level set to ISOLATION_LEVEL_AUTOCOMMIT? If so, each of those INSERTs willbe in its own transaction, and thus will go through the COMMIT overhead. That by itself wouldn't explain a jump thatlarge (in most environments), but it will definitely be *much* slower. Why do you say this? Psycopg doesn't wrap statements in BEGIN/COMMIT when in autocommit mode. Are you referring about some implicit transaction created by the database? -- Daniele
On 12/23/2016 04:29 PM, Daniele Varrazzo wrote: > On Sat, Dec 24, 2016 at 1:09 AM, Christophe Pettus <xof@thebuild.com> wrote: >> Are you running with the transaction isolation level set to ISOLATION_LEVEL_AUTOCOMMIT? If so, each of those INSERTswill be in its own transaction, and thus will go through the COMMIT overhead. That by itself wouldn't explain a jumpthat large (in most environments), but it will definitely be *much* slower. > > Why do you say this? Psycopg doesn't wrap statements in BEGIN/COMMIT > when in autocommit mode. Are you referring about some implicit I understood it did: http://initd.org/psycopg/docs/usage.html#transactions-control "It is possible to set the connection in autocommit mode: this way all the commands executed will be immediately committed and no rollback is possible. A few commands (e.g. CREATE DATABASE, VACUUM...) require to be run outside any transaction: in order to be able to run these commands from Psycopg, the connection must be in autocommit mode: you can use the autocommit property (set_isolation_level() in older versions)." > transaction created by the database? > > -- Daniele > > -- Adrian Klaver adrian.klaver@aklaver.com
> On Dec 23, 2016, at 16:58, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > I understood it did: > > http://initd.org/psycopg/docs/usage.html#transactions-control > > "It is possible to set the connection in autocommit mode: this way all the commands executed will be immediately committedand no rollback is possible. A few commands (e.g. CREATE DATABASE, VACUUM...) require to be run outside any transaction:in order to be able to run these commands from Psycopg, the connection must be in autocommit mode: you can usethe autocommit property (set_isolation_level() in older versions)." My somewhat garbled thought was that each of the component INSERTs in the .executemany would be getting its own transactionunless the connection was set to autocommit... but I'll admit I haven't tested it. -- -- Christophe Pettus xof@thebuild.com
On 12/23/2016 04:59 PM, Christophe Pettus wrote: > >> On Dec 23, 2016, at 16:58, Adrian Klaver <adrian.klaver@aklaver.com> wrote: >> I understood it did: >> >> http://initd.org/psycopg/docs/usage.html#transactions-control >> >> "It is possible to set the connection in autocommit mode: this way all the commands executed will be immediately committedand no rollback is possible. A few commands (e.g. CREATE DATABASE, VACUUM...) require to be run outside any transaction:in order to be able to run these commands from Psycopg, the connection must be in autocommit mode: you can usethe autocommit property (set_isolation_level() in older versions)." > > My somewhat garbled thought was that each of the component INSERTs in the .executemany would be getting its own transactionunless the connection was set to autocommit... but I'll admit I haven't tested it. Don't you mean?: My somewhat garbled thought was that each of the component INSERTs in the .executemany would be getting its own transaction if the connection was set to autocommit... but I'll admit I haven't tested it. > > -- > -- Christophe Pettus > xof@thebuild.com > -- Adrian Klaver adrian.klaver@aklaver.com
Hi Mike On Sat, Dec 24, 2016 at 1:05 AM, mike bayer <mike_mp@zzzcomputing.com> wrote: > I'm getting more and more regular complaints among users of SQLAlchemy of > the relatively slow speed of the cursor.executemany() call in psycopg2. In > almost all cases, these users have discovered that Postgresql is entirely > capable of running an INSERT or UPDATE of many values with a high degree of > speed using a single statement with a form like this: > > INSERT INTO table (a, b, c) > VALUES (1, 2, 3), (4, 5, 6), (7, 8, 9), ... > > whereas if they run the same form using a single VALUES insert and sending > the parameters as a sequence via executemany(), they will observe > hyperbolically slow speeds - today, someone claims they can run > approximately 200 sets of three integers each using the multiple VALUES > approach in approximately .02 seconds, whereas running 200 values into a > single executemany() call of the otherwise identical INSERT statement, they > are claiming takes 20 seconds; that is, 100000% slower. I'm not really > sure how that's even possible, considering the single INSERT with many > VALUES is a much larger string to send over the network and be parsed by the > server, if the overhead of a single INSERT is .02 seconds, we would think an > executemany() of 200 INSERT statements each with a single parameter set > would be at most four seconds. A x1000 slowdown sounds dodgy yes. > Obviously something is wrong with these users' environment, although I will > note that the relative speed of psycopg2 executemany() over a 1G network is > still pretty bad, compared to both sending a single INSERT with a large > VALUES clause as well as compared to the executemany() speed of DBAPIs (even > pure Python) for other databases like MySQL, I can provide some quick > benchmarks if that's helpful. You know something that could be worked out quickly? Currently executemany boils down to pseudocode: def executemany(self, stmt, argslist): for args in argslist: self.execute(stmt, args) This is not efficient because it's not prepared and because it does a roundtrip per args. We could save on the second by running something like: def executemany(self, stmt, argslist): # TODO: Do this in pages of 100 args, not all together stmts = [] for args in argslist: stmts.append(self.mogrify(stmt, argslist)) self.execute(";".join(stmts)) If you are in a mood for benchmarks, could you please check if this approach is so noticeably faster that we ought to change the executemany implementation in 2.7? This is something that only works with the old protocol, i.e. what psycopg 2.x implements. In a version implementing the "new protocol" we would likely implement executemany with a prepared statement, so this shouldn't be a concern. > I understand that psycopg2 does not use prepared statements, and I have dim > recollections that internal use of prepared statements for executemany is > not on the roadmap for psycopg2. However, I'm still not sure what I should > be telling my users when I get reports of these vastly slower results with > executemany(). > > I'm not asking that psycopg2 change anything, I'm just looking to understand > what the heck is going on when people are reporting this. Should I: > > 1. tell them they have a network issue that is causing executemany() to have > a problem? (even though I can also observe executemany() is kind of slow, > though not as slow as these people are reporting) As I said, while executemany is not a screaming arrow I don't expect it x1000 slower either. > 2. tell them there's some known issue, vacuuming / indexes/ or something > that is known to have this effect? That could be, but rather than you or me they should have a chat in pgsql-general or -performance to solve that. > 3. tell them that yes, they should use multiple-VALUES within a single > INSERT (which would eventually lead to strong pressure on me to reinvent > executemany() within SQLAlchemy's psycopg2 dialect to use this form) ? (also > if so, why is this the case? can this claimed 100000% slowdown be real?) Even if we had a better implemented executemany, I doubt it would be implemented with an INSERT ... VALUES list, because that would mean manipulate the SQL anyway. What we would do would be prepare and then execute prepared repeatedly. In my knowledge there is no libpq function to execute repeatedly a prepared statement with an array of arrays of values (https://www.postgresql.org/docs/9.3/static/libpq-exec.html). I'd be very happy if someone could prove me wrong on that. My bet is that `cur.execute(";".join(cur.mogrify(stmt, args) for args in argslist)` on old protocol will be faster than prepare + `for stmt in stmtlist: execute(prepared, stmt)` on new protocol: we would be trading parsing time for network roundtrips and, while I don't have benchmarks, the latters smell more expensive than the formers. > 4. use a hack to actually make my own prepared statements within > executemany() (I vaguely recall some recipe that you can get a prepared > statement going with psycopg2 by rolling it on the outside) ? Yes, I have a sort of recipe for a preparing cursor here: https://gist.github.com/dvarrazzo/3797445 but I don't think it would make executemany faster, because it's still one network roundtrip per values and one statement to parse (likely an EXECUTE instead of an INSERT) > 5. Other reasons that executemany() is known to sometimes be extremely slow? > > I'm purposely trying to stay out of the realm of picking apart the libpq > internals, assuming psycopg2 devs can shed some light what's going on here. > Thanks for any guidance you can offer! Hope my notes are useful. While I haven't really thrown much brain juice at the slow executemany problem (because COPY will always be faster and that's what I tend to use... but that has adaptation and interface problem of its own) If you can help me with some testing, and if nobody thinks that my ';'.join() solution is totally daft, I think we could roll this trick quite quickly. As for a definition of "quickly": I am currently in holiday, which means that I'm working at wrapping up psycopg 2.7 instead of $JOB. My plan is to release 2.7 within the first days of January. -- Daniele
On 12/23/2016 04:05 PM, mike bayer wrote: > I'm getting more and more regular complaints among users of SQLAlchemy > of the relatively slow speed of the cursor.executemany() call in > psycopg2. In almost all cases, these users have discovered that > Postgresql is entirely capable of running an INSERT or UPDATE of many > values with a high degree of speed using a single statement with a form > like this: > > INSERT INTO table (a, b, c) > VALUES (1, 2, 3), (4, 5, 6), (7, 8, 9), ... > > whereas if they run the same form using a single VALUES insert and > sending the parameters as a sequence via executemany(), they will > observe hyperbolically slow speeds - today, someone claims they can run > approximately 200 sets of three integers each using the multiple VALUES > approach in approximately .02 seconds, whereas running 200 values into a > single executemany() call of the otherwise identical INSERT statement, > they are claiming takes 20 seconds; that is, 100000% slower. I'm not > really sure how that's even possible, considering the single INSERT with > many VALUES is a much larger string to send over the network and be > parsed by the server, if the overhead of a single INSERT is .02 seconds, > we would think an executemany() of 200 INSERT statements each with a > single parameter set would be at most four seconds. > > Obviously something is wrong with these users' environment, although I > will note that the relative speed of psycopg2 executemany() over a 1G > network is still pretty bad, compared to both sending a single INSERT > with a large VALUES clause as well as compared to the executemany() > speed of DBAPIs (even pure Python) for other databases like MySQL, I can > provide some quick benchmarks if that's helpful. > > I understand that psycopg2 does not use prepared statements, and I have > dim recollections that internal use of prepared statements for > executemany is not on the roadmap for psycopg2. However, I'm still not > sure what I should be telling my users when I get reports of these > vastly slower results with executemany(). > > I'm not asking that psycopg2 change anything, I'm just looking to > understand what the heck is going on when people are reporting this. > Should I: > > 1. tell them they have a network issue that is causing executemany() to > have a problem? (even though I can also observe executemany() is kind > of slow, though not as slow as these people are reporting) > > 2. tell them there's some known issue, vacuuming / indexes/ or something > that is known to have this effect? > > 3. tell them that yes, they should use multiple-VALUES within a single > INSERT (which would eventually lead to strong pressure on me to reinvent > executemany() within SQLAlchemy's psycopg2 dialect to use this form) ? > (also if so, why is this the case? can this claimed 100000% slowdown > be real?) I have to go with Christophe's explanation. They are seeing the effects of 200 separate transactions, though like he stated later this more an assumption then something I have tested. > > 4. use a hack to actually make my own prepared statements within > executemany() (I vaguely recall some recipe that you can get a prepared > statement going with psycopg2 by rolling it on the outside) ? > > 5. Other reasons that executemany() is known to sometimes be extremely > slow? > > I'm purposely trying to stay out of the realm of picking apart the libpq > internals, assuming psycopg2 devs can shed some light what's going on > here. Thanks for any guidance you can offer! > > > > > -- Adrian Klaver adrian.klaver@aklaver.com
On Sat, Dec 24, 2016 at 2:05 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > I have to go with Christophe's explanation. They are seeing the effects of > 200 separate transactions, though like he stated later this more an > assumption then something I have tested. My wild wild wild assumption is that they have a trigger on the table that gets triggered 200 times instead of one. The only thing I am sure of is that psycopg doesn't use any BEGIN specifically in executemany. -- Daniele
On 12/23/2016 05:09 PM, Daniele Varrazzo wrote: > On Sat, Dec 24, 2016 at 2:05 AM, Adrian Klaver > <adrian.klaver@aklaver.com> wrote: > >> I have to go with Christophe's explanation. They are seeing the effects of >> 200 separate transactions, though like he stated later this more an >> assumption then something I have tested. > > My wild wild wild assumption is that they have a trigger on the table > that gets triggered 200 times instead of one. > > The only thing I am sure of is that psycopg doesn't use any BEGIN > specifically in executemany. I see that now: In [5]: val_list = [(1, 2, 3), (4, 5, 6), (7, 8, 9)] In [12]: sql = "INSERT INTO psycopg_table VALUES(%s, %s, %s)" In [16]: con.set_session(autocommit=True) In [17]: cur = con.cursor() In [18]: cur.executemany(sql, val_list) [unknown]-2016-12-23 17:17:10.829 PST-0 LOG: connection received: host=::1 port=37352 aklaver-2016-12-23 17:17:10.830 PST-0 LOG: connection authorized: user=aklaver database=production aklaver-2016-12-23 17:17:35.373 PST-0 LOG: statement: INSERT INTO psycopg_table VALUES(1, 2, 3) aklaver-2016-12-23 17:17:35.416 PST-0 LOG: statement: INSERT INTO psycopg_table VALUES(4, 5, 6) aklaver-2016-12-23 17:17:35.457 PST-0 LOG: statement: INSERT INTO psycopg_table VALUES(7, 8, 9) Yet the data was available immediately in another session without a explicit COMMIT, so how is psycopg2 committing the INSERTs? > > > -- Daniele > -- Adrian Klaver adrian.klaver@aklaver.com
> On Dec 23, 2016, at 17:23, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > Yet the data was available immediately in another session without a explicit COMMIT, so how is psycopg2 committing theINSERTs? autocommit = True means psycopg2 doesn't issue an initial BEGIN, so each INSERT commits when done (just like a top-levelpsql session)... thus "autocommit." -- -- Christophe Pettus xof@thebuild.com
On 12/23/2016 05:37 PM, Christophe Pettus wrote: > >> On Dec 23, 2016, at 17:23, Adrian Klaver <adrian.klaver@aklaver.com> wrote: >> Yet the data was available immediately in another session without a explicit COMMIT, so how is psycopg2 committing theINSERTs? > > autocommit = True means psycopg2 doesn't issue an initial BEGIN, so each INSERT commits when done (just like a top-levelpsql session)... thus "autocommit." Alright that I get. Still the practical outcome is each INSERT is being done in a transaction (an implicit one) so the transaction overhead comes into play. Or am I missing something? > > -- > -- Christophe Pettus > xof@thebuild.com > -- Adrian Klaver adrian.klaver@aklaver.com
> On Dec 23, 2016, at 18:55, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > Alright that I get. Still the practical outcome is each INSERT is being done in a transaction (an implicit one) so thetransaction overhead comes into play. Or am I missing something? Nope, not missing a thing. The theory (and it is only that) is that when they do the .executemany(), each of those INSERTspays the transaction overhead, while if they do one big INSERT, just that one statement does. -- -- Christophe Pettus xof@thebuild.com
On 12/23/16 7:04 PM, Daniele Varrazzo wrote: > In my knowledge there is no libpq > function to execute repeatedly a prepared statement with an array of > arrays of values I don't think there is, but the community might welcome one. Though, I think the first question you'll get is "why not just use async commands". Somewhat related to this, I recently heard a complaint that Pandas.DataFrame.from_sql() was so incredibly slow on Postgres that people were moving to storing data in things like Hadoop just for doing data science modelling. I'm not looking into that right now because it turns out moving data into plpython is almost as slow as via libpq in some simple testing (which strikes me as rather absurd), but it'd be worth looking into improving the libpq case as well. I suspect the way to do that would be to create a tuple of lists, instead of a list of tuples (or dicts, or whatever). Single row mode[1] would probably be helpful too... 1: https://www.postgresql.org/docs/current/static/libpq-single-row-mode.html -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
On 12/23/2016 06:57 PM, Christophe Pettus wrote: > >> On Dec 23, 2016, at 18:55, Adrian Klaver <adrian.klaver@aklaver.com> wrote: >> Alright that I get. Still the practical outcome is each INSERT is being done in a transaction (an implicit one) so thetransaction overhead comes into play. Or am I missing something? > > Nope, not missing a thing. The theory (and it is only that) is that when they do the .executemany(), each of those INSERTspays the transaction overhead, while if they do one big INSERT, just that one statement does. Just ran a quick and dirty test using IPython %timeit. With a list of 200 tuples each which had 3 integers INSERTing into: test=> \d psycopg_table Table "public.psycopg_table" Column | Type | Modifiers --------+---------+----------- a | integer | b | integer | c | integer | The results where: sql = "INSERT INTO psycopg_table VALUES(%s, %s, %s)" Without autocommit: In [65]: timeit -n 10 cur.executemany(sql, l) 10 loops, best of 3: 12.5 ms per loop With autocommit: In [72]: timeit -n 10 cur.executemany(sql, l) 10 loops, best of 3: 1.71 s per loop > > -- > -- Christophe Pettus > xof@thebuild.com > -- Adrian Klaver adrian.klaver@aklaver.com
Sending stuff in big-batches + autocommit (fast transactions) + few network calls is performance 101 I thought. I think the "executemany" should be documented what it does (it looked suspicious when I saw it long time ago, why I didn't use it).
On Sat, Dec 24, 2016 at 6:00 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 12/23/2016 06:57 PM, Christophe Pettus wrote:On Dec 23, 2016, at 18:55, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
Alright that I get. Still the practical outcome is each INSERT is being done in a transaction (an implicit one) so the transaction overhead comes into play. Or am I missing something?
Nope, not missing a thing. The theory (and it is only that) is that when they do the .executemany(), each of those INSERTs pays the transaction overhead, while if they do one big INSERT, just that one statement does.
Just ran a quick and dirty test using IPython %timeit.
With a list of 200 tuples each which had 3 integers INSERTing into:
test=> \d psycopg_table
Table "public.psycopg_table"
Column | Type | Modifiers
--------+---------+-----------
a | integer |
b | integer |
c | integer |
The results where:
sql = "INSERT INTO psycopg_table VALUES(%s, %s, %s)"
Without autocommit:
In [65]: timeit -n 10 cur.executemany(sql, l)
10 loops, best of 3: 12.5 ms per loop
With autocommit:
In [72]: timeit -n 10 cur.executemany(sql, l)
10 loops, best of 3: 1.71 s per loop
--
-- Christophe Pettus
xof@thebuild.com
--
Adrian Klaver
adrian.klaver@aklaver.com
--
Sent via psycopg mailing list (psycopg@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/psycopg
The implementation of executemany as described by me a few days ago is available in this gist, not heavily tested: https://gist.github.com/dvarrazzo/4204cca5d1cd7c9c95bc814d81e6b83e I would like to know if anyone sees any shortcoming in this new implementation. -- Daniele On Sun, Dec 25, 2016 at 10:11 AM, Dorian Hoxha <dorian.hoxha@gmail.com> wrote: > Sending stuff in big-batches + autocommit (fast transactions) + few network > calls is performance 101 I thought. I think the "executemany" should be > documented what it does (it looked suspicious when I saw it long time ago, > why I didn't use it). > > On Sat, Dec 24, 2016 at 6:00 AM, Adrian Klaver <adrian.klaver@aklaver.com> > wrote: >> >> On 12/23/2016 06:57 PM, Christophe Pettus wrote: >>> >>> >>>> On Dec 23, 2016, at 18:55, Adrian Klaver <adrian.klaver@aklaver.com> >>>> wrote: >>>> Alright that I get. Still the practical outcome is each INSERT is being >>>> done in a transaction (an implicit one) so the transaction overhead comes >>>> into play. Or am I missing something? >>> >>> >>> Nope, not missing a thing. The theory (and it is only that) is that when >>> they do the .executemany(), each of those INSERTs pays the transaction >>> overhead, while if they do one big INSERT, just that one statement does. >> >> >> Just ran a quick and dirty test using IPython %timeit. >> >> With a list of 200 tuples each which had 3 integers INSERTing into: >> test=> \d psycopg_table >> Table "public.psycopg_table" >> Column | Type | Modifiers >> --------+---------+----------- >> a | integer | >> b | integer | >> c | integer | >> >> >> The results where: >> >> sql = "INSERT INTO psycopg_table VALUES(%s, %s, %s)" >> >> Without autocommit: >> >> In [65]: timeit -n 10 cur.executemany(sql, l) >> 10 loops, best of 3: 12.5 ms per loop >> >> >> With autocommit: >> >> In [72]: timeit -n 10 cur.executemany(sql, l) >> 10 loops, best of 3: 1.71 s per loop >> >> >>> >>> -- >>> -- Christophe Pettus >>> xof@thebuild.com >>> >> >> >> -- >> Adrian Klaver >> adrian.klaver@aklaver.com >> >> >> >> -- >> Sent via psycopg mailing list (psycopg@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/psycopg > >
> On Dec 30, 2016, at 14:24, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote: > > The implementation of executemany as described by me a few days ago is > available in this gist, not heavily tested: > > https://gist.github.com/dvarrazzo/4204cca5d1cd7c9c95bc814d81e6b83e > > I would like to know if anyone sees any shortcoming in this new implementation. Seems fine to me! I wish there was a way of feeding the queries down the pipe asynchronously, rather than having to builda gigantic string, but that's probably more trouble than the feature is worth. -- -- Christophe Pettus xof@thebuild.com
On 12/30/2016 02:24 PM, Daniele Varrazzo wrote: > The implementation of executemany as described by me a few days ago is > available in this gist, not heavily tested: > > https://gist.github.com/dvarrazzo/4204cca5d1cd7c9c95bc814d81e6b83e > > I would like to know if anyone sees any shortcoming in this new implementation. A quick test. I added an argument to change the page_size on the command line: With NRECS=10000: aklaver@tito:~> python psycopg_executemany.py -p 10 classic: 0.800544023514 sec joined: 0.514330863953 sec aklaver@tito:~> python psycopg_executemany.py -p 100 classic: 0.780461072922 sec joined: 0.473304986954 sec aklaver@tito:~> python psycopg_executemany.py -p 1000 classic: 0.820818901062 sec joined: 0.488647937775 sec With NRECS=100000: aklaver@tito:~> python psycopg_executemany.py -p 10 classic: 7.78319811821 sec joined: 4.18683385849 sec aklaver@tito:~> python psycopg_executemany.py -p 100 classic: 7.75992202759 sec joined: 4.06096816063 sec aklaver@tito:~> python psycopg_executemany.py -p 1000 classic: 7.76269102097 sec joined: 4.12301802635 sec The relative difference between the classic and joined seems to hold, you just do not seem to get much benefit from changing the page_size. Not sure how much that matters and you do get a benefit from the joined solution. > > -- Daniele > > On Sun, Dec 25, 2016 at 10:11 AM, Dorian Hoxha <dorian.hoxha@gmail.com> wrote: >> Sending stuff in big-batches + autocommit (fast transactions) + few network >> calls is performance 101 I thought. I think the "executemany" should be >> documented what it does (it looked suspicious when I saw it long time ago, >> why I didn't use it). >> >> On Sat, Dec 24, 2016 at 6:00 AM, Adrian Klaver <adrian.klaver@aklaver.com> >> wrote: >>> >>> On 12/23/2016 06:57 PM, Christophe Pettus wrote: >>>> >>>> >>>>> On Dec 23, 2016, at 18:55, Adrian Klaver <adrian.klaver@aklaver.com> >>>>> wrote: >>>>> Alright that I get. Still the practical outcome is each INSERT is being >>>>> done in a transaction (an implicit one) so the transaction overhead comes >>>>> into play. Or am I missing something? >>>> >>>> >>>> Nope, not missing a thing. The theory (and it is only that) is that when >>>> they do the .executemany(), each of those INSERTs pays the transaction >>>> overhead, while if they do one big INSERT, just that one statement does. >>> >>> >>> Just ran a quick and dirty test using IPython %timeit. >>> >>> With a list of 200 tuples each which had 3 integers INSERTing into: >>> test=> \d psycopg_table >>> Table "public.psycopg_table" >>> Column | Type | Modifiers >>> --------+---------+----------- >>> a | integer | >>> b | integer | >>> c | integer | >>> >>> >>> The results where: >>> >>> sql = "INSERT INTO psycopg_table VALUES(%s, %s, %s)" >>> >>> Without autocommit: >>> >>> In [65]: timeit -n 10 cur.executemany(sql, l) >>> 10 loops, best of 3: 12.5 ms per loop >>> >>> >>> With autocommit: >>> >>> In [72]: timeit -n 10 cur.executemany(sql, l) >>> 10 loops, best of 3: 1.71 s per loop >>> >>> >>>> >>>> -- >>>> -- Christophe Pettus >>>> xof@thebuild.com >>>> >>> >>> >>> -- >>> Adrian Klaver >>> adrian.klaver@aklaver.com >>> >>> >>> >>> -- >>> Sent via psycopg mailing list (psycopg@postgresql.org) >>> To make changes to your subscription: >>> http://www.postgresql.org/mailpref/psycopg >> >> -- Adrian Klaver adrian.klaver@aklaver.com
Since lists are fixed-arrays that grow in 2x size when they fill, it's better to build a list in 1 go instead of appending to it (which makes it full, and then create a new 2x bigger, copying there, appending etc etc until it's full)
example:
`sqls = [self.mogrify(sql, args) for args in args]`On Sat, Dec 31, 2016 at 12:55 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 12/30/2016 02:24 PM, Daniele Varrazzo wrote:The implementation of executemany as described by me a few days ago is
available in this gist, not heavily tested:
https://gist.github.com/dvarrazzo/4204cca5d1cd7c9c95bc814d81 e6b83e
I would like to know if anyone sees any shortcoming in this new implementation.
A quick test. I added an argument to change the page_size on the command line:
With NRECS=10000:
aklaver@tito:~> python psycopg_executemany.py -p 10
classic: 0.800544023514 sec
joined: 0.514330863953 sec
aklaver@tito:~> python psycopg_executemany.py -p 100
classic: 0.780461072922 sec
joined: 0.473304986954 sec
aklaver@tito:~> python psycopg_executemany.py -p 1000
classic: 0.820818901062 sec
joined: 0.488647937775 sec
With NRECS=100000:
aklaver@tito:~> python psycopg_executemany.py -p 10
classic: 7.78319811821 sec
joined: 4.18683385849 sec
aklaver@tito:~> python psycopg_executemany.py -p 100
classic: 7.75992202759 sec
joined: 4.06096816063 sec
aklaver@tito:~> python psycopg_executemany.py -p 1000
classic: 7.76269102097 sec
joined: 4.12301802635 sec
The relative difference between the classic and joined seems to hold, you just do not seem to get much benefit from changing the page_size. Not sure how much that matters and you do get a benefit from the joined solution.--
-- Daniele
On Sun, Dec 25, 2016 at 10:11 AM, Dorian Hoxha <dorian.hoxha@gmail.com> wrote:Sending stuff in big-batches + autocommit (fast transactions) + few network
calls is performance 101 I thought. I think the "executemany" should be
documented what it does (it looked suspicious when I saw it long time ago,
why I didn't use it).
On Sat, Dec 24, 2016 at 6:00 AM, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:
On 12/23/2016 06:57 PM, Christophe Pettus wrote:On Dec 23, 2016, at 18:55, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:
Alright that I get. Still the practical outcome is each INSERT is being
done in a transaction (an implicit one) so the transaction overhead comes
into play. Or am I missing something?
Nope, not missing a thing. The theory (and it is only that) is that when
they do the .executemany(), each of those INSERTs pays the transaction
overhead, while if they do one big INSERT, just that one statement does.
Just ran a quick and dirty test using IPython %timeit.
With a list of 200 tuples each which had 3 integers INSERTing into:
test=> \d psycopg_table
Table "public.psycopg_table"
Column | Type | Modifiers
--------+---------+-----------
a | integer |
b | integer |
c | integer |
The results where:
sql = "INSERT INTO psycopg_table VALUES(%s, %s, %s)"
Without autocommit:
In [65]: timeit -n 10 cur.executemany(sql, l)
10 loops, best of 3: 12.5 ms per loop
With autocommit:
In [72]: timeit -n 10 cur.executemany(sql, l)
10 loops, best of 3: 1.71 s per loop
--
-- Christophe Pettus
xof@thebuild.com
--
Adrian Klaver
adrian.klaver@aklaver.com
--
Sent via psycopg mailing list (psycopg@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/psycopg
Adrian Klaver
adrian.klaver@aklaver.com
On 12/30/2016 06:42 PM, Christophe Pettus wrote: > >> On Dec 30, 2016, at 14:24, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote: >> >> The implementation of executemany as described by me a few days ago is >> available in this gist, not heavily tested: >> >> https://gist.github.com/dvarrazzo/4204cca5d1cd7c9c95bc814d81e6b83e >> >> I would like to know if anyone sees any shortcoming in this new implementation. > > Seems fine to me! I wish there was a way of feeding the queries down the pipe asynchronously, rather than having to builda gigantic string, but that's probably more trouble than the feature is worth. you'd need to be careful with that as a series of multiple parameter sets may have dependencies on each other, not to mention people might find it surprising that sequences / defaults / SERIAL etc. aren't firing off in the order in which parameter sets were given. > > -- > -- Christophe Pettus > xof@thebuild.com >
On 12/24/2016 12:00 AM, Adrian Klaver wrote: > On 12/23/2016 06:57 PM, Christophe Pettus wrote: >> >>> On Dec 23, 2016, at 18:55, Adrian Klaver <adrian.klaver@aklaver.com> >>> wrote: >>> Alright that I get. Still the practical outcome is each INSERT is >>> being done in a transaction (an implicit one) so the transaction >>> overhead comes into play. Or am I missing something? >> >> Nope, not missing a thing. The theory (and it is only that) is that >> when they do the .executemany(), each of those INSERTs pays the >> transaction overhead, while if they do one big INSERT, just that one >> statement does. > > Just ran a quick and dirty test using IPython %timeit. > > With a list of 200 tuples each which had 3 integers INSERTing into: > test=> \d psycopg_table > Table "public.psycopg_table" > Column | Type | Modifiers > --------+---------+----------- > a | integer | > b | integer | > c | integer | > > > The results where: > > sql = "INSERT INTO psycopg_table VALUES(%s, %s, %s)" > > Without autocommit: > > In [65]: timeit -n 10 cur.executemany(sql, l) > 10 loops, best of 3: 12.5 ms per loop > > > With autocommit: > > In [72]: timeit -n 10 cur.executemany(sql, l) > 10 loops, best of 3: 1.71 s per loop please ensure you run this test with statements passing over a real network connection and not localhost. makes a significant difference. > > >> >> -- >> -- Christophe Pettus >> xof@thebuild.com >> > >
On 01/01/2017 11:14 AM, mike bayer wrote: > > > On 12/24/2016 12:00 AM, Adrian Klaver wrote: >> On 12/23/2016 06:57 PM, Christophe Pettus wrote: >>> >>>> On Dec 23, 2016, at 18:55, Adrian Klaver <adrian.klaver@aklaver.com> >>>> wrote: >>>> Alright that I get. Still the practical outcome is each INSERT is >>>> being done in a transaction (an implicit one) so the transaction >>>> overhead comes into play. Or am I missing something? >>> >>> Nope, not missing a thing. The theory (and it is only that) is that >>> when they do the .executemany(), each of those INSERTs pays the >>> transaction overhead, while if they do one big INSERT, just that one >>> statement does. >> >> Just ran a quick and dirty test using IPython %timeit. >> >> With a list of 200 tuples each which had 3 integers INSERTing into: >> test=> \d psycopg_table >> Table "public.psycopg_table" >> Column | Type | Modifiers >> --------+---------+----------- >> a | integer | >> b | integer | >> c | integer | >> >> >> The results where: >> >> sql = "INSERT INTO psycopg_table VALUES(%s, %s, %s)" >> >> Without autocommit: >> >> In [65]: timeit -n 10 cur.executemany(sql, l) >> 10 loops, best of 3: 12.5 ms per loop >> >> >> With autocommit: >> >> In [72]: timeit -n 10 cur.executemany(sql, l) >> 10 loops, best of 3: 1.71 s per loop > > > please ensure you run this test with statements passing over a real > network connection and not localhost. makes a significant difference. Same code across network, client in Bellingham WA, server in Fremont CA: Without autocommit: In [51]: %timeit -n 10 cur.executemany(sql, l) 10 loops, best of 3: 8.22 s per loop With autocommit: In [56]: %timeit -n 10 cur.executemany(sql, l) 10 loops, best of 3: 8.38 s per loop > > > > > >> >> >>> >>> -- >>> -- Christophe Pettus >>> xof@thebuild.com >>> >> >> -- Adrian Klaver adrian.klaver@aklaver.com
On Sun, Jan 1, 2017 at 8:12 PM, mike bayer <mike_mp@zzzcomputing.com> wrote: > > you'd need to be careful with that as a series of multiple parameter sets > may have dependencies on each other, not to mention people might find it > surprising that sequences / defaults / SERIAL etc. aren't firing off in the > order in which parameter sets were given. I don't expect the parameters to be interpreted in a different way by the server: I think the order would be maintained. -- Daniele
On Sun, Jan 1, 2017 at 9:33 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > Same code across network, client in Bellingham WA, server in Fremont CA: > > Without autocommit: > > In [51]: %timeit -n 10 cur.executemany(sql, l) > 10 loops, best of 3: 8.22 s per loop > > > With autocommit: > > In [56]: %timeit -n 10 cur.executemany(sql, l) > 10 loops, best of 3: 8.38 s per loop Adrian, have you got a benchmark "classic vs. joined" on remote network? Thank you. -- Daniele
On 01/02/2017 05:05 AM, Daniele Varrazzo wrote: > On Sun, Jan 1, 2017 at 9:33 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > >> Same code across network, client in Bellingham WA, server in Fremont CA: >> >> Without autocommit: >> >> In [51]: %timeit -n 10 cur.executemany(sql, l) >> 10 loops, best of 3: 8.22 s per loop >> >> >> With autocommit: >> >> In [56]: %timeit -n 10 cur.executemany(sql, l) >> 10 loops, best of 3: 8.38 s per loop > > Adrian, have you got a benchmark "classic vs. joined" on remote > network? Thank you. Knew I was forgetting something, thanks for the reminder. Will set that up and report back. > > -- Daniele > -- Adrian Klaver adrian.klaver@aklaver.com
On 01/02/2017 05:05 AM, Daniele Varrazzo wrote: > On Sun, Jan 1, 2017 at 9:33 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > >> Same code across network, client in Bellingham WA, server in Fremont CA: >> >> Without autocommit: >> >> In [51]: %timeit -n 10 cur.executemany(sql, l) >> 10 loops, best of 3: 8.22 s per loop >> >> >> With autocommit: >> >> In [56]: %timeit -n 10 cur.executemany(sql, l) >> 10 loops, best of 3: 8.38 s per loop > > Adrian, have you got a benchmark "classic vs. joined" on remote > network? Thank you. With NRECS=10000 and page size=100: aklaver@tito:~> python psycopg_executemany.py -p 100 classic: 427.618795156 sec joined: 7.55754685402 sec > > -- Daniele > -- Adrian Klaver adrian.klaver@aklaver.com
On Mon, Jan 02, 2017 at 07:35:46AM -0800, Adrian Klaver wrote: >>> In [56]: %timeit -n 10 cur.executemany(sql, l) Someone along the way was asking for a possibility to "feed in" the 'l' values. Any chance 'l' can be supported to be a generator expression ? Thanks for these two great developments ! Karsten -- GPG key ID E4071346 @ eu.pool.sks-keyservers.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
On Mon, Jan 2, 2017 at 4:35 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > > With NRECS=10000 and page size=100: > > aklaver@tito:~> python psycopg_executemany.py -p 100 > classic: 427.618795156 sec > joined: 7.55754685402 sec Ugh! :D -- Daniele
On Mon, Jan 2, 2017 at 4:46 PM, Karsten Hilbert <Karsten.Hilbert@gmx.net> wrote: > On Mon, Jan 02, 2017 at 07:35:46AM -0800, Adrian Klaver wrote: > >>>> In [56]: %timeit -n 10 cur.executemany(sql, l) > > Someone along the way was asking for a possibility to "feed > in" the 'l' values. > > Any chance 'l' can be supported to be a generator expression ? The function would consume only page_size records at time. This is to avoid creating a humongous query in memory and then on the server, but has the nice effect of requesting only as many records from the sequence. -- Daniele
On 01/02/2017 08:07 AM, Daniele Varrazzo wrote: > On Mon, Jan 2, 2017 at 4:35 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote: >> >> With NRECS=10000 and page size=100: >> >> aklaver@tito:~> python psycopg_executemany.py -p 100 >> classic: 427.618795156 sec >> joined: 7.55754685402 sec > > Ugh! :D Well it does show the benefit of the joined approach. > > -- Daniele > -- Adrian Klaver adrian.klaver@aklaver.com
On Mon, Jan 02, 2017 at 05:10:10PM +0100, Daniele Varrazzo wrote: > > Any chance 'l' can be supported to be a generator expression ? > > The function would consume only page_size records at time. This is to > avoid creating a humongous query in memory and then on the server, but > has the nice effect of requesting only as many records from the > sequence. Nice. Can said sequence be a generator (IOW not needing to support len() upfront) ? Karsten -- GPG key ID E4071346 @ eu.pool.sks-keyservers.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
On Mon, Jan 2, 2017 at 5:16 PM, Karsten Hilbert <Karsten.Hilbert@gmx.net> wrote: > On Mon, Jan 02, 2017 at 05:10:10PM +0100, Daniele Varrazzo wrote: > >> > Any chance 'l' can be supported to be a generator expression ? >> >> The function would consume only page_size records at time. This is to >> avoid creating a humongous query in memory and then on the server, but >> has the nice effect of requesting only as many records from the >> sequence. > > Nice. > > Can said sequence be a generator (IOW not needing to support > len() upfront) ? Correct: see the implementation of paginate() in the gist at <https://gist.github.com/dvarrazzo/4204cca5d1cd7c9c95bc814d81e6b83e>: the sequence is only consumed via next(). Final implementation may change but accessing the sequence only as a generator is a desired feature for me too. -- Daniele
> On 01/02/2017 08:07 AM, Daniele Varrazzo wrote: > > On Mon, Jan 2, 2017 at 4:35 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > > > > > > With NRECS=10000 and page size=100: > > > > > > aklaver@tito:~> python psycopg_executemany.py -p 100 > > > classic: 427.618795156 sec > > > joined: 7.55754685402 sec > > > > Ugh! :D While 7 as such may be a lot, a decrease from 427 to 7 is simply brilliant. Karsten -- GPG key ID E4071346 @ eu.pool.sks-keyservers.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
On Mon, Jan 02, 2017 at 05:24:47PM +0100, Daniele Varrazzo wrote: > > Can said sequence be a generator (IOW not needing to support > > len() upfront) ? > > Correct: see the implementation of paginate() in the gist at > <https://gist.github.com/dvarrazzo/4204cca5d1cd7c9c95bc814d81e6b83e>: > the sequence is only consumed via next(). Ah, missed that, sorry. Great work, Karsten -- GPG key ID E4071346 @ eu.pool.sks-keyservers.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
On 01/02/2017 08:27 AM, Karsten Hilbert wrote: >> On 01/02/2017 08:07 AM, Daniele Varrazzo wrote: >>> On Mon, Jan 2, 2017 at 4:35 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote: >>>> >>>> With NRECS=10000 and page size=100: >>>> >>>> aklaver@tito:~> python psycopg_executemany.py -p 100 >>>> classic: 427.618795156 sec >>>> joined: 7.55754685402 sec >>> >>> Ugh! :D > > While 7 as such may be a lot, a decrease from 427 to 7 is > simply brilliant. Also the server I am running this against is a stock install running on a VM that just backs an issue tracker. No optimizations. Also on the client end we are experiencing a Winter storm that is making things sort of flaky. So I would take this as a relative comparison not a absolute truth. > > Karsten > -- Adrian Klaver adrian.klaver@aklaver.com
El 02/01/17 a las 17:07, Daniele Varrazzo escribió: > On Mon, Jan 2, 2017 at 4:35 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote: >> >> With NRECS=10000 and page size=100: >> >> aklaver@tito:~> python psycopg_executemany.py -p 100 >> classic: 427.618795156 sec >> joined: 7.55754685402 sec > Hello, There is a third option that provides a small improvement: generate a single sql with multple values. - Test with local database: classic: 1.53970813751 sec joined: 0.564052820206 sec joined values: 0.175103187561 sec - Test with db on an internet server classic: 236.342775822 sec joined: 6.08789801598 sec joined values: 4.49090409279 sec I often need to move data between different internet servers (sql server <-> Postgresql). In my experience this is the fastest way to move hundreds of thousands of data records. I attach the sample modified with it executemany3 function. (Sorry for my bad english) Regards.
El 02/01/17 a las 20:33, Oswaldo escribió: > El 02/01/17 a las 17:07, Daniele Varrazzo escribió: >> On Mon, Jan 2, 2017 at 4:35 PM, Adrian Klaver >> <adrian.klaver@aklaver.com> wrote: >>> >>> With NRECS=10000 and page size=100: >>> >>> aklaver@tito:~> python psycopg_executemany.py -p 100 >>> classic: 427.618795156 sec >>> joined: 7.55754685402 sec >> > > Hello, > > There is a third option that provides a small improvement: generate a > single sql with multple values. > > - Test with local database: > > classic: 1.53970813751 sec > joined: 0.564052820206 sec > joined values: 0.175103187561 sec > > - Test with db on an internet server > > classic: 236.342775822 sec > joined: 6.08789801598 sec > joined values: 4.49090409279 sec > > I often need to move data between different internet servers (sql server > <-> Postgresql). In my experience this is the fastest way to move > hundreds of thousands of data records. > > I attach the sample modified with it executemany3 function. > Sorry i forgot the file.
Attachment
On 02/01/17 17:07, Daniele Varrazzo wrote: > On Mon, Jan 2, 2017 at 4:35 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote: >> With NRECS=10000 and page size=100: >> >> aklaver@tito:~> python psycopg_executemany.py -p 100 >> classic: 427.618795156 sec >> joined: 7.55754685402 sec > Ugh! :D That's great. Just a minor point: I won't overload executemany() with this feature but add a new method UNLESS the semantics are exactly the same especially regarding session isolation. Also, right now psycopg keeps track of the number of affected rows over executemany() calls: I'd like to not lose that because it is a breaking change to the API. federico -- Federico Di Gregorio federico.digregorio@dndg.it DNDG srl http://dndg.it We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. -- D.E.Knuth
On Thu, Jan 5, 2017 at 5:32 PM, Federico Di Gregorio <fog@dndg.it> wrote: > On 02/01/17 17:07, Daniele Varrazzo wrote: >> >> On Mon, Jan 2, 2017 at 4:35 PM, Adrian Klaver <adrian.klaver@aklaver.com> >> wrote: >>> >>> With NRECS=10000 and page size=100: >>> >>> aklaver@tito:~> python psycopg_executemany.py -p 100 >>> classic: 427.618795156 sec >>> joined: 7.55754685402 sec >> >> Ugh! :D > > > That's great. Just a minor point: I won't overload executemany() with this > feature but add a new method UNLESS the semantics are exactly the same > especially regarding session isolation. Also, right now psycopg keeps track > of the number of affected rows over executemany() calls: I'd like to not > lose that because it is a breaking change to the API. It seems to me that the semantics would stay the same, even in presence of volatile functions. However unfortunately rowcount would break. That's just sad. We can have no problem an extra argument to executemany: page_size defaulting to 1 (previous behaviour) which could be bumped. It's sad the default cannot be 100. Mike Bayer reported (https://github.com/psycopg/psycopg2/issues/491) that SQLAlchemy actually uses the aggregated rowcount for concurrency control. So, how much it is of a deal-breaker? Can we afford losing aggregated rowcount to obtain a juicy speedup in default usage, or we'd rather leave the behaviour untouched but having people "opting in for speed"? ponder, ponder... Pondered: as the features had little test and I don't want to delay releasing 2.7 further, I'd rather release the feature with a page_size default of 1. People could use it and report eventual failures if they use a page_size > 1. If tests turn out to be positive that the database behaves ok we could think about changing the default in the future. We may want to drop the aggregated rowcount in the future but with better planning, e.g. to allow SQLAlchemy to ignore aggregated rowcount from psycopg >= 2.8... How does it sound? -- Daniele
On 01/05/2017 11:00 AM, Daniele Varrazzo wrote: > On Thu, Jan 5, 2017 at 5:32 PM, Federico Di Gregorio <fog@dndg.it> wrote: >> On 02/01/17 17:07, Daniele Varrazzo wrote: >>> >>> On Mon, Jan 2, 2017 at 4:35 PM, Adrian Klaver <adrian.klaver@aklaver.com> >>> wrote: >>>> >>>> With NRECS=10000 and page size=100: >>>> >>>> aklaver@tito:~> python psycopg_executemany.py -p 100 >>>> classic: 427.618795156 sec >>>> joined: 7.55754685402 sec >>> >>> Ugh! :D >> >> >> That's great. Just a minor point: I won't overload executemany() with this >> feature but add a new method UNLESS the semantics are exactly the same >> especially regarding session isolation. Also, right now psycopg keeps track >> of the number of affected rows over executemany() calls: I'd like to not >> lose that because it is a breaking change to the API. > > It seems to me that the semantics would stay the same, even in > presence of volatile functions. However unfortunately rowcount would > break. That's just sad. > > We can have no problem an extra argument to executemany: page_size > defaulting to 1 (previous behaviour) which could be bumped. It's sad > the default cannot be 100. > > Mike Bayer reported (https://github.com/psycopg/psycopg2/issues/491) > that SQLAlchemy actually uses the aggregated rowcount for concurrency > control. > > So, how much it is of a deal-breaker? Can we afford losing aggregated > rowcount to obtain a juicy speedup in default usage, or we'd rather > leave the behaviour untouched but having people "opting in for speed"? > > ponder, ponder... > > Pondered: as the features had little test and I don't want to delay > releasing 2.7 further, I'd rather release the feature with a page_size > default of 1. People could use it and report eventual failures if they > use a page_size > 1. If tests turn out to be positive that the > database behaves ok we could think about changing the default in the > future. We may want to drop the aggregated rowcount in the future but > with better planning, e.g. to allow SQLAlchemy to ignore aggregated > rowcount from psycopg >= 2.8... > > How does it sound? Works for me. > > -- Daniele > > -- Adrian Klaver adrian.klaver@aklaver.com
On 05/01/17 20:00, Daniele Varrazzo wrote: > On Thu, Jan 5, 2017 at 5:32 PM, Federico Di Gregorio <fog@dndg.it> wrote: >> On 02/01/17 17:07, Daniele Varrazzo wrote: >>> On Mon, Jan 2, 2017 at 4:35 PM, Adrian Klaver <adrian.klaver@aklaver.com> >>> wrote: >>>> With NRECS=10000 and page size=100: >>>> >>>> aklaver@tito:~> python psycopg_executemany.py -p 100 >>>> classic: 427.618795156 sec >>>> joined: 7.55754685402 sec >>> Ugh! :D >> >> That's great. Just a minor point: I won't overload executemany() with this >> feature but add a new method UNLESS the semantics are exactly the same >> especially regarding session isolation. Also, right now psycopg keeps track >> of the number of affected rows over executemany() calls: I'd like to not >> lose that because it is a breaking change to the API. > It seems to me that the semantics would stay the same, even in > presence of volatile functions. However unfortunately rowcount would > break. That's just sad. > > We can have no problem an extra argument to executemany: page_size > defaulting to 1 (previous behaviour) which could be bumped. It's sad > the default cannot be 100. > > Mike Bayer reported (https://github.com/psycopg/psycopg2/issues/491) > that SQLAlchemy actually uses the aggregated rowcount for concurrency > control. > > So, how much it is of a deal-breaker? Can we afford losing aggregated > rowcount to obtain a juicy speedup in default usage, or we'd rather > leave the behaviour untouched but having people "opting in for speed"? > > ponder, ponder... > > Pondered: as the features had little test and I don't want to delay > releasing 2.7 further, I'd rather release the feature with a page_size > default of 1. People could use it and report eventual failures if they > use a page_size > 1. If tests turn out to be positive that the > database behaves ok we could think about changing the default in the > future. We may want to drop the aggregated rowcount in the future but > with better planning, e.g. to allow SQLAlchemy to ignore aggregated > rowcount from psycopg >= 2.8... > > How does it sound? Fine for me. federico -- Federico Di Gregorio federico.digregorio@dndg.it DNDG srl http://dndg.it Purtroppo i creazionisti non si sono ancora estinti. -- vodka
On 01/05/2017 02:00 PM, Daniele Varrazzo wrote: > On Thu, Jan 5, 2017 at 5:32 PM, Federico Di Gregorio <fog@dndg.it> wrote: >> On 02/01/17 17:07, Daniele Varrazzo wrote: >>> >>> On Mon, Jan 2, 2017 at 4:35 PM, Adrian Klaver <adrian.klaver@aklaver.com> >>> wrote: >>>> >>>> With NRECS=10000 and page size=100: >>>> >>>> aklaver@tito:~> python psycopg_executemany.py -p 100 >>>> classic: 427.618795156 sec >>>> joined: 7.55754685402 sec >>> >>> Ugh! :D >> >> >> That's great. Just a minor point: I won't overload executemany() with this >> feature but add a new method UNLESS the semantics are exactly the same >> especially regarding session isolation. Also, right now psycopg keeps track >> of the number of affected rows over executemany() calls: I'd like to not >> lose that because it is a breaking change to the API. > > It seems to me that the semantics would stay the same, even in > presence of volatile functions. However unfortunately rowcount would > break. That's just sad. > > We can have no problem an extra argument to executemany: page_size > defaulting to 1 (previous behaviour) which could be bumped. It's sad > the default cannot be 100. > > Mike Bayer reported (https://github.com/psycopg/psycopg2/issues/491) > that SQLAlchemy actually uses the aggregated rowcount for concurrency > control. > > So, how much it is of a deal-breaker? Can we afford losing aggregated > rowcount to obtain a juicy speedup in default usage, or we'd rather > leave the behaviour untouched but having people "opting in for speed"? > > ponder, ponder... > > Pondered: as the features had little test and I don't want to delay > releasing 2.7 further, I'd rather release the feature with a page_size > default of 1. People could use it and report eventual failures if they > use a page_size > 1. If tests turn out to be positive that the > database behaves ok we could think about changing the default in the > future. We may want to drop the aggregated rowcount in the future but > with better planning, e.g. to allow SQLAlchemy to ignore aggregated > rowcount from psycopg >= 2.8... SQLAlchemy can definitely ignore the aggregated rowcount as most DBAPIs don't support it anyway, so we can flip the flag off if we know exactly what psycopg version breaks it. The ORM in most cases prefers to use executemany in any case unless the mapping has specified a versioning column, in which case it has to use the method that supplies accurate rowcount. Ideally if we can control whether or not we get "aggreagted rowcount" or "speed" via alternate API / flags / etc. would be nice. Seems like SQLAlchemy will need downstream changes to support this in any case. > > How does it sound? > > -- Daniele > >
On 1/9/17 10:04 AM, mike bayer wrote: > SQLAlchemy can definitely ignore the aggregated rowcount as most DBAPIs > don't support it anyway, so we can flip the flag off if we know exactly > what psycopg version breaks it. ISTM it'd be better to add an attribute to indicate whether rowcount was available or not, especially since it would be available with a page size of 1? -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
On 01/09/2017 11:45 AM, Jim Nasby wrote: > On 1/9/17 10:04 AM, mike bayer wrote: >> SQLAlchemy can definitely ignore the aggregated rowcount as most DBAPIs >> don't support it anyway, so we can flip the flag off if we know exactly >> what psycopg version breaks it. > > ISTM it'd be better to add an attribute to indicate whether rowcount was > available or not, especially since it would be available with a page > size of 1? so it's a dialect-level flag which on a SQLAlchemy engine you'd see as engine.dialect.supports_sane_multi_rowcount. When psycopg2 first adds the page size API, we can begin supporting it by detecting the availability of the flag via psycopg2 version info, as well as the default for this value - if psycopg2 is setting it to a number greater than 1 by default, we'd want to keep that default. If these two things are true then the supports_sane_multi_rowcount flag would be turned off for the dialect that imports this version of psycopg2. SQLAlchemy at the very least needs to respond if/when psycopg2 changes this default upstream since if we don't, it will break our own system and tests. The next addition to SQLAlchemy would be explicit API that allows this value to be configured both at dialect startup time as well as a per-statement execution option. supports_sane_multi_rowcount would be flagged based on how this is set up. There may need to be changes to ensure supports_sane_multi_rowcount is consulted local to a statement execution context, if we allow on-the-fly modifications to it.
On Mon, Jan 2, 2017 at 3:35 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote: >>> Same code across network, client in Bellingham WA, server in Fremont CA: >>> >>> Without autocommit: >>> >>> In [51]: %timeit -n 10 cur.executemany(sql, l) >>> 10 loops, best of 3: 8.22 s per loop >>> >>> >>> With autocommit: >>> >>> In [56]: %timeit -n 10 cur.executemany(sql, l) >>> 10 loops, best of 3: 8.38 s per loop >> >> Adrian, have you got a benchmark "classic vs. joined" on remote >> network? Thank you. > > With NRECS=10000 and page size=100: > > aklaver(at)tito:~> python psycopg_executemany.py -p 100 > classic: 427.618795156 sec > joined: 7.55754685402 sec This is really interesting. I have long been using a utility I put together to insert using BINARY COPY. In fact I just brushed it up a bit and put it on PyPi: <https://pypi.python.org/pypi/pgcopy> I'm curious to run a benchmark against the improved executemany. I'd hoped that pgcopy would be generally useful, but it may no longer be necessary. A fast executemany() certainly suits more use cases. Best, Aryeh Leib Taurog
On Thu, Jan 19, 2017 at 02:23:15PM +0200, Aryeh Leib Taurog wrote: > >> > >> Adrian, have you got a benchmark "classic vs. joined" on remote > >> network? Thank you. > > > > With NRECS=10000 and page size=100: > > > > aklaver(at)tito:~> python psycopg_executemany.py -p 100 > > classic: 427.618795156 sec > > joined: 7.55754685402 sec > > This is really interesting. I have long been using a utility I put > together to insert using BINARY COPY. In fact I just brushed it up a > bit and put it on PyPi: <https://pypi.python.org/pypi/pgcopy> I added binary copy to the benchmark: <https://gist.github.com/altaurog/84668e034646fb354b5de81bb86a580d> On local machine, with NRECS=100000 and page size=100: classic: 4.26264309883 sec joined: 2.34096288681 sec pgcopy: 0.412513971329 sec Over network, with NRECS=10000 and page size=100: classic: 716.759769917 sec joined: 15.141461134 sec pgcopy: 3.70594096184 sec
On Thu, Jan 19, 2017 at 12:23 PM, Aryeh Leib Taurog <python@aryehleib.com> wrote: > On Mon, Jan 2, 2017 at 3:35 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote: >> aklaver(at)tito:~> python psycopg_executemany.py -p 100 >> classic: 427.618795156 sec >> joined: 7.55754685402 sec > > This is really interesting. I have long been using a utility I put > together to insert using BINARY COPY. In fact I just brushed it up a > bit and put it on PyPi: <https://pypi.python.org/pypi/pgcopy> > > I'm curious to run a benchmark against the improved executemany. I'd > hoped that pgcopy would be generally useful, but it may no longer be > necessary. A fast executemany() certainly suits more use cases. (Sorry, mant to write this message earlier but forgot it in my drafts.) There's always the case that a sequence of: 1) psycopg executemany as is now: separate statements - insert into table values (...); - insert into table values (...); - insert into table values (...); is slower than 2) psycopg executemany as proposed: a single statement containing - insert into table values (...); insert into table values (...); insert into table values (...); which is slower than 3) a single insert with many params. Plays well with PQexecParams but would need some form of generation by the client - insert into table values (...), (...), (...); which is slower than 4) copy. While the proposed executemany is a nice low hanging fruit it will break on PQexecParams and it's far from being optimal anyway. Wonder if there is a way to help users at least to have 3 without bothering with mogrify (due to break too with the PQexecParams switch). Brainstorming from here: expect no consistency. Another good property of 2 is that it supports any statements: update, delete, select (which may call a stored procedure repeatedly to perform some data manipulation) whereas a manipulation into form 3 is specific to inserts (which can be a starting point for a fast update/select anyway: fast-insert into a temp table, then select or update with a join). fast-forward to last message: On Sun, Jan 29, 2017 at 5:44 PM, Aryeh Leib Taurog <python@aryehleib.com> wrote: > Over network, with NRECS=10000 and page size=100: > classic: 716.759769917 sec > joined: 15.141461134 sec > pgcopy: 3.70594096184 sec Aryeh thank you for this benchmark. Could you please add a test like (code untested): def insertmany(self, sql, argslist, page_size=100): tmpl = None for page in paginate(argslist, page_size=page_size): if tmpl is None and page: tmpl = '(%s)' % ','.join([%s] * len(page[0])) self.execute(sql % ",".join(self.mogrify(tmpl, args) for args in page)) where sql should be an insert with a single %s placeholder, which would be replaced by one or more arguments records, and see where we end up? I expect somewhere between "joined" and "pgcopy" but wonder closer to which. -- Daniele
On Mon, Jan 30, 2017 at 11:18:36AM +0000, Daniele Varrazzo wrote: > On Thu, Jan 19, 2017 at 12:23 PM, Aryeh Leib Taurog > <python@aryehleib.com> wrote: > > On Mon, Jan 2, 2017 at 3:35 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > > >> aklaver(at)tito:~> python psycopg_executemany.py -p 100 > >> classic: 427.618795156 sec > >> joined: 7.55754685402 sec > > > > This is really interesting. I have long been using a utility I put > > together to insert using BINARY COPY. In fact I just brushed it up a > > bit and put it on PyPi: <https://pypi.python.org/pypi/pgcopy> > > > > I'm curious to run a benchmark against the improved executemany. I'd > > hoped that pgcopy would be generally useful, but it may no longer be > > necessary. A fast executemany() certainly suits more use cases. > > (Sorry, mant to write this message earlier but forgot it in my drafts.) > > There's always the case that a sequence of: > > 1) psycopg executemany as is now: separate statements > - insert into table values (...); > - insert into table values (...); > - insert into table values (...); > > is slower than > > 2) psycopg executemany as proposed: a single statement containing > - insert into table values (...); > insert into table values (...); > insert into table values (...); > > which is slower than > > 3) a single insert with many params. Plays well with PQexecParams > but would need some form of generation by the client > - insert into table values (...), (...), (...); > > which is slower than > > 4) copy. > > While the proposed executemany is a nice low hanging fruit it will > break on PQexecParams and it's far from being optimal anyway. Wonder > if there is a way to help users at least to have 3 without bothering > with mogrify (due to break too with the PQexecParams switch). > > Brainstorming from here: expect no consistency. > > Another good property of 2 is that it supports any statements: update, > delete, select (which may call a stored procedure repeatedly to > perform some data manipulation) whereas a manipulation into form 3 is > specific to inserts (which can be a starting point for a fast > update/select anyway: fast-insert into a temp table, then select or > update with a join). I haven't dug into PQexecParams, but it seems to me that these alternate strategies could be made available outside of psycopg2. I modified the benchmark script so they are just functions which operate on a standard cursor object. They are simple enough to be included as "recipes" in the psycopg2 docs and/or in a separate package (like pgcopy). <https://gist.github.com/altaurog/84668e034646fb354b5de81bb86a580d> > fast-forward to last message: > > On Sun, Jan 29, 2017 at 5:44 PM, Aryeh Leib Taurog <python@aryehleib.com> wrote: > > > Over network, with NRECS=10000 and page size=100: > > classic: 716.759769917 sec > > joined: 15.141461134 sec > > pgcopy: 3.70594096184 sec > > Aryeh thank you for this benchmark. Could you please add a test like > (code untested): > > def insertmany(self, sql, argslist, page_size=100): > tmpl = None > for page in paginate(argslist, page_size=page_size): > if tmpl is None and page: > tmpl = '(%s)' % ','.join([%s] * len(page[0])) > self.execute(sql % ",".join(self.mogrify(tmpl, args) for > args in page)) > > where sql should be an insert with a single %s placeholder, which > would be replaced by one or more arguments records, and see where we > end up? I expect somewhere between "joined" and "pgcopy" but wonder > closer to which. This one was over a transatlantic connection, NRECS=5000 classic: 761.322767019 sec joined: 14.6529989243 sec folded: 12.4037430286 sec pgcopy: 2.85529208183 sec
On Mon, Jan 30, 2017 at 11:18:36AM +0000, Daniele Varrazzo wrote: > 3) a single insert with many params. Plays well with PQexecParams > but would need some form of generation by the client > - insert into table values (...), (...), (...); > > While the proposed executemany is a nice low hanging fruit it will > break on PQexecParams and it's far from being optimal anyway. Wonder > if there is a way to help users at least to have 3 without bothering > with mogrify (due to break too with the PQexecParams switch). Supporting a general case would surely require parsing the sql statement to some extent, but a simple insert could be done quite easily, particularly if you change the call signature so the caller does your work for you: def insert_batch(cur, sql, template, args): argslist = list(args) sql_full = sql + ','.join([template] * len(argslist)) cur.execute(sql_full, reduce(operator.add, argslist)) insert_batch(cur, "insert into testmany (num, data) values ", "(%s, %s)", data)
On Mon, Jan 30, 2017 at 9:51 PM, Aryeh Leib Taurog <python@aryehleib.com> wrote: > I haven't dug into PQexecParams, but it seems to me that these > alternate strategies could be made available outside of psycopg2. I got to the same conclusion, so I've implemented functions to implement the functions execute_batch() and execute_values() (resp. what you call "joined" and "folded" in your benchmarks) in the extras module, and leave the semantics of executemany() untouched (no extra parameters or rowcount breaking only if going batched... just too confusing). Implementation, docs, tests in `this commit`__. .. __: https://github.com/psycopg/psycopg2/commit/a95fd3df1abc0282f1c47fa2170191f037c3c8de I also thought about implementing an execute_prepared() function, which would have run PREPARE, then EXECUTE in a loop (possibly batched as in execute_batch), finally DEALLOCATE. Implementation doesn't seem trivial because: - parameters %s and %(name)s should be replaced with $1, $2, ..., with matching unescaping of literal % and escaping of literal $; - in case of error, calling DEALLOCATE is tricky: if the connection is not autocommit the transaction is failed, the function could use a savepoint but then it would clear the error state too; not calling DEALLOCATE would leave the prepared statement there and according to the name chosen for the statement (e.g. 'psycopg_%s' % id(cursor)) would make another execute_prepared() fail... it should be possible to wrap the PREPARE in a savepoint to deal with this problem; - if the connection is autocommit all the above is not needed. ISTM that it's easier to leave the users to call PREPARE/execute_batch('EXECUTE')/DEALLOCATE: knowing the context in which the sequence is called wouldn't require parameters mangling and the error management would be simpler for them. Thoughts? Shall we merge this stuff? Any feedback is welcome. -- Daniele
On Wed, Feb 01, 2017 at 02:21:37AM +0000, Daniele Varrazzo wrote: > On Mon, Jan 30, 2017 at 9:51 PM, Aryeh Leib Taurog <python@aryehleib.com> wrote: > > I haven't dug into PQexecParams, but it seems to me that these > > alternate strategies could be made available outside of psycopg2. > > I got to the same conclusion, so I've implemented functions to > implement the functions execute_batch() and execute_values() (resp. > what you call "joined" and "folded" in your benchmarks) in the extras > module, and leave the semantics of executemany() untouched (no extra > parameters or rowcount breaking only if going batched... just too > confusing). > > Implementation, docs, tests in `this commit`__. > > .. __: https://github.com/psycopg/psycopg2/commit/a95fd3df1abc0282f1c47fa2170191f037c3c8de > > I also thought about implementing an execute_prepared() function, > which would have run PREPARE, then EXECUTE in a loop (possibly batched > as in execute_batch), finally DEALLOCATE. Implementation doesn't seem > trivial because: > > - parameters %s and %(name)s should be replaced with $1, $2, ..., with > matching unescaping of literal % and escaping of literal $; > - in case of error, calling DEALLOCATE is tricky: if the connection is > not autocommit the transaction is failed, the function could use a > savepoint but then it would clear the error state too; not calling > DEALLOCATE would leave the prepared statement there and according to > the name chosen for the statement (e.g. 'psycopg_%s' % id(cursor)) > would make another execute_prepared() fail... it should be possible to > wrap the PREPARE in a savepoint to deal with this problem; > - if the connection is autocommit all the above is not needed. > > ISTM that it's easier to leave the users to call > PREPARE/execute_batch('EXECUTE')/DEALLOCATE: knowing the context in > which the sequence is called wouldn't require parameters mangling and > the error management would be simpler for them. > > Thoughts? Shall we merge this stuff? Any feedback is welcome. I think it looks great. I agree that execute_prepared() would be tricky, and it doesn't seem to provide much performance benefit over the methods you've already implemented. I have run the following benchmarks for UPDATE across the Atlantic: 1. classic executemany 2. "joined" - statements executed in batches 3. "prepared/joined" - as you suggested above 4. pgcopy - copy into a temp table, then UPDATE FROM temp table. <https://gist.github.com/altaurog/84668e034646fb354b5de81bb86a580d#file-updatemany-py-L69> With NRECS = 5000 classic: 794.658465862 sec joined: 11.6829760075 sec prepared: 10.1489500999 sec pgcopy: 2.68695497513 sec
On 1/31/17 6:21 PM, Daniele Varrazzo wrote: > ISTM that it's easier to leave the users to call > PREPARE/execute_batch('EXECUTE')/DEALLOCATE: knowing the context in > which the sequence is called wouldn't require parameters mangling and > the error management would be simpler for them. I think a context handler for prepare would be very handy, so that you didn't need to remember to dealloc. There might be a way to avoid the separate prepare too. I agree that trying to magically convert % to $ and back is nothing but trouble. If users are using prepared statements, they need to understand the $ syntax. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
I've merged the branch containing these functions to master. Docs for the new functions are at http://initd.org/psycopg/docs/extras.html#fast-execution-helpers There is still space for some final tweak, so feedback is still accepted. Thank you everybody for the constructive discussion. -- Daniele