Re: [psycopg] speed concerns with executemany() - Mailing list psycopg

From Dorian Hoxha
Subject Re: [psycopg] speed concerns with executemany()
Date
Msg-id CANsFX04okXoU9W2ij7ujzyD0Pp+ONcbAEHO9CSuQORkkejAo=g@mail.gmail.com
Whole thread Raw
In response to Re: [psycopg] speed concerns with executemany()  (Adrian Klaver <adrian.klaver@aklaver.com>)
List psycopg
Since lists are fixed-arrays that grow in 2x size when they fill, it's better to build a list in 1 go instead of appending to it (which makes it full, and then create a new 2x bigger, copying there, appending etc etc until it's full)
example:
`sqls = [self.mogrify(sql, args) for args in args]`
Even better make it a tuple instead of a list if it's immutable.

On Sat, Dec 31, 2016 at 12:55 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 12/30/2016 02:24 PM, Daniele Varrazzo wrote:
The implementation of executemany as described by me a few days ago is
available in this gist, not heavily tested:

https://gist.github.com/dvarrazzo/4204cca5d1cd7c9c95bc814d81e6b83e

I would like to know if anyone sees any shortcoming in this new implementation.

A quick test. I added an argument to change the page_size on the command line:

With NRECS=10000:

aklaver@tito:~> python psycopg_executemany.py -p 10
classic: 0.800544023514 sec
joined: 0.514330863953 sec
aklaver@tito:~> python psycopg_executemany.py -p 100
classic: 0.780461072922 sec
joined: 0.473304986954 sec
aklaver@tito:~> python psycopg_executemany.py -p 1000
classic: 0.820818901062 sec
joined: 0.488647937775 sec


With NRECS=100000:

aklaver@tito:~> python psycopg_executemany.py -p 10
classic: 7.78319811821 sec
joined: 4.18683385849 sec
aklaver@tito:~> python psycopg_executemany.py -p 100
classic: 7.75992202759 sec
joined: 4.06096816063 sec
aklaver@tito:~> python psycopg_executemany.py -p 1000
classic: 7.76269102097 sec
joined: 4.12301802635 sec


The relative difference between the classic and joined seems to hold, you just do not seem to get much benefit from changing the page_size. Not sure how much that matters and you do get a benefit from the joined solution.



-- Daniele

On Sun, Dec 25, 2016 at 10:11 AM, Dorian Hoxha <dorian.hoxha@gmail.com> wrote:
Sending stuff in big-batches + autocommit (fast transactions) + few network
calls is performance 101 I thought. I think the "executemany" should be
documented what it does (it looked suspicious when I saw it long time ago,
why I didn't use it).

On Sat, Dec 24, 2016 at 6:00 AM, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

On 12/23/2016 06:57 PM, Christophe Pettus wrote:


On Dec 23, 2016, at 18:55, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:
Alright that I get. Still the practical outcome is each INSERT is being
done in a transaction (an implicit one) so the transaction overhead comes
into play. Or am I missing something?


Nope, not missing a thing.  The theory (and it is only that) is that when
they do the .executemany(), each of those INSERTs pays the transaction
overhead, while if they do one big INSERT, just that one statement does.


Just ran a quick and dirty test using IPython %timeit.

With a list of 200 tuples each which had 3 integers INSERTing into:
test=> \d psycopg_table
 Table "public.psycopg_table"
 Column |  Type   | Modifiers
--------+---------+-----------
 a      | integer |
 b      | integer |
 c      | integer |


The results where:

sql = "INSERT INTO psycopg_table VALUES(%s, %s, %s)"

Without autocommit:

In [65]: timeit -n 10 cur.executemany(sql, l)
10 loops, best of 3: 12.5 ms per loop


With autocommit:

In [72]: timeit -n 10 cur.executemany(sql, l)
10 loops, best of 3: 1.71 s per loop



--
-- Christophe Pettus
   xof@thebuild.com



--
Adrian Klaver
adrian.klaver@aklaver.com



--
Sent via psycopg mailing list (psycopg@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/psycopg




--
Adrian Klaver
adrian.klaver@aklaver.com

psycopg by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: [psycopg] speed concerns with executemany()
Next
From: Daniele Varrazzo
Date:
Subject: [psycopg] Solving the SQL composition problem