Re: [psycopg] speed concerns with executemany() - Mailing list psycopg
From | Aryeh Leib Taurog |
---|---|
Subject | Re: [psycopg] speed concerns with executemany() |
Date | |
Msg-id | 20170130215151.GA7081@deb76.aryehleib.com Whole thread Raw |
In response to | Re: [psycopg] speed concerns with executemany() (Daniele Varrazzo <daniele.varrazzo@gmail.com>) |
Responses |
Re: [psycopg] speed concerns with executemany()
(Daniele Varrazzo <daniele.varrazzo@gmail.com>)
|
List | psycopg |
On Mon, Jan 30, 2017 at 11:18:36AM +0000, Daniele Varrazzo wrote: > On Thu, Jan 19, 2017 at 12:23 PM, Aryeh Leib Taurog > <python@aryehleib.com> wrote: > > On Mon, Jan 2, 2017 at 3:35 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote: > > >> aklaver(at)tito:~> python psycopg_executemany.py -p 100 > >> classic: 427.618795156 sec > >> joined: 7.55754685402 sec > > > > This is really interesting. I have long been using a utility I put > > together to insert using BINARY COPY. In fact I just brushed it up a > > bit and put it on PyPi: <https://pypi.python.org/pypi/pgcopy> > > > > I'm curious to run a benchmark against the improved executemany. I'd > > hoped that pgcopy would be generally useful, but it may no longer be > > necessary. A fast executemany() certainly suits more use cases. > > (Sorry, mant to write this message earlier but forgot it in my drafts.) > > There's always the case that a sequence of: > > 1) psycopg executemany as is now: separate statements > - insert into table values (...); > - insert into table values (...); > - insert into table values (...); > > is slower than > > 2) psycopg executemany as proposed: a single statement containing > - insert into table values (...); > insert into table values (...); > insert into table values (...); > > which is slower than > > 3) a single insert with many params. Plays well with PQexecParams > but would need some form of generation by the client > - insert into table values (...), (...), (...); > > which is slower than > > 4) copy. > > While the proposed executemany is a nice low hanging fruit it will > break on PQexecParams and it's far from being optimal anyway. Wonder > if there is a way to help users at least to have 3 without bothering > with mogrify (due to break too with the PQexecParams switch). > > Brainstorming from here: expect no consistency. > > Another good property of 2 is that it supports any statements: update, > delete, select (which may call a stored procedure repeatedly to > perform some data manipulation) whereas a manipulation into form 3 is > specific to inserts (which can be a starting point for a fast > update/select anyway: fast-insert into a temp table, then select or > update with a join). I haven't dug into PQexecParams, but it seems to me that these alternate strategies could be made available outside of psycopg2. I modified the benchmark script so they are just functions which operate on a standard cursor object. They are simple enough to be included as "recipes" in the psycopg2 docs and/or in a separate package (like pgcopy). <https://gist.github.com/altaurog/84668e034646fb354b5de81bb86a580d> > fast-forward to last message: > > On Sun, Jan 29, 2017 at 5:44 PM, Aryeh Leib Taurog <python@aryehleib.com> wrote: > > > Over network, with NRECS=10000 and page size=100: > > classic: 716.759769917 sec > > joined: 15.141461134 sec > > pgcopy: 3.70594096184 sec > > Aryeh thank you for this benchmark. Could you please add a test like > (code untested): > > def insertmany(self, sql, argslist, page_size=100): > tmpl = None > for page in paginate(argslist, page_size=page_size): > if tmpl is None and page: > tmpl = '(%s)' % ','.join([%s] * len(page[0])) > self.execute(sql % ",".join(self.mogrify(tmpl, args) for > args in page)) > > where sql should be an insert with a single %s placeholder, which > would be replaced by one or more arguments records, and see where we > end up? I expect somewhere between "joined" and "pgcopy" but wonder > closer to which. This one was over a transatlantic connection, NRECS=5000 classic: 761.322767019 sec joined: 14.6529989243 sec folded: 12.4037430286 sec pgcopy: 2.85529208183 sec