Re: [psycopg] speed concerns with executemany() - Mailing list psycopg

From Aryeh Leib Taurog
Subject Re: [psycopg] speed concerns with executemany()
Date
Msg-id 20170130215151.GA7081@deb76.aryehleib.com
Whole thread Raw
In response to Re: [psycopg] speed concerns with executemany()  (Daniele Varrazzo <daniele.varrazzo@gmail.com>)
Responses Re: [psycopg] speed concerns with executemany()  (Daniele Varrazzo <daniele.varrazzo@gmail.com>)
List psycopg
On Mon, Jan 30, 2017 at 11:18:36AM +0000, Daniele Varrazzo wrote:
> On Thu, Jan 19, 2017 at 12:23 PM, Aryeh Leib Taurog
> <python@aryehleib.com> wrote:
> > On Mon, Jan 2, 2017 at 3:35 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>
> >> aklaver(at)tito:~> python psycopg_executemany.py -p 100
> >> classic: 427.618795156 sec
> >> joined: 7.55754685402 sec
> >
> > This is really interesting.  I have long been using a utility I put
> > together to insert using BINARY COPY.  In fact I just brushed it up a
> > bit and put it on PyPi: <https://pypi.python.org/pypi/pgcopy>
> >
> > I'm curious to run a benchmark against the improved executemany.  I'd
> > hoped that pgcopy would be generally useful, but it may no longer be
> > necessary.  A fast executemany() certainly suits more use cases.
>
> (Sorry, mant to write this message earlier but forgot it in my drafts.)
>
> There's always the case that a sequence of:
>
> 1) psycopg executemany as is now: separate statements
> - insert into table values (...);
> - insert into table values (...);
> - insert into table values (...);
>
> is slower than
>
> 2) psycopg executemany as proposed: a single statement containing
> - insert into table values (...);
>   insert into table values (...);
>   insert into table values (...);
>
> which is slower than
>
> 3) a single insert with many params. Plays well with PQexecParams
>    but would need some form of generation by the client
> - insert into table values (...), (...), (...);
>
> which is slower than
>
> 4) copy.
>
> While the proposed executemany is a nice low hanging fruit it will
> break on PQexecParams and it's far from being optimal anyway. Wonder
> if there is a way to help users at least to have 3 without bothering
> with mogrify (due to break too with the PQexecParams switch).
>
> Brainstorming from here: expect no consistency.
>
> Another good property of 2 is that it supports any statements: update,
> delete, select (which may call a stored procedure repeatedly to
> perform some data manipulation) whereas a manipulation into form 3 is
> specific to inserts (which can be a starting point for a fast
> update/select anyway: fast-insert into a temp table, then select or
> update with a join).

I haven't dug into PQexecParams, but it seems to me that these
alternate strategies could be made available outside of psycopg2.
I modified the benchmark script so they are just functions which
operate on a standard cursor object.  They are simple enough to be
included as "recipes" in the psycopg2 docs and/or in a separate
package (like pgcopy).

<https://gist.github.com/altaurog/84668e034646fb354b5de81bb86a580d>

> fast-forward to last message:
>
> On Sun, Jan 29, 2017 at 5:44 PM, Aryeh Leib Taurog <python@aryehleib.com> wrote:
>
> > Over network, with NRECS=10000 and page size=100:
> > classic: 716.759769917 sec
> > joined: 15.141461134 sec
> > pgcopy: 3.70594096184 sec
>
> Aryeh thank you for this benchmark. Could you please add a test like
> (code untested):
>
>     def insertmany(self, sql, argslist, page_size=100):
>         tmpl = None
>         for page in paginate(argslist, page_size=page_size):
>             if tmpl is None and page:
>                 tmpl = '(%s)' % ','.join([%s] * len(page[0]))
>             self.execute(sql % ",".join(self.mogrify(tmpl, args) for
> args in page))
>
> where sql should be an insert with a single %s placeholder, which
> would be replaced by one or more arguments records, and see where we
> end up? I expect somewhere between "joined" and "pgcopy" but wonder
> closer to which.

This one was over a transatlantic connection, NRECS=5000
classic: 761.322767019 sec
joined: 14.6529989243 sec
folded: 12.4037430286 sec
pgcopy: 2.85529208183 sec


psycopg by date:

Previous
From: Daniele Varrazzo
Date:
Subject: Re: [psycopg] speed concerns with executemany()
Next
From: Aryeh Leib Taurog
Date:
Subject: Re: [psycopg] speed concerns with executemany()