[psycopg] speed concerns with executemany() - Mailing list psycopg

From mike bayer
Subject [psycopg] speed concerns with executemany()
Date
Msg-id 060e2dde-fef6-1c31-bb09-097a13121375@zzzcomputing.com
Whole thread Raw
Responses Re: [psycopg] speed concerns with executemany()  (Christophe Pettus <xof@thebuild.com>)
Re: [psycopg] speed concerns with executemany()  (Adrian Klaver <adrian.klaver@aklaver.com>)
Re: [psycopg] speed concerns with executemany()  (Daniele Varrazzo <daniele.varrazzo@gmail.com>)
List psycopg
I'm getting more and more regular complaints among users of SQLAlchemy
of the relatively slow speed of the cursor.executemany() call in
psycopg2.   In almost all cases, these users have discovered that
Postgresql is entirely capable of running an INSERT or UPDATE of many
values with a high degree of speed using a single statement with a form
like this:

INSERT INTO table (a, b, c)
VALUES  (1, 2, 3), (4, 5, 6), (7, 8, 9), ...

whereas if they run the same form using a single VALUES insert and
sending the parameters as a sequence via executemany(), they will
observe hyperbolically slow speeds - today, someone claims they can run
approximately 200 sets of three integers each using the multiple VALUES
approach in approximately .02 seconds, whereas running 200 values into a
single executemany() call of the otherwise identical INSERT statement,
they are claiming takes 20 seconds; that is, 100000% slower.   I'm not
really sure how that's even possible, considering the single INSERT with
many VALUES is a much larger string to send over the network and be
parsed by the server, if the overhead of a single INSERT is .02 seconds,
we would think an executemany() of 200 INSERT statements each with a
single parameter set would be at most four seconds.

Obviously something is wrong with these users' environment, although I
will note that the relative speed of psycopg2 executemany() over a 1G
network is still pretty bad, compared to both sending a single INSERT
with a large VALUES clause as well as compared to the executemany()
speed of DBAPIs (even pure Python) for other databases like MySQL, I can
provide some quick benchmarks if that's helpful.

I understand that psycopg2 does not use prepared statements, and I have
dim recollections that internal use of prepared statements for
executemany is not on the roadmap for psycopg2.  However, I'm still not
sure what I should be telling my users when I get reports of these
vastly slower results with executemany().

I'm not asking that psycopg2 change anything, I'm just looking to
understand what the heck is going on when people are reporting this.
Should I:

1. tell them they have a network issue that is causing executemany() to
have a problem?  (even though I can also observe executemany() is kind
of slow, though not as slow as these people are reporting)

2. tell them there's some known issue, vacuuming / indexes/ or something
that is known to have this effect?

3. tell them that yes, they should use multiple-VALUES within a single
INSERT (which would eventually lead to strong pressure on me to reinvent
executemany() within SQLAlchemy's psycopg2 dialect to use this form) ?
(also if so, why is this the case?   can this claimed 100000% slowdown
be real?)

4. use a hack to actually make my own prepared statements within
executemany() (I vaguely recall some recipe that you can get a prepared
statement going with psycopg2 by rolling it on the outside) ?

5. Other reasons that executemany() is known to sometimes be extremely slow?

I'm purposely trying to stay out of the realm of picking apart the libpq
internals, assuming psycopg2 devs can shed some light what's going on
here.   Thanks for any guidance you can offer!





psycopg by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: [psycopg] Does psycopg2 support Kerberos for Postgres?
Next
From: Christophe Pettus
Date:
Subject: Re: [psycopg] speed concerns with executemany()