Thread: Psycopg and prepared SQL statements

Psycopg and prepared SQL statements

From

Nicolas Boullis

Date:

16 May 2014, 15:22:38

Hi,

Lately, I’ve been designing and writing a WSGI webapp for which
performance matters. This webapp queries a PostgreSQL database.

As performance matters, I am using a ThreadedConnectionPool from
psycopg2.pool, but I also want to have my SQL statements prepared.

I read
  http://initd.org/psycopg/articles/2012/10/01/prepared-statements-psycopg/
but I could not find a way to mix PreparingCursor with the connection
pool.
Each statement should definitely be prepared only once for each
connection, and I could not find a way to attach the prepared cursors to
the connections in the pool.

So i designed a class that wraps SQL statements, and a class whose
instances are connection factories that prepare the needed statements.

Here is the result of my work, wit a simple example.

All comments are welcome. I think it woul be nice if psycopg could offer
a way to use prepared statements.


Cheers,

--
Nicolas Boullis

Attachment

prepared_statements.py

Re: Psycopg and prepared SQL statements

From

Christophe Pettus

Date:

16 May 2014, 15:25:21

On May 16, 2014, at 2:22 PM, Nicolas Boullis <postgresql@ilcode.fr> wrote:

> As performance matters, I am using a ThreadedConnectionPool from
> psycopg2.pool, but I also want to have my SQL statements prepared.

Why?

This an honest question.  In general, prepared statements do not provide significant performance improvements in
PostgreSQL.

--
-- Christophe Pettus
   xof@thebuild.com

Re: Psycopg and prepared SQL statements

From

Nicolas Boullis

Date:

16 May 2014, 15:34:37

On Fri, May 16, 2014 at 02:25:14PM +0200, Christophe Pettus wrote:
>
>
> On May 16, 2014, at 2:22 PM, Nicolas Boullis <postgresql@ilcode.fr> wrote:
>
> > As performance matters, I am using a ThreadedConnectionPool from
> > psycopg2.pool, but I also want to have my SQL statements prepared.
>
> Why?
>
> This an honest question.  In general, prepared statements do not provide significant performance improvements in
PostgreSQL.

Well, I benchmarked my webapp without prepared statements, and then
with.
Without prepared statements, I could handle around 70 requests/second.
With prepared statements, I could handle around 140 requests/second.
(And without the connection pool, I could handle around 35
requests/second.)

In my opinion, that’s a good enough reason for me to use prepared
statements.

Cheers,

--
Nicolas Boullis

Re: Psycopg and prepared SQL statements

From

Daniele Varrazzo

Date:

17 May 2014, 00:00:10

On Fri, May 16, 2014 at 1:22 PM, Nicolas Boullis <postgresql@ilcode.fr> wrote:
> Hi,
>
> Lately, I’ve been designing and writing a WSGI webapp for which
> performance matters. This webapp queries a PostgreSQL database.
>
> As performance matters, I am using a ThreadedConnectionPool from
> psycopg2.pool, but I also want to have my SQL statements prepared.
>
> I read
>   http://initd.org/psycopg/articles/2012/10/01/prepared-statements-psycopg/
> but I could not find a way to mix PreparingCursor with the connection
> pool.
> Each statement should definitely be prepared only once for each
> connection, and I could not find a way to attach the prepared cursors to
> the connections in the pool.
>
> So i designed a class that wraps SQL statements, and a class whose
> instances are connection factories that prepare the needed statements.
>
> Here is the result of my work, wit a simple example.
>
> All comments are welcome. I think it woul be nice if psycopg could offer
> a way to use prepared statements.

Hello Nicolas,

as stated in the article that was a first idea. One of my concerns is
exactly the fact that the statement preparation is a connection's
property and giving the prepared state to the cursor makes it more
"precious" than what you'd like.

I'd be happier to have statements prepared inside the connection too
(although the interface to require the preparation may still be on the
cursor). This begs the question: what happens with too many prepared
statements? This is not a problem for a program that has no
dynamically generated queries, but for one that does it would lead to
unbound use of resources.

I think it could be interesting to have a preparing connection, with
cursors offering some "prepare" interface as in the article you have
linked but where the state of the prepared connection is kept by the
connection. This could be used by a subclass which instead
automatically prepares every statement: easier to use but not suitable
for programs generating dynamic queries.

So I think I'd have the following classes:

PreparingConnection: connection subclass keeping the client-side state
of the statements that were prepared
PreparingCursor: cursor subclass allowing to manually prepare statements
AutoPreparingCursor: PreparingCursor subclass that automatically
prepares everything passed to its execute[many]().

The classes should be mixin-able with other cursor subclasses so that
one could have a beast preparing and returning named tuples and so on.

In the next days/weeks I'll try to make some experiments along this line.

Putting auto-preparing stuff inside a pool would automatically have
statements prepared at the first usage of each connection/query combo.
If this is not sufficient and one doesn't want to pay the price of the
occasional preparation but have all the statements prepared upfront I
think a specific subclass or wrapper as you did could be a good
option.

-- Daniele

Re: Psycopg and prepared SQL statements

From

Nicolas Boullis

Date:

20 May 2014, 02:18:25

Hello Daniele,

Thanks for your answer.

On Fri, May 16, 2014 at 09:59:45PM +0100, Daniele Varrazzo wrote:
>
> as stated in the article that was a first idea. One of my concerns is
> exactly the fact that the statement preparation is a connection's
> property and giving the prepared state to the cursor makes it more
> "precious" than what you'd like.
>
> I'd be happier to have statements prepared inside the connection too
> (although the interface to require the preparation may still be on the
> cursor). This begs the question: what happens with too many prepared
> statements? This is not a problem for a program that has no
> dynamically generated queries, but for one that does it would lead to
> unbound use of resources.

Sorry if my patch made you think I thought all SQL statements should be
prepared, but I think that, by default, SQL statements should be
executed directly without being prepared first, and that the developper
who uses psycopg should to it explicitely if (s)he wants the statement
to be prepared first.

As for your question, I have no idea how PostgreSQL behaves with many
prepared statements and, as a rule of thumb, I’d say that dynamically
generated quereies should not be prepared.

Moreover, as I understand it, it is sometimes more efficient to let
PostgreSQL optimize a request for a given set of parameter values than
to optimize it first and then use it as is for the values.

> I think it could be interesting to have a preparing connection, with
> cursors offering some "prepare" interface as in the article you have
> linked but where the state of the prepared connection is kept by the
> connection. This could be used by a subclass which instead
> automatically prepares every statement: easier to use but not suitable
> for programs generating dynamic queries.
>
> So I think I'd have the following classes:
>
> PreparingConnection: connection subclass keeping the client-side state
> of the statements that were prepared
> PreparingCursor: cursor subclass allowing to manually prepare statements

Sorry for asking, but I don’t understand the point of that
PreparingCursor subclass. How would it be different from “normal”
cursors?

> AutoPreparingCursor: PreparingCursor subclass that automatically
> prepares everything passed to its execute[many]().
>
> The classes should be mixin-able with other cursor subclasses so that
> one could have a beast preparing and returning named tuples and so on.
>
> In the next days/weeks I'll try to make some experiments along this line.
>
> Putting auto-preparing stuff inside a pool would automatically have
> statements prepared at the first usage of each connection/query combo.
> If this is not sufficient and one doesn't want to pay the price of the
> occasional preparation but have all the statements prepared upfront I
> think a specific subclass or wrapper as you did could be a good
> option.

As far as I am concerned, I would not care to pay the extra price of the
occasional preparation the first time a user uses to my webapp.
But I guess that, each time I use execute[many](), it will have to check
whether the statement has already been prepared. Do you have an idea how
expensive this test might be? My point when I designed my classes was to
avoid such systematic overhead.

By the way, I was willing to use a NamedTupleCursor cursor, but the
overhead was to high for me. I have not investigated yet, but I guess it
would be more efficient if the Record namedtuple was defined only once
for a given statement, and then re-used the next times it is used.

Cheers,

--
Nicolas Boullis