Thread: Psycopg and prepared SQL statements
Hi, Lately, I’ve been designing and writing a WSGI webapp for which performance matters. This webapp queries a PostgreSQL database. As performance matters, I am using a ThreadedConnectionPool from psycopg2.pool, but I also want to have my SQL statements prepared. I read http://initd.org/psycopg/articles/2012/10/01/prepared-statements-psycopg/ but I could not find a way to mix PreparingCursor with the connection pool. Each statement should definitely be prepared only once for each connection, and I could not find a way to attach the prepared cursors to the connections in the pool. So i designed a class that wraps SQL statements, and a class whose instances are connection factories that prepare the needed statements. Here is the result of my work, wit a simple example. All comments are welcome. I think it woul be nice if psycopg could offer a way to use prepared statements. Cheers, -- Nicolas Boullis
Attachment
On May 16, 2014, at 2:22 PM, Nicolas Boullis <postgresql@ilcode.fr> wrote: > As performance matters, I am using a ThreadedConnectionPool from > psycopg2.pool, but I also want to have my SQL statements prepared. Why? This an honest question. In general, prepared statements do not provide significant performance improvements in PostgreSQL. -- -- Christophe Pettus xof@thebuild.com
On Fri, May 16, 2014 at 02:25:14PM +0200, Christophe Pettus wrote: > > > On May 16, 2014, at 2:22 PM, Nicolas Boullis <postgresql@ilcode.fr> wrote: > > > As performance matters, I am using a ThreadedConnectionPool from > > psycopg2.pool, but I also want to have my SQL statements prepared. > > Why? > > This an honest question. In general, prepared statements do not provide significant performance improvements in PostgreSQL. Well, I benchmarked my webapp without prepared statements, and then with. Without prepared statements, I could handle around 70 requests/second. With prepared statements, I could handle around 140 requests/second. (And without the connection pool, I could handle around 35 requests/second.) In my opinion, that’s a good enough reason for me to use prepared statements. Cheers, -- Nicolas Boullis
On Fri, May 16, 2014 at 1:22 PM, Nicolas Boullis <postgresql@ilcode.fr> wrote: > Hi, > > Lately, I’ve been designing and writing a WSGI webapp for which > performance matters. This webapp queries a PostgreSQL database. > > As performance matters, I am using a ThreadedConnectionPool from > psycopg2.pool, but I also want to have my SQL statements prepared. > > I read > http://initd.org/psycopg/articles/2012/10/01/prepared-statements-psycopg/ > but I could not find a way to mix PreparingCursor with the connection > pool. > Each statement should definitely be prepared only once for each > connection, and I could not find a way to attach the prepared cursors to > the connections in the pool. > > So i designed a class that wraps SQL statements, and a class whose > instances are connection factories that prepare the needed statements. > > Here is the result of my work, wit a simple example. > > All comments are welcome. I think it woul be nice if psycopg could offer > a way to use prepared statements. Hello Nicolas, as stated in the article that was a first idea. One of my concerns is exactly the fact that the statement preparation is a connection's property and giving the prepared state to the cursor makes it more "precious" than what you'd like. I'd be happier to have statements prepared inside the connection too (although the interface to require the preparation may still be on the cursor). This begs the question: what happens with too many prepared statements? This is not a problem for a program that has no dynamically generated queries, but for one that does it would lead to unbound use of resources. I think it could be interesting to have a preparing connection, with cursors offering some "prepare" interface as in the article you have linked but where the state of the prepared connection is kept by the connection. This could be used by a subclass which instead automatically prepares every statement: easier to use but not suitable for programs generating dynamic queries. So I think I'd have the following classes: PreparingConnection: connection subclass keeping the client-side state of the statements that were prepared PreparingCursor: cursor subclass allowing to manually prepare statements AutoPreparingCursor: PreparingCursor subclass that automatically prepares everything passed to its execute[many](). The classes should be mixin-able with other cursor subclasses so that one could have a beast preparing and returning named tuples and so on. In the next days/weeks I'll try to make some experiments along this line. Putting auto-preparing stuff inside a pool would automatically have statements prepared at the first usage of each connection/query combo. If this is not sufficient and one doesn't want to pay the price of the occasional preparation but have all the statements prepared upfront I think a specific subclass or wrapper as you did could be a good option. -- Daniele
Hello Daniele, Thanks for your answer. On Fri, May 16, 2014 at 09:59:45PM +0100, Daniele Varrazzo wrote: > > as stated in the article that was a first idea. One of my concerns is > exactly the fact that the statement preparation is a connection's > property and giving the prepared state to the cursor makes it more > "precious" than what you'd like. > > I'd be happier to have statements prepared inside the connection too > (although the interface to require the preparation may still be on the > cursor). This begs the question: what happens with too many prepared > statements? This is not a problem for a program that has no > dynamically generated queries, but for one that does it would lead to > unbound use of resources. Sorry if my patch made you think I thought all SQL statements should be prepared, but I think that, by default, SQL statements should be executed directly without being prepared first, and that the developper who uses psycopg should to it explicitely if (s)he wants the statement to be prepared first. As for your question, I have no idea how PostgreSQL behaves with many prepared statements and, as a rule of thumb, I’d say that dynamically generated quereies should not be prepared. Moreover, as I understand it, it is sometimes more efficient to let PostgreSQL optimize a request for a given set of parameter values than to optimize it first and then use it as is for the values. > I think it could be interesting to have a preparing connection, with > cursors offering some "prepare" interface as in the article you have > linked but where the state of the prepared connection is kept by the > connection. This could be used by a subclass which instead > automatically prepares every statement: easier to use but not suitable > for programs generating dynamic queries. > > So I think I'd have the following classes: > > PreparingConnection: connection subclass keeping the client-side state > of the statements that were prepared > PreparingCursor: cursor subclass allowing to manually prepare statements Sorry for asking, but I don’t understand the point of that PreparingCursor subclass. How would it be different from “normal” cursors? > AutoPreparingCursor: PreparingCursor subclass that automatically > prepares everything passed to its execute[many](). > > The classes should be mixin-able with other cursor subclasses so that > one could have a beast preparing and returning named tuples and so on. > > In the next days/weeks I'll try to make some experiments along this line. > > Putting auto-preparing stuff inside a pool would automatically have > statements prepared at the first usage of each connection/query combo. > If this is not sufficient and one doesn't want to pay the price of the > occasional preparation but have all the statements prepared upfront I > think a specific subclass or wrapper as you did could be a good > option. As far as I am concerned, I would not care to pay the extra price of the occasional preparation the first time a user uses to my webapp. But I guess that, each time I use execute[many](), it will have to check whether the statement has already been prepared. Do you have an idea how expensive this test might be? My point when I designed my classes was to avoid such systematic overhead. By the way, I was willing to use a NamedTupleCursor cursor, but the overhead was to high for me. I have not investigated yet, but I guess it would be more efficient if the Record namedtuple was defined only once for a given statement, and then re-used the next times it is used. Cheers, -- Nicolas Boullis