Thread: Allow pooled connections to list all prepared queries

Allow pooled connections to list all prepared queries

From
David Brown
Date:
Hi:

Attached is a loadable module (and a small backend patch) that allows a
client to list the available query plans (created with PREPARE, or with a
'parse' protocol message) on a particular connection.

This work was done in response to an item in the TODO:

  * Allow pooled connections to list all prepared queries

    This would allow an application inheriting a pooled connection to know
    the queries prepared in the current session.

I've done the following:

  * Extended PostgreSQL's SQL grammar to capture the original SQL query
  string for PREPARE statements. Previously, the PostgreSQL kernel provided
  access to a prepared query's original SQL, but only for statements
  prepared with a libpq "parse" message - not with a PREPARE statement.

  * Modified backend/commands/prepare.c to keep some additional statistics
  in the prepared statement hash table (plan creation time, execution
  count, etc.)

  * Added an accessor function to allow for "raw" access to the prepared
  statement hash table (necessary for sequential access).

  * Implemented a PostgreSQL function to list the available query plans on
  the current connection. This function, called pg_prepared_query_plans,
  returns a set of tuples, each of which contain a plan name, the SQL query
  string associated with the plan name, the number of times the plan has
  been executed, the plan creation time, and the plan's last access time.

This should provide a way for clients sharing a connection pool to also
share prepared query plans. When a client inherits a connection from the
pool, it can use the results of a 'select * from pg_prepared_query_plans()'
to fill a (sql -> plan_name) hash table. By probing this hash table before
executing a PREPARE, duplicate PREPAREs can be skipped, even if the initial
PREPARE was performed by a different client.

I've attached three files: one is a diff against the backend, the other two
are the loadable module (source + create script).

If anyone is interested, I've also attached a small proof-of-concept patch
for DBD::Pg - it does server-side plan caching as described above (by
leaving the prepared plans on the connection at disconnect, and filling a
hash with the list of prepared plans at connect), and uses a simple LRU
deallocation policy ($ENV{'PLANCACHE_MAX'} is the high watermark, and
$ENV{'PLANCACHE_REAP'} is the number of plans below the high watermark to
target when deallocating - both should be set prior to DBI->connect).

All of this was done while experimenting with plan caching for a database
systems course. I have a more detailed write-up (with some synthetic
benchmarks) if it would be helpful.

HTH,

- Dave


Attachment

Re: Allow pooled connections to list all prepared queries

From
Tom Lane
Date:
David Brown <dave@spoonguard.org> writes:
> Attached is a loadable module (and a small backend patch) that allows a
> client to list the available query plans (created with PREPARE, or with a
> 'parse' protocol message) on a particular connection.

>   * Extended PostgreSQL's SQL grammar to capture the original SQL query
>   string for PREPARE statements.

This seems much the ugliest and most invasive part of the patch.  I'd
suggest doing something similar to what pg_proc.c is doing:

    /* We can get the original query text from the active portal (hack...) */
    Assert(ActivePortal && ActivePortal->status == PORTAL_ACTIVE);
    queryText = ActivePortal->sourceText;

>   * Modified backend/commands/prepare.c to keep some additional statistics
>   in the prepared statement hash table (plan creation time, execution
>   count, etc.)

The usefulness of this seems pretty dubious.  You aren't going to have a
bunch of random bits of code sharing a connection; it's going to be a
single application that probably knows perfectly well exactly which
queries it needs prepared.  So I don't think the stats will pay for
themselves.

>   * Added an accessor function to allow for "raw" access to the prepared
>   statement hash table (necessary for sequential access).

It would have been better to put the function that needs this access
into prepare.c.  There is no point in trying to hide a data structure
inside a module if we then turn around and expose the data structure
to the world...

            regards, tom lane

Re: Allow pooled connections to list all prepared queries

From
Kris Jurka
Date:

On Wed, 22 Dec 2004, David Brown wrote:

>     This would allow an application inheriting a pooled connection to know
>     the queries prepared in the current session.
>

I don't see this as all that helpful for a client interface that does the
preparation itself.  Possibly it could be used for libpq, but you
mentioned DBI which should already know what it has or has not prepared.
The idea of adding a network round trip to detect a prepared statement
seems like a performance loss, not a gain.  If this is just to avoid
repreparing the same statement then perhaps something like PREPARE OR
REPLACE would be more useful.

Kris Jurka

Re: Allow pooled connections to list all prepared queries

From
Christopher Kings-Lynne
Date:
> The usefulness of this seems pretty dubious.  You aren't going to have a
> bunch of random bits of code sharing a connection; it's going to be a
> single application that probably knows perfectly well exactly which
> queries it needs prepared.  So I don't think the stats will pay for
> themselves.

Not true.  Personally, I was looking for something like this, before I
decided to go with stored sql procs instead.

The problem is a website that uses persistent connections.

Say we have a query that takes 350ms to get threads in a forum, and half
of that is planning time.  Any particular PHP process does not know if a
previous process has already prepared that query or not.  So I have to
prepare it every time, deal with errors, etc.

However, I now use sql stored procs as basically 'cached prepared
queries', so it's no longer a problem for me.

Chris

Re: Allow pooled connections to list all prepared queries

From
Christopher Kings-Lynne
Date:
> I don't see this as all that helpful for a client interface that does the
> preparation itself.  Possibly it could be used for libpq, but you
> mentioned DBI which should already know what it has or has not prepared.
> The idea of adding a network round trip to detect a prepared statement
> seems like a performance loss, not a gain.  If this is just to avoid
> repreparing the same statement then perhaps something like PREPARE OR
> REPLACE would be more useful.

Yes, PREPARE OR REPLACE was the other thing I thought would be useful.
However, in my example that still means preparing everytime, wasting the
benefits.

You could have:

PREPARE IF NOT EXISTS

:P

Chris

Re: Allow pooled connections to list all prepared queries

From
Kris Jurka
Date:

On Sat, 25 Dec 2004, David Brown wrote:

> You're right, in the case of standalone Perl or Apache::DBI. However, if
> DBD::Pg happens to grab an already-open connection that doesn't have a
> one-to-one correspondence with a $dbh (e.g. from a Postgres connection pool,
> or from an external pooling server like DBBalancer[1]), the state of the
> connection (with respect to past PREPAREs) isn't known.
>

I'm not clear why the pooling case is different.  Speaking from the JDBC
side (which does connection pooling, but not statement pooling) there is
a proxy connection object which wraps a real connection.  So each new
connection from the pool gets a new proxy object, but the real underlying
connection stays the same.  So the real connection will know what it has
or has not prepared.  So I don't see why, simply because it's in a pool,
that it forgets what's been prepared.

Kris Jurka

Re: Allow pooled connections to list all prepared queries

From
David Brown
Date:
Hi Kris:

On Sat, Dec 25, 2004 at 07:05:07PM -0500, Kris Jurka wrote:
>I don't see this as all that helpful for a client interface that does the
>preparation itself.  Possibly it could be used for libpq, but you
>mentioned DBI which should already know what it has or has not prepared.
>The idea of adding a network round trip to detect a prepared statement
>seems like a performance loss, not a gain.  If this is just to avoid
>repreparing the same statement then perhaps something like PREPARE OR
>REPLACE would be more useful.

You're right, in the case of standalone Perl or Apache::DBI. However, if
DBD::Pg happens to grab an already-open connection that doesn't have a
one-to-one correspondence with a $dbh (e.g. from a Postgres connection pool,
or from an external pooling server like DBBalancer[1]), the state of the
connection (with respect to past PREPAREs) isn't known.

In the case of an external-to-Perl connection pool, We'd make one round trip
to the server to fill in DBD::Pg's list of prepared statements, at
DBD::Pg::db::connect() - not at every prepare (which, as you said, would be
a net loss).

(DBD:::Pg, in fact, ships with server-side prepares totally turned off. I
have some code that fixes that for the SELECT and DELETE cases, but it, like
the rest of this stuff, isn't really release-quality yet.)

[1] http://dbbalancer.sourceforge.net

Thanks,

- Dave


Re: Allow pooled connections to list all prepared queries

From
David Brown
Date:
Hi Tom:

On Fri, Dec 24, 2004 at 11:00:57AM -0500, Tom Lane wrote:
>>   * Extended PostgreSQL's SQL grammar to capture the original SQL query
>>   string for PREPARE statements.
>
>This seems much the ugliest and most invasive part of the patch.  I'd
>suggest doing something similar to what pg_proc.c is doing:

I agree. At the time I did this, I was working against the 7.4.5 release
(which, as far as I can tell, doesn't have ActivePortal defined anywhere).
I'll try your approach on a CVS snapshot.

>>   * Modified backend/commands/prepare.c to keep some additional
>>   statistics in the prepared statement hash table (plan creation time,
>>   execution count, etc.)
>
>The usefulness of this seems pretty dubious.  You aren't going to have a

>>   * Added an accessor function to allow for "raw" access to the prepared
>>   statement hash table (necessary for sequential access).
>
>It would have been better to put the function that needs this access
>into prepare.c.  There is no point in trying to hide a data structure
>inside a module if we then turn around and expose the data structure
>to the world...

I'll split the exec_count/gettimeofday stuff out into a separate patch, and
find a way to eliminate FetchPreparedStatementHtab().

Thanks,

- Dave


Re: Allow pooled connections to list all prepared queries

From
David Brown
Date:
On Mon, Dec 27, 2004 at 01:03:08PM -0500, Kris Jurka wrote:
>has not prepared.  So I don't see why, simply because it's in a pool, that
>it forgets what's been prepared.

The connection (the open socket to Postgres) doesn't forget anything.  If
you have multiple machines sharing a connection pool, though, there may not
be any guarantee that the same client-side connection object (in your
example, the proxy object) will always be associated with the same
underlying connection (the open socket to Postgres).

Multiple machines connecting through a single connection pooling server need
some way to communicate the state of a particular connection between
themselves ("state", in this case, being the list of prepared queries).
This information could be tracked by the connection pool, it could be
tracked by the clients and shared through some back channel, or it could
just be exposed via SQL (since Postgres already has it in a convenient
format).

The patch provides a way to inspect the list of prepared queries on a
connection, via SQL. It'd be up to the client to avoid this round-trip when
it isn't necessary (e.g. the single-server or single-process cases you've
mentioned, where keeping and sharing the list of prepared statements is
cheap).

At a bare minimum, it'd at least be convenient for humans to be able to
connect to a pool and look at what's been prepared. There was no way to do
this before.

- Dave


Re: Allow pooled connections to list all prepared queries

From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Christopher Kings-Lynne wrote:
> Say we have a query that takes 350ms to get threads in a forum, and half
> of that is planning time.  Any particular PHP process does not know if a
> previous process has already prepared that query or not.  So I have to
> prepare it every time, deal with errors, etc.

With DBI, this is an ideal case for $dbh->prepare_cached(), which avoids
all the network trips altogether. It pretty much does this:

my %sth;
my $SQL = "SELECT door_number_three()";
if (!exists $sth{$SQL}) {
        $sth{$SQL} = $dbh->prepare($SQL);
}
$sth{$SQL}->execute();

PHP should have something similar.

A far better strategy, and one I often use myself for persistent connections
(e.g. mod_perl) is to prepare all the common queries once in the BEGIN block,
and then use prepare_cached for the lesser used ones.


David Brown wrote:
> DBD:::Pg, in fact, ships with server-side prepares totally turned off. I
> have some code that fixes that for the SELECT and DELETE cases, but it, like
> the rest of this stuff, isn't really release-quality yet

Actually, DBD::Pg "ships" with no server-side support at all. The next version
(1.40, about to be released) does have full server-side support, and when
used with prepare_cached above, should solve all of the problems mentioned
in this thread (for Perl people anyway!), with the exception of being able to
see what has already been prepared. That, however, is something application
writers should not have to worry about, which is why DBI and DBD::Pg will do
all the creation, naming, tracking, and deletion of prepared statements for
you behind the scenes.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200501020913
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8

-----BEGIN PGP SIGNATURE-----

iD8DBQFB2AMFvJuQZxSWSsgRAjcjAJ9ncSAa2N77iY9xOpyE50sbeuqzjACguKBe
HGWkn+Vv/pKKdyFiipDpdGs=
=FOAg
-----END PGP SIGNATURE-----