Thread: Prepared Statements vs. pgbouncer

Prepared Statements vs. pgbouncer

From
Paul Lindner
Date:
Hi,

I've been trying to implement pgbouncer in my environment and have
come up against a few issues with the JDBC driver.

The symptoms surface as an error redefining a the statement S_1.

  "S_1" already exists

I read the documentation and set prepareThreshhold=0, and the problem
decreased but did not go away.  Digging deeper I believe I've found
out why this occurs.  The problem only occurs when JDBC statements use
a non-zero fetchSize.  This results in the driver creating a named
statement.

Specifically the problem occurs when clients use a recycled backend.
Here's the scenario:

  * Client A creates statement S_1, fetches results.
    When finished the statement is put in a cleanup queue.
  * time passes
  * pgbouncer notices that client A is idle and reassigns backend to
    client B
  * client B tries to create statement S_1, but fails. (S_1 is already
    defined)
  ...

So the combination of lazy cleanup and the assumption of a singular
client->server backend is causing this problem.

I can see two solutions:

  1) Forcing cleanup of prepared statements when exiting a transaction
     block.  (Seems like this would be hard...)

  2) Allowing the client to specify a unique prefix for the statement
     handles.  (This seems easy, but could result in a leak of named
     statements in the scenario above)

I'd be fine with #2, since we can configure pgbouncer to recycle
backend connections to protect us against prepared statement leaks.

I'd be happy to work to make this possible.

Thanks!

--
Paul Lindner        ||||| | | | |  |  |  |   |   |
lindner@inuus.com

Attachment

Re: Prepared Statements vs. pgbouncer

From
Oliver Jowett
Date:
Paul Lindner wrote:
> the assumption of a singular
> client->server backend is causing this problem.

Well, yes, that's fairly fundamental to the client-server protocol. It
sounds like pgbouncer is breaking the protocol.

You could perhaps do transparent statement/portal renaming in pgbouncer
but that seems to be only scratching the surface of the problem.
Consider your case here:

>   * Client A creates statement S_1, fetches results.
>     When finished the statement is put in a cleanup queue.
>   * time passes
>   * pgbouncer notices that client A is idle and reassigns backend to
>     client B

Ok, now client A wakes up and wants to reuse S_1. What happens? The
backend that knows about S_1 is currently in use by B.

Changing the JDBC driver to handle a "server" that doesn't follow the
documented server protocol seems a bit backwards. I think you're going
to have to teach pgbouncer a whole lot more about statements and portals
to get this one working.

-O

Re: Prepared Statements vs. pgbouncer

From
Paul Lindner
Date:
On Sat, Sep 29, 2007 at 01:05:54PM +1200, Oliver Jowett wrote:
> Paul Lindner wrote:
> >the assumption of a singular
> >client->server backend is causing this problem.
>
> Well, yes, that's fairly fundamental to the client-server protocol. It
> sounds like pgbouncer is breaking the protocol.

That's a given.  Given the messages I've seen in the archives this
affects pgpool as well.  The advice offered was to downgrade jdbcv2.
This, of course only works because the v2 code does not use
server-side prepared statements.

> You could perhaps do transparent statement/portal renaming in pgbouncer
> but that seems to be only scratching the surface of the problem.
> Consider your case here:
>
> >  * Client A creates statement S_1, fetches results.
> >    When finished the statement is put in a cleanup queue.
> >  * time passes
> >  * pgbouncer notices that client A is idle and reassigns backend to
> >    client B
>
> Ok, now client A wakes up and wants to reuse S_1. What happens? The
> backend that knows about S_1 is currently in use by B.

We're currently using Transaction pooling with pgbouncer.  With
prepareThreshhold set to 0, the only cases where server-side
statements are used are with cursors. These occur inside transactions
on our end.

It seems that the driver should immediately deallocate the server-side
prepared statements it creates for cursors when it finishs fetching
data for that cursor.  This would be much better than waiting for the
garbage collector to issue the deallocation.

> Changing the JDBC driver to handle a "server" that doesn't follow the
> documented server protocol seems a bit backwards. I think you're going
> to have to teach pgbouncer a whole lot more about statements and portals
> to get this one working.

Well..  I only use pgBouncer to solve my particular issues.

Digging a little deeper I see that Postgres 8.3 will have DEALLOCATE
ALL and DISCARD ALL commands that can be used in pgbouncers health check.

Still, it would be nice if the end-user could have some control over
how the JDBC driver uses prepared statements other than
prepareThreshhold

Thanks!

--
Paul Lindner        ||||| | | | |  |  |  |   |   |
lindner@inuus.com

Attachment

Re: Prepared Statements vs. pgbouncer

From
Tom Lane
Date:
Paul Lindner <lindner@inuus.com> writes:
> It seems that the driver should immediately deallocate the server-side
> prepared statements it creates for cursors when it finishs fetching
> data for that cursor.

What exactly is the argument here?  That no client should use anything
more than the fraction of the FE/BE protocol that pgbouncer currently
supports?  Pardon me for not buying into it.

            regards, tom lane

Re: Prepared Statements vs. pgbouncer

From
Oliver Jowett
Date:
Paul Lindner wrote:

> We're currently using Transaction pooling with pgbouncer.  With
> prepareThreshhold set to 0, the only cases where server-side
> statements are used are with cursors. These occur inside transactions
> on our end.

The driver uses server-side statements for frequently-used internal
utility commands such as BEGIN too, and the lifetime of those statements
is essentially "the whole connection", not a particular transaction.

> It seems that the driver should immediately deallocate the server-side
> prepared statements it creates for cursors when it finishs fetching
> data for that cursor.  This would be much better than waiting for the
> garbage collector to issue the deallocation.

We only rely on GC for statement deallocation if you don't explicitly
close your Statements. When you explicitly close them the server-side
statement is immediately enqueued on the reference queue, and will be
deallocated automatically when the next query is executed on the same
connection. If you just discard the statement, the enqueue is driven by GC.

There's no real reason for the driver to aggressively deallocate
statements merely because they were created so a portal could be used.
That sort of query is just as likely to be reused as any other.

>> Changing the JDBC driver to handle a "server" that doesn't follow the
>> documented server protocol seems a bit backwards. I think you're going
>> to have to teach pgbouncer a whole lot more about statements and portals
>> to get this one working.
>
> Well..  I only use pgBouncer to solve my particular issues.

And now you have two problems ;-)

> Digging a little deeper I see that Postgres 8.3 will have DEALLOCATE
> ALL and DISCARD ALL commands that can be used in pgbouncers health check.

That's going to break things even further since now the driver will have
statements that it thinks the server has prepared that will fail when
used because pgbouncer has decided it should go and deallocate them all!

-O

Re: Prepared Statements vs. pgbouncer

From
Tom Lane
Date:
Oliver Jowett <oliver@opencloud.com> writes:
> The driver uses server-side statements for frequently-used internal
> utility commands such as BEGIN too, and the lifetime of those statements
> is essentially "the whole connection", not a particular transaction.

I'm kinda hijacking the thread here, because this question is unrelated
to pgbouncer's behavior, but: have you ever done any performance
measurement to prove that preparing BEGIN/COMMIT/ROLLBACK is a good idea?
AFAICS there is only trivial parsing work to be saved, and no planning
work, and yet the overhead of storing and referencing a prepared
statement remains.  My gut feeling is that this is at best a wash and
could easily be a loss, particularly as of 8.3 which will have more
overhead to maintain prepared statements.

            regards, tom lane

Re: Prepared Statements vs. pgbouncer

From
Oliver Jowett
Date:
Tom Lane wrote:

> I'm kinda hijacking the thread here, because this question is unrelated
> to pgbouncer's behavior, but: have you ever done any performance
> measurement to prove that preparing BEGIN/COMMIT/ROLLBACK is a good idea?

No benchmarks. There would be essentially no difference in the driver
code either way (literally, you'd have to add code to avoid preparing
the statement) so unless named statements are much more expensive than
unnamed ones it's probably not worth worrying about. We do avoid
processing an extra network message each way (Parse/ParseComplete) with
the current code.

-O

Re: Prepared Statements vs. pgbouncer

From
Tom Lane
Date:
Oliver Jowett <oliver@opencloud.com> writes:
> Tom Lane wrote:
>> I'm kinda hijacking the thread here, because this question is unrelated
>> to pgbouncer's behavior, but: have you ever done any performance
>> measurement to prove that preparing BEGIN/COMMIT/ROLLBACK is a good idea?

> No benchmarks. There would be essentially no difference in the driver
> code either way (literally, you'd have to add code to avoid preparing
> the statement) so unless named statements are much more expensive than
> unnamed ones it's probably not worth worrying about.

I don't have a reason to think it's a big loss --- I was just worried
that you were going out of your way to make this happen, when it's
probably not a win either.

            regards, tom lane

Re: Prepared Statements vs. pgbouncer

From
Paul Lindner
Date:
I appear to have stirred the pot a little too vigorously..  Let's take
a deep breath and take a step back..

First off, I really appreciate the hard work that's gone into the
design and implementation of Postgres and the JDBC driver.  I realize
that what I'm trying to do falls outside of the norms -- hopefully the
following background information will help everyone understand what
I'm trying to achieve:

The environment:

 * 100s of application servers using Torque and DBCP
 * Dozens of databases.
 * All app servers can connect to all databases.
 * Each application server may need many connections to an individual
   database.
 * App code as written will spawn multiple concurrent SELECTs to
   a single database to speed up queries on partitioned tables.

Okay..  So given those parameters we've been able to tune the system
to use about 2000 maxconns on the DBs, and a max of 8 connections from
each application server.

In spite of that the majority of connections are idle.  So we kill off
idle backends with cron..  Not ideal, but it's worked so far..

Okay... now let's double or triple the number of application servers..
That means either:
  1. Going to 4k or 6k backend maxconns.
  2. Halving or thirding the number of simultaneous conns for each app server.
  3. Use pgbouncer to allow 6k connections while actually
     lowering the number of DB backends, plus giving us some very cool
     maintenance features like redirecting connections to other hosts
     and more.

So we we're trying to implement #3.

If others have better ideas I'm all ears.

Our pgbouncer config will keep a connection on the same backend
for the duration of an individual transaction.

The only thing holding us back from deploying the pgbouncer solution
is this issue with the server-side prepared statements.

Possible solutions:

 * Use protocolVersion=2, since 7.3 does not support server side prepare..
 * Modify jdbc driver to use unique prefixes for server-side
   prepared statements.  Build my own jar and deploy.
 * Request help to solve this the 'correct' way.


I realize that this environment is not so common.  All I ask is help
in making it possible.

FWIW it seems that Oracle has something similar in 11g named DRCP:

  http://www.oracle.com/technology/tech/oci/pdf/oracledrcp11g.pdf

So maybe it's not such an uncommon case after all...

Thanks in advance for any assistence, pointers, tips on this matter.


On Fri, Sep 28, 2007 at 10:31:36PM -0400, Tom Lane wrote:
> Paul Lindner <lindner@inuus.com> writes:
> > It seems that the driver should immediately deallocate the server-side
> > prepared statements it creates for cursors when it finishs fetching
> > data for that cursor.
>
> What exactly is the argument here?  That no client should use anything
> more than the fraction of the FE/BE protocol that pgbouncer currently
> supports?  Pardon me for not buying into it.
>
>             regards, tom lane

--
Paul Lindner        ||||| | | | |  |  |  |   |   |
lindner@inuus.com

Attachment

Re: Prepared Statements vs. pgbouncer

From
Oliver Jowett
Date:
Paul Lindner wrote:

>  * 100s of application servers using Torque and DBCP
>  * Dozens of databases.
>  * All app servers can connect to all databases.
>  * Each application server may need many connections to an individual
>    database.
>  * App code as written will spawn multiple concurrent SELECTs to
>    a single database to speed up queries on partitioned tables.

I'm assuming there is some reason why you can't run the same apps on
fewer appservers and share the pools. The basic problem here is that you
can't get a good global view of which connections are idle because you
have a lot of separate appservers each doing their own thing.

>   3. Use pgbouncer to allow 6k connections while actually
>      lowering the number of DB backends, plus giving us some very cool
>      maintenance features like redirecting connections to other hosts
>      and more.
>
> So we we're trying to implement #3.

That is a reasonable approach to take, that's essentially giving you a
global connection pool. However..

> The only thing holding us back from deploying the pgbouncer solution
> is this issue with the server-side prepared statements.

This is a really an issue with pgbouncer's incomplete implementation of
the protocol.

If you were going to change the driver the best place to do it is in the
V3 protocol code itself -- tell it not to use named statements at all.
The higher layers in the driver are working to a different API that
doesn't know anything about named statements at all, it just provides
hints about whether a query is likely to be re-used or not. So trying to
repair particular instances of that so they don't used named statements
is going to be fairly error-prone.

I don't see that sort of change going into the official driver though ..
as Tom says, why would we deliberately cripple use of the protocol
because a 3rd party piece of software can't handle the full protocol? So
I think you are going to be stuck with either deploying a customized
JDBC driver for your particular environment, or fixing pgbouncer so that
it properly implements the protocol.

-O

Re: Prepared Statements vs. pgbouncer

From
Gregory Stark
Date:
"Paul Lindner" <lindner@inuus.com> writes:

>   * pgbouncer notices that client A is idle and reassigns backend to
>     client B

What do you mean by "notices"?

Prepared statements are only one form of state which can persist beyond a
transaction end. I don't think you can reassign connections unless you get
some sort of explicit notice that the client is done with any state it has set
up. Either the driver supports noticing such a state because there are no
active references to its handle or the client issues a statement like RESET
ALL or something equivalent.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: Prepared Statements vs. pgbouncer

From
Josh Berkus
Date:
Paul Lindner wrote:
> I appear to have stirred the pot a little too vigorously..  Let's take
> a deep breath and take a step back..
>
> First off, I really appreciate the hard work that's gone into the
> design and implementation of Postgres and the JDBC driver.  I realize
> that what I'm trying to do falls outside of the norms -- hopefully the
> following background information will help everyone understand what
> I'm trying to achieve:

Actually, I wouldn't say it "goes outside the norm".  While the majority
of JDBC users will use J2EE connection pooling rather than pgBouncer, as
more and more users want to scale up to 1 million connections (yes,
really, we even put in a bid for a customer with this spec) J2EE pooling
won't be enough ... you'll need both.

So we're going to have to troubleshoot this sooner or later.

Actually, in that kind of an application, I don't see the theoretical
issue with S_1 being reused by different client connections.  In an
ideal world, this would give us de-facto shared prepared plans.  Or am I
misunderstanding the issue?

Also, should I understand that there now is no way in pgsql-jdbc to turn
prepared plans off, even if you want to?

--Jsoh


Re: Prepared Statements vs. pgbouncer

From
Paul Lindner
Date:
On Sat, Sep 29, 2007 at 08:43:33PM +1200, Oliver Jowett wrote:
> Paul Lindner wrote:
>
> > * 100s of application servers using Torque and DBCP
> > * Dozens of databases.
> > * All app servers can connect to all databases.
> > * Each application server may need many connections to an individual
> >   database.
> > * App code as written will spawn multiple concurrent SELECTs to
> >   a single database to speed up queries on partitioned tables.
>
> I'm assuming there is some reason why you can't run the same apps on
> fewer appservers and share the pools. The basic problem here is that you
> can't get a good global view of which connections are idle because you
> have a lot of separate appservers each doing their own thing.

Partially, yes.  DBCP and Torque are supposed to take care of this,
however the idle connection reaper is buggy/doesn't work correctly for
us.  This results in app servers quickly going to their max pool size.

What other client-side connection pools are people using?  Is Sequoia
an option?  I have not tried it as of yet.

> >  3. Use pgbouncer to allow 6k connections while actually
> >     lowering the number of DB backends, plus giving us some very cool
> >     maintenance features like redirecting connections to other hosts
> >     and more.
> >
> >So we we're trying to implement #3.
>
> That is a reasonable approach to take, that's essentially giving you a
> global connection pool. However..
>
> >The only thing holding us back from deploying the pgbouncer solution
> >is this issue with the server-side prepared statements.
>
> This is a really an issue with pgbouncer's incomplete implementation of
> the protocol.

Okay, can we get the JDBC people talking with the pgbouncer people
(and the pgpool, pgcluster people too...)  I've Cc'd Marko and hope to
get him into this conversation, especially since the suggested fix of
using DEALLOCATE ALL, DISCARD ALL will probably not work with the JDBC
driver as is.

Note that each of those products has the exact same issues:

http://pgfoundry.org/pipermail/pgbouncer-general/2007-June/000004.html
http://archives.postgresql.org/pgsql-jdbc/2007-02/msg00132.php
http://pgfoundry.org/pipermail/pgcluster-general/2006-October/001070.html
http://pgfoundry.org/pipermail/pgpool-general/2006-January/000275.html

We really need to have some way of maintaining server/client
coherency...

Also, it would be in all of our best interests to solve this problem.
As it stands we could never use a load balancer of any sort to
maintain a highly reliable pool of read-only replicas.

> If you were going to change the driver the best place to do it is in the
> V3 protocol code itself -- tell it not to use named statements at all.
> The higher layers in the driver are working to a different API that
> doesn't know anything about named statements at all, it just provides
> hints about whether a query is likely to be re-used or not. So trying to
> repair particular instances of that so they don't used named statements
> is going to be fairly error-prone.
>
> I don't see that sort of change going into the official driver though ..
> as Tom says, why would we deliberately cripple use of the protocol
> because a 3rd party piece of software can't handle the full protocol? So
> I think you are going to be stuck with either deploying a customized
> JDBC driver for your particular environment, or fixing pgbouncer so that
> it properly implements the protocol.

I never said I wanted the driver crippled.  I'm just looking for a way
to make this scenario work for me.  You already provide
prepareThreshold=X for people that don't want automatic server-side
prepared statements.

How about I code up a patch that adds a new parameter
preparePrefix=XXX to DSN, and add a global utility method to set the
same?  Would you accept such a patch?

--
Paul Lindner        ||||| | | | |  |  |  |   |   |
lindner@inuus.com

Attachment

Re: Prepared Statements vs. pgbouncer

From
Paul Lindner
Date:
On Sat, Sep 29, 2007 at 10:36:33AM +0100, Gregory Stark wrote:
>
> "Paul Lindner" <lindner@inuus.com> writes:
>
> >   * pgbouncer notices that client A is idle and reassigns backend to
> >     client B
>
> What do you mean by "notices"?

Okay, I wasn't being clear.  In pgbouncer at the END of a commit the
backend will be put into the idle pool.  Please read the following:

https://developer.skype.com/SkypeGarage/DbProjects/PgBouncer

> Prepared statements are only one form of state which can persist beyond a
> transaction end. I don't think you can reassign connections unless you get
> some sort of explicit notice that the client is done with any state it has set
> up. Either the driver supports noticing such a state because there are no
> active references to its handle or the client issues a statement like RESET
> ALL or something equivalent.

Okay.

How do we fix this?  Short term?  Long term?

For this specific case a long term fix might involve
transaction-scoped prepared statements.  Of course that would require
adding this feature on many levels.

Should middleware products track all prepared statements and re-send
those to each backend?  What happens when you have collisions between
names?

Should auto-generated prepared statements use a common hashing method
to insure that we don't recreate the same prepared statement over and
over?

--
Paul Lindner        ||||| | | | |  |  |  |   |   |
lindner@inuus.com

Attachment

Re: Prepared Statements vs. pgbouncer

From
Oliver Jowett
Date:
Paul Lindner wrote:

> How do we fix this?  Short term?  Long term?

Build a connection pooling thing that does what you want it to do and
implements the protocol *completely*.

For example the whole named statement problem goes away if the bit of
software doing the pooling keeps track of

(a) which statements were prepared under which names on the "client"
side connections (including query text & OIDs)
(b) which statements were prepared under which names on the "server" side
(c) relevant transaction and session state on both sides

Note that the names don't necessarily match up, the pool can merrily
assign its own names on the server side.

Then it doesn't matter at all what names the clients use, it's
irrelevant, they're still scoped within the connection they originated
from as the protocol expects. Internally the pool then maps them to a
new statement name on whichever real server connection it decides to
push queries to. If the statement hasn't been prepared on that
connection yet, that's fine, you just re-prepare it under a new name
from the data you have stored.

And you get a "shared" prepared statement cache per server connection
for free. You can also implement whatever logic you want for managing
this cache, garbage collection and timing out idle statements, etc, as
you see fit.

This is basically what I meant by fixing pgbouncer. No, it's not trivial
to do, but there's no technical reason why it can't be done, you'll just
need to throw development time at it.

You can also run this with a heterogenous client environment and not
have to worry about clients following some particular subset of the
protocol or cooperating over statement names and the like.

-O

Re: Prepared Statements vs. pgbouncer

From
Oliver Jowett
Date:
Josh Berkus wrote:

> Actually, in that kind of an application, I don't see the theoretical
> issue with S_1 being reused by different client connections.  In an
> ideal world, this would give us de-facto shared prepared plans.  Or am I
> misunderstanding the issue?

S_1 from client 1 might be a completely different query to S_1 from
client 2. The JDBC driver just numbers statements sequentially as they
are used.

> Also, should I understand that there now is no way in pgsql-jdbc to turn
> prepared plans off, even if you want to?

Right, and there never has been something like that since the V3 code
was first added back in the (7.4? 8.0?) driver. Named statements are
part of the V3 protocol, the driver expects the full protocol to be
available. As I said if you wanted you could do driver modifications to
tell the V3 protocol layer never to use named statements if you really
wanted to, but if you're talking to something that implements the full
V3 protocol (like, um, a postgresql backend..) there's no need. There
are already knobs which let you tune whether user-generated queries use
named statements or not, mostly because there are potential performance
differences (unnamed statements can benefit from knowing the actual
parameter values). For internal driver-generated queries that has not
been an issue in the past so there's no tuning knob for them.

-O

Re: Prepared Statements vs. pgbouncer

From
Paul Lindner
Date:
Your proposal below is interesting, but is a much larger scale problem
than I want to deal with.  It also requires that the middleware layer
to do deep packet inspection, which is suboptimal from a latency and
performance standpoint.

Anyway, I've been thinking about this for a little while and think the
following changes would not cripple the driver, but would provide
correct behavior in the face of a server that could potentially change
backends:

1) Use hashing to choose a prepared statement name

   If we take the hash of the prepared statement text and prefix with
   S_ we can be assured of using the same unique prepared statement
   name across all application servers.

   (And yes, I know that hashes are not perfect and collisions can
    occur.  Highly unlikely if we choose a good hash)

2) If we try to prepare a statement with an hashed name and it already
   exists then we ignore the error and continue.

3) If we receive an error while executing a prepared statement with an
   hashed name the driver will try to re-prepare the statement and
   re-execute the query.  If an error occurs after this retry step then
   error are surfaced to the caller.


If no-one has a more workable solution I'll probably go ahead and
implement the modifications I've listed above.

Of course I'd be happy to publish the patch and maintain the fork for
anyone else that might want to use middleware software with their Java
clients.


-- Paul

On Sun, Sep 30, 2007 at 11:50:10AM +1300, Oliver Jowett wrote:
> Paul Lindner wrote:
>
> >How do we fix this?  Short term?  Long term?
>
> Build a connection pooling thing that does what you want it to do and
> implements the protocol *completely*.
>
> For example the whole named statement problem goes away if the bit of
> software doing the pooling keeps track of
>
> (a) which statements were prepared under which names on the "client"
> side connections (including query text & OIDs)
> (b) which statements were prepared under which names on the "server" side
> (c) relevant transaction and session state on both sides
>
> Note that the names don't necessarily match up, the pool can merrily
> assign its own names on the server side.
>
> Then it doesn't matter at all what names the clients use, it's
> irrelevant, they're still scoped within the connection they originated
> from as the protocol expects. Internally the pool then maps them to a
> new statement name on whichever real server connection it decides to
> push queries to. If the statement hasn't been prepared on that
> connection yet, that's fine, you just re-prepare it under a new name
> from the data you have stored.
>
> And you get a "shared" prepared statement cache per server connection
> for free. You can also implement whatever logic you want for managing
> this cache, garbage collection and timing out idle statements, etc, as
> you see fit.
>
> This is basically what I meant by fixing pgbouncer. No, it's not trivial
> to do, but there's no technical reason why it can't be done, you'll just
> need to throw development time at it.
>
> You can also run this with a heterogenous client environment and not
> have to worry about clients following some particular subset of the
> protocol or cooperating over statement names and the like.
>
> -O

--
Paul Lindner        ||||| | | | |  |  |  |   |   |
lindner@inuus.com

Attachment

Re: Prepared Statements vs. pgbouncer

From
Oliver Jowett
Date:
Paul Lindner wrote:

> 2) If we try to prepare a statement with an hashed name and it already
>    exists then we ignore the error and continue.

Errors will cause the current transaction to fail..

> 3) If we receive an error while executing a prepared statement with an
>    hashed name the driver will try to re-prepare the statement and
>    re-execute the query.  If an error occurs after this retry step then
>    error are surfaced to the caller.

... so you can't just back off and try again.

-O

Re: Prepared Statements vs. pgbouncer

From
Josh Berkus
Date:
Oliver,

> S_1 from client 1 might be a completely different query to S_1 from
> client 2. The JDBC driver just numbers statements sequentially as they
> are used.

So, how does j2EE-side connection pooling handle this?

--Josh


Re: Prepared Statements vs. pgbouncer

From
Oliver Jowett
Date:
Josh Berkus wrote:
> Oliver,
>
>> S_1 from client 1 might be a completely different query to S_1 from
>> client 2. The JDBC driver just numbers statements sequentially as they
>> are used.
>
> So, how does j2EE-side connection pooling handle this?

Err.. it is dealing with JDBC connections, not protocol level stuff, so
there's no issue with statement naming. The driver just does what it
normally does (maintains a mapping of query to statement name on each
connection). The usual J2EE connection pool model is different to what
pgbouncer apparently does as clients explicitly return connections to
the pool when no longer needed -- which effectively invalidates any
PreparedStatement objects they might still be holding -- and reobtain
them when later need.

-O

Re: Prepared Statements vs. pgbouncer

From
Paul Lindner
Date:
On Sun, Sep 30, 2007 at 08:11:04PM +1300, Oliver Jowett wrote:
> Paul Lindner wrote:
>
> >2) If we try to prepare a statement with an hashed name and it already
> >   exists then we ignore the error and continue.
>
> Errors will cause the current transaction to fail..

Looking through the backend code and the protocol flow documenation it
appears we could just close the statement name and it won't cause an
error:

   switch (close_type) {
              case 'S':
                if (close_target[0] != '\0')
                   DropPreparedStatement(close_target, false);
                else

where

   void DropPreparedStatement(const char *stmt_name, bool showError)

So amend number 2 to say:

  2) Before preparing a statement with a hashed name, send a close
     statement to insure that we don't get an error inside of a
     transaction.

This is just as performant as the current driver which will happily
re-prepare the same SQL many times.

Sadly it appears that the protocol does not allow for arbitrary
optional data to be sent along with the query.

For example, it would be fairly easy to modify gram.y to support IF
EXISTS or OR REPLACE -- getting that into the protocol looks to be a
bit more difficult.

(FYI if anyone is interested in the following syntax let me know, If
there's interest I could spend some of my free time hacking the
backend)

  DEALLOCATE [IF EXISTS] <plan_name>;
  PREPARE [OR REPLACE] <plan_name> [(args, ...)] AS <query>


> >3) If we receive an error while executing a prepared statement with an
> >   hashed name the driver will try to re-prepare the statement and
> >   re-execute the query.  If an error occurs after this retry step then
> >   error are surfaced to the caller.
>
> ... so you can't just back off and try again.

Actually #3 isn't needed anymore if the deallocate/re-prepare steps
are carried out as mentioned above.

For me it still appears worth the effort to modify the driver to
support what I've outlined.  I do hope that you'll find any work done
useful for general consumption.

--
Paul Lindner        ||||| | | | |  |  |  |   |   |
lindner@inuus.com

Attachment

Re: Prepared Statements vs. pgbouncer

From
Oliver Jowett
Date:
Paul Lindner wrote:

>   2) Before preparing a statement with a hashed name, send a close
>      statement to insure that we don't get an error inside of a
>      transaction.

But that defeats the purpose of using a named statement in the first
place -- and any potential benefit you'd get by sharing statements
between clients -- if you're going to throw it away before reusing it
every time!

> This is just as performant as the current driver which will happily
> re-prepare the same SQL many times.

No. The current driver will switch to a named statement that is prepared
*once* when it thinks there is a benefit to doing so (tunable via
prepareThreshold). Your change would mean that it would always
re-prepare statements. Unless you're talking about a statement cache so
that applications don't have to hold onto a particular PreparedStatement
object to get the benefit of reuse, which arguably is the responsibility
of the appserver (see the list archives for recent discussion of this).

> I do hope that you'll find any work done
> useful for general consumption.

Unfortunately, I still don't really see any these solutions to your
particular problem as something that'd be useful in the general driver.
I think your efforts would be better spent in teaching pgbouncer to deal
with named statements properly..

-O

Re: Prepared Statements vs. pgbouncer

From
Heikki Linnakangas
Date:
Oliver Jowett wrote:
> Josh Berkus wrote:
>> Oliver,
>>
>>> S_1 from client 1 might be a completely different query to S_1 from
>>> client 2. The JDBC driver just numbers statements sequentially as
>>> they are used.
>>
>> So, how does j2EE-side connection pooling handle this?
>
> Err.. it is dealing with JDBC connections, not protocol level stuff, so
> there's no issue with statement naming. The driver just does what it
> normally does (maintains a mapping of query to statement name on each
> connection). The usual J2EE connection pool model is different to what
> pgbouncer apparently does as clients explicitly return connections to
> the pool when no longer needed -- which effectively invalidates any
> PreparedStatement objects they might still be holding -- and reobtain
> them when later need.

The OP might be better off doing the connection multiplexing at a higher
level as well. You could use something like Virtual JDBC
(vjdbc.sourceforge.net) between the clients and a connection pool
managed in the multiplexer. There would be no problem with prepared
statements, since the JDBC driver would deal with real connections to
the database. I've never used vjdbc myself, so I don't know if it's
applicable, but something along those lines should work.

You might also consider if using pgbouncer or similar really is
necessary. Perhaps you could get away with a per-client connection pool,
with "min pool size" small enough, maybe 0, that the number of
simultaneous connections to the server would stay reasonable.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

Re: Prepared Statements vs. pgbouncer

From
Till Toenges
Date:
Paul Lindner wrote:
> 1) Use hashing to choose a prepared statement name
>
>    If we take the hash of the prepared statement text and prefix with
>    S_ we can be assured of using the same unique prepared statement
>    name across all application servers.
>
>    (And yes, I know that hashes are not perfect and collisions can
>     occur.  Highly unlikely if we choose a good hash)

This sounds like a bad idea on many levels. Don't do it. Actually, i
think the driver is ok and the real problem is pgbouncer.

I have another solution to your problem:

You said that there are 100s of Servers, i assume that there is some
money behind that. Talk to the pgbouncer developers. Ask them to analyze
your problem and what it would cost to fix it. Ask them what it would
cost to make pgbouncer fully compliant, because it's probably not the
only feature you will miss. Figure out if this is cheaper than replacing
/ upgrading your hardware with something powerful enough to handle the
load without pgbouncer. Pick the cheaper solution. Done.

"Fixing" the jdbc driver is probably not a stable long term solution,
even if it helps with your most immediate problem.


Till

Re: Prepared Statements vs. pgbouncer

From
Josh Berkus
Date:
Heikki,

> You might also consider if using pgbouncer or similar really is
> necessary. Perhaps you could get away with a per-client connection pool,
> with "min pool size" small enough, maybe 0, that the number of
> simultaneous connections to the server would stay reasonable.
>

It becomes necessary whenever you have a large number of appservers
connecting to the same database server.  Setting min pool size downwards
isn't really effective becuase then you're just adding connection time
delays ... exactly what you're trying to prevent by using a connection pool.

I'm working on a scalable project myself which is why I'm concerned
about this.  The reason why we're planning to rely on pgBouncer is that
we'll have both java and non-java applications connecting to the same
database, and we don't want to manage two different connection/failover
pools.  So it's important to us that PG-JDBC function with independant
connection pools.

So where is it going to be easier to fix this ... pgBouncer, or pg-JDBC?

--Josh Berkus

Re: Prepared Statements vs. pgbouncer

From
Dave Cramer
Date:
Josh Berkus wrote:
> Heikki,
>
>> You might also consider if using pgbouncer or similar really is
>> necessary. Perhaps you could get away with a per-client connection pool,
>> with "min pool size" small enough, maybe 0, that the number of
>> simultaneous connections to the server would stay reasonable.
>>
>
> It becomes necessary whenever you have a large number of appservers
> connecting to the same database server.  Setting min pool size
> downwards isn't really effective becuase then you're just adding
> connection time delays ... exactly what you're trying to prevent by
> using a connection pool.
>
> I'm working on a scalable project myself which is why I'm concerned
> about this.  The reason why we're planning to rely on pgBouncer is
> that we'll have both java and non-java applications connecting to the
> same database, and we don't want to manage two different
> connection/failover pools.  So it's important to us that PG-JDBC
> function with independant connection pools.
>
> So where is it going to be easier to fix this ... pgBouncer, or pg-JDBC?
pgBouncer is broken so I'd fix it.
>
> --Josh Berkus
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>


Re: Prepared Statements vs. pgbouncer

From
Heikki Linnakangas
Date:
Josh Berkus wrote:
>> You might also consider if using pgbouncer or similar really is
>> necessary. Perhaps you could get away with a per-client connection pool,
>> with "min pool size" small enough, maybe 0, that the number of
>> simultaneous connections to the server would stay reasonable.
>
> It becomes necessary whenever you have a large number of appservers
> connecting to the same database server.  Setting min pool size downwards
> isn't really effective becuase then you're just adding connection time
> delays ... exactly what you're trying to prevent by using a connection
> pool.

Yeah, I know. I was trying to think of workaround for the OP. With
hundreds of app servers it's likely that a pool of 1-2 connections would
 actually be sufficient. Unless there really is thousands of
simultaneous users running queries at the same time, at which point
Postgres is going to be sweating even if you manage to multiplex the
number of connections to a few hundred.

> I'm working on a scalable project myself which is why I'm concerned
> about this.  The reason why we're planning to rely on pgBouncer is that
> we'll have both java and non-java applications connecting to the same
> database, and we don't want to manage two different connection/failover
> pools.  So it's important to us that PG-JDBC function with independant
> connection pools.
>
> So where is it going to be easier to fix this ... pgBouncer, or pg-JDBC?

It's clearly pgBouncer here that's braking the protocol, so it's
pgBouncer that needs fixing. A workaround in JDBC driver would only help
JDBC apps. If you're having non-java applications in the mix, you'd have
to implement the workaround in *all* the drivers.

BTW: I just stumbled across this Greg Sabine Mullane's blog post, which
describes a workaround for DBD::Pg perl module, so we're not alone:
http://people.planetpostgresql.org/greg/index.php?/archives/110-Using-DBDPg-with-pg_bouncer.html

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

Re: Prepared Statements vs. pgbouncer

From
Josh Berkus
Date:
Heikki,

> It's clearly pgBouncer here that's braking the protocol, so it's
> pgBouncer that needs fixing. A workaround in JDBC driver would only help
> JDBC apps. If you're having non-java applications in the mix, you'd have
> to implement the workaround in *all* the drivers.
>
> BTW: I just stumbled across this Greg Sabine Mullane's blog post, which
> describes a workaround for DBD::Pg perl module, so we're not alone:
> http://people.planetpostgresql.org/greg/index.php?/archives/110-Using-DBDPg-with-pg_bouncer.html
>

Well, I think we have the right group of people here to think about what
needs to be fixed.  One of the Sun engineers can help with it, which I
think is necessary since I doubt Marco has the time.

So, given pgBouncer's purpose, what exactly should it be doing to handle
prepared connections?  That is, how can it support the v3 protocol
without eliminating its tremendous scalability?

--Josh

Re: Prepared Statements vs. pgbouncer

From
Tom Lane
Date:
Dave Cramer <pg@fastcrypt.com> writes:
> Josh Berkus wrote:
>> So where is it going to be easier to fix this ... pgBouncer, or pg-JDBC?

> pgBouncer is broken so I'd fix it.

It's an enormous mistake to imagine that prepared statements are the
only issue.  What about GUC settings and temp tables, to mention a
couple other bits of per-session state?

            regards, tom lane

Re: Prepared Statements vs. pgbouncer

From
Paul Lindner
Date:
On Mon, Oct 01, 2007 at 11:44:58AM -0700, Josh Berkus wrote:
> Heikki,
>
> >It's clearly pgBouncer here that's braking the protocol, so it's
> >pgBouncer that needs fixing. A workaround in JDBC driver would only help
> >JDBC apps. If you're having non-java applications in the mix, you'd have
> >to implement the workaround in *all* the drivers.
> >
> >BTW: I just stumbled across this Greg Sabine Mullane's blog post, which
> >describes a workaround for DBD::Pg perl module, so we're not alone:
> >http://people.planetpostgresql.org/greg/index.php?/archives/110-Using-DBDPg-with-pg_bouncer.html
> >
>
> Well, I think we have the right group of people here to think about what
> needs to be fixed.  One of the Sun engineers can help with it, which I
> think is necessary since I doubt Marco has the time.

> So, given pgBouncer's purpose, what exactly should it be doing to handle
> prepared connections?  That is, how can it support the v3 protocol
> without eliminating its tremendous scalability?

Here's a couple of ideas:

* Make more commands have modes that ignore errors.  This allows for
  idempotent behavior and won't result in transactions failing.

    PREPARE IF NOT EXISTS foo() AS xxxx
    PREPARE OR REPLACE foo() AS xxxx
    DEALLLOCATE IF EXISTS foo;

  Or fix the general case and allow for a way to catch errors without
  invalidating the transaction.

* make it possible to request server-side state be transaction-scoped.
  For example, prepared-statements, temporary tables, and the like
  could be cleaned up at the end of a transaction automatically.  Some
  of this behavior exists already:

    CREATE TEMP TABLE foo (integer a) on commit drop;

  I think temporary views, temporary tables, and prepared statements
  are the big things here.

* Have server-side support for 'pooled' connections.  If you request a
  pooled connection then certain features will be disabled to insure
  that server-side state is contained properly.

  This could be specified on connection, or perhaps it could be
  specified as a user attribute...

  For example, if this was in place no session state modifications
  could occur outside a transaction.

* Have some way of dumping and restoring transient state.  This would
  allow pgbouncer to preserve client state without having to intercept
  and track all of the various temporary information.  Something
  simple like this would suffice.

    SHOW CLIENTSTATE;

    clientstate
    --------------------
    aa$*afjhak1324834....

    SET CLIENTSTATE="aa$*afjhak1324834...."

  .. or ..

  Have the database itself track client state separate from backend
  processes.



--
Paul Lindner        ||||| | | | |  |  |  |   |   |
lindner@inuus.com

Attachment

Re: Prepared Statements vs. pgbouncer

From
Josh Berkus
Date:
Paul,

>     PREPARE IF NOT EXISTS foo() AS xxxx
>     PREPARE OR REPLACE foo() AS xxxx
>     DEALLLOCATE IF EXISTS foo;

This has been a request of various app developers anyway.  Mind you, they
want it mostly to support sloppy programming, but it would make it easier
to port applications from MySQL.

>   Or fix the general case and allow for a way to catch errors without
>   invalidating the transaction.

heh, that would be a major PostgreSQL change.

> * make it possible to request server-side state be transaction-scoped.
>   For example, prepared-statements, temporary tables, and the like
>   could be cleaned up at the end of a transaction automatically.  Some
>   of this behavior exists already:

This would rather limit the utility of prepared plans.

> * Have server-side support for 'pooled' connections.  If you request a
>   pooled connection then certain features will be disabled to insure
>   that server-side state is contained properly.
>
>   This could be specified on connection, or perhaps it could be
>   specified as a user attribute...
>
>   For example, if this was in place no session state modifications
>   could occur outside a transaction.

I don't see that this would fix the issue with JDBC.

> * Have some way of dumping and restoring transient state.  This would
>   allow pgbouncer to preserve client state without having to intercept
>   and track all of the various temporary information.  Something
>   simple like this would suffice.
>   .. or ..
>   Have the database itself track client state separate from backend
>   processes.

This approach seems incompatible with the goal of using pgBouncer in the
first place, namely scalability.

Basically, I see two things we need to do.  In the short term, we need some
kind of fix for pgBouncer so it at least works with JDBC.  In the long
term, we should work on support for shared prepared plans.

--
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

Re: Prepared Statements vs. pgbouncer

From
Paul Lindner
Date:
On Mon, Oct 01, 2007 at 12:49:18PM -0700, Josh Berkus wrote:
> Paul,
>
> >     PREPARE IF NOT EXISTS foo() AS xxxx
> >     PREPARE OR REPLACE foo() AS xxxx
> >     DEALLLOCATE IF EXISTS foo;
>
> This has been a request of various app developers anyway.  Mind you, they
> want it mostly to support sloppy programming, but it would make it easier
> to port applications from MySQL.

> >   Or fix the general case and allow for a way to catch errors without
> >   invalidating the transaction.
>
> heh, that would be a major PostgreSQL change.

People underestimate the importance of idempot operations when dealing
with many real world situations.

Another alternative might be

  PREPARE foo() AS xxxxx ON COMMIT DEALLOCATE;

except I just realized that I'm recreating CURSORs :)

Why doesn't the Postgres JDBC driver use cursors for ResultSets
instead of creating prepared statements?  Is this not supported in the
v3 protocol?  (I'd answer this question myself, but no time to read
code at the moment...)

> > * make it possible to request server-side state be transaction-scoped.
> >   For example, prepared-statements, temporary tables, and the like
> >   could be cleaned up at the end of a transaction automatically.  Some
> >   of this behavior exists already:
>
> This would rather limit the utility of prepared plans.

> > * Have server-side support for 'pooled' connections.  If you request a
> >   pooled connection then certain features will be disabled to insure
> >   that server-side state is contained properly.
> >
> >   This could be specified on connection, or perhaps it could be
> >   specified as a user attribute...
> >
> >   For example, if this was in place no session state modifications
> >   could occur outside a transaction.
>
> I don't see that this would fix the issue with JDBC.

With pgbouncer you can use transaction scoping.  In a given
transaction all commands are guaranteed to go to the same backend.

If you can match your server side state with a BEGIN/END block then
you can be assured that this state is preserved for the duration of
this mini-session.

JDBC could be modified to either:

  * Only use server-side prepared statements inside transactions.
    (Or use cursors instead)

> > * Have some way of dumping and restoring transient state.  This would
> >   allow pgbouncer to preserve client state without having to intercept
> >   and track all of the various temporary information.  Something
> >   simple like this would suffice.
> >   .. or ..
> >   Have the database itself track client state separate from backend
> >   processes.
>
> This approach seems incompatible with the goal of using pgBouncer in the
> first place, namely scalability.

Right, it's the difference between a router and switch.

In my case it wouldn't matter.  I don't want to keep state on the
server, I want my app-clients to be as stateless as possible.  The
driver is subverting this desire.

Right now the benefit from pgbouncer is much better than benefits from
prepared plans.

> Basically, I see two things we need to do.  In the short term, we need some
> kind of fix for pgBouncer so it at least works with JDBC.  In the long
> term, we should work on support for shared prepared plans.

Okay.  I'm willing to lend a hand and work with anyone that's working
in large scale java Postgres environments.


--
Paul Lindner        ||||| | | | |  |  |  |   |   |
lindner@inuus.com

Attachment

Re: Prepared Statements vs. pgbouncer

From
Paul Lindner
Date:
On Mon, Oct 01, 2007 at 06:52:50PM +0200, Till Toenges wrote:
> Paul Lindner wrote:
> >1) Use hashing to choose a prepared statement name
> >
> >   If we take the hash of the prepared statement text and prefix with
> >   S_ we can be assured of using the same unique prepared statement
> >   name across all application servers.
> >
> >   (And yes, I know that hashes are not perfect and collisions can
> >    occur.  Highly unlikely if we choose a good hash)
>
> This sounds like a bad idea on many levels. Don't do it. Actually, i
> think the driver is ok and the real problem is pgbouncer.

Actually the problem is that Postgres is process/session oriented in a
multithreaded event-based world.

> I have another solution to your problem:
>
> You said that there are 100s of Servers, i assume that there is some
> money behind that. Talk to the pgbouncer developers. Ask them to analyze
> your problem and what it would cost to fix it. Ask them what it would
> cost to make pgbouncer fully compliant, because it's probably not the
> only feature you will miss. Figure out if this is cheaper than replacing
> / upgrading your hardware with something powerful enough to handle the
> load without pgbouncer. Pick the cheaper solution. Done.

The problem is that the described solution puts way too much moving
parts inside of something that should be very simple.  You'd have to
recreate most of Postgresql's parsing and grammar inside of Pgbouncer
and change it from something simple into a full-featured proxy.

In my opinion the _correct_ way to fix the problem of too many bloated
idle backends is to incorporate libevent into Postgres and radically
rethink of ways to maintain cache/session coherence.

> "Fixing" the jdbc driver is probably not a stable long term solution,
> even if it helps with your most immediate problem.

Fixing the JDBC driver is the simplest and cheapest alternative.  So I
am following your advice and picking the cheaper solution :)

--
Paul Lindner        ||||| | | | |  |  |  |   |   |
lindner@inuus.com

Attachment

Re: Prepared Statements vs. pgbouncer

From
Dave Cramer
Date:
Paul Lindner wrote:
On Mon, Oct 01, 2007 at 12:49:18PM -0700, Josh Berkus wrote: 
Paul,
   
    PREPARE IF NOT EXISTS foo() AS xxxx   PREPARE OR REPLACE foo() AS xxxx   DEALLLOCATE IF EXISTS foo;     
This has been a request of various app developers anyway.  Mind you, they 
want it mostly to support sloppy programming, but it would make it easier 
to port applications from MySQL.   
 
  Or fix the general case and allow for a way to catch errors without invalidating the transaction.     
heh, that would be a major PostgreSQL change.   
People underestimate the importance of idempot operations when dealing
with many real world situations.

Another alternative might be
 PREPARE foo() AS xxxxx ON COMMIT DEALLOCATE;

except I just realized that I'm recreating CURSORs :)

Why doesn't the Postgres JDBC driver use cursors for ResultSets
instead of creating prepared statements?  Is this not supported in the
v3 protocol?  (I'd answer this question myself, but no time to read
code at the moment...) 
It does use cursors if you set the fetch size appropriately. Prepared Statements are not the same as cursors.

One is a reusable statement that you can just change the parameters without reparsing the statement.
Cursors are the result of the statement being executed.
 
* make it possible to request server-side state be transaction-scoped. For example, prepared-statements, temporary tables, and the like could be cleaned up at the end of a transaction automatically.  Some of this behavior exists already:     
This would rather limit the utility of prepared plans.     
 
* Have server-side support for 'pooled' connections.  If you request a pooled connection then certain features will be disabled to insure that server-side state is contained properly.
 This could be specified on connection, or perhaps it could be specified as a user attribute...
 For example, if this was in place no session state modifications could occur outside a transaction.     
I don't see that this would fix the issue with JDBC.   
With pgbouncer you can use transaction scoping.  In a given
transaction all commands are guaranteed to go to the same backend.

If you can match your server side state with a BEGIN/END block then
you can be assured that this state is preserved for the duration of
this mini-session.

JDBC could be modified to either:
 * Only use server-side prepared statements inside transactions.   (Or use cursors instead)
 
* Have some way of dumping and restoring transient state.  This would allow pgbouncer to preserve client state without having to intercept and track all of the various temporary information.  Something simple like this would suffice. .. or .. Have the database itself track client state separate from backend processes.     
This approach seems incompatible with the goal of using pgBouncer in the 
first place, namely scalability.   
Right, it's the difference between a router and switch.

In my case it wouldn't matter.  I don't want to keep state on the
server, I want my app-clients to be as stateless as possible.  The
driver is subverting this desire.

Right now the benefit from pgbouncer is much better than benefits from
prepared plans.
 
Basically, I see two things we need to do.  In the short term, we need some 
kind of fix for pgBouncer so it at least works with JDBC.  In the long 
term, we should work on support for shared prepared plans.   
Okay.  I'm willing to lend a hand and work with anyone that's working
in large scale java Postgres environments.

 

Re: Prepared Statements vs. pgbouncer

From
Oliver Jowett
Date:
Paul Lindner wrote:

> Why doesn't the Postgres JDBC driver use cursors for ResultSets
> instead of creating prepared statements?  Is this not supported in the
> v3 protocol?  (I'd answer this question myself, but no time to read
> code at the moment...)

It uses portals which are the v3 equivalent of cursors.

If you are serious about solving these problems I think you really need
to go and read the v3 protocol spec at a minimum.

-O

Re: Prepared Statements vs. pgbouncer

From
Oliver Jowett
Date:
Paul Lindner wrote:

> The problem is that the described solution puts way too much moving
> parts inside of something that should be very simple.  You'd have to
> recreate most of Postgresql's parsing and grammar inside of Pgbouncer
> and change it from something simple into a full-featured proxy.

What? Why on earth would you need to recreate the SQL grammar inside
pgbouncer?! Justify this.

-O

Re: Prepared Statements vs. pgbouncer

From
Paul Lindner
Date:
On Mon, Oct 01, 2007 at 02:44:10PM -0400, Tom Lane wrote:
> Dave Cramer <pg@fastcrypt.com> writes:
> > Josh Berkus wrote:
> >> So where is it going to be easier to fix this ... pgBouncer, or pg-JDBC?
>
> > pgBouncer is broken so I'd fix it.
>
> It's an enormous mistake to imagine that prepared statements are the
> only issue.  What about GUC settings and temp tables, to mention a
> couple other bits of per-session state?

What if you're in a homogenous environment and can control those
variables?

Or for another example, what if you want to create a pool of read-only
replicas and don't care which server gets the request.

What about failover situations?  With stateless clients and the
correct pooling one could seamlessly send requests to a durable
Connection, avoiding a large amount of retry logic in the individual
code.

Perl's adage "Make easy things easy and hard things possible" is
apropos here..

----------------------------------------------------------------------
In fact, here's a crazy idea: static pre-defined session state tied to
roles:

  ALTER ROLE appserver_v1 PREPARE foo() AS ....;
  ALTER ROLE appserver_v1 PREPARE xyz() AS ....;
  ALTER ROLE appserver_v1 SET SESSION stateless=true;

Of course this doesn't help for dynamically prepared statements, which
has been my problem all along...

--
Paul Lindner        ||||| | | | |  |  |  |   |   |
lindner@inuus.com

Attachment

Re: Prepared Statements vs. pgbouncer

From
Oliver Jowett
Date:
Oliver Jowett wrote:
> Paul Lindner wrote:
>
>> Why doesn't the Postgres JDBC driver use cursors for ResultSets
>> instead of creating prepared statements?  Is this not supported in the
>> v3 protocol?  (I'd answer this question myself, but no time to read
>> code at the moment...)
>
> It uses portals which are the v3 equivalent of cursors.

To elaborate a bit more: you need a statement before you can create a
portal, and the unnamed statement is no use for this particular case as
we need the portal to live despite reuse of the unnamed statement, and
the implicit close of the unnamed statement on reuse also causes any
dependant portal to close. So we must use a named statement so that we
can control the lifetime of the portal correctly.

-O

Re: Prepared Statements vs. pgbouncer

From
Paul Lindner
Date:
On Tue, Oct 02, 2007 at 10:06:50AM +1300, Oliver Jowett wrote:
> Paul Lindner wrote:
>
> >The problem is that the described solution puts way too much moving
> >parts inside of something that should be very simple.  You'd have to
> >recreate most of Postgresql's parsing and grammar inside of Pgbouncer
> >and change it from something simple into a full-featured proxy.
>
> What? Why on earth would you need to recreate the SQL grammar inside
> pgbouncer?! Justify this.

Eh? I didn't mention Sql grammar.  A proxy would at minimum have to
track and maintain connection settings and portals and recreate them
on each backend.  However a full-featured proxy could parse any GUC
statements.

In fact if you want full support for temporary tables (iffy) /
temporary views (perhaps possible) and whatnot you will have to parse
the SQL flying across the wire so you can recreate the session in it's
entirety.

Of course, I don't want nor need that.

For the record:

   Please please please note that I'm only trying to solve a particular
   problem here.  I know what I want to do is messy, ugly and a little
   impure and flies in the face of elegant design.

   If it helps think of what I'm proposing as akin to denormalization
   of a beautiful schema to achieve specific goals.

--
Paul Lindner        ||||| | | | |  |  |  |   |   |
lindner@inuus.com

Attachment

Re: Prepared Statements vs. pgbouncer

From
Oliver Jowett
Date:
Paul Lindner wrote:

> For the record:
>
>    Please please please note that I'm only trying to solve a particular
>    problem here.  I know what I want to do is messy, ugly and a little
>    impure and flies in the face of elegant design.
>
>    If it helps think of what I'm proposing as akin to denormalization
>    of a beautiful schema to achieve specific goals.

I am going to suggest what I suggested early on in this thread again:

(1) Modify your copy of the JDBC driver so that the v3 protocol path
does not use named statements or portals
(2) Deploy your modified driver into your specific environment

I don't think anything short of the "right" solution is going to be
suitable for general purpose use.

-O


Re: Prepared Statements vs. pgbouncer

From
Oliver Jowett
Date:
Paul Lindner wrote:
> On Tue, Oct 02, 2007 at 10:06:50AM +1300, Oliver Jowett wrote:
>> Paul Lindner wrote:
>> You'd have to recreate most of Postgresql's parsing and grammar inside of Pgbouncer

>> What? Why on earth would you need to recreate the SQL grammar inside
>> pgbouncer?! Justify this.
>
> Eh? I didn't mention Sql grammar.

Sorry if I misinterpreted what "most of Postgresql's parsing and
grammar" meant then.

> A proxy would at minimum have to
> track and maintain connection settings and portals and recreate them
> on each backend.  However a full-featured proxy could parse any GUC
> statements.

I can think of a few ways around this offhand that don't require parsing
every statement going past (e.g. SHOW ALL, or individual SHOWs on the
subset of GUCs you care about, when considering whether to move a
connection). Even if you do have to inspect queries you hardly need a
full-blown lexer/parser to recognize SET, the syntax is fairly simple.

> In fact if you want full support for temporary tables (iffy) /
> temporary views (perhaps possible) and whatnot you will have to parse
> the SQL flying across the wire so you can recreate the session in it's
> entirety.
>
> Of course, I don't want nor need that.

I think it's acceptable to put some limits on what *application level*
SQL can do when running through pgbouncer, document that temporary
tables that are not ON COMMIT DROP aren't supported or whatever. You
could do the same for SET SESSION if you really wanted.

But the application usually can't do anything about the protocol level
stuff, so you pretty much have to get that right. Currently pgbouncer
does not do this.

-O

Re: Prepared Statements vs. pgbouncer

From
Paul Lindner
Date:
On Tue, Oct 02, 2007 at 10:19:56AM +1300, Oliver Jowett wrote:
> Oliver Jowett wrote:
> >Paul Lindner wrote:
> >
> >>Why doesn't the Postgres JDBC driver use cursors for ResultSets
> >>instead of creating prepared statements?  Is this not supported in the
> >>v3 protocol?  (I'd answer this question myself, but no time to read
> >>code at the moment...)
> >
> >It uses portals which are the v3 equivalent of cursors.
>
> To elaborate a bit more: you need a statement before you can create a
> portal, and the unnamed statement is no use for this particular case as
> we need the portal to live despite reuse of the unnamed statement, and
> the implicit close of the unnamed statement on reuse also causes any
> dependant portal to close. So we must use a named statement so that we
> can control the lifetime of the portal correctly.

Okay, Okay, I have read much of

  http://www.postgresql.org/docs/current/static/protocol.html

and much of

  src/backend/tcop/postgres.c

and much of

  src/backend/commands/portalcmds.c

I'm a newbie here, but it appears that PerformCursorOpen() is callable
from the SQL grammar, not directly via the protocol.

My apologies for mixing up PL/PGSQL and V3 protocol.


--
Paul Lindner        ||||| | | | |  |  |  |   |   |
lindner@inuus.com

Attachment

Re: Prepared Statements vs. pgbouncer

From
hubert depesz lubaczewski
Date:
On Mon, Oct 01, 2007 at 02:44:10PM -0400, Tom Lane wrote:
> Dave Cramer <pg@fastcrypt.com> writes:
> > Josh Berkus wrote:
> >> So where is it going to be easier to fix this ... pgBouncer, or pg-JDBC?
> > pgBouncer is broken so I'd fix it.
> It's an enormous mistake to imagine that prepared statements are the
> only issue.  What about GUC settings and temp tables, to mention a
> couple other bits of per-session state?

i think that calling it broken is "a bit" far fetched.

i dont know how familiar you are with pgbouncer, but the mode in which
paul ran pgbouncer is *purposedly* not working correctly with prepared
statement.s

basically - ppgbouncer supports 3 modes:
- session pooling
- transaction pooling
- statement pooling.

description of all of them is clear in manual:

------------------------------
  Session pooling::
    Most polite method.  When client connects, a server connection
    will be assigned to it for the whole duration it stays connected.
    When client disconnects, the server connection will be put back
    into pool.

  Transaction pooling::
    Server connection is assigned to client only during a transaction.
    When PgBouncer notices that transaction is over, the server
    will be put back into pool.

  Statement pooling::
    Most aggressive method.  The server connection will be put back into
    pool immidiately after a query completes.  Multi-statement
    transactions are disallowed in this mode as they would break.
------------------------------

so, pgbouncer is not broken. if you want to keep your connection between
transactions (which is perfectly sensible) - use session pooling.

both transaction pooling and statement pooling are modes which trade
some performance for missing features.

i wouldn't suggest anyone using statement pooling, but if i would use
it, then what right do i have to complain about bad transactions?!

best regards,

depesz

--
quicksil1er: "postgres is excellent, but like any DB it requires a
highly paid DBA.  here's my CV!" :)
http://www.depesz.com/ - blog dla ciebie (i moje CV)

Re: Prepared Statements vs. pgbouncer

From
"Marko Kreen"
Date:
On 10/2/07, hubert depesz lubaczewski <depesz@depesz.com> wrote:
> On Mon, Oct 01, 2007 at 02:44:10PM -0400, Tom Lane wrote:
> > Dave Cramer <pg@fastcrypt.com> writes:
> > > Josh Berkus wrote:
> > >> So where is it going to be easier to fix this ... pgBouncer, or pg-JDBC?
> > > pgBouncer is broken so I'd fix it.
> > It's an enormous mistake to imagine that prepared statements are the
> > only issue.  What about GUC settings and temp tables, to mention a
> > couple other bits of per-session state?
>
> i think that calling it broken is "a bit" far fetched.

Thanks, I think so too.  Considering all the other things that
are broken by transaction pooling, "it would be cute to have it"
is the best I can think of.

I did a quick feature matrix of things broken by pooler in general:

 https://developer.skype.com/SkypeGarage/DbProjects/PgBouncer

Seems like the protocol-level plans is only one of them that
_could_ be worked around in pooler.  Rest will stay broken.

Coincidentally, the prepared plans happens to be the pet-feature
of JDBC, as I understand...


So, personally I don't have time to work on the feature, but
I have thought a draft design that could somewhat work in the
context of pgbouncer.  If anyone is interested to work on that,
contact me.

--
marko

Re: Prepared Statements vs. pgbouncer

From
Josh Berkus
Date:
Marko,

> So, personally I don't have time to work on the feature, but
> I have thought a draft design that could somewhat work in the
> context of pgbouncer.  If anyone is interested to work on that,
> contact me.

Will this work correctly if you use Session mode for pgBouncer?

--
Josh Berkus
PostgreSQL @ Sun
San Francisco

Re: Prepared Statements vs. pgbouncer

From
"Marko Kreen"
Date:
On 10/2/07, Josh Berkus <josh@agliodbs.com> wrote:
> > So, personally I don't have time to work on the feature, but
> > I have thought a draft design that could somewhat work in the
> > context of pgbouncer.  If anyone is interested to work on that,
> > contact me.
>
> Will this work correctly if you use Session mode for pgBouncer?

Yes, then you don't need any support from pgbouncer, only
a way to drop all prepared plans on session end, which unfortunately
is available only in 8.3 - DEALLOCATE ALL / DISCARD ALL.

--
marko

Re: Prepared Statements vs. pgbouncer

From
Josh Berkus
Date:
Marko,

> Yes, then you don't need any support from pgbouncer, only
> a way to drop all prepared plans on session end, which unfortunately
> is available only in 8.3 - DEALLOCATE ALL / DISCARD ALL.

Oh.  So on 8.2, this is still an issue becuase the plans will persist beyond
JDBC dropping the connection?  Also, are you saying that on 8.3 the client
code will have to manually issue DEALLOCATE ALL on session end?

--
Josh Berkus
PostgreSQL @ Sun
San Francisco

Re: [Pgbouncer-general] Prepared Statements vs. pgbouncer

From
"Marko Kreen"
Date:
On 10/4/07, Josh Berkus <josh@agliodbs.com> wrote:
> > Yes, then you don't need any support from pgbouncer, only
> > a way to drop all prepared plans on session end, which unfortunately
> > is available only in 8.3 - DEALLOCATE ALL / DISCARD ALL.
>
> Oh.  So on 8.2, this is still an issue becuase the plans will persist beyond
> JDBC dropping the connection?

Yes.  JDBC could avoid it by carefully dropping all plans before
closing the connection.

> Also, are you saying that on 8.3 the client
> code will have to manually issue DEALLOCATE ALL on session end?

No, it is pgbouncer's job.  Set server_reset_query to
"DISCARD ALL".  Although that happens to be available only
in 1.1 branch of pgbouncer, which should be released any day now.

In 1.0 it can be done by server_check_query = "..", server_check_delay = 0
but that makes it unusable for it's intended purpose.

--
marko