Thread: Bind message

Bind message

From

"Do, Leon"

Date:

25 September 2006, 14:27:55

Hi All,

From my understanding, when I create a prepare statement I can execute it multiple times and to increase the efficiency. But from what I seen from the logging, Postgres always send the same message sequences over the wire.

parse,

bind,

describe,

execute

....

parse,

bind,

describe

execute

It should execute parse only one time right?

If this is not the right forum then please let me know or direct me to the right forum

thanks

Leon Do

Re: Bind message

From

Dave Cramer

Date:

25 September 2006, 14:37:16

Hi,

The default is not to re-use statements, you have add prepareThreshold=n to the url where n is greater than 0. What this means is that after n times of use the prepared statement will be re-used..

see http://jdbc.postgresql.org/documentation/head/connect.html#connection-parameters for more details

Dave

On 25-Sep-06, at 1:27 PM, Do, Leon wrote:

Hi All,

From my understanding, when I create a prepare statement I can execute it multiple times and to increase the efficiency. But from what I seen from the logging, Postgres always send the same message sequences over the wire.
parse,
bind,
describe,
execute
....
parse,
bind,
describe
execute

It should execute parse only one time right?

If this is not the right forum then please let me know or direct me to the right forum

thanks

Leon Do

Re: Bind message

From

"Guy Rouillier"

Date:

25 September 2006, 16:01:00

Dave Cramer wrote:
> Hi,
>
>
> The default is not to re-use statements, you have add
> prepareThreshold=n to the url where n is greater than 0. What this
> means is that after n times of use the prepared statement will be
> re-used..

That's peculiar.  The whole point of prepared statements is to obtain
the optimization of statement reuse.  If PostgreSQL is going to prevent
that, I'd expect the prevention to be the exception rather than the
rule.  That is, programmers should reasonably expect that the default
behavior is statement reuse.

>
> see
>
http://jdbc.postgresql.org/documentation/head/connect.html#connection-pa
rameters
> for more details

--
Guy Rouillier

Re: Bind message

From

Kris Jurka

Date:

25 September 2006, 16:21:15

On Mon, 25 Sep 2006, Guy Rouillier wrote:

> Dave Cramer wrote:
>>
>> The default is not to re-use statements, you have add
>> prepareThreshold=n to the url where n is greater than 0. What this
>> means is that after n times of use the prepared statement will be
>> re-used..
>
> That's peculiar.  The whole point of prepared statements is to obtain
> the optimization of statement reuse.

Sometimes it is, sometimes the purpose of using a PreparedStatement is to
avoid escaping data yourself for a plain Statement execution.  There are
potential performance penalties for using a "reusable" server side
prepared statement because it plans them with generic parameters, not
those supplied in your query.  For this reason we want to be sure that
people will actually be reusing these statements before switching to this
mode.  I believe the default prepareThreshold is set to 5 despite what the
documentation says.

Kris Jurka

Re: Bind message

From

Mark Lewis

Date:

25 September 2006, 16:23:26

On Mon, 2006-09-25 at 14:00 -0500, Guy Rouillier wrote:
> Dave Cramer wrote:
> > Hi,
> >
> >
> > The default is not to re-use statements, you have add
> > prepareThreshold=n to the url where n is greater than 0. What this
> > means is that after n times of use the prepared statement will be
> > re-used..
>
> That's peculiar.  The whole point of prepared statements is to obtain
> the optimization of statement reuse.  If PostgreSQL is going to prevent
> that, I'd expect the prevention to be the exception rather than the
> rule.  That is, programmers should reasonably expect that the default
> behavior is statement reuse.

Well, yes and no.  Yes, the original intent of PreparedStatements was to
increase efficiency by reuse.  But in practice, since there's no easy
way of specifying parameters programatically except by using the
PreparedStatement interface, most programs which construct one-time-only
SQL on the fly also use PreparedStatements.

There can be a significant benefit to NOT using generic prepared
statements in the one-shot (or few-shot) case; the planner can see real
parameter values instead of placeholders, and can thus make smarter plan
decisions.

-- Mark

Re: Bind message

From

Markus Schaber

Date:

25 September 2006, 18:14:29

Hi, Kris,

Kris Jurka wrote:
> Sometimes it is, sometimes the purpose of using a PreparedStatement is
> to avoid escaping data yourself for a plain Statement execution.  There
> are potential performance penalties for using a "reusable" server side
> prepared statement because it plans them with generic parameters, not
> those supplied in your query.  For this reason we want to be sure that
> people will actually be reusing these statements before switching to
> this mode.  I believe the default prepareThreshold is set to 5 despite
> what the documentation says.

I remember having read something about delaying the actual planning
until the first parameterset arrives.

Am I mislead by my weak memory?

Thanks
Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

Re: Bind message

From

"Guy Rouillier"

Date:

25 September 2006, 18:29:03

Kris Jurka wrote:
> On Mon, 25 Sep 2006, Guy Rouillier wrote:
>
>> Dave Cramer wrote:
>>>
>>> The default is not to re-use statements, you have add
>>> prepareThreshold=n to the url where n is greater than 0. What this
>>> means is that after n times of use the prepared statement will be
>>> re-used..
>>
>> That's peculiar.  The whole point of prepared statements is to obtain
>> the optimization of statement reuse.
>
> we want to be sure that people will actually be reusing these
> statements before switching to this mode.  I believe the default
> prepareThreshold is set to 5 despite what the documentation says.

If true, that's a good compromise.  It addresses the use of prepared
statements for purposes other than optimization that both you and Mark
mention, but doesn't unwittingly bite the programmer who has a statement
he intends to execute 1000 times and tries to do the right thing by
using a prepared statement.

--
Guy Rouillier

Re: Bind message

From

Oliver Jowett

Date:

25 September 2006, 18:43:20

Markus Schaber wrote:
> Hi, Kris,
>
> Kris Jurka wrote:
>
>>Sometimes it is, sometimes the purpose of using a PreparedStatement is
>>to avoid escaping data yourself for a plain Statement execution.  There
>>are potential performance penalties for using a "reusable" server side
>>prepared statement because it plans them with generic parameters, not
>>those supplied in your query.  For this reason we want to be sure that
>>people will actually be reusing these statements before switching to
>>this mode.  I believe the default prepareThreshold is set to 5 despite
>>what the documentation says.
>
>
> I remember having read something about delaying the actual planning
> until the first parameterset arrives.

The server only does this when you use the unnamed statement (i.e.
before the driver's prepareThreshold is reached), because the resulting
plan is only really useful for that particular set of parameter values,
and using a named statement implies that you want to reuse the
parse/plan results for other parameter values.

-O

Re: Bind message

From

Kris Jurka

Date:

26 September 2006, 02:31:48


On Mon, 25 Sep 2006, Kris Jurka wrote:

> I believe the default prepareThreshold is set to 5 despite what the
> documentation says.
>

Documentation updated.

Kris Jurka

Re: Bind message

From

Markus Schaber

Date:

26 September 2006, 05:49:18

Hi, Oliver,

Oliver Jowett wrote:

>> I remember having read something about delaying the actual planning
>> until the first parameterset arrives.

> The server only does this when you use the unnamed statement (i.e.
> before the driver's prepareThreshold is reached), because the resulting
> plan is only really useful for that particular set of parameter values,
> and using a named statement implies that you want to reuse the
> parse/plan results for other parameter values.

Ah. But usually, the other parameter values for the same prepared
statement produce similar query plans, because they're the same use case
on similar data.

So, at least for most of the apps we're running here, the first set of
parameters makes more sense than planning without any parameters.

Maybe this behaviour should be configurable via some session
configuration variable?

And for applications that really have two very different usecases
resulting in identical prepared statements, it's affordable to either
switch the delayed planning of, or to prepare the statement twice, with
different names and different first parameter sets.

Maybe we should crosspost this to pg_hackers?

Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

Re: Bind message

From

Oliver Jowett

Date:

26 September 2006, 07:21:13

Markus Schaber wrote:

> Ah. But usually, the other parameter values for the same prepared
> statement produce similar query plans, because they're the same use case
> on similar data.

For the "wrong" values though the plan might be pathologically bad.

> Maybe this behaviour should be configurable via some session
> configuration variable?

Isn't this just a very roundabout way of overriding the planner? i.e.
what you are really doing is saying "ignore your statistics, I know more
about the parameter values I'm going to give you than I'm actually
telling you"..

-O

Re: Bind message

From

Marc Herbert

Date:

26 September 2006, 07:53:58

Kris Jurka <books@ejurka.com> writes:

>> That's peculiar.  The whole point of prepared statements is to obtain
>> the optimization of statement reuse.
>
> Sometimes it is, sometimes the purpose of using a PreparedStatement is
> to avoid escaping data yourself for a plain Statement execution.

By the way this confusion has been introduced by JDBC. In ODBC the
functions SQLBindParameter() and SQLPrepare() were not related.
 <http://msdn.microsoft.com/library/default.asp?url=/library/en-us/odbc/htm/odbcbinding_parameter_markers.asp>

Both ODBC and JDBC docs warn the user that some drivers may not fully
implement PREPARE. Exercise: define "fully" in a portable way.
Granted, you can still complain if PREPARE does nothing at all.

I remember a number of past discussions on this topic. Among others
this one: "Subject: Limit vs setMaxRows issue"
<http://archives.postgresql.org/pgsql-jdbc/2006-07/threads.php#00020>

Re: Bind message

From

Markus Schaber

Date:

26 September 2006, 09:01:24

Hi, Oliver,

Oliver Jowett wrote:

>> Ah. But usually, the other parameter values for the same prepared
>> statement produce similar query plans, because they're the same use case
>> on similar data.
>
> For the "wrong" values though the plan might be pathologically bad.

Yes, I know, and that's why I'd like to retain the possibilitiy of
preparing a new statement with a different first parameter set for
different use cases, and introduce the option of switching back to the
current behaviour.

>> Maybe this behaviour should be configurable via some session
>> configuration variable?
>
> Isn't this just a very roundabout way of overriding the planner? i.e.
> what you are really doing is saying "ignore your statistics, I know more
> about the parameter values I'm going to give you than I'm actually
> telling you"..

No, it's telling the planner "You can assume that all those queries for
my statement are similar enough that the same query plan will fit them,
so you can safe the overhead of both parsing and planning".

Don't forget that, in the current implementation, the query planner has
no choice but planning the query without any actual parameter values,
which is likely to give equal or worse results in most cases.

The use case I have in mind are geometric data lookup of mostly static
data (PostGIS objects) that are nicely distributed, and the queries all
fetch a small bounding box hitting a fraction of at most 1/1000th, more
likely 1/1000000 of the data. For them, the GIST index scan is the best
query plan.

Another option would be that the planner uses some self-learning
algorithm that analyzes the parameter sets for a given prepared
statement, and sees whether it's better to replan always, keep the same
plan, or even efficiently switch between a small set of fixed plans. But
that's out of scope for now, I'm afraid.

Thanks,
Markus
--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

Re: Bind message

From

Oliver Jowett

Date:

26 September 2006, 09:26:26

Markus Schaber wrote:
> Hi, Oliver,
>
> Oliver Jowett wrote:

>>Isn't this just a very roundabout way of overriding the planner? i.e.
>>what you are really doing is saying "ignore your statistics, I know more
>>about the parameter values I'm going to give you than I'm actually
>>telling you"..
>
>
> No, it's telling the planner "You can assume that all those queries for
> my statement are similar enough that the same query plan will fit them,
> so you can safe the overhead of both parsing and planning".

What I mean is, if you are prepared to say that, what you are really
wanting is to second-guess the planner and tell the server "use *THIS*
plan" explicitly .. In both approaches, you (the app developer) need
detailed knowledge about what sort of plans work for the parameters &
query you are using (for your particular server version and dataset etc).

Doing that at the Parse/Bind level doesn't seem right and in fact seems
even more errorprone than explicitly specifying a plan, since you are
relying on the planner picking a particular plan for a particular set of
parameter values that you happened to use first which may or may not
always be true depending on things like the current state of DB statistics..

-O

Re: Bind message

From

Tom Lane

Date:

26 September 2006, 10:10:45

Markus Schaber <schabi@logix-tt.com> writes:
> Don't forget that, in the current implementation, the query planner has
> no choice but planning the query without any actual parameter values,
> which is likely to give equal or worse results in most cases.

No, I think you are oversimplifying.  What the planner is supposed to do
when given a parameterized query (and no parameter value info) is to
generate a plan that won't be too terribly awful regardless of the
parameter values.  With a set of sample values, it may generate a plan
that is great for those values and utterly unusable for anything else.
Here's a simple example:

    select ... from a join b on (a.key = b.key) where a.val = ?

Given a parameter value that looks like it will match only one A row,
you will likely get a plan like this:

    Nest Loop
        Index Scan on A
            Index Cond: val = ?
        Index Scan on B
            Index Cond: b.key = a.key

which is about as fast as you can get if indeed there's only one match.
However if the parameter value matches very many A rows, this plan is
horrid.  With no value for the parameter, the planner should pick a
compromise plan (perhaps a mergejoin) that may not be as fast for the
single-match case, but will finish before doomsday in the other case.

            regards, tom lane

Re: Bind message

From

Markus Schaber

Date:

26 September 2006, 10:20:29

Hi, Tom,

Tom Lane wrote:
> Markus Schaber <schabi@logix-tt.com> writes:
>> Don't forget that, in the current implementation, the query planner has
>> no choice but planning the query without any actual parameter values,
>> which is likely to give equal or worse results in most cases.
>
> No, I think you are oversimplifying.  What the planner is supposed to do
> when given a parameterized query (and no parameter value info) is to
> generate a plan that won't be too terribly awful regardless of the
> parameter values.  With a set of sample values, it may generate a plan
> that is great for those values and utterly unusable for anything else.

That's why I want to have it configurable, and maybe even disabled by
default.

When I know that the cases are similar enough, it can be enabled, and
possibly benchmarked whether it's really faster (better or equal plan
and less planning overhead) or slower (horrible plan due to failure of
premise that the cases are similar enough).

Thanks,
Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

Re: Bind message

From

"Guy Rouillier"

Date:

26 September 2006, 13:06:45

Oliver Jowett wrote:
> Isn't this just a very roundabout way of overriding the planner? i.e.
> what you are really doing is saying "ignore your statistics, I know
> more about the parameter values I'm going to give you than I'm
> actually telling you"..

Is there a non-roundabout way of overriding the planner?  Oracle has
hints.  Sometimes the query analyzer benefits from a helping hand.

--
Guy Rouillier

Re: Bind message

From

Mark Lewis

Date:

26 September 2006, 13:36:05

On Tue, 2006-09-26 at 11:06 -0500, Guy Rouillier wrote:
> Oliver Jowett wrote:
> > Isn't this just a very roundabout way of overriding the planner? i.e.
> > what you are really doing is saying "ignore your statistics, I know
> > more about the parameter values I'm going to give you than I'm
> > actually telling you"..
>
> Is there a non-roundabout way of overriding the planner?  Oracle has
> hints.  Sometimes the query analyzer benefits from a helping hand.

The PG core doesn't support hints.  I believe the general feeling among
the PG developers is that their time is better spent making the planner
smarter than in supporting a hints system which would introduce its own
maintenance headaches.

-- Mark