Thread: Prepared statements considered harmful

Prepared statements considered harmful

From

Peter Eisentraut

Date:

31 August 2006, 09:01:43

With time, it becomes ever clearer to me that prepared SQL statements are just 
a really bad idea.  On some days, it seems like half the performance problems 
in PostgreSQL-using systems are because a bad plan was cached somewhere.  I'd 
say, in the majority of cases the time you save parsing and planning is 
irrelevant compared to the possibly disastrous effects of wrong or suboptimal 
plans.  I wonder if other people have similar experiences.

I'd wish that we reconsider when and how prepared statements are used.  The 
JDBC interface and PL/pgSQL are frequently noticed perpetrators, but the 
problem is really all over the place.

A couple of actions to consider:

- Never use prepared statements unless the user has turned them on.  (This is 
the opposite of the current behavior.)

- Transparently invalidate and regenerate prepared plans more often.  This 
could be tied to the transaction count, update activity obtained from the 
statistics collector, etc.

- Redefine "prepared" to mean "parsed" rather than "parsed and planned".

Each of these or similar changes would only solve a subset of the possible 
problems.  Possibly, we need more knobs to adjust these things.  But 
something needs to be done.

Comments?

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Prepared statements considered harmful

From

Martijn van Oosterhout

Date:

31 August 2006, 09:09:28

On Thu, Aug 31, 2006 at 01:56:29PM +0200, Peter Eisentraut wrote:
> With time, it becomes ever clearer to me that prepared SQL statements are just
> a really bad idea.  On some days, it seems like half the performance problems
> in PostgreSQL-using systems are because a bad plan was cached somewhere.  I'd
> say, in the majority of cases the time you save parsing and planning is
> irrelevant compared to the possibly disastrous effects of wrong or suboptimal
> plans.  I wonder if other people have similar experiences.

Yeah, it seems to me that many of the benefits of not planning are
overrun by the effects of bad plans.

> - Redefine "prepared" to mean "parsed" rather than "parsed and planned".

I think this is the best. Some way to specify that you don't want
planning to take place immediately would be good.

One question though: there is a function PQexecParams(). Does this
suffer from the same problem? I imagine most interfaces like
out-of-band parameters (no escaping issues), why do they not use this?

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Prepared statements considered harmful

From

Csaba Nagy

Date:

31 August 2006, 09:11:59

> - Redefine "prepared" to mean "parsed" rather than "parsed and planned".

How about "prepared" means really "prepared"... in the sense of parsed,
analyzed all sensible plans, and save a meta-plan which based on current
statistics and parameter values chooses one of the considered (and
cached) plans ?

That would be immune both to statistics changes and parameter value
changes in certain limits. It would be also a lot more complex too than
a simple plan...

Cheers,
Csaba.

Re: Prepared statements considered harmful

From

Stefan Kaltenbrunner

Date:

31 August 2006, 09:27:29

Peter Eisentraut wrote:
> With time, it becomes ever clearer to me that prepared SQL statements are just 
> a really bad idea.  On some days, it seems like half the performance problems 
> in PostgreSQL-using systems are because a bad plan was cached somewhere.  I'd 
> say, in the majority of cases the time you save parsing and planning is 
> irrelevant compared to the possibly disastrous effects of wrong or suboptimal 
> plans.  I wonder if other people have similar experiences.
> 
> I'd wish that we reconsider when and how prepared statements are used.  The 
> JDBC interface and PL/pgSQL are frequently noticed perpetrators, but the 
> problem is really all over the place.
> 
> A couple of actions to consider:
> 
> - Never use prepared statements unless the user has turned them on.  (This is 
> the opposite of the current behavior.)
> 
> - Transparently invalidate and regenerate prepared plans more often.  This 
> could be tied to the transaction count, update activity obtained from the 
> statistics collector, etc.

well this sounds like being best done with the "central plan cache" idea 
that is floating around(I think neilc once worked on that) - once we 
have something like that I would expect we can easily 
invalidate/regenerate plans there based on certain criteria (from 
obvious things like DDL-changes to more subtile ones like maybe "age of 
the plan" or "statistics changed significantly on table foo" or 
"regenerate plan everytime when the table bla is involved")
Most of that is pure speculation - but something like that would be a 
very powerful thing to have.

Stefan

Re: Prepared statements considered harmful

From

Peter Eisentraut

Date:

31 August 2006, 09:33:33

Am Donnerstag, 31. August 2006 14:11 schrieb Csaba Nagy:
> How about "prepared" means really "prepared"... in the sense of parsed,
> analyzed all sensible plans, and save a meta-plan which based on current
> statistics and parameter values chooses one of the considered (and
> cached) plans ?

I don't think this could solve one particularly frequent problem which is that 
pattern matching queries don't get along with prepared plans if the search 
pattern isn't known at planning time.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Prepared statements considered harmful

From

Csaba Nagy

Date:

31 August 2006, 09:55:18

On Thu, 2006-08-31 at 14:32, Peter Eisentraut wrote:
> Am Donnerstag, 31. August 2006 14:11 schrieb Csaba Nagy:
> > How about "prepared" means really "prepared"... in the sense of parsed,
> > analyzed all sensible plans, and save a meta-plan which based on current
> > statistics and parameter values chooses one of the considered (and
> > cached) plans ?
> 
> I don't think this could solve one particularly frequent problem which is that 
> pattern matching queries don't get along with prepared plans if the search 
> pattern isn't known at planning time.

Why not ? I specifically said you would prepare a few sensible plans
based on statistics/expected variations of the statistics, and parameter
value ranges which would trigger different plans. 

So for the like query case you could save 2 plans, one for the indexable
case, one for the not indexable case. Then at runtime you choose the
proper one based on the pattern value. The meta-plan I mentioned would
be a collection of plans with rules to choose the right one at run time
based on parameter values and perhaps the current statistics.

This of course would need a lot more preparation time than just prepare
one plan, but that's why you want to do it upfront and then cache the
results. A central plan repository mentioned in other posts would fit
nicely here... and you could use prepared plans for non-parameterized
queries too by simply considering the constants as parameters, to
increase the chances for a prepared plan reuse - this of course for
complex enough queries.

Cheers,
Csaba.

Re: Prepared statements considered harmful

From

"Merlin Moncure"

Date:

31 August 2006, 10:05:48

On 8/31/06, Peter Eisentraut <peter_e@gmx.net> wrote:
> With time, it becomes ever clearer to me that prepared SQL statements are just
> a really bad idea.  On some days, it seems like half the performance problems
> in PostgreSQL-using systems are because a bad plan was cached somewhere.  I'd
> say, in the majority of cases the time you save parsing and planning is
> irrelevant compared to the possibly disastrous effects of wrong or suboptimal
> plans.  I wonder if other people have similar experiences.

I have to respectfully disagree. I have used them to great effect in
many of my projects. In the most extreme case, prepared statements can
provide a 50% reduction or greater in overall query time...this is too
good a benefit to simply discard.  I worked on converted isam projects
which would not have been possbile to make efficient without prepared
statements.   However you are correct that the planner does often
create wacky plans which can cause disasterous results in some cases.

My major issue is that you cannot supply hints to the query engine.
For example one of my favorite tricks is to paramterize the limit
clause in a query which creates a sliding window over the table for
progressive readahead.  Unfortunately the planner assumes 10% which
borks the plan. My work around is to turn off bitmap, seqscan before
plan and turn them on after the prepare.

The proposal to supply hints to statements and functions has been
voted down several times due to the argument that it is better to fix
the planner.  I think supplying hints does fix the planner, and is a
balanced solution.

merlin

Re: Prepared statements considered harmful

From

"zhou bo"

Date:

31 August 2006, 10:06:09

hello everyone , 

   i has been add to you guys' mail list by accident, i don't how to 
refuse to receive your mails, would you please help me to remove my mail 
address form mail group pgsql-hackers@postgresql.org?
   i appreciatewhat you will do for me. (my mail address: 
tate_zhou@hotmail.com)


thanks .








>From: Csaba Nagy <nagy@ecircle-ag.com>
>To: Peter Eisentraut <peter_e@gmx.net>
>CC: postgres hackers <pgsql-hackers@postgresql.org>
>Subject: Re: [HACKERS] Prepared statements considered harmful
>Date: Thu, 31 Aug 2006 14:52:05 +0200
>
>On Thu, 2006-08-31 at 14:32, Peter Eisentraut wrote:
> > Am Donnerstag, 31. August 2006 14:11 schrieb Csaba Nagy:
> > > How about "prepared" means really "prepared"... in the sense of 
parsed,
> > > analyzed all sensible plans, and save a meta-plan which based on 
current
> > > statistics and parameter values chooses one of the considered (and
> > > cached) plans ?
> >
> > I don't think this could solve one particularly frequent problem which 
is that
> > pattern matching queries don't get along with prepared plans if the 
search
> > pattern isn't known at planning time.
>
>Why not ? I specifically said you would prepare a few sensible plans
>based on statistics/expected variations of the statistics, and parameter
>value ranges which would trigger different plans.
>
>So for the like query case you could save 2 plans, one for the indexable
>case, one for the not indexable case. Then at runtime you choose the
>proper one based on the pattern value. The meta-plan I mentioned would
>be a collection of plans with rules to choose the right one at run time
>based on parameter values and perhaps the current statistics.
>
>This of course would need a lot more preparation time than just prepare
>one plan, but that's why you want to do it upfront and then cache the
>results. A central plan repository mentioned in other posts would fit
>nicely here... and you could use prepared plans for non-parameterized
>queries too by simply considering the constants as parameters, to
>increase the chances for a prepared plan reuse - this of course for
>complex enough queries.
>
>Cheers,
>Csaba.
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 6: explain analyze is your friend

Re: Prepared statements considered harmful

From

"Jeroen T. Vermeulen"

Date:

31 August 2006, 10:08:29

On Thu, August 31, 2006 18:56, Peter Eisentraut wrote:

> With time, it becomes ever clearer to me that prepared SQL statements are
> just
> a really bad idea.  On some days, it seems like half the performance
> problems
> in PostgreSQL-using systems are because a bad plan was cached somewhere.

Is there any kind of pattern at all to this problem?  Anything
recognizable?  A few typical pitfalls?

Without knowing much of the internals, I could imagine [waves hands in
vague gestures] other options--something like recognizing major changes
that upset the cost functions that went into generating a plan, and
invalidating the plan based on those; or noting bad estimates somehow as
they become apparent during execution, and annotating the plan with a
"this assumption was a bad idea" marker so you'll do better next time.

I guess you can't go far wrong if you re-define "prepared" to mean merely
"pre-parsed," but it sounds like such a waste of opportunity...

Jeroen

Re: Prepared statements considered harmful

From

Andreas Pflug

Date:

31 August 2006, 10:25:26

Peter Eisentraut wrote:
> With time, it becomes ever clearer to me that prepared SQL statements are just 
> a really bad idea.  On some days, it seems like half the performance problems 
> in PostgreSQL-using systems are because a bad plan was cached somewhere.  I'd 
> say, in the majority of cases the time you save parsing and planning is 
> irrelevant compared to the possibly disastrous effects of wrong or suboptimal 
> plans.  I wonder if other people have similar experiences.
>
> I'd wish that we reconsider when and how prepared statements are used.  The 
> JDBC interface and PL/pgSQL are frequently noticed perpetrators, but the 
> problem is really all over the place.
>
> A couple of actions to consider:
>
> - Never use prepared statements unless the user has turned them on.  (This is 
> the opposite of the current behavior.)
>
> - Transparently invalidate and regenerate prepared plans more often.  This 
> could be tied to the transaction count, update activity obtained from the 
> statistics collector, etc.
>
> - Redefine "prepared" to mean "parsed" rather than "parsed and planned".
>
> Each of these or similar changes would only solve a subset of the possible 
> problems.  Possibly, we need more knobs to adjust these things.  But 
> something needs to be done.
>   
Not to mention problems with outdated plans after schema changes. Using
views unplanned (replanned) when used in joins could lead to improved
resulting plans (e.g. if the view contains outer joins itself).

Regards,
Andreas

Re: Prepared statements considered harmful

From

Peter Eisentraut

Date:

31 August 2006, 10:27:56

Am Donnerstag, 31. August 2006 14:52 schrieb Csaba Nagy:
> So for the like query case you could save 2 plans, one for the indexable
> case, one for the not indexable case. Then at runtime you choose the
> proper one based on the pattern value.

OK, why don't you work out an example.  Let's look at this query:

SELECT * FROM t1 WHERE a LIKE $1;

What two plans would you prepare?

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Prepared statements considered harmful

From

"Zeugswetter Andreas DCP SD"

Date:

31 August 2006, 10:32:06

> > How about "prepared" means really "prepared"... in the sense of
> > parsed, analyzed all sensible plans, and save a meta-plan which
based
> > on current statistics and parameter values chooses one of the
> > considered (and cached) plans ?
>
> I don't think this could solve one particularly frequent
> problem which is that pattern matching queries don't get
> along with prepared plans if the search pattern isn't known
> at planning time.

I think what we would actually want is knowledge about how
much difference different parameters actually make in plan decision.
(the stats show an even distribution and join correlation)
Then we could prepare the plan when there is not much difference
and postpone planning until we know the parameters when the difference
is big.

OLTP workload typically benefits from prepared plans, and the one plan
is good
for all possible inputs, so imho we cannot just assume all plans need
replanning
for different parameters.

Andreas

Re: Prepared statements considered harmful

From

Peter Eisentraut

Date:

31 August 2006, 10:32:15

Am Donnerstag, 31. August 2006 15:05 schrieb Merlin Moncure:
> The proposal to supply hints to statements and functions has been
> voted down several times due to the argument that it is better to fix
> the planner.  I think supplying hints does fix the planner, and is a
> balanced solution.

Planner hints are a way to address a deficient planner.  But neither a 
manually hinted planner nor a perfectly good planner will help if the 
planning decisions are based on outdated information.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Prepared statements considered harmful

From

Peter Eisentraut

Date:

31 August 2006, 10:34:47

Am Donnerstag, 31. August 2006 15:06 schrieb Jeroen T. Vermeulen:
> Is there any kind of pattern at all to this problem?  Anything
> recognizable?  A few typical pitfalls?

If data is not distributed evenly, then any old WHERE foo = $1 is prone to be 
the wrong plan for half of the possible values of $1.  The more data you have 
and the more it changes, the worse this gets.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Prepared statements considered harmful

From

Peter Eisentraut

Date:

31 August 2006, 10:36:36

Am Donnerstag, 31. August 2006 15:18 schrieb Andreas Pflug:
> Not to mention problems with outdated plans after schema changes. Using
> views unplanned (replanned) when used in joins could lead to improved
> resulting plans (e.g. if the view contains outer joins itself).

Views don't contain execution plans.  I don't see how this is relevant.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Prepared statements considered harmful

From

Andreas Pflug

Date:

31 August 2006, 10:37:28

Merlin Moncure wrote:
> On 8/31/06, Peter Eisentraut <peter_e@gmx.net> wrote:
>> With time, it becomes ever clearer to me that prepared SQL statements
>> are just
>> a really bad idea.  On some days, it seems like half the performance
>> problems
>> in PostgreSQL-using systems are because a bad plan was cached
>> somewhere.  I'd
>> say, in the majority of cases the time you save parsing and planning is
>> irrelevant compared to the possibly disastrous effects of wrong or
>> suboptimal
>> plans.  I wonder if other people have similar experiences.
>
> I have to respectfully disagree. I have used them to great effect in
> many of my projects.
Peter doesn't propose to remove prepared statements as such. They are
certainly of great value, if used carefully and specifically, as in your
case. The problems he's addressing stem from plans _implicitly_ created
and stored.
> In the most extreme case, prepared statements can
> provide a 50% reduction or greater in overall query time...this is too
> good a benefit to simply discard.  I worked on converted isam projects
> which would not have been possbile to make efficient without prepared
> statements.   However you are correct that the planner does often
> create wacky plans which can cause disasterous results in some cases.
>
> My major issue is that you cannot supply hints to the query engine.
I don't believe extending this thread to the we-need-hints issue is a
good idea.

Regards,
Andreas

Re: Prepared statements considered harmful

From

Csaba Nagy

Date:

31 August 2006, 10:43:47

On Thu, 2006-08-31 at 15:19, Peter Eisentraut wrote:
> OK, why don't you work out an example.  Let's look at this query:
> 
> SELECT * FROM t1 WHERE a LIKE $1;
> 
> What two plans would you prepare?

if substring($1 from 1 for 1) != '%' then use plan 1 (see below);
else use plan 2 (see below);
end if;

Save both plans from below with the meta-plan from above, and call it a
prepared plan.

cnagy=# create table t1 (a text);
CREATE TABLE
cnagy=# insert into t1 select round(10000000 * random()) from
generate_series(1,10000);
INSERT 0 10000
cnagy=# create index idx_t1_a on t1 (a);
CREATE INDEX
cnagy=# analyze verbose t1;
INFO:  analyzing "public.t1"
INFO:  "t1": scanned 55 of 55 pages, containing 10000 live rows and 0
dead rows; 3000 rows in sample, 10000 estimated total rows
ANALYZE
cnagy=# explain select a from t1 where a like '121%';                              QUERY PLAN
------------------------------------------------------------------------Bitmap Heap Scan on t1  (cost=2.06..27.63
rows=10width=10)  Filter: (a ~~ '121%'::text)  ->  Bitmap Index Scan on idx_t1_a  (cost=0.00..2.06 rows=10 width=0)
  Index Cond: ((a >= '121'::text) AND (a < '122'::text))
 
(4 rows)
cnagy=# explain select a from t1 where a like '%121';                     QUERY PLAN
------------------------------------------------------Seq Scan on t1  (cost=0.00..180.00 rows=80 width=10)  Filter: (a
~~'%121'::text)
 
(2 rows)


Cheers,
Csaba.

Re: Prepared statements considered harmful

From

Peter Eisentraut

Date:

31 August 2006, 10:52:29

Am Donnerstag, 31. August 2006 15:36 schrieb Csaba Nagy:
> On Thu, 2006-08-31 at 15:19, Peter Eisentraut wrote:
> > OK, why don't you work out an example.  Let's look at this query:
> >
> > SELECT * FROM t1 WHERE a LIKE $1;
> >
> > What two plans would you prepare?
>
> if substring($1 from 1 for 1) != '%' then
>   use plan 1 (see below);
> else
>   use plan 2 (see below);
> end if;

Note that plan 1 can only be created if you know the actual value for $1.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Prepared statements considered harmful

From

Csaba Nagy

Date:

31 August 2006, 10:56:47

On Thu, 2006-08-31 at 15:49, Peter Eisentraut wrote:
> Note that plan 1 can only be created if you know the actual value for $1.

Why would that be so ? The plan can contain functions of $1 (both
constants in plan 1 are a function of $1).

Cheers,
Csaba

Re: Prepared statements considered harmful

From

AgentM

Date:

31 August 2006, 11:23:25

On Aug 31, 2006, at 8:52 , Csaba Nagy wrote:

>
> This of course would need a lot more preparation time than just  
> prepare
> one plan, but that's why you want to do it upfront and then cache the
> results. A central plan repository mentioned in other posts would fit
> nicely here... and you could use prepared plans for non-parameterized
> queries too by simply considering the constants as parameters, to
> increase the chances for a prepared plan reuse - this of course for
> complex enough queries.

If prepared statements become more expensive to create, then it would  
make more sense for them to persist across sessions. All of an  
application's prepared statements could be cached.

-M

Re: Prepared statements considered harmful

From

Phil Frost

Date:

31 August 2006, 11:41:40

On Thu, Aug 31, 2006 at 08:06:57PM +0700, Jeroen T. Vermeulen wrote:
> On Thu, August 31, 2006 18:56, Peter Eisentraut wrote:
> 
> > With time, it becomes ever clearer to me that prepared SQL
> > statements are just a really bad idea.  On some days, it seems like
> > half the performance problems in PostgreSQL-using systems are
> > because a bad plan was cached somewhere.
> 
> Is there any kind of pattern at all to this problem?  Anything
> recognizable?  A few typical pitfalls?

Frequently I have found preplanning will result in a horrible plan
because it is assumed parameters may be volatile while in practice they
are literals. Here is a function from my database:

CREATE FUNCTION nullorblank(character varying) RETURNS boolean   AS $_$ select $1 is null or trim($1) = '' $_$
LANGUAGEsql IMMUTABLE;
 

This is used in stored procedures that answer search queries. For
example, let's consider one that searches products, filtered on any
number of "part number", "manufacturer", or "name". If one of these is
not specified, it does not restrict the query. One might write that
query so:

-- $1: part number
-- $2: manufacturer
-- $3: name

SELECT * FROM product WHERE (nullorblank($1) OR lower(partnumber) = lower($1)) AND (nullorblank($2) OR manufacturername
=$2) AND (nullorblank($3) OR name = $3)
 

The parameters will always be literal strings, taken from some form
presented to the user. If one does the parameter subsitution manually,
the plans are quite reasonable:

EXPLAIN ANALYZE
SELECT * FROM product WHERE (nullorblank('int2100/512') OR lower(partnumber) = lower('int2100/512')) AND
(nullorblank('')OR manufacturername = '') AND (nullorblank('') OR name = '');
 
                                                                  QUERY PLAN
                      
 

------------------------------------------------------------------------------------------------------------------------------------------------Result
(cost=15.54..4494.71 rows=1867 width=254) (actual time=43.502..43.507 rows=1 loops=1)  ->  Bitmap Heap Scan on product
(cost=15.54..4494.71rows=1867 width=254) (actual time=43.161..43.162 rows=1 loops=1)        Recheck Cond:
(lower((partnumber)::text)= 'int2100/512'::text)        ->  Bitmap Index Scan on product_partnumber_loweridx
(cost=0.00..15.54rows=1867 width=0) (actual time=43.022..43.022 rows=1 loops=1)              Index Cond:
(lower((partnumber)::text)= 'int2100/512'::text)Total runtime: 51.626 ms
 
(7 rows)

The 'manufacturername' and 'name' disjuncts have been removed by
simplification, since the expression is known to be true.

However, if "prepared", it's horrible:

PREPARE to_be_slow(text, text, text) AS
SELECT * FROM product WHERE (nullorblank($1) OR lower(partnumber) = lower($1))                       AND
(nullorblank($2)OR manufacturername = $2) AND (nullorblank($3) OR name = $3);
 

explain analyze execute to_be_slow('int2100/512', NULL, NULL);
                                                                                                    QUERY PLAN

                            
 

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Result
(cost=0.00..22317.13 rows=1 width=254) (actual time=1115.167..1579.535 rows=1 loops=1)  ->  Seq Scan on product
(cost=0.00..22317.12rows=1 width=254) (actual time=1114.845..1579.211 rows=1 loops=1)        Filter: (((($1)::character
varyingIS NULL) OR (btrim(($1)::text) = ''::text) OR (lower((partnumber)::text) = lower($1))) AND ((($2)::character
varyingIS NULL) OR (btrim(($2)::text) = ''::text) OR (manufacturername = $2)) AND ((($3)::character varying IS NULL) OR
(btrim(($3)::text)= ''::text) OR ((name)::text = $3)))Total runtime: 1580.006 ms
 
(5 rows)

Re: Prepared statements considered harmful

From

Peter Eisentraut

Date:

31 August 2006, 11:42:54

Am Donnerstag, 31. August 2006 16:09 schrieb Theo Schlossnagle:
> I don't chime in very often, but I do think the refusal to
> incorporate hints into the planner system is fantastically stubborn
> and nonsensical.

What is actually fantastically nonsensical about this is that the issues I 
outlined about prepared statements would merely become worse if planner hints 
were used.  Then, you wouldn't only have to worry about plans that were 
created earlier during the session, you would be faced with plans that were 
created earlier during the application's development.  In general, the 
solutions to the prepared statement issues need to effect that the plans are 
created more often, not less often.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Prepared statements considered harmful

From

Theo Schlossnagle

Date:

31 August 2006, 11:50:00

On Aug 31, 2006, at 9:25 AM, Peter Eisentraut wrote:

> Am Donnerstag, 31. August 2006 15:05 schrieb Merlin Moncure:
>> The proposal to supply hints to statements and functions has been
>> voted down several times due to the argument that it is better to fix
>> the planner.  I think supplying hints does fix the planner, and is a
>> balanced solution.
>
> Planner hints are a way to address a deficient planner.  But neither a
> manually hinted planner nor a perfectly good planner will help if the
> planning decisions are based on outdated information.

I don't chime in very often, but I do think the refusal to  
incorporate hints into the planner system is fantastically stubborn  
and nonsensical.  I whole-heartedly agree that it is _better_ to fix  
the planner, but many of us have production systems and can't just go  
check out CVS HEAD to address our day-to-day issues and we suffer  
from this decision.

There are many databases out there with better planners than  
PostgreSQL -- likely there will always be.  Even those databases have  
query planner hints.  Why?  Because the authors of those database had  
the humility to realize that the planner they designed wasn't perfect  
and that people _still_ need their database to perform well despite a  
non-optimal query plan here and there.

A good query planner hint system would act as a catalyst to the  
improvement of the current query planner as users could share their  
complex queries and associated improved query plans through hinting.

I like Postgres a lot, I think the people that work on it are very  
very sharp.  I do feel that the consistent refusal to allow query  
hinting to be introduced demonstrates an unhealthy amount of hubris  
that, in the end, negatively impacts users.

While Postgres is missing a ton of other needed features, I rarely  
see the attitude that they are _unwanted_.  Instead I see the "if it  
is important to you, go build it" attitude which is what I would  
expect in an open source project.

// Theo Schlossnagle
// CTO -- http://www.omniti.com/~jesus/
// OmniTI Computer Consulting, Inc. -- http://www.omniti.com/

Re: Prepared statements considered harmful

From

Peter Eisentraut

Date:

31 August 2006, 11:53:08

Am Donnerstag, 31. August 2006 16:26 schrieb Andrew Dunstan:
> Cached plans etc. might have an impact, but please do not overlook the
> benefits of parameterized queries in avoiding SQL injection attacks, as
> well as often being much cleaner to code.

That might be part of the confusion.  Composing queries with the variable 
parameters out of line is a very nice feature.  But that concept is totally 
independent of the question whether the execution plan should be cached.  The 
APIs (and their documentations) just don't convey that very well.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Prepared statements considered harmful

From

Lukas Kahwe Smith

Date:

31 August 2006, 12:15:51

Peter Eisentraut wrote:
> Am Donnerstag, 31. August 2006 14:52 schrieb Csaba Nagy:
>> So for the like query case you could save 2 plans, one for the indexable
>> case, one for the not indexable case. Then at runtime you choose the
>> proper one based on the pattern value.
> 
> OK, why don't you work out an example.  Let's look at this query:
> 
> SELECT * FROM t1 WHERE a LIKE $1;
> 
> What two plans would you prepare?

Well I guess for the case that none of the "expected" plans fit you can 
always fallback to generating a new plan on the fly.

Anyways it would of course be cool if pgsql could set an invalid flag if 
it detects that a certain plan performed badly (maybe even automatically 
cause a fresh table analysis) or some DDL/DML was executed that likely 
invalidated the plan.

I am not sure if there is any "philosphie" that pgsql tries to adhere 
to. Does it want to leave the job of tuning to the DBA or does it want 
to do things automatically (which always means that in some situations 
it will do the wrong thing).

tweak planner vs. planner hints
manually analyze vs. automatically analyze
manual vaccum vs autovaccum

Hmm actually its probably not a black and white thing and the ultimate 
goal would be to offer both with maybe some installer checkbox to 
default everything to "DBA-less" automode.

Anyways I never liked the idea of planner hints. I think it makes much 
more sense to give people direct access to plans in that case. Meaning 
they can "partially" hardcode (parameterized) plans they want. I have 
mentioned before that Sybase seems to have such a feature (you can dump 
plans, tweak them and remove pieces that should be done on the fly and 
associate them with stored procedures - not sure if you also do that for 
prepared statements).

regards,
Lukas

Re: Prepared statements considered harmful

From

mark@mark.mielke.cc

Date:

31 August 2006, 12:18:56

On Thu, Aug 31, 2006 at 01:56:29PM +0200, Peter Eisentraut wrote:
> With time, it becomes ever clearer to me that prepared SQL
> statements are just a really bad idea.  On some days, it seems like
> half the performance problems in PostgreSQL-using systems are
> because a bad plan was cached somewhere.  I'd say, in the majority
> of cases the time you save parsing and planning is irrelevant
> compared to the possibly disastrous effects of wrong or suboptimal
> plans.  I wonder if other people have similar experiences.
> ...
> Comments?

Hello.

I'm attempting to understand why prepared statements would be used for
long enough for tables to change to a point that a given plan will
change from 'optimal' to 'disastrous'.

Wouldn't this require that the tables are completely re-written, or
that their data is drastically updated? For my own tables, most of the
data remains static for months on end. Data is accumulated. Small
changes are made. I don't see why a prepared statement used over a
24 hour period would ever become disastrous.

This suggests to me that you are doing either:
  1) Maintaining prepared statements for weeks or months at a time.
  2) Churning your tables up into a froth.

I'm guessing, as you mentioned JDBC, that you might be hitting 1), in
the context of JDBC being used from a Web Application, where the
application server holds a connection open for weeks or months at a
time. If so, it does sound as if JDBC is doing wrong by keeping
prepared queries around for that long. A time limit of an hour, or
even a few minutes would make sense.

My experience does not match yours. Prepared queries have always
significantly improved my execution times. They do have a place.
Whatever the scenarios you are hitting should be dealt with, possibly
in JDBC.

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...

                          http://mark.mielke.cc/

Re: Prepared statements considered harmful

From

mark@mark.mielke.cc

Date:

31 August 2006, 12:27:29

On Thu, Aug 31, 2006 at 03:36:25PM +0200, Csaba Nagy wrote:
> On Thu, 2006-08-31 at 15:19, Peter Eisentraut wrote:
> > OK, why don't you work out an example.  Let's look at this query:
> > SELECT * FROM t1 WHERE a LIKE $1;
> > What two plans would you prepare?
> if substring($1 from 1 for 1) != '%' then
>   use plan 1 (see below);
> else
>   use plan 2 (see below);
> end if;

It would be cool if PostgreSQL did this - but I think it is also
true that anybody (or JDBC) who tried to prepare a plan in the cases
that are known to cause problems, is making a mistake.

While on the 'it would be cool' subject - I think it might be cool if
the prepare statement took sample arguments that could be used to
prepare the plans with. "Prepare a plan that would work best with
these arguments." Then JDBC could prepare both plans for you - if it
was smart enough... :-)

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...

                          http://mark.mielke.cc/

Re: Prepared statements considered harmful

From

mark@mark.mielke.cc

Date:

31 August 2006, 12:33:41

On Thu, Aug 31, 2006 at 10:09:10AM -0400, Theo Schlossnagle wrote:
> There are many databases out there with better planners than  
> PostgreSQL -- likely there will always be.  Even those databases have  
> query planner hints.  Why?  Because the authors of those database had  
> the humility to realize that the planner they designed wasn't perfect  
> and that people _still_ need their database to perform well despite a  
> non-optimal query plan here and there.

> A good query planner hint system would act as a catalyst to the  
> improvement of the current query planner as users could share their  
> complex queries and associated improved query plans through hinting.

Would a hint system allow the planner to execute quicker? Eliminate
plans from consideration early, without evaluation how long they might
take to execute? Sort of possible today with toggling of the 'seqscan'
and other such options... :-)

> I like Postgres a lot, I think the people that work on it are very  
> very sharp.  I do feel that the consistent refusal to allow query  
> hinting to be introduced demonstrates an unhealthy amount of hubris  
> that, in the end, negatively impacts users.

Hubris isn't always bad. If hints were provided, the need for the
fully automatic planner to improve would be reduced. But yes, they do
seem to be competing goals, disenfranchising the user.

> While Postgres is missing a ton of other needed features, I rarely  
> see the attitude that they are _unwanted_.  Instead I see the "if it  
> is important to you, go build it" attitude which is what I would  
> expect in an open source project.

There is also "what you submit should be maintainable because we know
you might disappear at any time".

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/

Re: Prepared statements considered harmful

From

Tom Lane

Date:

31 August 2006, 12:33:50

Peter Eisentraut <peter_e@gmx.net> writes:
> With time, it becomes ever clearer to me that prepared SQL statements
> are just a really bad idea.

That's an overstatement, but I'll agree that they have strong
limitations.

> I'd wish that we reconsider when and how prepared statements are used.  The 
> JDBC interface and PL/pgSQL are frequently noticed perpetrators, but the 
> problem is really all over the place.

AFAIK those are the only two places where preparation is the default
... what else were you thinking of?

> - Transparently invalidate and regenerate prepared plans more often.  This 
> could be tied to the transaction count, update activity obtained from the 
> statistics collector, etc.

FWIW, I've assumed right along that once we have a plan-invalidation
mechanism, any ANALYZE stats update would invalidate affected plans.

> - Redefine "prepared" to mean "parsed" rather than "parsed and planned".

For plan-inval to work in all cases, we'll have to store either the source
query string or the raw grammar's output tree, before even parse analysis.
Is that what you are thinking of?  It's hardly "prepared" at all if you
do that.

As noted downthread, we've confused out-of-line parameter value shipping
with prepared statements.  It might be worth rejiggering the FE/BE
protocol to separate those things better.
        regards, tom lane

Re: Prepared statements considered harmful

From

AgentM

Date:

31 August 2006, 12:39:38

On Aug 31, 2006, at 11:18 , mark@mark.mielke.cc wrote:

> I'm attempting to understand why prepared statements would be used for
> long enough for tables to change to a point that a given plan will
> change from 'optimal' to 'disastrous'.
>
> Wouldn't this require that the tables are completely re-written, or
> that their data is drastically updated? For my own tables, most of the
> data remains static for months on end. Data is accumulated. Small
> changes are made. I don't see why a prepared statement used over a
> 24 hour period would ever become disastrous.

Scenario: A web application maintains a pool of connections to the  
database. If the connections have to be regularly restarted due to a  
postgres implementation detail (stale plans), then that is a database  
deficiency.

-M

Re: Prepared statements considered harmful

From

Andrew Dunstan

Date:

31 August 2006, 12:43:39

Tom Lane wrote:
> As noted downthread, we've confused out-of-line parameter value shipping
> with prepared statements.  It might be worth rejiggering the FE/BE
> protocol to separate those things better.
>
>     

Well, that's surely not going to happen in a hurry, is it? Maybe a quick 
fix would be a way to allow the user to turn plan caching on and off.

cheers

andrew

Re: Prepared statements considered harmful

From

Andrew - Supernews

Date:

31 August 2006, 12:47:36

On 2006-08-31, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'd wish that we reconsider when and how prepared statements are used.  The 
>> JDBC interface and PL/pgSQL are frequently noticed perpetrators, but the 
>> problem is really all over the place.
>
> AFAIK those are the only two places where preparation is the default

RI triggers.

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

Re: Prepared statements considered harmful

From

mark@mark.mielke.cc

Date:

31 August 2006, 12:57:12

On Thu, Aug 31, 2006 at 11:36:36AM -0400, AgentM wrote:
> On Aug 31, 2006, at 11:18 , mark@mark.mielke.cc wrote:
> >I'm attempting to understand why prepared statements would be used for
> >long enough for tables to change to a point that a given plan will
> >change from 'optimal' to 'disastrous'.
> >
> >Wouldn't this require that the tables are completely re-written, or
> >that their data is drastically updated? For my own tables, most of the
> >data remains static for months on end. Data is accumulated. Small
> >changes are made. I don't see why a prepared statement used over a
> >24 hour period would ever become disastrous.

> Scenario: A web application maintains a pool of connections to the  
> database. If the connections have to be regularly restarted due to a  
> postgres implementation detail (stale plans), then that is a database  
> deficiency.

Or a JDBC deficiency. Nobody is forcing JDBC to automatically reuse a
prepared plan indefinately. If automatically prepared, it can
regenerate them whenever it wishes.

Does Oracle automatically regenerate prepared plans on occasion?

I don't consider it a deficiency. It is doing exactly what you are
asking it to do. That it isn't second guessing you isn't a deficiency.
For all PostgreSQL knows, your tables are not changing such that a
query a week later is suddenly disastrous because the consistency of
your data has changed drastically, and what you prepared a week ago,
and chose to execute today, is still the optimal plan.

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...

                          http://mark.mielke.cc/

Re: Prepared statements considered harmful

From

Tom Lane

Date:

31 August 2006, 13:05:22

AgentM <agentm@themactionfaction.com> writes:
> On Aug 31, 2006, at 11:18 , mark@mark.mielke.cc wrote:
>> I'm attempting to understand why prepared statements would be used for
>> long enough for tables to change to a point that a given plan will
>> change from 'optimal' to 'disastrous'.

> Scenario: A web application maintains a pool of connections to the  
> database. If the connections have to be regularly restarted due to a  
> postgres implementation detail (stale plans), then that is a database  
> deficiency.

The two major complaints that I've seen are

* plpgsql's prepared plans don't work at all for scenarios involving
temp tables that are created and dropped in each use of the function.
Then, the plan needs to be regenerated on every successive call.
Right now we tell people they have to use EXECUTE, which is painful
and gives up unnecessary amounts of performance (because it might
well be useful to cache a plan for the lifespan of the table).

* for parameterized queries, a generic plan gives up too much
performance compared to one generated for specific constant parameter
values.

Neither of these problems have anything to do with statistics getting
stale.
        regards, tom lane

Re: Prepared statements considered harmful

From

Tom Lane

Date:

31 August 2006, 13:11:20

Andrew Dunstan <andrew@dunslane.net> writes:
> Tom Lane wrote:
>> As noted downthread, we've confused out-of-line parameter value shipping
>> with prepared statements.  It might be worth rejiggering the FE/BE
>> protocol to separate those things better.

> Well, that's surely not going to happen in a hurry, is it? Maybe a quick 
> fix would be a way to allow the user to turn plan caching on and off.

There aren't any "quick fixes" here (at least nothing that's likely to
appear in 8.2).  But I didn't mean the above suggestion as our only
response to Peter's criticism --- more that that is one of several areas
we ought to think about.
        regards, tom lane

Re: Prepared statements considered harmful

From

Csaba Nagy

Date:

31 August 2006, 13:16:03

> Neither of these problems have anything to do with statistics getting
> stale.

... and the second one would benefit from a "meta-plan" facility which
puts some "meta-plan" nodes on top of specific plans to dispatch based
on parameter values at runtime.

Incidentally, the dispatch could check the statistics assumptions too.
If you don't need to do the planning for each execution, you could
afford to check the assumptions for each execution instead...

Cheers,
Csaba.

Re: Prepared statements considered harmful

From

AgentM

Date:

31 August 2006, 13:22:38

On Aug 31, 2006, at 12:04 , Tom Lane wrote:

>
> The two major complaints that I've seen are
>
<snip>
> Neither of these problems have anything to do with statistics getting
> stale.

Not stats-- plans. Plan invalidation has been discussed before, no?

-M

Re: Prepared statements considered harmful

From

Martijn van Oosterhout

Date:

31 August 2006, 13:23:04

On Thu, Aug 31, 2006 at 11:27:18AM -0400, Tom Lane wrote:
> > I'd wish that we reconsider when and how prepared statements are used.  The
> > JDBC interface and PL/pgSQL are frequently noticed perpetrators, but the
> > problem is really all over the place.
>
> AFAIK those are the only two places where preparation is the default
> ... what else were you thinking of?

Perl DBI (DBD::Pg) defaults to prepared plans when connecting to a
version 8.0 or higher server.

Or at least, that's the way I read the documentation.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Prepared statements considered harmful

From

Lukas Kahwe Smith

Date:

31 August 2006, 13:30:05

Martijn van Oosterhout wrote:
> On Thu, Aug 31, 2006 at 11:27:18AM -0400, Tom Lane wrote:
>>> I'd wish that we reconsider when and how prepared statements are used.  The 
>>> JDBC interface and PL/pgSQL are frequently noticed perpetrators, but the 
>>> problem is really all over the place.
>> AFAIK those are the only two places where preparation is the default
>> ... what else were you thinking of?
> 
> Perl DBI (DBD::Pg) defaults to prepared plans when connecting to a
> version 8.0 or higher server.
> 
> Or at least, that's the way I read the documentation.

AFAIK this is also the case for PHP PDO extension, which is bundled 
since PHP 5.1.

regards,
Lukas

Re: Prepared statements considered harmful

From

Lukas Kahwe Smith

Date:

31 August 2006, 13:35:28

Lukas Kahwe Smith wrote:
> Martijn van Oosterhout wrote:
>> On Thu, Aug 31, 2006 at 11:27:18AM -0400, Tom Lane wrote:
>>>> I'd wish that we reconsider when and how prepared statements are 
>>>> used.  The JDBC interface and PL/pgSQL are frequently noticed 
>>>> perpetrators, but the problem is really all over the place.
>>> AFAIK those are the only two places where preparation is the default
>>> ... what else were you thinking of?
>>
>> Perl DBI (DBD::Pg) defaults to prepared plans when connecting to a
>> version 8.0 or higher server.
>>
>> Or at least, that's the way I read the documentation.
> 
> AFAIK this is also the case for PHP PDO extension, which is bundled 
> since PHP 5.1.

BTW: PDO has gotten a switch to force client side placeholder 
replacement in favor of using server side prepared statements due to the 
fact that prepared statements side-step the MySQL query cache.

http://netevil.org/node.php?uuid=444a6017-0548-2459-2943-44a601714d58
BTW: I am not posting this to solicit MySQL bashing.

The main reason why PDO pushes prepared statements is the fact that they 
offer good protection against SQL injection. However obviously in shared 
nothing architectures like PHP, which does not yet have any sort of 
connection/statement-pooling solution, the danger of prepared statements 
becoming stale over time is small. However the problem of running the 
same statements with two different parameters that require different 
plans is still quite real.

regards,
Lukas

Re: Prepared statements considered harmful

From

Peter Eisentraut

Date:

31 August 2006, 13:39:44

AgentM wrote:
> On Aug 31, 2006, at 12:04 , Tom Lane wrote:
> > The two major complaints that I've seen are
>
> <snip>
>
> > Neither of these problems have anything to do with statistics
> > getting stale.
>
> Not stats-- plans. Plan invalidation has been discussed before, no?

Plan invalidation helps with schema changes and data changes but not 
with parametrized queries.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Prepared statements considered harmful

From

Martijn van Oosterhout

Date:

31 August 2006, 14:29:52

On Thu, Aug 31, 2006 at 06:34:45PM +0200, Lukas Kahwe Smith wrote:
> BTW: PDO has gotten a switch to force client side placeholder
> replacement in favor of using server side prepared statements due to the
> fact that prepared statements side-step the MySQL query cache.

Perl DBD:Pg also has a switch to force one way or the other.

However (as has been stated already) people are confusing prepared
statements with out-of-line parameters. Even DBI uses the phrase
"prepare" for setting up statements, whereas this doesn't actually
require server-side prepare, all it needs is out-of-line parameters.

I see from the source that DBD::Pg does use PQexecParams() sometimes so
maybe it does support out-of-line parameters...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Prepared statements considered harmful

From

"Merlin Moncure"

Date:

31 August 2006, 14:46:25

On 8/31/06, Peter Eisentraut <peter_e@gmx.net> wrote:
> Am Donnerstag, 31. August 2006 15:05 schrieb Merlin Moncure:
> > The proposal to supply hints to statements and functions has been
> > voted down several times due to the argument that it is better to fix
> > the planner.  I think supplying hints does fix the planner, and is a
> > balanced solution.
>
> Planner hints are a way to address a deficient planner.  But neither a
> manually hinted planner nor a perfectly good planner will help if the
> planning decisions are based on outdated information.

right, anyways it's clearer now what you are suggesting and I think
your idea regarding impicitly generated plans has some merit.  the
major annoyance for me is I have to force disconnect anytime there is
a schema search_path change.

query hints, which I still think would make my life much easier, do
not have much to do with the thrust of your argument.

I think, maybe to add some intelligence to implicit plan generation
parhaps guarded by GUC:
implicit_plan_generation=[none, smart, all]
with smart meaning some defined events including perhaps:
* creation or deletion of temp table
* duration of time
* user invocation
* manipulation of search_path

just thinking out loud here,
merlin

Re: Prepared statements considered harmful

From

Gregory Stark

Date:

31 August 2006, 14:47:24

Peter Eisentraut <peter_e@gmx.net> writes:

> - Redefine "prepared" to mean "parsed" rather than "parsed and planned".

Then you would be going very much against the user's expectations. 

Driver interfaces expose very clearly to the user an explicit interface to
prepare and execute a query separately. What your proposing is to go behind
the user's back and do what he's gone out of his way to tell you not to do.
You can always choose to prepare your queries immediately before use. Most
drivers even supply an interface to do so in a single step for convenience.

If you've gone to the trouble of saving the prepared query handle you very
much do NOT want the database spontaneously deciding to change the behaviour
of that query (even just the performance behaviour) without warning.

In fact somewhere down the list of my personal wishlist for Postgres is plan
stability which would let the DBA control exactly when plans could change.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com

Re: Prepared statements considered harmful

From

Peter Eisentraut

Date:

31 August 2006, 15:01:20

Gregory Stark wrote:
> Then you would be going very much against the user's expectations.
>
> Driver interfaces expose very clearly to the user an explicit
> interface to prepare and execute a query separately. What your
> proposing is to go behind the user's back and do what he's gone out
> of his way to tell you not to do. You can always choose to prepare
> your queries immediately before use. Most drivers even supply an
> interface to do so in a single step for convenience.

Let's verify that.  JDBC and PL/pgSQL have been mentioned.

The JDBC documentation merely contains statements of the sort "A SQL 
statement with or without IN parameters can be pre-compiled and stored 
in a PreparedStatement object. This object can then be used to 
efficiently execute this statement multiple times."  There is 
absolutely no indication that the execution plan of the statement is 
computed at the time of preparation.  In fact, it doesn't say 
what "pre-compiled" means at all.

For PL/pgSQL, you simply write a query and all the preparing action 
happens implicitly.  There is nothing explicit about that interface.

So if users have certain expectations here, they're just making them up.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Prepared statements considered harmful

From

Martijn van Oosterhout

Date:

31 August 2006, 15:09:03

On Thu, Aug 31, 2006 at 06:43:38PM +0100, Gregory Stark wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > - Redefine "prepared" to mean "parsed" rather than "parsed and planned".
>
> Then you would be going very much against the user's expectations.
>
> Driver interfaces expose very clearly to the user an explicit interface to
> prepare and execute a query separately. What your proposing is to go behind
> the user's back and do what he's gone out of his way to tell you not to do.
> You can always choose to prepare your queries immediately before use. Most
> drivers even supply an interface to do so in a single step for convenience.

Is that really so? Under Perl DBI, the only way to get a statement
handle is to "prepare" it. Yet I don't want to use server-side prepares
because I know of the problems it causes. The "single-step" approach
provides no statement handle at all, which has several drawbacks.

People are encouraged to use prepared stataments for clarity and
security reasons, not speed. I would really like an option to choose
between:

- slightly more planning time but always good plans
- plan once and be unforgiving if the plan doesn't work with the
parameters

I'd take the first option anyday, but that's just the types of queries
I'm doing.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Prepared statements considered harmful

From

Andrew Dunstan

Date:

31 August 2006, 15:16:54

Martijn van Oosterhout wrote:
> On Thu, Aug 31, 2006 at 06:43:38PM +0100, Gregory Stark wrote:
>   
>> Peter Eisentraut <peter_e@gmx.net> writes:
>>     
>>> - Redefine "prepared" to mean "parsed" rather than "parsed and planned".
>>>       
>> Then you would be going very much against the user's expectations. 
>>
>> Driver interfaces expose very clearly to the user an explicit interface to
>> prepare and execute a query separately. What your proposing is to go behind
>> the user's back and do what he's gone out of his way to tell you not to do.
>> You can always choose to prepare your queries immediately before use. Most
>> drivers even supply an interface to do so in a single step for convenience.
>>     
>
> Is that really so? Under Perl DBI, the only way to get a statement
> handle is to "prepare" it. Yet I don't want to use server-side prepares
> because I know of the problems it causes. The "single-step" approach
> provides no statement handle at all, which has several drawbacks.
>
> People are encouraged to use prepared stataments for clarity and
> security reasons, not speed. I would really like an option to choose
> between:
>
> - slightly more planning time but always good plans
> - plan once and be unforgiving if the plan doesn't work with the
> parameters
>
> I'd take the first option anyday, but that's just the types of queries
> I'm doing.
>
> Have a nice day,
>   

According to the docs you can actually choose between server side 
prepare or not on a per call basis. It contains this example:
            $sth->{pg_server_prepare} = 1;            $sth->execute(22);            $sth->{pg_server_prepare} = 0;
     $sth->execute(44);            $sth->{pg_server_prepare} = 1;            $sth->execute(66);
 

cheers

andrew

Re: Prepared statements considered harmful

From

Jeff Davis

Date:

31 August 2006, 15:57:50

On Thu, 2006-08-31 at 10:09 -0400, Theo Schlossnagle wrote:
> On Aug 31, 2006, at 9:25 AM, Peter Eisentraut wrote:
> 
> > Am Donnerstag, 31. August 2006 15:05 schrieb Merlin Moncure:
> >> The proposal to supply hints to statements and functions has been
> >> voted down several times due to the argument that it is better to fix
> >> the planner.  I think supplying hints does fix the planner, and is a
> >> balanced solution.
> >

> There are many databases out there with better planners than  
> PostgreSQL -- likely there will always be.  Even those databases have  
> query planner hints.  Why?  Because the authors of those database had  
> the humility to realize that the planner they designed wasn't perfect  
> and that people _still_ need their database to perform well despite a  
> non-optimal query plan here and there.

You can see a related discussion here:

http://archives.postgresql.org/pgsql-hackers/2006-08/msg00463.php

What I understood from that thread was that the concept of planner hints
was not completely rejected. I think the most likely outcome (if any
planning system is implemented) is some mechanism to state the hint in a
separate SQL declaration rather than inside the query itself. This can
still result in potentially stale (or very stale) plans, but at least
you don't have to change your application every time you modify the
"hints". However, as far as I know, this has not progressed beyond the
brainstorming stage.

I think many people are still very skeptical of various implementations
of planner hints, but there is some reasonable level of discussion.

Regards,Jeff Davis

Re: Prepared statements considered harmful

From

Tom Lane

Date:

31 August 2006, 16:01:35

Peter Eisentraut <peter_e@gmx.net> writes:
> Gregory Stark wrote:
>> Driver interfaces expose very clearly to the user an explicit
>> interface to prepare and execute a query separately.

> The JDBC documentation merely contains statements of the sort "A SQL 
> statement with or without IN parameters can be pre-compiled and stored 
> in a PreparedStatement object. This object can then be used to 
> efficiently execute this statement multiple times."  There is 
> absolutely no indication that the execution plan of the statement is 
> computed at the time of preparation.

The key word there is "efficiently".  I think it is a reasonable
presumption on the user's part that a query done this way will have less
overhead than just resubmitting the raw query each time.

The important thing I see here is that JDBC allows use of IN parameters
with or without a PreparedStatement (no?).  So they've separated the
concepts of out-of-line parameters and "preparing" a statement.  That's
the distinction we have unfortunately fudged in the V3 protocol.

The protocol does let you use OOL parameters without retaining a
prepared plan, thanks to the hack introduced later to not plan the
unnamed statement at Parse time, but that's definitely a bit of a wart
on the original protocol design.  Maybe it's good enough, or maybe not.
        regards, tom lane

Re: Prepared statements considered harmful

From

Martijn van Oosterhout

Date:

31 August 2006, 16:14:50

On Thu, Aug 31, 2006 at 02:58:48PM -0400, Tom Lane wrote:
> The protocol does let you use OOL parameters without retaining a
> prepared plan, thanks to the hack introduced later to not plan the
> unnamed statement at Parse time, but that's definitely a bit of a wart
> on the original protocol design.  Maybe it's good enough, or maybe not.

Urk, so it was a hack. Unfortunatly it seems something you can't really
change without changing the protocol.

So what are the options now? A GUC like so:

prepare_means_plan = [true|false]

So then a prepare will always parse straightaway, but you can choose
whether or not you want to plan straightaway or at bind time.

Would this be acceptable?

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Prepared statements considered harmful

From

Martijn van Oosterhout

Date:

31 August 2006, 16:17:38

On Thu, Aug 31, 2006 at 02:16:32PM -0400, Andrew Dunstan wrote:
> According to the docs you can actually choose between server side
> prepare or not on a per call basis. It contains this example:

Yeah, but it also contains this:
   Using prepared statements is in theory quite a bit faster: not only   does the PostgreSQL backend only have to
preparethe query only   once, but DBD::Pg no longer has to worry about quoting each value   before sending it to the
server.

Which just continues the misconception: you can not worry about quoting
each value and still not use server-side prepares. There's a third
option which is not made clear (and it's not clear if it's available
via DBI).

Basically, unnamed "prepares" are not planned until bind time, named
statements are planned at prepare time. The question is, do you want to
be able to defer planning for named statements also?

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Prepared statements considered harmful

From

Dave Cramer

Date:

31 August 2006, 16:17:53

On 31-Aug-06, at 2:58 PM, Tom Lane wrote:

> Peter Eisentraut <peter_e@gmx.net> writes:
>> Gregory Stark wrote:
>>> Driver interfaces expose very clearly to the user an explicit
>>> interface to prepare and execute a query separately.
>
>> The JDBC documentation merely contains statements of the sort "A SQL
>> statement with or without IN parameters can be pre-compiled and  
>> stored
>> in a PreparedStatement object. This object can then be used to
>> efficiently execute this statement multiple times."  There is
>> absolutely no indication that the execution plan of the statement is
>> computed at the time of preparation.
>
> The key word there is "efficiently".  I think it is a reasonable
> presumption on the user's part that a query done this way will have  
> less
> overhead than just resubmitting the raw query each time.
>
> The important thing I see here is that JDBC allows use of IN  
> parameters
> with or without a PreparedStatement (no?).

No, not that I am aware of. You can create a statement, and execute  
it, but you need a PreparedStatement to set IN parameters


> So they've separated the
> concepts of out-of-line parameters and "preparing" a statement.   
> That's
> the distinction we have unfortunately fudged in the V3 protocol.
>
> The protocol does let you use OOL parameters without retaining a
> prepared plan, thanks to the hack introduced later to not plan the
> unnamed statement at Parse time, but that's definitely a bit of a wart
> on the original protocol design.  Maybe it's good enough, or maybe  
> not.
>
>             regards, tom lane
>
> ---------------------------(end of  
> broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org
>
Dave

Re: Prepared statements considered harmful

From

Hannu Krosing

Date:

31 August 2006, 16:31:36

Ühel kenal päeval, N, 2006-08-31 kell 20:01, kirjutas Peter Eisentraut:

> For PL/pgSQL, you simply write a query and all the preparing action 
> happens implicitly.  There is nothing explicit about that interface.
>
> So if users have certain expectations here, they're just making them up.

Or basing them on experience.

I for one would not like it at all if all my queries (select * from
plpgsqlfunc()) just magically become slower by 10-50% 

If there will be an option not to plan/optimise prepared statemants, I
would certainly expect it to be off by default.

-- 
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com

Re: Prepared statements considered harmful

From

Tom Lane

Date:

31 August 2006, 17:34:07

Martijn van Oosterhout <kleptog@svana.org> writes:
> So what are the options now? A GUC like so:
> prepare_means_plan = [true|false]
> So then a prepare will always parse straightaway, but you can choose
> whether or not you want to plan straightaway or at bind time.

That seems like just a kluge, as you'd typically want query-by-query
control, and a GUC setting isn't convenient for that.

It's entirely possible that the current protocol definition is Good
Enough, assuming that client-library designers are aware of the
implications of using named vs unnamed statements (which I bet not
all of 'em are).  You *can* have either behavior today, so far as
client-issued queries go.  The area that seems to need work more
drastically is controlling what happens with queries inside plpgsql.
        regards, tom lane

Re: Prepared statements considered harmful

From

Gregory Stark

Date:

31 August 2006, 18:48:46

mark@mark.mielke.cc writes:

> Does Oracle automatically regenerate prepared plans on occasion?

Not due to statistics changes, only if your schema changes.

(caveat: I last used Oracle back at 8i)

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com

Re: Prepared statements considered harmful

From

Gregory Stark

Date:

31 August 2006, 19:02:32

Peter Eisentraut <peter_e@gmx.net> writes:

> Gregory Stark wrote:
>
> Let's verify that.  JDBC and PL/pgSQL have been mentioned.
>
> The JDBC documentation merely contains statements of the sort "A SQL 
> statement with or without IN parameters can be pre-compiled and stored 
> in a PreparedStatement object. This object can then be used to 
> efficiently execute this statement multiple times."  There is 
> absolutely no indication that the execution plan of the statement is 
> computed at the time of preparation.  In fact, it doesn't say 
> what "pre-compiled" means at all.

I didn't say you were violating the technical definition in the specification.
I said you're going against expectations. This is the problem with being
dogmatic about abstraction boundaries. Obviously someone who doesn't know
what's going on under the hood has no specific expectations about what
"pre-compiling" might mean. 

But the reality is that you can't effectively use a database without
understanding what query plans are and users do have expectations about
behaviour below the abstraction barrier.

If you don't think "pre-compiled" and "efficiently execute multiple times"
doesn't translate into "generates a query plan so it doesn't have to go
through that process to execute the query" I think you're in a very small
minority.

> For PL/pgSQL, you simply write a query and all the preparing action 
> happens implicitly.  There is nothing explicit about that interface.

Well that's sort of the inherent problem with PLpgSQL and the way it mixes up
the procedural language with SQL.

I guess the natural extension of questioning PL/pgSQL would be to wonder why
subqueries in SQL queries don't get replanned every time they're executed. The
data distribution could certainly change partway though. 

> So if users have certain expectations here, they're just making them up.

Well, that's what makes them "expectations" rather than promises.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com

Re: Prepared statements considered harmful

From

Gregory Stark

Date:

31 August 2006, 20:04:50

Martijn van Oosterhout <kleptog@svana.org> writes:

> > Then you would be going very much against the user's expectations. 
> > 
> > Driver interfaces expose very clearly to the user an explicit interface to
> > prepare and execute a query separately. What your proposing is to go behind
> > the user's back and do what he's gone out of his way to tell you not to do.
> > You can always choose to prepare your queries immediately before use. Most
> > drivers even supply an interface to do so in a single step for convenience.
> 
> Is that really so? Under Perl DBI, the only way to get a statement
> handle is to "prepare" it. 

Sure, but you can prepare it right before you use it and throw it away instead
of keeping it around.

The server has to prepare the query sometime. The v3 protocol just gives you
control over when that happens, but it doesn't force you to do it at any
particular time.

-- 
greg

Re: Prepared statements considered harmful

From

Martijn van Oosterhout

Date:

01 September 2006, 03:44:27

On Thu, Aug 31, 2006 at 07:04:07PM -0400, Gregory Stark wrote:
> The server has to prepare the query sometime. The v3 protocol just gives you
> control over when that happens, but it doesn't force you to do it at any
> particular time.

Not really. All named prepares are planned straight away, all unnamed
ones are planned at bind time. Therefore you cannot have more than one
parsed-but-not-planned prepared query at a time. In a connection pool
scenario there's no way to share such plans since you can't tell which
query has been prepared. That's not forcing, but it's an asymmetry we
could do with out.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Prepared statements considered harmful

From

Lukas Kahwe Smith

Date:

01 September 2006, 04:32:18

Martijn van Oosterhout wrote:
> On Thu, Aug 31, 2006 at 07:04:07PM -0400, Gregory Stark wrote:
>> The server has to prepare the query sometime. The v3 protocol just gives you
>> control over when that happens, but it doesn't force you to do it at any
>> particular time.
> 
> Not really. All named prepares are planned straight away, all unnamed
> ones are planned at bind time. Therefore you cannot have more than one
> parsed-but-not-planned prepared query at a time. In a connection pool
> scenario there's no way to share such plans since you can't tell which
> query has been prepared. That's not forcing, but it's an asymmetry we
> could do with out.

AFAIK since Oracle 9i planning is always deferred until the first 
execution. This way they hope to get a better plan, which would 
obviously not be possible if the selectivity varies greatly.

So are the plans generated without taking any bound values into account 
more stable in performance (albeit at times slower than what would have 
been produced if the value would have been known)?

Either way mixing the question of when to prepare the plan with the 
prepared statement being named or unnamed seems unexpected.

regards,
Lukas

Re: Prepared statements considered harmful

From

"Zeugswetter Andreas DCP SD"

Date:

01 September 2006, 04:40:18

> >>> I'd wish that we reconsider when and how prepared statements are
> >>> used.  The JDBC interface and PL/pgSQL are frequently noticed
> >>> perpetrators, but the problem is really all over the place.
> >> AFAIK those are the only two places where preparation is
> the default
> >> ... what else were you thinking of?
> >
> > Perl DBI (DBD::Pg) defaults to prepared plans when connecting to a
> > version 8.0 or higher server.
> >
> > Or at least, that's the way I read the documentation.

Yea, but if you close the statement or leave the scope of the statement
variable the plan is gone. So it is doing exactly what I would expect.

It is written $stmt->prepare('select 1') what else would you expect ?
There are enough other functions to get a result without a plan sticking
around, like $db->selectrow_array ....

Andreas

Re: Prepared statements considered harmful

From

Martijn van Oosterhout

Date:

01 September 2006, 04:50:13

On Fri, Sep 01, 2006 at 09:26:24AM +0200, Lukas Kahwe Smith wrote:
> AFAIK since Oracle 9i planning is always deferred until the first
> execution. This way they hope to get a better plan, which would
> obviously not be possible if the selectivity varies greatly.

Interesting.

> So are the plans generated without taking any bound values into account
> more stable in performance (albeit at times slower than what would have
> been produced if the value would have been known)?

Possibly, though they might uniformly suck or be uniformly good...

> Either way mixing the question of when to prepare the plan with the
> prepared statement being named or unnamed seems unexpected.

Yes, well, when the V3 protocol designed it wasn't considered to
seperate the parse and plan phases. The exact commit that got us to
where we are now is here:

http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/tcop/postgres.c#rev1.420

That was over two years ago, I'm not sure whether any of the ideas
mentioned in there have been implemented since.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Prepared statements considered harmful

From

"Jeroen T. Vermeulen"

Date:

01 September 2006, 06:03:38

On Thu, August 31, 2006 21:41, Phil Frost wrote:

>> Is there any kind of pattern at all to this problem?  Anything
>> recognizable?  A few typical pitfalls?
>
> Frequently I have found preplanning will result in a horrible plan
> because it is assumed parameters may be volatile while in practice they
> are literals. Here is a function from my database:

That's a very common thing in processor design as well, and there's a
standard trick for it: the saturating two-bit counter.  It tends to work
pretty well for branch prediction, value prediction etc.  Usually it's the
first thing you reach for, so of course somebody may already have tried it
here and found it didn't work.

In this particular case it might be applied something like this: for each
parameter in a prepared statement you cache a predictor value, plus a
"confidence counter" saying (more or less--see below) how many times in
succession that value has repeated.  Let's say each of the counters count
from 0 to 3 inclusive, with its confidence threshold right in the middle,
between 1 and 2.

On every invocation, you check each parameter value against the
corresponding predictor value.  If it's identical, you increment its
counter (provided it can be incremented any further).  If it isn't, you
decrement its counter, and if the counter ends up below its confidence
threshold, you replace the predictor value with the new parameter value.

Then, whenever any new planning needs to be done (I'll get to that in a
moment), you see which counters are above their confidence thresholds.  In
your new planning you assume that all parameters with confident
predictions will remain pseudo-constant for the next few invocations.

Of course there's a problem when parameters do not match predicted values.That's where having one or two backup plans
couldcome in handy.  You

could keep your original, fully-generalized plan around.  If plans are
cheap enough to store, you could try to keep a cache of old plans for the
same query.  The great thing about keeping some backup plans around is
that a pseudo-constant parameter can have a different value once in a
while, then flick back to its old habits without invalidating all your
efforts.  Your usually-unused search fields are a good example.  You may
also have two stable parameter patterns with different sets of
pseudo-constants competing for your attention.

It's not perfect, and it clearly has its pathological cases--but if it
works well enough overall, the bad cases could be caught and handled as
exceptions.  Confidence counters can be tweaked to lose confidence more
easily than they gain it, or vice versa.  Some meta-confidence scheme may
catch the worst offenders.  I won't go into that now--first I'll shut up
and wait for others to point out what I've missed.  :)

Jeroen

Re: Prepared statements considered harmful

From

Martijn van Oosterhout

Date:

01 September 2006, 06:56:50

On Fri, Sep 01, 2006 at 03:56:19PM +0700, Jeroen T. Vermeulen wrote:
> That's a very common thing in processor design as well, and there's a
> standard trick for it: the saturating two-bit counter.  It tends to work
> pretty well for branch prediction, value prediction etc.  Usually it's the
> first thing you reach for, so of course somebody may already have tried it
> here and found it didn't work.

Interesting thought. It might be worth trying. But my big question: is
all this testing and counting actually going to be faster than just
replanning? Postgresql's planner is not that slow.

> Of course there's a problem when parameters do not match predicted values.
>  That's where having one or two backup plans could come in handy.  You
> could keep your original, fully-generalized plan around.  If plans are
> cheap enough to store, you could try to keep a cache of old plans for the
> same query.  The great thing about keeping some backup plans around is
> that a pseudo-constant parameter can have a different value once in a
> while, then flick back to its old habits without invalidating all your
> efforts.  Your usually-unused search fields are a good example.  You may
> also have two stable parameter patterns with different sets of
> pseudo-constants competing for your attention.

The thing is that number of possible plans is going to be proportional
to factorial(number of tables). Once you have 3 tables you're going to
have at least a dozen possible plans, probably more. What the best plan
is depends strongly on what the parameters are.

Anyway, your plan assumes that you have information to work with. The
current system plans prepared queries with no information at all about
parameters and people are advocating to keep it that way. I think a
good first step would be the plan on first execution, like Oracle does.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Prepared statements considered harmful

From

"Zeugswetter Andreas DCP SD"

Date:

01 September 2006, 07:40:24

> Anyway, your plan assumes that you have information to work
> with. The current system plans prepared queries with no
> information at all about parameters and people are advocating
> to keep it that way. I think a good first step would be the
> plan on first execution, like Oracle does.

Yup, it is also possible to try to find an obvious plan and only
delay planning (or part of the plan) when different inputs make a
big difference (like MaxDB and Informix).

Andreas

Re: Prepared statements considered harmful

From

"Zeugswetter Andreas DCP SD"

Date:

01 September 2006, 08:11:48

> > I don't chime in very often, but I do think the refusal to
incorporate
> > hints into the planner system is fantastically stubborn and
> > nonsensical.
>
> What is actually fantastically nonsensical about this is that
> the issues I outlined about prepared statements would merely
> become worse if planner hints were used.  Then, you wouldn't
> only have to worry about plans that were created earlier
> during the session, you would be faced with plans that were
> created earlier during the application's development.  In
> general, the solutions to the prepared statement issues need
> to effect that the plans are created more often, not less often.

I have yet to see one of our partial Informix hints (where the planner
does it's
usual job only with one path with lowered/elevated costs) fall foul on
not anticipated change of underlying data.

Thus I don't buy the argument that hints are always bad.
Of course their use should be extremely rare and well thought out.
Most of the time sql tuning involves a concerted effort between the
programmer and a db performance expert, usually resulting in
rewritten sql or program logic without adding hints.

I can see arguments for hints the dba can set himself centrally on the
server,
but in my experience chances for substantial improvement are very
limited in that case.

Andreas

Re: Prepared statements considered harmful

From

"Jeroen T. Vermeulen"

Date:

01 September 2006, 11:58:54

On Fri, September 1, 2006 16:53, Martijn van Oosterhout wrote:

> Interesting thought. It might be worth trying. But my big question: is
> all this testing and counting actually going to be faster than just
> replanning? Postgresql's planner is not that slow.

In the best case (which of course would have to be very frequent for any
of this to matter in the first place) it's mainly just a short loop
comparing the call's parameter values to their counterparts stored with
the plan and update those two-bit confidence counters.  You wouldn't
*believe* how simple you have to keep these things in processor
architecture.  :-)

> The thing is that number of possible plans is going to be proportional
> to factorial(number of tables). Once you have 3 tables you're going to
> have at least a dozen possible plans, probably more. What the best plan
> is depends strongly on what the parameters are.

Of course.  That's the whole point: to end up with a small but effective
subset of all those possible plans.  I'd guess that you could cover even
most of the nasty cases with a maximum of three plans or so per prepared
statement, including the original fully-generalized one.  The plans could
be replaced on an LRU basis, which isn't very costly for three or so
entries.

> Anyway, your plan assumes that you have information to work with. The
> current system plans prepared queries with no information at all about
> parameters and people are advocating to keep it that way. I think a
> good first step would be the plan on first execution, like Oracle does.

Yes, delaying things a bit can help a lot sometimes.  That's also what JIT
compilers in JVMs do, for instance.  FWIW, libpqxx doesn't prepare
statements until they're first called anyway.

But if this choice to discard parameter information is exactly what causes
a lot of the bad plans in the first place, as Peter says, what's wrong
with putting it to use instead?  For those cases, you're pretty much
screwed by definition as long as you fail to do so.  And it's not like
what I'm suggesting is very difficult!

The real question is whether it's worthwhile.  To find that out, we'd need
to estimate four factors: coverage (how often you'd get a useful
prediction), accuracy (how often that prediction would be accurate), cost
of misprediction (near-zero compared to current situation, assuming we
keep the generalized plans handy), and savings for correct prediction (in
our case, benefit of planning for a constant instead of a variable minus
the cost of re-planning which you say isn't very expensive).

Based on what Peter and you tell me about cost, the main worries here are
coverage and accuracy.  Coverage and accuracy can be extracted (and
tweaked!) relatively easily if we have logs of prepared-statement
executions in a wide variety of real-life applications.  Listings of
consecutive prepared-statement invocations (statement name plus parameter
values) are all that's needed.

Do we have any logs like that?  If we do, I'll be more than happy to run
some simulations and see if the idea shows any promise.  Like I said,
there's every chance that it doesn't.  It was just an off-the-cuff
suggestion and if it's no good I'll have no problems saying so.  But
there's not much point sitting around arguing over theoretical merits if
they're that easy to quantify!

Jeroen

Re: Prepared statements considered harmful

From

Tom Lane

Date:

01 September 2006, 12:07:55

Martijn van Oosterhout <kleptog@svana.org> writes:
>> The server has to prepare the query sometime. The v3 protocol just gives you
>> control over when that happens, but it doesn't force you to do it at any
>> particular time.

> Not really. All named prepares are planned straight away, all unnamed
> ones are planned at bind time. Therefore you cannot have more than one
> parsed-but-not-planned prepared query at a time. In a connection pool
> scenario there's no way to share such plans since you can't tell which
> query has been prepared. That's not forcing, but it's an asymmetry we
> could do with out.

Sure, but how much does it really matter?  If you don't want the plan
saved for reuse, merely avoiding retransmission of the query text does
not seem like a major win.  Having had time to think about it, I no
longer think the protocol design is a blocking bug for this problem
area.  It's something we could improve when we are ready to design
protocol V4, but it does not seem in itself enough reason to make a
new protocol (with all the pain that entails).
        regards, tom lane

Re: Prepared statements considered harmful

From

mark@mark.mielke.cc

Date:

01 September 2006, 12:22:43

On Fri, Sep 01, 2006 at 11:53:11AM +0200, Martijn van Oosterhout wrote:
> On Fri, Sep 01, 2006 at 03:56:19PM +0700, Jeroen T. Vermeulen wrote:
> > That's a very common thing in processor design as well, and there's a
> > standard trick for it: the saturating two-bit counter.  It tends to work
> > pretty well for branch prediction, value prediction etc.  Usually it's the
> > first thing you reach for, so of course somebody may already have tried it
> > here and found it didn't work.
> Interesting thought. It might be worth trying. But my big question: is
> all this testing and counting actually going to be faster than just
> replanning? Postgresql's planner is not that slow.

The difference between a pre-planned query, and a plan each time
query, for me, seems to be a minimum of around 0.3 - 0.5 ms. This is
on a fairly modern AMD X2 3800+. If the tests and counting are kept
simple - I don't see why they would take anywhere near that long.

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/

Re: Prepared statements considered harmful

From

Tom Lane

Date:

01 September 2006, 12:28:36

Martijn van Oosterhout <kleptog@svana.org> writes:
> Interesting thought. It might be worth trying. But my big question: is
> all this testing and counting actually going to be faster than just
> replanning? Postgresql's planner is not that slow.
> ...
> The thing is that number of possible plans is going to be proportional
> to factorial(number of tables).

Yeah.  One of the reasons the planner is acceptably fast is that it is
aggressive about discarding candidate plans as soon as they are clearly
inferior to other plans.  Tracking multiple plans that might be optimal
under varying assumptions about the query parameters would make things
exponentially slower.
        regards, tom lane

Re: Prepared statements considered harmful

From

Gregory Stark

Date:

01 September 2006, 13:29:46

"Jeroen T. Vermeulen" <jtv@xs4all.nl> writes:

> On Fri, September 1, 2006 16:53, Martijn van Oosterhout wrote:
>
>> Interesting thought. It might be worth trying. But my big question: is
>> all this testing and counting actually going to be faster than just
>> replanning? Postgresql's planner is not that slow.
>
> In the best case (which of course would have to be very frequent for any
> of this to matter in the first place) it's mainly just a short loop
> comparing the call's parameter values to their counterparts stored with
> the plan and update those two-bit confidence counters.  You wouldn't
> *believe* how simple you have to keep these things in processor
> architecture.  :-)

I think the slow part is trying to figure out whether to count the current
call as a hit or a miss. How do you determine whether the plan you're running
is the best plan without replanning the query?

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com

Re: Prepared statements considered harmful

From

"Jeroen T. Vermeulen"

Date:

01 September 2006, 14:26:25

On Fri, September 1, 2006 21:30, Tom Lane wrote:

> Yeah.  One of the reasons the planner is acceptably fast is that it is
> aggressive about discarding candidate plans as soon as they are clearly
> inferior to other plans.  Tracking multiple plans that might be optimal
> under varying assumptions about the query parameters would make things
> exponentially slower.

AFAICS the planner shouldn't be affected at all--it'd just be invoked more
often as and when the need for new plans became apparent.  Not
"exponentially" (that's an overused word anyway) but "proportionally" to
that.

I've been assuming that once you have a plan, storing it is not very
expensive.  If, say, doubling the number of plans stored with a session's
prepared statements incurs some serious cost (apart from the planning
itself, of course) then that changes things.

Jeroen

Re: Prepared statements considered harmful

From

Martijn van Oosterhout

Date:

01 September 2006, 14:32:20

On Fri, Sep 01, 2006 at 10:18:37AM -0400, Tom Lane wrote:
> Sure, but how much does it really matter?  If you don't want the plan
> saved for reuse, merely avoiding retransmission of the query text does
> not seem like a major win.  Having had time to think about it, I no
> longer think the protocol design is a blocking bug for this problem
> area.  It's something we could improve when we are ready to design
> protocol V4, but it does not seem in itself enough reason to make a
> new protocol (with all the pain that entails).

Well, I see the documentation does actually describe the situation
fairly well, so perhaps the right approach is to get interface writers
to appreciate the difference and indicate if named or unnamed prepares
are used.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Prepared statements considered harmful

From

"Jeroen T. Vermeulen"

Date:

01 September 2006, 16:15:28

On Fri, September 1, 2006 22:14, Gregory Stark wrote:

> I think the slow part is trying to figure out whether to count the current
> call as a hit or a miss. How do you determine whether the plan you're
> running
> is the best plan without replanning the query?

The question of knowing which plan is best _based on what's in the actual
tables_ would be unsolved just as it always was.  The scheme addresses
only the opportunity to optimize for pseudo-constant parameters.  It
treats the existing planner as a black box.  If you find a solution to the
problem of inaccurate statistics, it'll probably be more or less
orthogonal to what I'm describing: you could have one or the other, but
combining them shouldn't be much harder.

I don't think telling hits from misses would be all that hard.  Let's say
you're having a prepared statement called, and you're evaluating a
candidate plan.  Each parameter is in one of two sets: those "predicted"
by the plan to have certain values (let's call them P), and those "not
predicted" by the plan because their confidence counters were below the
threshold (I'm tempted to call this set NP, but let's make it Q instead). 
Whether a parameter is in P or in Q can be derived from its confidence
counter.  In my previous example, you just take its most-significant bit.
* For any parameter in P, if the actual value does not match the plan's
prediction, you have a miss.  Can't use this plan.  Use another if you
have one that applies (such as your regular old non-specialized
plan--that always applies), or if not, write a new one!

If you get through this without finding a mismatch, congratulations: you
have a hit.  The plan you're looking at is applicable to your call.  But
now we see if we can do better:
* For any parameter in Q, if its value would have been predicted
correctly but its counter was below the confidence threshold, you
increment the counter.  If that lifts the counter above the threshold,
you have room for improving on this plan.  It means there's a good chance
you can re-plan for the case that this parameter is also a
pseudo-constant, without the effort being wasted.  Of course you could
also set a minimum number of invocations between re-plannings to get a
more long-term view (e.g. different parameters being recognized as
pseudo-constants in subsequent calls--you may not want to re-plan for
each of those calls).

So which plan do you execute if you have more than one applicable
candidate?  We can see what works well.  As a starter I would definitely
pick the one with the larger P (smaller Q), breaking ties in favour of the
most recently generated plan.  I'm assuming we only want one plan for a
given P.

We'd probably want to limit the number of candidate plans per statement to
some very small, fixed number--somewhere between one and four, I'd say; or
maybe one generalized plan plus up to two specialized ones.  With numbers
like that, none of this should be very expensive.  A simple linear match
against 1-4 candidates may be more effective than any attempt to be
clever.

I must admit I haven't thought through all of the consequences of caching
more than one specialized plan per statement.  For example, we could give
every cached plan its own set of confidence counters, and match an
incoming invocation against each of those; or we could keep just one "most
likely" plan with its associated predictor state, and only consider
previously generated plans if we either miss or find room for improvement
in the predictor.

Jeroen

Re: Prepared statements considered harmful

From

"Jim C. Nasby"

Date:

02 September 2006, 18:50:12

On Fri, Sep 01, 2006 at 04:14:32PM +0100, Gregory Stark wrote:
> >> Interesting thought. It might be worth trying. But my big question: is
> >> all this testing and counting actually going to be faster than just
> >> replanning? Postgresql's planner is not that slow.
> >
> > In the best case (which of course would have to be very frequent for any
> > of this to matter in the first place) it's mainly just a short loop
> > comparing the call's parameter values to their counterparts stored with
> > the plan and update those two-bit confidence counters.  You wouldn't
> > *believe* how simple you have to keep these things in processor
> > architecture.  :-)
> 
> I think the slow part is trying to figure out whether to count the current
> call as a hit or a miss. How do you determine whether the plan you're running
> is the best plan without replanning the query?

Simply looking at estimated row counts/cost versus what actually
happened would probably suffice. It'd at least be a great start.
-- 
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net

Re: Prepared statements considered harmful

From

"Jim C. Nasby"

Date:

02 September 2006, 19:40:53

On Fri, Sep 01, 2006 at 10:18:37AM -0400, Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> >> The server has to prepare the query sometime. The v3 protocol just gives you
> >> control over when that happens, but it doesn't force you to do it at any
> >> particular time.
> 
> > Not really. All named prepares are planned straight away, all unnamed
> > ones are planned at bind time. Therefore you cannot have more than one
> > parsed-but-not-planned prepared query at a time. In a connection pool
> > scenario there's no way to share such plans since you can't tell which
> > query has been prepared. That's not forcing, but it's an asymmetry we
> > could do with out.
> 
> Sure, but how much does it really matter?  If you don't want the plan
> saved for reuse, merely avoiding retransmission of the query text does
> not seem like a major win.  Having had time to think about it, I no
> longer think the protocol design is a blocking bug for this problem
> area.  It's something we could improve when we are ready to design
> protocol V4, but it does not seem in itself enough reason to make a
> new protocol (with all the pain that entails).

That should either go into the TODO, or a "V4 wishlist"...
-- 
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net