Thread: Prepared statements considered harmful
With time, it becomes ever clearer to me that prepared SQL statements are just a really bad idea. On some days, it seems like half the performance problems in PostgreSQL-using systems are because a bad plan was cached somewhere. I'd say, in the majority of cases the time you save parsing and planning is irrelevant compared to the possibly disastrous effects of wrong or suboptimal plans. I wonder if other people have similar experiences. I'd wish that we reconsider when and how prepared statements are used. The JDBC interface and PL/pgSQL are frequently noticed perpetrators, but the problem is really all over the place. A couple of actions to consider: - Never use prepared statements unless the user has turned them on. (This is the opposite of the current behavior.) - Transparently invalidate and regenerate prepared plans more often. This could be tied to the transaction count, update activity obtained from the statistics collector, etc. - Redefine "prepared" to mean "parsed" rather than "parsed and planned". Each of these or similar changes would only solve a subset of the possible problems. Possibly, we need more knobs to adjust these things. But something needs to be done. Comments? -- Peter Eisentraut http://developer.postgresql.org/~petere/
On Thu, Aug 31, 2006 at 01:56:29PM +0200, Peter Eisentraut wrote: > With time, it becomes ever clearer to me that prepared SQL statements are just > a really bad idea. On some days, it seems like half the performance problems > in PostgreSQL-using systems are because a bad plan was cached somewhere. I'd > say, in the majority of cases the time you save parsing and planning is > irrelevant compared to the possibly disastrous effects of wrong or suboptimal > plans. I wonder if other people have similar experiences. Yeah, it seems to me that many of the benefits of not planning are overrun by the effects of bad plans. > - Redefine "prepared" to mean "parsed" rather than "parsed and planned". I think this is the best. Some way to specify that you don't want planning to take place immediately would be good. One question though: there is a function PQexecParams(). Does this suffer from the same problem? I imagine most interfaces like out-of-band parameters (no escaping issues), why do they not use this? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
> - Redefine "prepared" to mean "parsed" rather than "parsed and planned". How about "prepared" means really "prepared"... in the sense of parsed, analyzed all sensible plans, and save a meta-plan which based on current statistics and parameter values chooses one of the considered (and cached) plans ? That would be immune both to statistics changes and parameter value changes in certain limits. It would be also a lot more complex too than a simple plan... Cheers, Csaba.
Peter Eisentraut wrote: > With time, it becomes ever clearer to me that prepared SQL statements are just > a really bad idea. On some days, it seems like half the performance problems > in PostgreSQL-using systems are because a bad plan was cached somewhere. I'd > say, in the majority of cases the time you save parsing and planning is > irrelevant compared to the possibly disastrous effects of wrong or suboptimal > plans. I wonder if other people have similar experiences. > > I'd wish that we reconsider when and how prepared statements are used. The > JDBC interface and PL/pgSQL are frequently noticed perpetrators, but the > problem is really all over the place. > > A couple of actions to consider: > > - Never use prepared statements unless the user has turned them on. (This is > the opposite of the current behavior.) > > - Transparently invalidate and regenerate prepared plans more often. This > could be tied to the transaction count, update activity obtained from the > statistics collector, etc. well this sounds like being best done with the "central plan cache" idea that is floating around(I think neilc once worked on that) - once we have something like that I would expect we can easily invalidate/regenerate plans there based on certain criteria (from obvious things like DDL-changes to more subtile ones like maybe "age of the plan" or "statistics changed significantly on table foo" or "regenerate plan everytime when the table bla is involved") Most of that is pure speculation - but something like that would be a very powerful thing to have. Stefan
Am Donnerstag, 31. August 2006 14:11 schrieb Csaba Nagy: > How about "prepared" means really "prepared"... in the sense of parsed, > analyzed all sensible plans, and save a meta-plan which based on current > statistics and parameter values chooses one of the considered (and > cached) plans ? I don't think this could solve one particularly frequent problem which is that pattern matching queries don't get along with prepared plans if the search pattern isn't known at planning time. -- Peter Eisentraut http://developer.postgresql.org/~petere/
On Thu, 2006-08-31 at 14:32, Peter Eisentraut wrote: > Am Donnerstag, 31. August 2006 14:11 schrieb Csaba Nagy: > > How about "prepared" means really "prepared"... in the sense of parsed, > > analyzed all sensible plans, and save a meta-plan which based on current > > statistics and parameter values chooses one of the considered (and > > cached) plans ? > > I don't think this could solve one particularly frequent problem which is that > pattern matching queries don't get along with prepared plans if the search > pattern isn't known at planning time. Why not ? I specifically said you would prepare a few sensible plans based on statistics/expected variations of the statistics, and parameter value ranges which would trigger different plans. So for the like query case you could save 2 plans, one for the indexable case, one for the not indexable case. Then at runtime you choose the proper one based on the pattern value. The meta-plan I mentioned would be a collection of plans with rules to choose the right one at run time based on parameter values and perhaps the current statistics. This of course would need a lot more preparation time than just prepare one plan, but that's why you want to do it upfront and then cache the results. A central plan repository mentioned in other posts would fit nicely here... and you could use prepared plans for non-parameterized queries too by simply considering the constants as parameters, to increase the chances for a prepared plan reuse - this of course for complex enough queries. Cheers, Csaba.
On 8/31/06, Peter Eisentraut <peter_e@gmx.net> wrote: > With time, it becomes ever clearer to me that prepared SQL statements are just > a really bad idea. On some days, it seems like half the performance problems > in PostgreSQL-using systems are because a bad plan was cached somewhere. I'd > say, in the majority of cases the time you save parsing and planning is > irrelevant compared to the possibly disastrous effects of wrong or suboptimal > plans. I wonder if other people have similar experiences. I have to respectfully disagree. I have used them to great effect in many of my projects. In the most extreme case, prepared statements can provide a 50% reduction or greater in overall query time...this is too good a benefit to simply discard. I worked on converted isam projects which would not have been possbile to make efficient without prepared statements. However you are correct that the planner does often create wacky plans which can cause disasterous results in some cases. My major issue is that you cannot supply hints to the query engine. For example one of my favorite tricks is to paramterize the limit clause in a query which creates a sliding window over the table for progressive readahead. Unfortunately the planner assumes 10% which borks the plan. My work around is to turn off bitmap, seqscan before plan and turn them on after the prepare. The proposal to supply hints to statements and functions has been voted down several times due to the argument that it is better to fix the planner. I think supplying hints does fix the planner, and is a balanced solution. merlin
hello everyone , i has been add to you guys' mail list by accident, i don't how to refuse to receive your mails, would you please help me to remove my mail address form mail group pgsql-hackers@postgresql.org? i appreciatewhat you will do for me. (my mail address: tate_zhou@hotmail.com) thanks . >From: Csaba Nagy <nagy@ecircle-ag.com> >To: Peter Eisentraut <peter_e@gmx.net> >CC: postgres hackers <pgsql-hackers@postgresql.org> >Subject: Re: [HACKERS] Prepared statements considered harmful >Date: Thu, 31 Aug 2006 14:52:05 +0200 > >On Thu, 2006-08-31 at 14:32, Peter Eisentraut wrote: > > Am Donnerstag, 31. August 2006 14:11 schrieb Csaba Nagy: > > > How about "prepared" means really "prepared"... in the sense of parsed, > > > analyzed all sensible plans, and save a meta-plan which based on current > > > statistics and parameter values chooses one of the considered (and > > > cached) plans ? > > > > I don't think this could solve one particularly frequent problem which is that > > pattern matching queries don't get along with prepared plans if the search > > pattern isn't known at planning time. > >Why not ? I specifically said you would prepare a few sensible plans >based on statistics/expected variations of the statistics, and parameter >value ranges which would trigger different plans. > >So for the like query case you could save 2 plans, one for the indexable >case, one for the not indexable case. Then at runtime you choose the >proper one based on the pattern value. The meta-plan I mentioned would >be a collection of plans with rules to choose the right one at run time >based on parameter values and perhaps the current statistics. > >This of course would need a lot more preparation time than just prepare >one plan, but that's why you want to do it upfront and then cache the >results. A central plan repository mentioned in other posts would fit >nicely here... and you could use prepared plans for non-parameterized >queries too by simply considering the constants as parameters, to >increase the chances for a prepared plan reuse - this of course for >complex enough queries. > >Cheers, >Csaba. > > >---------------------------(end of broadcast)--------------------------- >TIP 6: explain analyze is your friend
On Thu, August 31, 2006 18:56, Peter Eisentraut wrote: > With time, it becomes ever clearer to me that prepared SQL statements are > just > a really bad idea. On some days, it seems like half the performance > problems > in PostgreSQL-using systems are because a bad plan was cached somewhere. Is there any kind of pattern at all to this problem? Anything recognizable? A few typical pitfalls? Without knowing much of the internals, I could imagine [waves hands in vague gestures] other options--something like recognizing major changes that upset the cost functions that went into generating a plan, and invalidating the plan based on those; or noting bad estimates somehow as they become apparent during execution, and annotating the plan with a "this assumption was a bad idea" marker so you'll do better next time. I guess you can't go far wrong if you re-define "prepared" to mean merely "pre-parsed," but it sounds like such a waste of opportunity... Jeroen
Peter Eisentraut wrote: > With time, it becomes ever clearer to me that prepared SQL statements are just > a really bad idea. On some days, it seems like half the performance problems > in PostgreSQL-using systems are because a bad plan was cached somewhere. I'd > say, in the majority of cases the time you save parsing and planning is > irrelevant compared to the possibly disastrous effects of wrong or suboptimal > plans. I wonder if other people have similar experiences. > > I'd wish that we reconsider when and how prepared statements are used. The > JDBC interface and PL/pgSQL are frequently noticed perpetrators, but the > problem is really all over the place. > > A couple of actions to consider: > > - Never use prepared statements unless the user has turned them on. (This is > the opposite of the current behavior.) > > - Transparently invalidate and regenerate prepared plans more often. This > could be tied to the transaction count, update activity obtained from the > statistics collector, etc. > > - Redefine "prepared" to mean "parsed" rather than "parsed and planned". > > Each of these or similar changes would only solve a subset of the possible > problems. Possibly, we need more knobs to adjust these things. But > something needs to be done. > Not to mention problems with outdated plans after schema changes. Using views unplanned (replanned) when used in joins could lead to improved resulting plans (e.g. if the view contains outer joins itself). Regards, Andreas
Am Donnerstag, 31. August 2006 14:52 schrieb Csaba Nagy: > So for the like query case you could save 2 plans, one for the indexable > case, one for the not indexable case. Then at runtime you choose the > proper one based on the pattern value. OK, why don't you work out an example. Let's look at this query: SELECT * FROM t1 WHERE a LIKE $1; What two plans would you prepare? -- Peter Eisentraut http://developer.postgresql.org/~petere/
> > How about "prepared" means really "prepared"... in the sense of > > parsed, analyzed all sensible plans, and save a meta-plan which based > > on current statistics and parameter values chooses one of the > > considered (and cached) plans ? > > I don't think this could solve one particularly frequent > problem which is that pattern matching queries don't get > along with prepared plans if the search pattern isn't known > at planning time. I think what we would actually want is knowledge about how much difference different parameters actually make in plan decision. (the stats show an even distribution and join correlation) Then we could prepare the plan when there is not much difference and postpone planning until we know the parameters when the difference is big. OLTP workload typically benefits from prepared plans, and the one plan is good for all possible inputs, so imho we cannot just assume all plans need replanning for different parameters. Andreas
Am Donnerstag, 31. August 2006 15:05 schrieb Merlin Moncure: > The proposal to supply hints to statements and functions has been > voted down several times due to the argument that it is better to fix > the planner. I think supplying hints does fix the planner, and is a > balanced solution. Planner hints are a way to address a deficient planner. But neither a manually hinted planner nor a perfectly good planner will help if the planning decisions are based on outdated information. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Am Donnerstag, 31. August 2006 15:06 schrieb Jeroen T. Vermeulen: > Is there any kind of pattern at all to this problem? Anything > recognizable? A few typical pitfalls? If data is not distributed evenly, then any old WHERE foo = $1 is prone to be the wrong plan for half of the possible values of $1. The more data you have and the more it changes, the worse this gets. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Am Donnerstag, 31. August 2006 15:18 schrieb Andreas Pflug: > Not to mention problems with outdated plans after schema changes. Using > views unplanned (replanned) when used in joins could lead to improved > resulting plans (e.g. if the view contains outer joins itself). Views don't contain execution plans. I don't see how this is relevant. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Merlin Moncure wrote: > On 8/31/06, Peter Eisentraut <peter_e@gmx.net> wrote: >> With time, it becomes ever clearer to me that prepared SQL statements >> are just >> a really bad idea. On some days, it seems like half the performance >> problems >> in PostgreSQL-using systems are because a bad plan was cached >> somewhere. I'd >> say, in the majority of cases the time you save parsing and planning is >> irrelevant compared to the possibly disastrous effects of wrong or >> suboptimal >> plans. I wonder if other people have similar experiences. > > I have to respectfully disagree. I have used them to great effect in > many of my projects. Peter doesn't propose to remove prepared statements as such. They are certainly of great value, if used carefully and specifically, as in your case. The problems he's addressing stem from plans _implicitly_ created and stored. > In the most extreme case, prepared statements can > provide a 50% reduction or greater in overall query time...this is too > good a benefit to simply discard. I worked on converted isam projects > which would not have been possbile to make efficient without prepared > statements. However you are correct that the planner does often > create wacky plans which can cause disasterous results in some cases. > > My major issue is that you cannot supply hints to the query engine. I don't believe extending this thread to the we-need-hints issue is a good idea. Regards, Andreas
On Thu, 2006-08-31 at 15:19, Peter Eisentraut wrote: > OK, why don't you work out an example. Let's look at this query: > > SELECT * FROM t1 WHERE a LIKE $1; > > What two plans would you prepare? if substring($1 from 1 for 1) != '%' then use plan 1 (see below); else use plan 2 (see below); end if; Save both plans from below with the meta-plan from above, and call it a prepared plan. cnagy=# create table t1 (a text); CREATE TABLE cnagy=# insert into t1 select round(10000000 * random()) from generate_series(1,10000); INSERT 0 10000 cnagy=# create index idx_t1_a on t1 (a); CREATE INDEX cnagy=# analyze verbose t1; INFO: analyzing "public.t1" INFO: "t1": scanned 55 of 55 pages, containing 10000 live rows and 0 dead rows; 3000 rows in sample, 10000 estimated total rows ANALYZE cnagy=# explain select a from t1 where a like '121%'; QUERY PLAN ------------------------------------------------------------------------Bitmap Heap Scan on t1 (cost=2.06..27.63 rows=10width=10) Filter: (a ~~ '121%'::text) -> Bitmap Index Scan on idx_t1_a (cost=0.00..2.06 rows=10 width=0) Index Cond: ((a >= '121'::text) AND (a < '122'::text)) (4 rows) cnagy=# explain select a from t1 where a like '%121'; QUERY PLAN ------------------------------------------------------Seq Scan on t1 (cost=0.00..180.00 rows=80 width=10) Filter: (a ~~'%121'::text) (2 rows) Cheers, Csaba.
Am Donnerstag, 31. August 2006 15:36 schrieb Csaba Nagy: > On Thu, 2006-08-31 at 15:19, Peter Eisentraut wrote: > > OK, why don't you work out an example. Let's look at this query: > > > > SELECT * FROM t1 WHERE a LIKE $1; > > > > What two plans would you prepare? > > if substring($1 from 1 for 1) != '%' then > use plan 1 (see below); > else > use plan 2 (see below); > end if; Note that plan 1 can only be created if you know the actual value for $1. -- Peter Eisentraut http://developer.postgresql.org/~petere/
On Thu, 2006-08-31 at 15:49, Peter Eisentraut wrote: > Note that plan 1 can only be created if you know the actual value for $1. Why would that be so ? The plan can contain functions of $1 (both constants in plan 1 are a function of $1). Cheers, Csaba
On Aug 31, 2006, at 8:52 , Csaba Nagy wrote: > > This of course would need a lot more preparation time than just > prepare > one plan, but that's why you want to do it upfront and then cache the > results. A central plan repository mentioned in other posts would fit > nicely here... and you could use prepared plans for non-parameterized > queries too by simply considering the constants as parameters, to > increase the chances for a prepared plan reuse - this of course for > complex enough queries. If prepared statements become more expensive to create, then it would make more sense for them to persist across sessions. All of an application's prepared statements could be cached. -M
On Thu, Aug 31, 2006 at 08:06:57PM +0700, Jeroen T. Vermeulen wrote: > On Thu, August 31, 2006 18:56, Peter Eisentraut wrote: > > > With time, it becomes ever clearer to me that prepared SQL > > statements are just a really bad idea. On some days, it seems like > > half the performance problems in PostgreSQL-using systems are > > because a bad plan was cached somewhere. > > Is there any kind of pattern at all to this problem? Anything > recognizable? A few typical pitfalls? Frequently I have found preplanning will result in a horrible plan because it is assumed parameters may be volatile while in practice they are literals. Here is a function from my database: CREATE FUNCTION nullorblank(character varying) RETURNS boolean AS $_$ select $1 is null or trim($1) = '' $_$ LANGUAGEsql IMMUTABLE; This is used in stored procedures that answer search queries. For example, let's consider one that searches products, filtered on any number of "part number", "manufacturer", or "name". If one of these is not specified, it does not restrict the query. One might write that query so: -- $1: part number -- $2: manufacturer -- $3: name SELECT * FROM product WHERE (nullorblank($1) OR lower(partnumber) = lower($1)) AND (nullorblank($2) OR manufacturername =$2) AND (nullorblank($3) OR name = $3) The parameters will always be literal strings, taken from some form presented to the user. If one does the parameter subsitution manually, the plans are quite reasonable: EXPLAIN ANALYZE SELECT * FROM product WHERE (nullorblank('int2100/512') OR lower(partnumber) = lower('int2100/512')) AND (nullorblank('')OR manufacturername = '') AND (nullorblank('') OR name = ''); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------Result (cost=15.54..4494.71 rows=1867 width=254) (actual time=43.502..43.507 rows=1 loops=1) -> Bitmap Heap Scan on product (cost=15.54..4494.71rows=1867 width=254) (actual time=43.161..43.162 rows=1 loops=1) Recheck Cond: (lower((partnumber)::text)= 'int2100/512'::text) -> Bitmap Index Scan on product_partnumber_loweridx (cost=0.00..15.54rows=1867 width=0) (actual time=43.022..43.022 rows=1 loops=1) Index Cond: (lower((partnumber)::text)= 'int2100/512'::text)Total runtime: 51.626 ms (7 rows) The 'manufacturername' and 'name' disjuncts have been removed by simplification, since the expression is known to be true. However, if "prepared", it's horrible: PREPARE to_be_slow(text, text, text) AS SELECT * FROM product WHERE (nullorblank($1) OR lower(partnumber) = lower($1)) AND (nullorblank($2)OR manufacturername = $2) AND (nullorblank($3) OR name = $3); explain analyze execute to_be_slow('int2100/512', NULL, NULL); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Result (cost=0.00..22317.13 rows=1 width=254) (actual time=1115.167..1579.535 rows=1 loops=1) -> Seq Scan on product (cost=0.00..22317.12rows=1 width=254) (actual time=1114.845..1579.211 rows=1 loops=1) Filter: (((($1)::character varyingIS NULL) OR (btrim(($1)::text) = ''::text) OR (lower((partnumber)::text) = lower($1))) AND ((($2)::character varyingIS NULL) OR (btrim(($2)::text) = ''::text) OR (manufacturername = $2)) AND ((($3)::character varying IS NULL) OR (btrim(($3)::text)= ''::text) OR ((name)::text = $3)))Total runtime: 1580.006 ms (5 rows)
Am Donnerstag, 31. August 2006 16:09 schrieb Theo Schlossnagle: > I don't chime in very often, but I do think the refusal to > incorporate hints into the planner system is fantastically stubborn > and nonsensical. What is actually fantastically nonsensical about this is that the issues I outlined about prepared statements would merely become worse if planner hints were used. Then, you wouldn't only have to worry about plans that were created earlier during the session, you would be faced with plans that were created earlier during the application's development. In general, the solutions to the prepared statement issues need to effect that the plans are created more often, not less often. -- Peter Eisentraut http://developer.postgresql.org/~petere/
On Aug 31, 2006, at 9:25 AM, Peter Eisentraut wrote: > Am Donnerstag, 31. August 2006 15:05 schrieb Merlin Moncure: >> The proposal to supply hints to statements and functions has been >> voted down several times due to the argument that it is better to fix >> the planner. I think supplying hints does fix the planner, and is a >> balanced solution. > > Planner hints are a way to address a deficient planner. But neither a > manually hinted planner nor a perfectly good planner will help if the > planning decisions are based on outdated information. I don't chime in very often, but I do think the refusal to incorporate hints into the planner system is fantastically stubborn and nonsensical. I whole-heartedly agree that it is _better_ to fix the planner, but many of us have production systems and can't just go check out CVS HEAD to address our day-to-day issues and we suffer from this decision. There are many databases out there with better planners than PostgreSQL -- likely there will always be. Even those databases have query planner hints. Why? Because the authors of those database had the humility to realize that the planner they designed wasn't perfect and that people _still_ need their database to perform well despite a non-optimal query plan here and there. A good query planner hint system would act as a catalyst to the improvement of the current query planner as users could share their complex queries and associated improved query plans through hinting. I like Postgres a lot, I think the people that work on it are very very sharp. I do feel that the consistent refusal to allow query hinting to be introduced demonstrates an unhealthy amount of hubris that, in the end, negatively impacts users. While Postgres is missing a ton of other needed features, I rarely see the attitude that they are _unwanted_. Instead I see the "if it is important to you, go build it" attitude which is what I would expect in an open source project. // Theo Schlossnagle // CTO -- http://www.omniti.com/~jesus/ // OmniTI Computer Consulting, Inc. -- http://www.omniti.com/
Am Donnerstag, 31. August 2006 16:26 schrieb Andrew Dunstan: > Cached plans etc. might have an impact, but please do not overlook the > benefits of parameterized queries in avoiding SQL injection attacks, as > well as often being much cleaner to code. That might be part of the confusion. Composing queries with the variable parameters out of line is a very nice feature. But that concept is totally independent of the question whether the execution plan should be cached. The APIs (and their documentations) just don't convey that very well. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Peter Eisentraut wrote: > Am Donnerstag, 31. August 2006 14:52 schrieb Csaba Nagy: >> So for the like query case you could save 2 plans, one for the indexable >> case, one for the not indexable case. Then at runtime you choose the >> proper one based on the pattern value. > > OK, why don't you work out an example. Let's look at this query: > > SELECT * FROM t1 WHERE a LIKE $1; > > What two plans would you prepare? Well I guess for the case that none of the "expected" plans fit you can always fallback to generating a new plan on the fly. Anyways it would of course be cool if pgsql could set an invalid flag if it detects that a certain plan performed badly (maybe even automatically cause a fresh table analysis) or some DDL/DML was executed that likely invalidated the plan. I am not sure if there is any "philosphie" that pgsql tries to adhere to. Does it want to leave the job of tuning to the DBA or does it want to do things automatically (which always means that in some situations it will do the wrong thing). tweak planner vs. planner hints manually analyze vs. automatically analyze manual vaccum vs autovaccum Hmm actually its probably not a black and white thing and the ultimate goal would be to offer both with maybe some installer checkbox to default everything to "DBA-less" automode. Anyways I never liked the idea of planner hints. I think it makes much more sense to give people direct access to plans in that case. Meaning they can "partially" hardcode (parameterized) plans they want. I have mentioned before that Sybase seems to have such a feature (you can dump plans, tweak them and remove pieces that should be done on the fly and associate them with stored procedures - not sure if you also do that for prepared statements). regards, Lukas
On Thu, Aug 31, 2006 at 01:56:29PM +0200, Peter Eisentraut wrote: > With time, it becomes ever clearer to me that prepared SQL > statements are just a really bad idea. On some days, it seems like > half the performance problems in PostgreSQL-using systems are > because a bad plan was cached somewhere. I'd say, in the majority > of cases the time you save parsing and planning is irrelevant > compared to the possibly disastrous effects of wrong or suboptimal > plans. I wonder if other people have similar experiences. > ... > Comments? Hello. I'm attempting to understand why prepared statements would be used for long enough for tables to change to a point that a given plan will change from 'optimal' to 'disastrous'. Wouldn't this require that the tables are completely re-written, or that their data is drastically updated? For my own tables, most of the data remains static for months on end. Data is accumulated. Small changes are made. I don't see why a prepared statement used over a 24 hour period would ever become disastrous. This suggests to me that you are doing either: 1) Maintaining prepared statements for weeks or months at a time. 2) Churning your tables up into a froth. I'm guessing, as you mentioned JDBC, that you might be hitting 1), in the context of JDBC being used from a Web Application, where the application server holds a connection open for weeks or months at a time. If so, it does sound as if JDBC is doing wrong by keeping prepared queries around for that long. A time limit of an hour, or even a few minutes would make sense. My experience does not match yours. Prepared queries have always significantly improved my execution times. They do have a place. Whatever the scenarios you are hitting should be dealt with, possibly in JDBC. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
On Thu, Aug 31, 2006 at 03:36:25PM +0200, Csaba Nagy wrote: > On Thu, 2006-08-31 at 15:19, Peter Eisentraut wrote: > > OK, why don't you work out an example. Let's look at this query: > > SELECT * FROM t1 WHERE a LIKE $1; > > What two plans would you prepare? > if substring($1 from 1 for 1) != '%' then > use plan 1 (see below); > else > use plan 2 (see below); > end if; It would be cool if PostgreSQL did this - but I think it is also true that anybody (or JDBC) who tried to prepare a plan in the cases that are known to cause problems, is making a mistake. While on the 'it would be cool' subject - I think it might be cool if the prepare statement took sample arguments that could be used to prepare the plans with. "Prepare a plan that would work best with these arguments." Then JDBC could prepare both plans for you - if it was smart enough... :-) Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
On Thu, Aug 31, 2006 at 10:09:10AM -0400, Theo Schlossnagle wrote: > There are many databases out there with better planners than > PostgreSQL -- likely there will always be. Even those databases have > query planner hints. Why? Because the authors of those database had > the humility to realize that the planner they designed wasn't perfect > and that people _still_ need their database to perform well despite a > non-optimal query plan here and there. > A good query planner hint system would act as a catalyst to the > improvement of the current query planner as users could share their > complex queries and associated improved query plans through hinting. Would a hint system allow the planner to execute quicker? Eliminate plans from consideration early, without evaluation how long they might take to execute? Sort of possible today with toggling of the 'seqscan' and other such options... :-) > I like Postgres a lot, I think the people that work on it are very > very sharp. I do feel that the consistent refusal to allow query > hinting to be introduced demonstrates an unhealthy amount of hubris > that, in the end, negatively impacts users. Hubris isn't always bad. If hints were provided, the need for the fully automatic planner to improve would be reduced. But yes, they do seem to be competing goals, disenfranchising the user. > While Postgres is missing a ton of other needed features, I rarely > see the attitude that they are _unwanted_. Instead I see the "if it > is important to you, go build it" attitude which is what I would > expect in an open source project. There is also "what you submit should be maintainable because we know you might disappear at any time". Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
Peter Eisentraut <peter_e@gmx.net> writes: > With time, it becomes ever clearer to me that prepared SQL statements > are just a really bad idea. That's an overstatement, but I'll agree that they have strong limitations. > I'd wish that we reconsider when and how prepared statements are used. The > JDBC interface and PL/pgSQL are frequently noticed perpetrators, but the > problem is really all over the place. AFAIK those are the only two places where preparation is the default ... what else were you thinking of? > - Transparently invalidate and regenerate prepared plans more often. This > could be tied to the transaction count, update activity obtained from the > statistics collector, etc. FWIW, I've assumed right along that once we have a plan-invalidation mechanism, any ANALYZE stats update would invalidate affected plans. > - Redefine "prepared" to mean "parsed" rather than "parsed and planned". For plan-inval to work in all cases, we'll have to store either the source query string or the raw grammar's output tree, before even parse analysis. Is that what you are thinking of? It's hardly "prepared" at all if you do that. As noted downthread, we've confused out-of-line parameter value shipping with prepared statements. It might be worth rejiggering the FE/BE protocol to separate those things better. regards, tom lane
On Aug 31, 2006, at 11:18 , mark@mark.mielke.cc wrote: > I'm attempting to understand why prepared statements would be used for > long enough for tables to change to a point that a given plan will > change from 'optimal' to 'disastrous'. > > Wouldn't this require that the tables are completely re-written, or > that their data is drastically updated? For my own tables, most of the > data remains static for months on end. Data is accumulated. Small > changes are made. I don't see why a prepared statement used over a > 24 hour period would ever become disastrous. Scenario: A web application maintains a pool of connections to the database. If the connections have to be regularly restarted due to a postgres implementation detail (stale plans), then that is a database deficiency. -M
Tom Lane wrote: > As noted downthread, we've confused out-of-line parameter value shipping > with prepared statements. It might be worth rejiggering the FE/BE > protocol to separate those things better. > > Well, that's surely not going to happen in a hurry, is it? Maybe a quick fix would be a way to allow the user to turn plan caching on and off. cheers andrew
On 2006-08-31, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'd wish that we reconsider when and how prepared statements are used. The >> JDBC interface and PL/pgSQL are frequently noticed perpetrators, but the >> problem is really all over the place. > > AFAIK those are the only two places where preparation is the default RI triggers. -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services
On Thu, Aug 31, 2006 at 11:36:36AM -0400, AgentM wrote: > On Aug 31, 2006, at 11:18 , mark@mark.mielke.cc wrote: > >I'm attempting to understand why prepared statements would be used for > >long enough for tables to change to a point that a given plan will > >change from 'optimal' to 'disastrous'. > > > >Wouldn't this require that the tables are completely re-written, or > >that their data is drastically updated? For my own tables, most of the > >data remains static for months on end. Data is accumulated. Small > >changes are made. I don't see why a prepared statement used over a > >24 hour period would ever become disastrous. > Scenario: A web application maintains a pool of connections to the > database. If the connections have to be regularly restarted due to a > postgres implementation detail (stale plans), then that is a database > deficiency. Or a JDBC deficiency. Nobody is forcing JDBC to automatically reuse a prepared plan indefinately. If automatically prepared, it can regenerate them whenever it wishes. Does Oracle automatically regenerate prepared plans on occasion? I don't consider it a deficiency. It is doing exactly what you are asking it to do. That it isn't second guessing you isn't a deficiency. For all PostgreSQL knows, your tables are not changing such that a query a week later is suddenly disastrous because the consistency of your data has changed drastically, and what you prepared a week ago, and chose to execute today, is still the optimal plan. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
AgentM <agentm@themactionfaction.com> writes: > On Aug 31, 2006, at 11:18 , mark@mark.mielke.cc wrote: >> I'm attempting to understand why prepared statements would be used for >> long enough for tables to change to a point that a given plan will >> change from 'optimal' to 'disastrous'. > Scenario: A web application maintains a pool of connections to the > database. If the connections have to be regularly restarted due to a > postgres implementation detail (stale plans), then that is a database > deficiency. The two major complaints that I've seen are * plpgsql's prepared plans don't work at all for scenarios involving temp tables that are created and dropped in each use of the function. Then, the plan needs to be regenerated on every successive call. Right now we tell people they have to use EXECUTE, which is painful and gives up unnecessary amounts of performance (because it might well be useful to cache a plan for the lifespan of the table). * for parameterized queries, a generic plan gives up too much performance compared to one generated for specific constant parameter values. Neither of these problems have anything to do with statistics getting stale. regards, tom lane
Andrew Dunstan <andrew@dunslane.net> writes: > Tom Lane wrote: >> As noted downthread, we've confused out-of-line parameter value shipping >> with prepared statements. It might be worth rejiggering the FE/BE >> protocol to separate those things better. > Well, that's surely not going to happen in a hurry, is it? Maybe a quick > fix would be a way to allow the user to turn plan caching on and off. There aren't any "quick fixes" here (at least nothing that's likely to appear in 8.2). But I didn't mean the above suggestion as our only response to Peter's criticism --- more that that is one of several areas we ought to think about. regards, tom lane
> Neither of these problems have anything to do with statistics getting > stale. ... and the second one would benefit from a "meta-plan" facility which puts some "meta-plan" nodes on top of specific plans to dispatch based on parameter values at runtime. Incidentally, the dispatch could check the statistics assumptions too. If you don't need to do the planning for each execution, you could afford to check the assumptions for each execution instead... Cheers, Csaba.
On Aug 31, 2006, at 12:04 , Tom Lane wrote: > > The two major complaints that I've seen are > <snip> > Neither of these problems have anything to do with statistics getting > stale. Not stats-- plans. Plan invalidation has been discussed before, no? -M
On Thu, Aug 31, 2006 at 11:27:18AM -0400, Tom Lane wrote: > > I'd wish that we reconsider when and how prepared statements are used. The > > JDBC interface and PL/pgSQL are frequently noticed perpetrators, but the > > problem is really all over the place. > > AFAIK those are the only two places where preparation is the default > ... what else were you thinking of? Perl DBI (DBD::Pg) defaults to prepared plans when connecting to a version 8.0 or higher server. Or at least, that's the way I read the documentation. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout wrote: > On Thu, Aug 31, 2006 at 11:27:18AM -0400, Tom Lane wrote: >>> I'd wish that we reconsider when and how prepared statements are used. The >>> JDBC interface and PL/pgSQL are frequently noticed perpetrators, but the >>> problem is really all over the place. >> AFAIK those are the only two places where preparation is the default >> ... what else were you thinking of? > > Perl DBI (DBD::Pg) defaults to prepared plans when connecting to a > version 8.0 or higher server. > > Or at least, that's the way I read the documentation. AFAIK this is also the case for PHP PDO extension, which is bundled since PHP 5.1. regards, Lukas
Lukas Kahwe Smith wrote: > Martijn van Oosterhout wrote: >> On Thu, Aug 31, 2006 at 11:27:18AM -0400, Tom Lane wrote: >>>> I'd wish that we reconsider when and how prepared statements are >>>> used. The JDBC interface and PL/pgSQL are frequently noticed >>>> perpetrators, but the problem is really all over the place. >>> AFAIK those are the only two places where preparation is the default >>> ... what else were you thinking of? >> >> Perl DBI (DBD::Pg) defaults to prepared plans when connecting to a >> version 8.0 or higher server. >> >> Or at least, that's the way I read the documentation. > > AFAIK this is also the case for PHP PDO extension, which is bundled > since PHP 5.1. BTW: PDO has gotten a switch to force client side placeholder replacement in favor of using server side prepared statements due to the fact that prepared statements side-step the MySQL query cache. http://netevil.org/node.php?uuid=444a6017-0548-2459-2943-44a601714d58 BTW: I am not posting this to solicit MySQL bashing. The main reason why PDO pushes prepared statements is the fact that they offer good protection against SQL injection. However obviously in shared nothing architectures like PHP, which does not yet have any sort of connection/statement-pooling solution, the danger of prepared statements becoming stale over time is small. However the problem of running the same statements with two different parameters that require different plans is still quite real. regards, Lukas
AgentM wrote: > On Aug 31, 2006, at 12:04 , Tom Lane wrote: > > The two major complaints that I've seen are > > <snip> > > > Neither of these problems have anything to do with statistics > > getting stale. > > Not stats-- plans. Plan invalidation has been discussed before, no? Plan invalidation helps with schema changes and data changes but not with parametrized queries. -- Peter Eisentraut http://developer.postgresql.org/~petere/
On Thu, Aug 31, 2006 at 06:34:45PM +0200, Lukas Kahwe Smith wrote: > BTW: PDO has gotten a switch to force client side placeholder > replacement in favor of using server side prepared statements due to the > fact that prepared statements side-step the MySQL query cache. Perl DBD:Pg also has a switch to force one way or the other. However (as has been stated already) people are confusing prepared statements with out-of-line parameters. Even DBI uses the phrase "prepare" for setting up statements, whereas this doesn't actually require server-side prepare, all it needs is out-of-line parameters. I see from the source that DBD::Pg does use PQexecParams() sometimes so maybe it does support out-of-line parameters... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
On 8/31/06, Peter Eisentraut <peter_e@gmx.net> wrote: > Am Donnerstag, 31. August 2006 15:05 schrieb Merlin Moncure: > > The proposal to supply hints to statements and functions has been > > voted down several times due to the argument that it is better to fix > > the planner. I think supplying hints does fix the planner, and is a > > balanced solution. > > Planner hints are a way to address a deficient planner. But neither a > manually hinted planner nor a perfectly good planner will help if the > planning decisions are based on outdated information. right, anyways it's clearer now what you are suggesting and I think your idea regarding impicitly generated plans has some merit. the major annoyance for me is I have to force disconnect anytime there is a schema search_path change. query hints, which I still think would make my life much easier, do not have much to do with the thrust of your argument. I think, maybe to add some intelligence to implicit plan generation parhaps guarded by GUC: implicit_plan_generation=[none, smart, all] with smart meaning some defined events including perhaps: * creation or deletion of temp table * duration of time * user invocation * manipulation of search_path just thinking out loud here, merlin
Peter Eisentraut <peter_e@gmx.net> writes: > - Redefine "prepared" to mean "parsed" rather than "parsed and planned". Then you would be going very much against the user's expectations. Driver interfaces expose very clearly to the user an explicit interface to prepare and execute a query separately. What your proposing is to go behind the user's back and do what he's gone out of his way to tell you not to do. You can always choose to prepare your queries immediately before use. Most drivers even supply an interface to do so in a single step for convenience. If you've gone to the trouble of saving the prepared query handle you very much do NOT want the database spontaneously deciding to change the behaviour of that query (even just the performance behaviour) without warning. In fact somewhere down the list of my personal wishlist for Postgres is plan stability which would let the DBA control exactly when plans could change. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark wrote: > Then you would be going very much against the user's expectations. > > Driver interfaces expose very clearly to the user an explicit > interface to prepare and execute a query separately. What your > proposing is to go behind the user's back and do what he's gone out > of his way to tell you not to do. You can always choose to prepare > your queries immediately before use. Most drivers even supply an > interface to do so in a single step for convenience. Let's verify that. JDBC and PL/pgSQL have been mentioned. The JDBC documentation merely contains statements of the sort "A SQL statement with or without IN parameters can be pre-compiled and stored in a PreparedStatement object. This object can then be used to efficiently execute this statement multiple times." There is absolutely no indication that the execution plan of the statement is computed at the time of preparation. In fact, it doesn't say what "pre-compiled" means at all. For PL/pgSQL, you simply write a query and all the preparing action happens implicitly. There is nothing explicit about that interface. So if users have certain expectations here, they're just making them up. -- Peter Eisentraut http://developer.postgresql.org/~petere/
On Thu, Aug 31, 2006 at 06:43:38PM +0100, Gregory Stark wrote: > Peter Eisentraut <peter_e@gmx.net> writes: > > - Redefine "prepared" to mean "parsed" rather than "parsed and planned". > > Then you would be going very much against the user's expectations. > > Driver interfaces expose very clearly to the user an explicit interface to > prepare and execute a query separately. What your proposing is to go behind > the user's back and do what he's gone out of his way to tell you not to do. > You can always choose to prepare your queries immediately before use. Most > drivers even supply an interface to do so in a single step for convenience. Is that really so? Under Perl DBI, the only way to get a statement handle is to "prepare" it. Yet I don't want to use server-side prepares because I know of the problems it causes. The "single-step" approach provides no statement handle at all, which has several drawbacks. People are encouraged to use prepared stataments for clarity and security reasons, not speed. I would really like an option to choose between: - slightly more planning time but always good plans - plan once and be unforgiving if the plan doesn't work with the parameters I'd take the first option anyday, but that's just the types of queries I'm doing. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout wrote: > On Thu, Aug 31, 2006 at 06:43:38PM +0100, Gregory Stark wrote: > >> Peter Eisentraut <peter_e@gmx.net> writes: >> >>> - Redefine "prepared" to mean "parsed" rather than "parsed and planned". >>> >> Then you would be going very much against the user's expectations. >> >> Driver interfaces expose very clearly to the user an explicit interface to >> prepare and execute a query separately. What your proposing is to go behind >> the user's back and do what he's gone out of his way to tell you not to do. >> You can always choose to prepare your queries immediately before use. Most >> drivers even supply an interface to do so in a single step for convenience. >> > > Is that really so? Under Perl DBI, the only way to get a statement > handle is to "prepare" it. Yet I don't want to use server-side prepares > because I know of the problems it causes. The "single-step" approach > provides no statement handle at all, which has several drawbacks. > > People are encouraged to use prepared stataments for clarity and > security reasons, not speed. I would really like an option to choose > between: > > - slightly more planning time but always good plans > - plan once and be unforgiving if the plan doesn't work with the > parameters > > I'd take the first option anyday, but that's just the types of queries > I'm doing. > > Have a nice day, > According to the docs you can actually choose between server side prepare or not on a per call basis. It contains this example: $sth->{pg_server_prepare} = 1; $sth->execute(22); $sth->{pg_server_prepare} = 0; $sth->execute(44); $sth->{pg_server_prepare} = 1; $sth->execute(66); cheers andrew
On Thu, 2006-08-31 at 10:09 -0400, Theo Schlossnagle wrote: > On Aug 31, 2006, at 9:25 AM, Peter Eisentraut wrote: > > > Am Donnerstag, 31. August 2006 15:05 schrieb Merlin Moncure: > >> The proposal to supply hints to statements and functions has been > >> voted down several times due to the argument that it is better to fix > >> the planner. I think supplying hints does fix the planner, and is a > >> balanced solution. > > > There are many databases out there with better planners than > PostgreSQL -- likely there will always be. Even those databases have > query planner hints. Why? Because the authors of those database had > the humility to realize that the planner they designed wasn't perfect > and that people _still_ need their database to perform well despite a > non-optimal query plan here and there. You can see a related discussion here: http://archives.postgresql.org/pgsql-hackers/2006-08/msg00463.php What I understood from that thread was that the concept of planner hints was not completely rejected. I think the most likely outcome (if any planning system is implemented) is some mechanism to state the hint in a separate SQL declaration rather than inside the query itself. This can still result in potentially stale (or very stale) plans, but at least you don't have to change your application every time you modify the "hints". However, as far as I know, this has not progressed beyond the brainstorming stage. I think many people are still very skeptical of various implementations of planner hints, but there is some reasonable level of discussion. Regards,Jeff Davis
Peter Eisentraut <peter_e@gmx.net> writes: > Gregory Stark wrote: >> Driver interfaces expose very clearly to the user an explicit >> interface to prepare and execute a query separately. > The JDBC documentation merely contains statements of the sort "A SQL > statement with or without IN parameters can be pre-compiled and stored > in a PreparedStatement object. This object can then be used to > efficiently execute this statement multiple times." There is > absolutely no indication that the execution plan of the statement is > computed at the time of preparation. The key word there is "efficiently". I think it is a reasonable presumption on the user's part that a query done this way will have less overhead than just resubmitting the raw query each time. The important thing I see here is that JDBC allows use of IN parameters with or without a PreparedStatement (no?). So they've separated the concepts of out-of-line parameters and "preparing" a statement. That's the distinction we have unfortunately fudged in the V3 protocol. The protocol does let you use OOL parameters without retaining a prepared plan, thanks to the hack introduced later to not plan the unnamed statement at Parse time, but that's definitely a bit of a wart on the original protocol design. Maybe it's good enough, or maybe not. regards, tom lane
On Thu, Aug 31, 2006 at 02:58:48PM -0400, Tom Lane wrote: > The protocol does let you use OOL parameters without retaining a > prepared plan, thanks to the hack introduced later to not plan the > unnamed statement at Parse time, but that's definitely a bit of a wart > on the original protocol design. Maybe it's good enough, or maybe not. Urk, so it was a hack. Unfortunatly it seems something you can't really change without changing the protocol. So what are the options now? A GUC like so: prepare_means_plan = [true|false] So then a prepare will always parse straightaway, but you can choose whether or not you want to plan straightaway or at bind time. Would this be acceptable? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
On Thu, Aug 31, 2006 at 02:16:32PM -0400, Andrew Dunstan wrote: > According to the docs you can actually choose between server side > prepare or not on a per call basis. It contains this example: Yeah, but it also contains this: Using prepared statements is in theory quite a bit faster: not only does the PostgreSQL backend only have to preparethe query only once, but DBD::Pg no longer has to worry about quoting each value before sending it to the server. Which just continues the misconception: you can not worry about quoting each value and still not use server-side prepares. There's a third option which is not made clear (and it's not clear if it's available via DBI). Basically, unnamed "prepares" are not planned until bind time, named statements are planned at prepare time. The question is, do you want to be able to defer planning for named statements also? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
On 31-Aug-06, at 2:58 PM, Tom Lane wrote: > Peter Eisentraut <peter_e@gmx.net> writes: >> Gregory Stark wrote: >>> Driver interfaces expose very clearly to the user an explicit >>> interface to prepare and execute a query separately. > >> The JDBC documentation merely contains statements of the sort "A SQL >> statement with or without IN parameters can be pre-compiled and >> stored >> in a PreparedStatement object. This object can then be used to >> efficiently execute this statement multiple times." There is >> absolutely no indication that the execution plan of the statement is >> computed at the time of preparation. > > The key word there is "efficiently". I think it is a reasonable > presumption on the user's part that a query done this way will have > less > overhead than just resubmitting the raw query each time. > > The important thing I see here is that JDBC allows use of IN > parameters > with or without a PreparedStatement (no?). No, not that I am aware of. You can create a statement, and execute it, but you need a PreparedStatement to set IN parameters > So they've separated the > concepts of out-of-line parameters and "preparing" a statement. > That's > the distinction we have unfortunately fudged in the V3 protocol. > > The protocol does let you use OOL parameters without retaining a > prepared plan, thanks to the hack introduced later to not plan the > unnamed statement at Parse time, but that's definitely a bit of a wart > on the original protocol design. Maybe it's good enough, or maybe > not. > > regards, tom lane > > ---------------------------(end of > broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org > Dave
Ühel kenal päeval, N, 2006-08-31 kell 20:01, kirjutas Peter Eisentraut: > For PL/pgSQL, you simply write a query and all the preparing action > happens implicitly. There is nothing explicit about that interface. > > So if users have certain expectations here, they're just making them up. Or basing them on experience. I for one would not like it at all if all my queries (select * from plpgsqlfunc()) just magically become slower by 10-50% If there will be an option not to plan/optimise prepared statemants, I would certainly expect it to be off by default. -- ---------------- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com
Martijn van Oosterhout <kleptog@svana.org> writes: > So what are the options now? A GUC like so: > prepare_means_plan = [true|false] > So then a prepare will always parse straightaway, but you can choose > whether or not you want to plan straightaway or at bind time. That seems like just a kluge, as you'd typically want query-by-query control, and a GUC setting isn't convenient for that. It's entirely possible that the current protocol definition is Good Enough, assuming that client-library designers are aware of the implications of using named vs unnamed statements (which I bet not all of 'em are). You *can* have either behavior today, so far as client-issued queries go. The area that seems to need work more drastically is controlling what happens with queries inside plpgsql. regards, tom lane
mark@mark.mielke.cc writes: > Does Oracle automatically regenerate prepared plans on occasion? Not due to statistics changes, only if your schema changes. (caveat: I last used Oracle back at 8i) -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Peter Eisentraut <peter_e@gmx.net> writes: > Gregory Stark wrote: > > Let's verify that. JDBC and PL/pgSQL have been mentioned. > > The JDBC documentation merely contains statements of the sort "A SQL > statement with or without IN parameters can be pre-compiled and stored > in a PreparedStatement object. This object can then be used to > efficiently execute this statement multiple times." There is > absolutely no indication that the execution plan of the statement is > computed at the time of preparation. In fact, it doesn't say > what "pre-compiled" means at all. I didn't say you were violating the technical definition in the specification. I said you're going against expectations. This is the problem with being dogmatic about abstraction boundaries. Obviously someone who doesn't know what's going on under the hood has no specific expectations about what "pre-compiling" might mean. But the reality is that you can't effectively use a database without understanding what query plans are and users do have expectations about behaviour below the abstraction barrier. If you don't think "pre-compiled" and "efficiently execute multiple times" doesn't translate into "generates a query plan so it doesn't have to go through that process to execute the query" I think you're in a very small minority. > For PL/pgSQL, you simply write a query and all the preparing action > happens implicitly. There is nothing explicit about that interface. Well that's sort of the inherent problem with PLpgSQL and the way it mixes up the procedural language with SQL. I guess the natural extension of questioning PL/pgSQL would be to wonder why subqueries in SQL queries don't get replanned every time they're executed. The data distribution could certainly change partway though. > So if users have certain expectations here, they're just making them up. Well, that's what makes them "expectations" rather than promises. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Martijn van Oosterhout <kleptog@svana.org> writes: > > Then you would be going very much against the user's expectations. > > > > Driver interfaces expose very clearly to the user an explicit interface to > > prepare and execute a query separately. What your proposing is to go behind > > the user's back and do what he's gone out of his way to tell you not to do. > > You can always choose to prepare your queries immediately before use. Most > > drivers even supply an interface to do so in a single step for convenience. > > Is that really so? Under Perl DBI, the only way to get a statement > handle is to "prepare" it. Sure, but you can prepare it right before you use it and throw it away instead of keeping it around. The server has to prepare the query sometime. The v3 protocol just gives you control over when that happens, but it doesn't force you to do it at any particular time. -- greg
On Thu, Aug 31, 2006 at 07:04:07PM -0400, Gregory Stark wrote: > The server has to prepare the query sometime. The v3 protocol just gives you > control over when that happens, but it doesn't force you to do it at any > particular time. Not really. All named prepares are planned straight away, all unnamed ones are planned at bind time. Therefore you cannot have more than one parsed-but-not-planned prepared query at a time. In a connection pool scenario there's no way to share such plans since you can't tell which query has been prepared. That's not forcing, but it's an asymmetry we could do with out. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout wrote: > On Thu, Aug 31, 2006 at 07:04:07PM -0400, Gregory Stark wrote: >> The server has to prepare the query sometime. The v3 protocol just gives you >> control over when that happens, but it doesn't force you to do it at any >> particular time. > > Not really. All named prepares are planned straight away, all unnamed > ones are planned at bind time. Therefore you cannot have more than one > parsed-but-not-planned prepared query at a time. In a connection pool > scenario there's no way to share such plans since you can't tell which > query has been prepared. That's not forcing, but it's an asymmetry we > could do with out. AFAIK since Oracle 9i planning is always deferred until the first execution. This way they hope to get a better plan, which would obviously not be possible if the selectivity varies greatly. So are the plans generated without taking any bound values into account more stable in performance (albeit at times slower than what would have been produced if the value would have been known)? Either way mixing the question of when to prepare the plan with the prepared statement being named or unnamed seems unexpected. regards, Lukas
> >>> I'd wish that we reconsider when and how prepared statements are > >>> used. The JDBC interface and PL/pgSQL are frequently noticed > >>> perpetrators, but the problem is really all over the place. > >> AFAIK those are the only two places where preparation is > the default > >> ... what else were you thinking of? > > > > Perl DBI (DBD::Pg) defaults to prepared plans when connecting to a > > version 8.0 or higher server. > > > > Or at least, that's the way I read the documentation. Yea, but if you close the statement or leave the scope of the statement variable the plan is gone. So it is doing exactly what I would expect. It is written $stmt->prepare('select 1') what else would you expect ? There are enough other functions to get a result without a plan sticking around, like $db->selectrow_array .... Andreas
On Fri, Sep 01, 2006 at 09:26:24AM +0200, Lukas Kahwe Smith wrote: > AFAIK since Oracle 9i planning is always deferred until the first > execution. This way they hope to get a better plan, which would > obviously not be possible if the selectivity varies greatly. Interesting. > So are the plans generated without taking any bound values into account > more stable in performance (albeit at times slower than what would have > been produced if the value would have been known)? Possibly, though they might uniformly suck or be uniformly good... > Either way mixing the question of when to prepare the plan with the > prepared statement being named or unnamed seems unexpected. Yes, well, when the V3 protocol designed it wasn't considered to seperate the parse and plan phases. The exact commit that got us to where we are now is here: http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/tcop/postgres.c#rev1.420 That was over two years ago, I'm not sure whether any of the ideas mentioned in there have been implemented since. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
On Thu, August 31, 2006 21:41, Phil Frost wrote: >> Is there any kind of pattern at all to this problem? Anything >> recognizable? A few typical pitfalls? > > Frequently I have found preplanning will result in a horrible plan > because it is assumed parameters may be volatile while in practice they > are literals. Here is a function from my database: That's a very common thing in processor design as well, and there's a standard trick for it: the saturating two-bit counter. It tends to work pretty well for branch prediction, value prediction etc. Usually it's the first thing you reach for, so of course somebody may already have tried it here and found it didn't work. In this particular case it might be applied something like this: for each parameter in a prepared statement you cache a predictor value, plus a "confidence counter" saying (more or less--see below) how many times in succession that value has repeated. Let's say each of the counters count from 0 to 3 inclusive, with its confidence threshold right in the middle, between 1 and 2. On every invocation, you check each parameter value against the corresponding predictor value. If it's identical, you increment its counter (provided it can be incremented any further). If it isn't, you decrement its counter, and if the counter ends up below its confidence threshold, you replace the predictor value with the new parameter value. Then, whenever any new planning needs to be done (I'll get to that in a moment), you see which counters are above their confidence thresholds. In your new planning you assume that all parameters with confident predictions will remain pseudo-constant for the next few invocations. Of course there's a problem when parameters do not match predicted values.That's where having one or two backup plans couldcome in handy. You could keep your original, fully-generalized plan around. If plans are cheap enough to store, you could try to keep a cache of old plans for the same query. The great thing about keeping some backup plans around is that a pseudo-constant parameter can have a different value once in a while, then flick back to its old habits without invalidating all your efforts. Your usually-unused search fields are a good example. You may also have two stable parameter patterns with different sets of pseudo-constants competing for your attention. It's not perfect, and it clearly has its pathological cases--but if it works well enough overall, the bad cases could be caught and handled as exceptions. Confidence counters can be tweaked to lose confidence more easily than they gain it, or vice versa. Some meta-confidence scheme may catch the worst offenders. I won't go into that now--first I'll shut up and wait for others to point out what I've missed. :) Jeroen
On Fri, Sep 01, 2006 at 03:56:19PM +0700, Jeroen T. Vermeulen wrote: > That's a very common thing in processor design as well, and there's a > standard trick for it: the saturating two-bit counter. It tends to work > pretty well for branch prediction, value prediction etc. Usually it's the > first thing you reach for, so of course somebody may already have tried it > here and found it didn't work. Interesting thought. It might be worth trying. But my big question: is all this testing and counting actually going to be faster than just replanning? Postgresql's planner is not that slow. > Of course there's a problem when parameters do not match predicted values. > That's where having one or two backup plans could come in handy. You > could keep your original, fully-generalized plan around. If plans are > cheap enough to store, you could try to keep a cache of old plans for the > same query. The great thing about keeping some backup plans around is > that a pseudo-constant parameter can have a different value once in a > while, then flick back to its old habits without invalidating all your > efforts. Your usually-unused search fields are a good example. You may > also have two stable parameter patterns with different sets of > pseudo-constants competing for your attention. The thing is that number of possible plans is going to be proportional to factorial(number of tables). Once you have 3 tables you're going to have at least a dozen possible plans, probably more. What the best plan is depends strongly on what the parameters are. Anyway, your plan assumes that you have information to work with. The current system plans prepared queries with no information at all about parameters and people are advocating to keep it that way. I think a good first step would be the plan on first execution, like Oracle does. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
> Anyway, your plan assumes that you have information to work > with. The current system plans prepared queries with no > information at all about parameters and people are advocating > to keep it that way. I think a good first step would be the > plan on first execution, like Oracle does. Yup, it is also possible to try to find an obvious plan and only delay planning (or part of the plan) when different inputs make a big difference (like MaxDB and Informix). Andreas
> > I don't chime in very often, but I do think the refusal to incorporate > > hints into the planner system is fantastically stubborn and > > nonsensical. > > What is actually fantastically nonsensical about this is that > the issues I outlined about prepared statements would merely > become worse if planner hints were used. Then, you wouldn't > only have to worry about plans that were created earlier > during the session, you would be faced with plans that were > created earlier during the application's development. In > general, the solutions to the prepared statement issues need > to effect that the plans are created more often, not less often. I have yet to see one of our partial Informix hints (where the planner does it's usual job only with one path with lowered/elevated costs) fall foul on not anticipated change of underlying data. Thus I don't buy the argument that hints are always bad. Of course their use should be extremely rare and well thought out. Most of the time sql tuning involves a concerted effort between the programmer and a db performance expert, usually resulting in rewritten sql or program logic without adding hints. I can see arguments for hints the dba can set himself centrally on the server, but in my experience chances for substantial improvement are very limited in that case. Andreas
On Fri, September 1, 2006 16:53, Martijn van Oosterhout wrote: > Interesting thought. It might be worth trying. But my big question: is > all this testing and counting actually going to be faster than just > replanning? Postgresql's planner is not that slow. In the best case (which of course would have to be very frequent for any of this to matter in the first place) it's mainly just a short loop comparing the call's parameter values to their counterparts stored with the plan and update those two-bit confidence counters. You wouldn't *believe* how simple you have to keep these things in processor architecture. :-) > The thing is that number of possible plans is going to be proportional > to factorial(number of tables). Once you have 3 tables you're going to > have at least a dozen possible plans, probably more. What the best plan > is depends strongly on what the parameters are. Of course. That's the whole point: to end up with a small but effective subset of all those possible plans. I'd guess that you could cover even most of the nasty cases with a maximum of three plans or so per prepared statement, including the original fully-generalized one. The plans could be replaced on an LRU basis, which isn't very costly for three or so entries. > Anyway, your plan assumes that you have information to work with. The > current system plans prepared queries with no information at all about > parameters and people are advocating to keep it that way. I think a > good first step would be the plan on first execution, like Oracle does. Yes, delaying things a bit can help a lot sometimes. That's also what JIT compilers in JVMs do, for instance. FWIW, libpqxx doesn't prepare statements until they're first called anyway. But if this choice to discard parameter information is exactly what causes a lot of the bad plans in the first place, as Peter says, what's wrong with putting it to use instead? For those cases, you're pretty much screwed by definition as long as you fail to do so. And it's not like what I'm suggesting is very difficult! The real question is whether it's worthwhile. To find that out, we'd need to estimate four factors: coverage (how often you'd get a useful prediction), accuracy (how often that prediction would be accurate), cost of misprediction (near-zero compared to current situation, assuming we keep the generalized plans handy), and savings for correct prediction (in our case, benefit of planning for a constant instead of a variable minus the cost of re-planning which you say isn't very expensive). Based on what Peter and you tell me about cost, the main worries here are coverage and accuracy. Coverage and accuracy can be extracted (and tweaked!) relatively easily if we have logs of prepared-statement executions in a wide variety of real-life applications. Listings of consecutive prepared-statement invocations (statement name plus parameter values) are all that's needed. Do we have any logs like that? If we do, I'll be more than happy to run some simulations and see if the idea shows any promise. Like I said, there's every chance that it doesn't. It was just an off-the-cuff suggestion and if it's no good I'll have no problems saying so. But there's not much point sitting around arguing over theoretical merits if they're that easy to quantify! Jeroen
Martijn van Oosterhout <kleptog@svana.org> writes: >> The server has to prepare the query sometime. The v3 protocol just gives you >> control over when that happens, but it doesn't force you to do it at any >> particular time. > Not really. All named prepares are planned straight away, all unnamed > ones are planned at bind time. Therefore you cannot have more than one > parsed-but-not-planned prepared query at a time. In a connection pool > scenario there's no way to share such plans since you can't tell which > query has been prepared. That's not forcing, but it's an asymmetry we > could do with out. Sure, but how much does it really matter? If you don't want the plan saved for reuse, merely avoiding retransmission of the query text does not seem like a major win. Having had time to think about it, I no longer think the protocol design is a blocking bug for this problem area. It's something we could improve when we are ready to design protocol V4, but it does not seem in itself enough reason to make a new protocol (with all the pain that entails). regards, tom lane
On Fri, Sep 01, 2006 at 11:53:11AM +0200, Martijn van Oosterhout wrote: > On Fri, Sep 01, 2006 at 03:56:19PM +0700, Jeroen T. Vermeulen wrote: > > That's a very common thing in processor design as well, and there's a > > standard trick for it: the saturating two-bit counter. It tends to work > > pretty well for branch prediction, value prediction etc. Usually it's the > > first thing you reach for, so of course somebody may already have tried it > > here and found it didn't work. > Interesting thought. It might be worth trying. But my big question: is > all this testing and counting actually going to be faster than just > replanning? Postgresql's planner is not that slow. The difference between a pre-planned query, and a plan each time query, for me, seems to be a minimum of around 0.3 - 0.5 ms. This is on a fairly modern AMD X2 3800+. If the tests and counting are kept simple - I don't see why they would take anywhere near that long. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
Martijn van Oosterhout <kleptog@svana.org> writes: > Interesting thought. It might be worth trying. But my big question: is > all this testing and counting actually going to be faster than just > replanning? Postgresql's planner is not that slow. > ... > The thing is that number of possible plans is going to be proportional > to factorial(number of tables). Yeah. One of the reasons the planner is acceptably fast is that it is aggressive about discarding candidate plans as soon as they are clearly inferior to other plans. Tracking multiple plans that might be optimal under varying assumptions about the query parameters would make things exponentially slower. regards, tom lane
"Jeroen T. Vermeulen" <jtv@xs4all.nl> writes: > On Fri, September 1, 2006 16:53, Martijn van Oosterhout wrote: > >> Interesting thought. It might be worth trying. But my big question: is >> all this testing and counting actually going to be faster than just >> replanning? Postgresql's planner is not that slow. > > In the best case (which of course would have to be very frequent for any > of this to matter in the first place) it's mainly just a short loop > comparing the call's parameter values to their counterparts stored with > the plan and update those two-bit confidence counters. You wouldn't > *believe* how simple you have to keep these things in processor > architecture. :-) I think the slow part is trying to figure out whether to count the current call as a hit or a miss. How do you determine whether the plan you're running is the best plan without replanning the query? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On Fri, September 1, 2006 21:30, Tom Lane wrote: > Yeah. One of the reasons the planner is acceptably fast is that it is > aggressive about discarding candidate plans as soon as they are clearly > inferior to other plans. Tracking multiple plans that might be optimal > under varying assumptions about the query parameters would make things > exponentially slower. AFAICS the planner shouldn't be affected at all--it'd just be invoked more often as and when the need for new plans became apparent. Not "exponentially" (that's an overused word anyway) but "proportionally" to that. I've been assuming that once you have a plan, storing it is not very expensive. If, say, doubling the number of plans stored with a session's prepared statements incurs some serious cost (apart from the planning itself, of course) then that changes things. Jeroen
On Fri, Sep 01, 2006 at 10:18:37AM -0400, Tom Lane wrote: > Sure, but how much does it really matter? If you don't want the plan > saved for reuse, merely avoiding retransmission of the query text does > not seem like a major win. Having had time to think about it, I no > longer think the protocol design is a blocking bug for this problem > area. It's something we could improve when we are ready to design > protocol V4, but it does not seem in itself enough reason to make a > new protocol (with all the pain that entails). Well, I see the documentation does actually describe the situation fairly well, so perhaps the right approach is to get interface writers to appreciate the difference and indicate if named or unnamed prepares are used. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
On Fri, September 1, 2006 22:14, Gregory Stark wrote: > I think the slow part is trying to figure out whether to count the current > call as a hit or a miss. How do you determine whether the plan you're > running > is the best plan without replanning the query? The question of knowing which plan is best _based on what's in the actual tables_ would be unsolved just as it always was. The scheme addresses only the opportunity to optimize for pseudo-constant parameters. It treats the existing planner as a black box. If you find a solution to the problem of inaccurate statistics, it'll probably be more or less orthogonal to what I'm describing: you could have one or the other, but combining them shouldn't be much harder. I don't think telling hits from misses would be all that hard. Let's say you're having a prepared statement called, and you're evaluating a candidate plan. Each parameter is in one of two sets: those "predicted" by the plan to have certain values (let's call them P), and those "not predicted" by the plan because their confidence counters were below the threshold (I'm tempted to call this set NP, but let's make it Q instead). Whether a parameter is in P or in Q can be derived from its confidence counter. In my previous example, you just take its most-significant bit. * For any parameter in P, if the actual value does not match the plan's prediction, you have a miss. Can't use this plan. Use another if you have one that applies (such as your regular old non-specialized plan--that always applies), or if not, write a new one! If you get through this without finding a mismatch, congratulations: you have a hit. The plan you're looking at is applicable to your call. But now we see if we can do better: * For any parameter in Q, if its value would have been predicted correctly but its counter was below the confidence threshold, you increment the counter. If that lifts the counter above the threshold, you have room for improving on this plan. It means there's a good chance you can re-plan for the case that this parameter is also a pseudo-constant, without the effort being wasted. Of course you could also set a minimum number of invocations between re-plannings to get a more long-term view (e.g. different parameters being recognized as pseudo-constants in subsequent calls--you may not want to re-plan for each of those calls). So which plan do you execute if you have more than one applicable candidate? We can see what works well. As a starter I would definitely pick the one with the larger P (smaller Q), breaking ties in favour of the most recently generated plan. I'm assuming we only want one plan for a given P. We'd probably want to limit the number of candidate plans per statement to some very small, fixed number--somewhere between one and four, I'd say; or maybe one generalized plan plus up to two specialized ones. With numbers like that, none of this should be very expensive. A simple linear match against 1-4 candidates may be more effective than any attempt to be clever. I must admit I haven't thought through all of the consequences of caching more than one specialized plan per statement. For example, we could give every cached plan its own set of confidence counters, and match an incoming invocation against each of those; or we could keep just one "most likely" plan with its associated predictor state, and only consider previously generated plans if we either miss or find room for improvement in the predictor. Jeroen
On Fri, Sep 01, 2006 at 04:14:32PM +0100, Gregory Stark wrote: > >> Interesting thought. It might be worth trying. But my big question: is > >> all this testing and counting actually going to be faster than just > >> replanning? Postgresql's planner is not that slow. > > > > In the best case (which of course would have to be very frequent for any > > of this to matter in the first place) it's mainly just a short loop > > comparing the call's parameter values to their counterparts stored with > > the plan and update those two-bit confidence counters. You wouldn't > > *believe* how simple you have to keep these things in processor > > architecture. :-) > > I think the slow part is trying to figure out whether to count the current > call as a hit or a miss. How do you determine whether the plan you're running > is the best plan without replanning the query? Simply looking at estimated row counts/cost versus what actually happened would probably suffice. It'd at least be a great start. -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On Fri, Sep 01, 2006 at 10:18:37AM -0400, Tom Lane wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: > >> The server has to prepare the query sometime. The v3 protocol just gives you > >> control over when that happens, but it doesn't force you to do it at any > >> particular time. > > > Not really. All named prepares are planned straight away, all unnamed > > ones are planned at bind time. Therefore you cannot have more than one > > parsed-but-not-planned prepared query at a time. In a connection pool > > scenario there's no way to share such plans since you can't tell which > > query has been prepared. That's not forcing, but it's an asymmetry we > > could do with out. > > Sure, but how much does it really matter? If you don't want the plan > saved for reuse, merely avoiding retransmission of the query text does > not seem like a major win. Having had time to think about it, I no > longer think the protocol design is a blocking bug for this problem > area. It's something we could improve when we are ready to design > protocol V4, but it does not seem in itself enough reason to make a > new protocol (with all the pain that entails). That should either go into the TODO, or a "V4 wishlist"... -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net