Thread: Poor plan choice in prepared statement
Hi, I am re-posting my question here after trying to find a solution in the PHP pgsql list with no luck. I am experiencing some performance issues that I think are stemming from prepared statements. I have a pretty simple query: SELECT cl.idOffer,cl.idaffiliate ,cl.subid,cl.datetime FROM click AS cl LEFT JOIN conversion AS co ON cl.clickgenerated = co.clickgenerated WHERE cl."date" >= '2008-12-01' AND cl."date" <= '2008-12-23' AND cl.idAffiliate = 49978 LIMIT 10; Run times: - sub-second when issued from the command line (not prepared) - runs sub-second from our PHP application if I prepend the query with "EXPLAIN ANALYZE", and looking at the resulting plan, it shows the same plan as when it runs quickly from the command line. - runs sub-second if I remove the "prepare" function from the PHP application and execute the query with the variables in-line. - takes 200+ seconds when run from the command line inside a prepared statement (see query below) - takes over 200s when run from our application, within the PHP PDO prepared function (same as preparing from the command line) I would like to continue to use bind variables to prevent sql injection, but I'd like to force a plan re-parse for every single query (if necessary?) Does anyone have any ideas? postgresql 8.2.11, freshly loaded db, vacuumed and analyzed. All indices in the plans are regular b-trees. -- good plan, from non-prepared statement -- dev=# EXPLAIN ANALYZE SELECT cl.idOffer,cl.idaffiliate ,cl.subid,cl.datetime dev-# FROM click AS cl LEFT JOIN conversion AS co ON cl.clickgenerated = co.clickgenerated dev-# WHERE cl."date" >= '2008-12-01' dev-# AND cl."date" <= '2008-12-23' dev-# AND cl.idAffiliate = 49978 dev-# LIMIT 10; Limit (cost=0.00..30.52 rows=10 width=48) (actual time=0.253..0.429 rows=10 loops=1) -> Nested Loop Left Join (cost=0.00..2613524.29 rows=856328 width=48) (actual time=0.251..0.416 rows=10 loops=1) -> Append (cost=0.00..77406.15 rows=19179 width=80) (actual time=0.226..0.256 rows=10 loops=1) -> Seq Scan on click cl (cost=0.00..15.07 rows=1 width=80) (actual time=0.001..0.001 rows=0 loops=1) Filter: ((date >= '2008-12-01'::date) AND (date <= '2008-12-23'::date) AND (idaffiliate = 49978)) -> Bitmap Heap Scan on click_00030 cl (cost=406.48..77391.08 rows=19178 width=46) (actual time=0.222..0.241 rows=10 loops=1) Recheck Cond: (idaffiliate = 49978) Filter: ((date >= '2008-12-01'::date) AND (date <= '2008-12-23'::date)) -> Bitmap Index Scan on click_00030_idaffiliate_idx (cost=0.00..401.68 rows=21355 width=0) (actual time=0.138..0.138 rows=484 loops=1) Index Cond: (idaffiliate = 49978) -> Index Scan using conversion_clickgenerated_idx on "conversion" co (cost=0.00..131.67 rows=45 width=12) (actual time=0.014..0.014 rows=0 loops=10) Index Cond: ((cl.clickgenerated)::text = (co.clickgenerated)::text) Total runtime: 0.495 ms -- bad plan, from prepared statement -- dev=# prepare fooplan (date,date,int,int) as dev-# SELECT cl.idOffer AS campaign, cl.idAffiliate AS affiliate, cl.idCreative AS creative, cl.subid, cl.datetime dev-# FROM click AS cl LEFT JOIN conversion AS co ON cl.clickGenerated = co.clickGenerated dev-# WHERE cl."date" >= $1 dev-# AND cl."date" <= $2 dev-# AND cl.idAffiliate = $3 dev-# LIMIT $4; ERROR: prepared statement "fooplan" already exists dev=# explain analyze execute fooplan ('2008-12-01','2008-12-23',49978,10); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=0.00..15275.74 rows=3277 width=227) (actual time=201350.494..201350.765 rows=10 loops=1) -> Nested Loop Left Join (cost=0.00..152771.39 rows=32773 width=227) (actual time=201350.489..201350.748 rows=10 loops=1) -> Append (cost=0.00..16031.56 rows=734 width=180) (actual time=201350.438..201350.542 rows=10 loops=1) -> Seq Scan on click cl (cost=0.00..15.07 rows=1 width=180) (actual time=0.001..0.001 rows=0 loops=1) Filter: ((date >= $1) AND (date <= $2) AND (idaffiliate = $3)) -> Bitmap Heap Scan on click_00028 cl (cost=6701.97..8179.43 rows=372 width=177) (actual time=0.060..0.060 rows=0 loops=1) Recheck Cond: ((idaffiliate = $3) AND (date >= $1) AND (date <= $2)) -> BitmapAnd (cost=6701.97..6701.97 rows=372 width=0) (actual time=0.058..0.058 rows=0 loops=1) -> Bitmap Index Scan on click_00028_idaffiliate_idx (cost=0.00..1384.22 rows=74456 width=0) (actual time=0.038..0.038 rows=86 loops=1) Index Cond: (idaffiliate = $3) -> Bitmap Index Scan on click_00028_date_idx (cost=0.00..5317.31 rows=253151 width=0) (actual time=0.010..0.010 rows=0 loops=1) Index Cond: ((date >= $1) AND (date <= $2)) -> Bitmap Heap Scan on click_00030 cl (cost=6383.03..7811.35 rows=360 width=141) (actual time=201350.373..201350.466 rows=10 loops=1) Recheck Cond: ((idaffiliate = $3) AND (date >= $1) AND (date <= $2)) -> BitmapAnd (cost=6383.03..6383.03 rows=360 width=0) (actual time=201350.280..201350.280 rows=0 loops=1) -> Bitmap Index Scan on click_00030_idaffiliate_idx (cost=0.00..1337.96 rows=72058 width=0) (actual time=0.101..0.101 rows=484 loops=1) Index Cond: (idaffiliate = $3) -> Bitmap Index Scan on click_00030_date_idx (cost=0.00..5044.64 rows=240312 width=0) (actual time=201347.064..201347.064 rows=43352606 loops=1) Index Cond: ((date >= $1) AND (date <= $2)) -> Index Scan using click_current_massive_idx on click_current cl (cost=0.00..25.70 rows=1 width=152) (never executed) Index Cond: ((date >= $1) AND (date <= $2) AND (idaffiliate = $3)) -> Index Scan using conversion_clickgenerated_idx on "conversion" co (cost=0.00..185.51 rows=45 width=16) (actual time=0.015..0.016 rows=0 loops=10) Index Cond: ((cl.clickgenerated)::text = (co.clickgenerated)::text) Total runtime: 201350.887 ms (24 rows) Time: 201351.556 ms dev=# deallocate fooplan; DEALLOCATE Thanks!
On Tue, Dec 30, 2008 at 1:59 PM, bricklen <bricklen@gmail.com> wrote: > Hi, I am re-posting my question here after trying to find a solution > in the PHP pgsql list with no luck. > > I am experiencing some performance issues that I think are stemming > from prepared statements. I have a pretty simple query: > -- bad plan, from prepared statement > -- > dev=# prepare fooplan (date,date,int,int) as > dev-# SELECT cl.idOffer AS campaign, cl.idAffiliate AS affiliate, > cl.idCreative AS creative, cl.subid, cl.datetime > dev-# FROM click AS cl LEFT JOIN conversion AS co ON cl.clickGenerated > = co.clickGenerated > dev-# WHERE cl."date" >= $1 > dev-# AND cl."date" <= $2 > dev-# AND cl.idAffiliate = $3 > dev-# LIMIT $4; Your problem is that the query as written is hard to plan. The database has no idea what you pass in, it has to guess. (IMO, It almost always guesses wrong...I think it should assume 1 row returned). Also, the db has no idea what you want to pass in at plan time for date. what indexes do you have on click? merlin
Hi Merlin, On Tue, Dec 30, 2008 at 11:42 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Tue, Dec 30, 2008 at 1:59 PM, bricklen <bricklen@gmail.com> wrote: >> Hi, I am re-posting my question here after trying to find a solution >> in the PHP pgsql list with no luck. >> >> I am experiencing some performance issues that I think are stemming >> from prepared statements. I have a pretty simple query: >> -- bad plan, from prepared statement >> -- >> dev=# prepare fooplan (date,date,int,int) as >> dev-# SELECT cl.idOffer AS campaign, cl.idAffiliate AS affiliate, >> cl.idCreative AS creative, cl.subid, cl.datetime >> dev-# FROM click AS cl LEFT JOIN conversion AS co ON cl.clickGenerated >> = co.clickGenerated >> dev-# WHERE cl."date" >= $1 >> dev-# AND cl."date" <= $2 >> dev-# AND cl.idAffiliate = $3 >> dev-# LIMIT $4; > > Your problem is that the query as written is hard to plan. The > database has no idea what you pass in, it has to guess. (IMO, It > almost always guesses wrong...I think it should assume 1 row > returned). Also, the db has no idea what you want to pass in at plan > time for date. > > what indexes do you have on click? > > merlin > "click" is a partitioned table, but the child tables are all the same. Here is the current partition: dev=# \d click_current Column | Type | Modifiers ----------------+-----------------------------+---------------------------------------------------- id | bigint | not null default nextval('click_id_seq'::regclass) idaffiliate | integer | not null idsite | integer | not null idoffer | integer | not null idcreative | integer | not null idoptimizer | integer | clickgenerated | character varying | not null subid | character varying | datetime | timestamp without time zone | not null date | date | ip | inet | not null xip | inet | referrer | text | countrycode | character varying | timestamp | timestamp without time zone | not null Indexes: "click_current_pk" PRIMARY KEY, btree (id) "click_current_clickgenerated_idx" btree (clickgenerated) "click_current_date_idx" btree (date) "click_current_idoffer_idx" btree (idoffer) "click_current_massive_idx" btree (date, idaffiliate, idsite, idoffer, idcreative, idoptimizer, subid) Check constraints: "click_current_date_chk" CHECK (date > '2008-12-29'::date)
On Tue, Dec 30, 2008 at 12:42 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Tue, Dec 30, 2008 at 1:59 PM, bricklen <bricklen@gmail.com> wrote: >> Hi, I am re-posting my question here after trying to find a solution >> in the PHP pgsql list with no luck. >> >> I am experiencing some performance issues that I think are stemming >> from prepared statements. I have a pretty simple query: >> -- bad plan, from prepared statement >> -- >> dev=# prepare fooplan (date,date,int,int) as >> dev-# SELECT cl.idOffer AS campaign, cl.idAffiliate AS affiliate, >> cl.idCreative AS creative, cl.subid, cl.datetime >> dev-# FROM click AS cl LEFT JOIN conversion AS co ON cl.clickGenerated >> = co.clickGenerated >> dev-# WHERE cl."date" >= $1 >> dev-# AND cl."date" <= $2 >> dev-# AND cl.idAffiliate = $3 >> dev-# LIMIT $4; > > Your problem is that the query as written is hard to plan. The > database has no idea what you pass in, it has to guess. (IMO, It > almost always guesses wrong...I think it should assume 1 row > returned). Also, the db has no idea what you want to pass in at plan > time for date. One of the things you can try here is to build your query then execute it so it has to be planned each time.
Hi Scott, On Tue, Dec 30, 2008 at 12:09 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Tue, Dec 30, 2008 at 12:42 PM, Merlin Moncure <mmoncure@gmail.com> wrote: >> On Tue, Dec 30, 2008 at 1:59 PM, bricklen <bricklen@gmail.com> wrote: >>> Hi, I am re-posting my question here after trying to find a solution >>> in the PHP pgsql list with no luck. >>> >>> I am experiencing some performance issues that I think are stemming >>> from prepared statements. I have a pretty simple query: >>> -- bad plan, from prepared statement >>> -- >>> dev=# prepare fooplan (date,date,int,int) as >>> dev-# SELECT cl.idOffer AS campaign, cl.idAffiliate AS affiliate, >>> cl.idCreative AS creative, cl.subid, cl.datetime >>> dev-# FROM click AS cl LEFT JOIN conversion AS co ON cl.clickGenerated >>> = co.clickGenerated >>> dev-# WHERE cl."date" >= $1 >>> dev-# AND cl."date" <= $2 >>> dev-# AND cl.idAffiliate = $3 >>> dev-# LIMIT $4; >> >> Your problem is that the query as written is hard to plan. The >> database has no idea what you pass in, it has to guess. (IMO, It >> almost always guesses wrong...I think it should assume 1 row >> returned). Also, the db has no idea what you want to pass in at plan >> time for date. > > One of the things you can try here is to build your query then execute > it so it has to be planned each time. > Yeah, I've tested that in the application itself and it worked correctly. I am trying to discover a way to use bind variables in PHP without using the prepare function (to block sql injection), or if I must use the prepare function, then force it to replan each time somehow. That's part of where I'm stuck (and I'm no php guy).
There is no way to force Postgres to re-plan a prepared statement. In many cases, this would be a hugely beneficial feature(perhaps part of the definition of the statement?). I have had similar issues, and had to code the application to prevent SQL injection (Postgres $ quotes and other stuff ishelpful, but not always adequate or easy). With the current state of things, you'll have to often do your SQL injectiondefense in your application due to this weakness in prepared statements. I have also had a case where one query would take a couple hundred ms to parse, but was fairly fast to plan and execute(1/3 the parse cost) -- yet another case where a prepared statement that re-plans each execution would be helpful. At least you can prevent SQL injection and cut the parse cost. Its not all about the cost of planning the query. ________________________________________ From: pgsql-performance-owner@postgresql.org [pgsql-performance-owner@postgresql.org] On Behalf Of bricklen [bricklen@gmail.com] Sent: Tuesday, December 30, 2008 12:14 PM To: Scott Marlowe Cc: Merlin Moncure; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Poor plan choice in prepared statement Hi Scott, On Tue, Dec 30, 2008 at 12:09 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Tue, Dec 30, 2008 at 12:42 PM, Merlin Moncure <mmoncure@gmail.com> wrote: >> On Tue, Dec 30, 2008 at 1:59 PM, bricklen <bricklen@gmail.com> wrote: >>> Hi, I am re-posting my question here after trying to find a solution >>> in the PHP pgsql list with no luck. >>> >>> I am experiencing some performance issues that I think are stemming >>> from prepared statements. I have a pretty simple query: >>> -- bad plan, from prepared statement >>> -- >>> dev=# prepare fooplan (date,date,int,int) as >>> dev-# SELECT cl.idOffer AS campaign, cl.idAffiliate AS affiliate, >>> cl.idCreative AS creative, cl.subid, cl.datetime >>> dev-# FROM click AS cl LEFT JOIN conversion AS co ON cl.clickGenerated >>> = co.clickGenerated >>> dev-# WHERE cl."date" >= $1 >>> dev-# AND cl."date" <= $2 >>> dev-# AND cl.idAffiliate = $3 >>> dev-# LIMIT $4; >> >> Your problem is that the query as written is hard to plan. The >> database has no idea what you pass in, it has to guess. (IMO, It >> almost always guesses wrong...I think it should assume 1 row >> returned). Also, the db has no idea what you want to pass in at plan >> time for date. > > One of the things you can try here is to build your query then execute > it so it has to be planned each time. > Yeah, I've tested that in the application itself and it worked correctly. I am trying to discover a way to use bind variables in PHP without using the prepare function (to block sql injection), or if I must use the prepare function, then force it to replan each time somehow. That's part of where I'm stuck (and I'm no php guy). -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
On Tue, Dec 30, 2008 at 1:09 PM, Scott Carey <scott@richrelevance.com> wrote: > There is no way to force Postgres to re-plan a prepared statement. In many cases, this would be a hugely beneficial feature(perhaps part of the definition of the statement?). > > I have had similar issues, and had to code the application to prevent SQL injection (Postgres $ quotes and other stuffis helpful, but not always adequate or easy). With the current state of things, you'll have to often do your SQL injectiondefense in your application due to this weakness in prepared statements. > > I have also had a case where one query would take a couple hundred ms to parse, but was fairly fast to plan and execute(1/3 the parse cost) -- yet another case where a prepared statement that re-plans each execution would be helpful. At least you can prevent SQL injection and cut the parse cost. Its not all about the cost of planning the query. > Can you point out any standard ways of preventing sql injection (within or without php)? I would really rather not have to roll my own input sanitation code if there are standard methods available. For example, addslashes is apparently not recommended (according to a warning in the postgresql docs at http://wiki.postgresql.org/wiki/8.1.4_et._al._Security_Release_FAQ). Thanks! Bricklen
Scott Carey <scott@richrelevance.com> writes: > I have also had a case where one query would take a couple hundred ms to parse, but was fairly fast to plan and execute(1/3 the parse cost) -- yet another case where a prepared statement that re-plans each execution would be helpful. At least you can prevent SQL injection and cut the parse cost. Its not all about the cost of planning the query. The point of a prepared statement IMHO is to do the planning only once. There's necessarily a tradeoff between that and having a plan that's perfectly adapted to specific parameter values. Reasonable client-side APIs should provide the option to use out-of-line parameters, which is what you want to prevent SQL injection, without hard-wiring that to the orthogonal concept of statements whose plan is prepared in advance. In libpq, for instance, PQexecParams() does that. regards, tom lane
Hi Tom, On Tue, Dec 30, 2008 at 3:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > The point of a prepared statement IMHO is to do the planning only once. > There's necessarily a tradeoff between that and having a plan that's > perfectly adapted to specific parameter values. I agree, and normally it wouldn't be an issue. In this particular case, we are seeing response time to go from sub-second with non-prepared queries, to over 200 seconds w/ prepared queries. Note that is not an isolated case in our application, merely the numbers from this particular example. > > Reasonable client-side APIs should provide the option to use out-of-line > parameters, which is what you want to prevent SQL injection, without > hard-wiring that to the orthogonal concept of statements whose plan is > prepared in advance. In libpq, for instance, PQexecParams() does that. > > regards, tom lane > Again, I agree completely. What I am after I guess are some pointers on where to look for that, with regards to PHP. Whatever I turn up, I will turn over to our developers, but before I do that I want to be sure I am giving them the correct advice. Thanks, Bricklen
On Tue, 30 Dec 2008, Tom Lane wrote: > Scott Carey <scott@richrelevance.com> writes: >> I have also had a case where one query would take a couple hundred ms to parse, but was fairly fast to plan and execute(1/3 the parse cost) -- yet another case where a prepared statement that re-plans each execution would be helpful. At least you can prevent SQL injection and cut the parse cost. Its not all about the cost of planning the query. > > The point of a prepared statement IMHO is to do the planning only once. > There's necessarily a tradeoff between that and having a plan that's > perfectly adapted to specific parameter values. actually, it does two things 1. planning only once 2. parsing only once. I suspect that when this was initially setup the expectation was that the planning was the expensive thing that should be avoided. in this case a post earlier in the thread identified parsing of the query as being the expensive thing (planning + execution was 1/3 the cost of the parsing) since there is not a pre-parsed interface for queries, it may make sense to setup a way to have the query pre-parsed, but not pre-planned for cases like this. David Lang
david@lang.hm writes: > since there is not a pre-parsed interface for queries, it may make sense to > setup a way to have the query pre-parsed, but not pre-planned for cases like > this. What would be more interesting would be to have plans that take into account the outlier values and have alternative plans for those values. One aspect that hasn't been discussed here is whether it's only certain outlier arguments that cause Postgres to choose the poor plan for you or whether it always chooses it for all the sets of arguments you actually use. If it's the former then it's possible you're only being bitten if the first time you prepare the query happens to have one of these outlier set of parameters. I forget what version this went in but I think it was before 8.2 that Postgres started using the first set of arguments seen to plan the query. This is usually an improvement over just guessing but if that first set is unusual it could lead to strange results. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning
On Tue, Dec 30, 2008 at 7:59 PM, bricklen <bricklen@gmail.com> wrote: > I would like to continue to use bind variables to prevent sql > injection, but I'd like to force a plan re-parse for every single > query (if necessary?) As far as I understand your problem, you don't care about using prepared statements. If so, you can: - either use pg_query_params(): http://www.php.net/manual/en/function.pg-query-params.php - or use an unnamed prepared statements when you don't want a prepared statement if, for some reason, you really need to use prepared statements in a few cases: you can specify an empty string as plan name. The condition for this one is that you need to upgrade to a recent version of 8.3 as postponed planning of unnamed prepared statements is a new feature of 8.3 and was broken in 8.3.0 and 8.3.1. -- Guillaume
Hi Guillaume, On Wed, Dec 31, 2008 at 1:12 AM, Guillaume Smet <guillaume.smet@gmail.com> wrote: > On Tue, Dec 30, 2008 at 7:59 PM, bricklen <bricklen@gmail.com> wrote: >> I would like to continue to use bind variables to prevent sql >> injection, but I'd like to force a plan re-parse for every single >> query (if necessary?) > > As far as I understand your problem, you don't care about using > prepared statements. > > If so, you can: > - either use pg_query_params(): > http://www.php.net/manual/en/function.pg-query-params.php > - or use an unnamed prepared statements when you don't want a prepared > statement if, for some reason, you really need to use prepared > statements in a few cases: you can specify an empty string as plan > name. The condition for this one is that you need to upgrade to a > recent version of 8.3 as postponed planning of unnamed prepared > statements is a new feature of 8.3 and was broken in 8.3.0 and 8.3.1. > > -- > Guillaume > I will look into the pg_query_params() function to see if it works for us. I don't think your second suggestion is going to work for us, since we are stuck on 8.2 for the foreseeable future. Thanks for the tips though, I appreciate it! Bricklen
Tom Lane escribió: > Scott Carey <scott@richrelevance.com> writes: > > I have also had a case where one query would take a couple hundred > > ms to parse, but was fairly fast to plan and execute (1/3 the parse > > cost) -- yet another case where a prepared statement that re-plans > > each execution would be helpful. At least you can prevent SQL > > injection and cut the parse cost. Its not all about the cost of > > planning the query. > > The point of a prepared statement IMHO is to do the planning only once. > There's necessarily a tradeoff between that and having a plan that's > perfectly adapted to specific parameter values. I think it has been shown enough times that the performance drop caused by a worse plan can be orders of magnitudes worse than what's gained by producing the plan only once. It does not seem a bad idea to provide a way to carry out only the parse phase, and postpone planning until the parameters have been received. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Wed, Dec 31, 2008 at 11:01 AM, Alvaro Herrera <alvherre@commandprompt.com> wrote: >> The point of a prepared statement IMHO is to do the planning only once. >> There's necessarily a tradeoff between that and having a plan that's >> perfectly adapted to specific parameter values. > > I think it has been shown enough times that the performance drop caused > by a worse plan can be orders of magnitudes worse than what's gained by > producing the plan only once. It does not seem a bad idea to provide a > way to carry out only the parse phase, and postpone planning until the > parameters have been received. +1. ...Robert
On Wed, Dec 31, 2008 at 5:01 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > I think it has been shown enough times that the performance drop caused > by a worse plan can be orders of magnitudes worse than what's gained by > producing the plan only once. It does not seem a bad idea to provide a > way to carry out only the parse phase, and postpone planning until the > parameters have been received. It's already done in 8.3 for unnamed plans, isn't it? -- Guillaume
On Thu, 1 Jan 2009, Guillaume Smet wrote: > On Wed, Dec 31, 2008 at 5:01 PM, Alvaro Herrera > <alvherre@commandprompt.com> wrote: >> I think it has been shown enough times that the performance drop caused >> by a worse plan can be orders of magnitudes worse than what's gained by >> producing the plan only once. It does not seem a bad idea to provide a >> way to carry out only the parse phase, and postpone planning until the >> parameters have been received. > > It's already done in 8.3 for unnamed plans, isn't it? forgive my ignorance here, but if it's unnamed how can you reference it later to take advantage of the parsing? I may just be not understanding the terms being used here. David Lang
On Thu, Jan 1, 2009 at 9:24 PM, <david@lang.hm> wrote: > forgive my ignorance here, but if it's unnamed how can you reference it > later to take advantage of the parsing? You can't. That's what unnamed prepared statements are for. It's not obvious to me that the parsing phase is worth any "caching". From my experience, the planning phase takes far much time on complex queries. -- Guillaume
On Thu, 1 Jan 2009, Guillaume Smet wrote: > On Thu, Jan 1, 2009 at 9:24 PM, <david@lang.hm> wrote: >> forgive my ignorance here, but if it's unnamed how can you reference it >> later to take advantage of the parsing? > > You can't. That's what unnamed prepared statements are for. > > It's not obvious to me that the parsing phase is worth any "caching". > From my experience, the planning phase takes far much time on complex > queries. the poster who started this thread had a query where the parsing phase took significantly longer than the planning stage. David Lang
david@lang.hm writes: > the poster who started this thread had a query where the parsing phase > took significantly longer than the planning stage. That was an anecdote utterly unsupported by evidence. regards, tom lane
>david@lang.hm writes:
>> the poster who started this thread had a query where the parsing phase
>> took significantly longer than the planning stage.
> That was an anecdote utterly unsupported by evidence.
regards, tom lane
The issue of prepared statements having atrocious query plans has hit me again. I feel very strongly about this topic and the need for Postgres to have an option that allows for prepared statements to re-plan based on the inputs that works and is user friendly. Pardon my bluntness, but in the current situation the system is brain dead in many important use cases and data sets.
I believe my statement referenced above was about parsing time to the remaining time, not parsing compared to planning. But either way, its a minor detail, and its not important to justify the need for the enhancement here.
Yeah, its anecdotal from your perspective. Go ahead and ignore all that if you wish.
**** I am making several points in this message that are independent of such evidence, IMHO.
I have tried to rearrange this message so that the anecdotal narrative is at the end, after the first dashed line.
Unnamed prepared statements do solve much of the problem in theory, since the most common issue is typically poor execution plans or a lack of ability to cleanly deal with SQL injection and write less bug prone client code. Parsing being very expensive is more rare. But there IS a performance savings that is not insignificant for many workloads to be had by avoiding the utterly avoidable parsing.
HOWEVER:
What is overlooked WRT unnamed prepared statements, is that they are hard to use and changing client code or behavior is difficult, error prone, and sometimes impossible. Not all client APIs play nice with them at the moment (see Postgres’ JDBC). The behavior has some global tweaks, but these are useless in many situations where you need behavior that varies.
Every time the answer to a problem is to change the client behavior, I ask myself if the DB could have a better default or configuration parameter so that clients don't have to change. Some database instances have dozens of applications and hundreds of client types including ad-hoc usage. Some client code is legacy code that simply can't be changed.
Changing the clients is a lot harder than changing a db parameter, or configuring a new default for a particular db user. If the client must change, adding something like SET prepare_level = ‘parse_only’ is the least intrusive and easiest to test — but I stress again that in many real-world cases the client is not flexible.
A session-level parameter that controls prepared statement behavior defaults (never cache by default? parse cache only? parse cache and plan cache?) would be a blessing. A DBA could administer a fix to certain problems without having to force clients to change behavior or wait for new client API versions with fixes.
That reminds me, isn't there a global parameter that can force no prepared statements to be cached, does that make them all behave as if they are unnamed? Or are they simply re-created each time? I believe I tried this in the past and the prepared statements were unable to use the parameter values for partition table selection, suggesting the latter.
Typically, I run into the issue with queries in a back-end process that operates on large data sets or partitioned tables. Prepared statements essentially kill performance by several orders of magnitude (think, scan 1 versus scan 5000 partition tables). However, my recent issue is brutally simple.
I need to have an auto-complete web form, and thus need a query with a portion like
WHERE name LIKE ‘userenteredtext%’
Thus, I make an index with varchar_pattern_ops and off we go! ... Or not. Works fine with explicit queries, but not a prepared query. Unfortunately, this is highly prone to SQL injection, and the industry standard way to deal with this is by parameterization.
http://www.owasp.org/index.php/Guide_to_SQL_Injection
(that site is a weath of information, tools, and tests on the topic, for example: http://www.owasp.org/index.php/Testing_for_SQL_Injection).
Prepared statements are a blessing from a client code perspective, preventing all sorts of bugs and catching many others early, on the client side. Not being able to use them because it causes the database to execute very bad query plans is a flaw in the database, not the client.
------------------------------------------------------------------------------------
Unnamed prepared statements did not work for me when I tried them as a solution (8.3.2, supposedly after the fix). I was in a hurry to fix my issue, and just moved on when they were plainly not working. It is possible I did something wrong back then. They are also poorly supported by many client APIs -- when they did not work for me, I supposed it was a JDBC issue or perhaps user error, but maybe it was server side? The documentation on both sides is not entirely clear on what should really be happening. And I could not figure out how to debug where the problem was. Can you even test an unnamed prepared statement in psql?
I had no time to waste and changed a lot of the client code instead, and have since been less interested until the topic came up in this thread, and then after letting this message sit as a draft for a month, ran into the “varchar_pattern_ops + prepared statement = index, what index?” Issue.
--------
On queries that are overly parse-heavy:
It is very difficult to tease apart parse time from plan time in Postgres (genrally speaking, getting performance data from Postgres is much harder than the commercial DB's I've used). However, my experience with commercial DBs that have running counters of time spent in various operations is that parsing can be upwards of 50% of the total CPU usage with some workloads. Decision Support type stuff where many light-weight queries are executed, is where I have seen that with Oracle. In Oracle you can find out exactly how much of CPU time was spent parsing versus planning versus other stuff in aggregate over a time interval. I don't suspect that Postgres has a parser that is orders of magnitude faster. But its true I don't have direct evidence to share right now teasing apart plan from parse time.
So yes, unnamed prepared statements are potentially part (but not all) of a solution, provided there was good control over their use in client API’s (which there isn’t and won’t be, since they are non-standard), or there is a useful non-global way to configure them.
Planning is long on most queries that take long to parse, but this will vary. (We have several queries that access tables with either only primary key indexes or no indexes, lots of embedded 'stuff' that is naturally more parser than planner heavy like CASE statements and value constraints/checks/modifiers on columns without indexes and not in the where clause, and some are approximately 2k character behemoths).
-----
On Performance Improvements gained by avoiding prepared statements:
I gained about a factor of 5x to 50x in performance by changing a lot of code to avoid prepared statements. Some of the worst cases involved partitioned tables, where planning issues are particularly troublesome. (lets scan 1000 tables for no good reason! Hooray!)
I never posted here about my struggles with prepared statements and execution performance, I knew it wasn't going to change and I had to fix the problem in a few days time. One of the queries changed had about 7 sub-selects in it, but the eventual plan can be very fast for some cases, returning almost no data, and we run this query repeatedly with slightly different parameters. So with some parameters the execution time dominates by orders of magnitude, and for most parameter combinations the execution time is almost none of it. Of course, now we just write a new query for all the variants, else the performance is unacceptable. This is not too difficult of a change because these are not a SQL injection worry, although it has complicated the client code and test cases.
---
Being able to avoid these problems and let programmers use prepared statements would be a good thing, and so I think a solution more useable and flexible than the current unnamed prepared statements would be great. And if the avoidable and redundant re-parsing can be avoided too, its win-win.
Being able to cut parsing out of the loop in many cases to improve performance, should be able to stand up on its own as a legitimate improvement. If it is permanently bound to planning, it is permanently bound to significant caveats. Decoupling the two and providing a means of control over these WRT prepared statements and related features has much merit IMO.
>> the poster who started this thread had a query where the parsing phase
>> took significantly longer than the planning stage.
> That was an anecdote utterly unsupported by evidence.
regards, tom lane
The issue of prepared statements having atrocious query plans has hit me again. I feel very strongly about this topic and the need for Postgres to have an option that allows for prepared statements to re-plan based on the inputs that works and is user friendly. Pardon my bluntness, but in the current situation the system is brain dead in many important use cases and data sets.
I believe my statement referenced above was about parsing time to the remaining time, not parsing compared to planning. But either way, its a minor detail, and its not important to justify the need for the enhancement here.
Yeah, its anecdotal from your perspective. Go ahead and ignore all that if you wish.
**** I am making several points in this message that are independent of such evidence, IMHO.
I have tried to rearrange this message so that the anecdotal narrative is at the end, after the first dashed line.
Unnamed prepared statements do solve much of the problem in theory, since the most common issue is typically poor execution plans or a lack of ability to cleanly deal with SQL injection and write less bug prone client code. Parsing being very expensive is more rare. But there IS a performance savings that is not insignificant for many workloads to be had by avoiding the utterly avoidable parsing.
HOWEVER:
What is overlooked WRT unnamed prepared statements, is that they are hard to use and changing client code or behavior is difficult, error prone, and sometimes impossible. Not all client APIs play nice with them at the moment (see Postgres’ JDBC). The behavior has some global tweaks, but these are useless in many situations where you need behavior that varies.
Every time the answer to a problem is to change the client behavior, I ask myself if the DB could have a better default or configuration parameter so that clients don't have to change. Some database instances have dozens of applications and hundreds of client types including ad-hoc usage. Some client code is legacy code that simply can't be changed.
Changing the clients is a lot harder than changing a db parameter, or configuring a new default for a particular db user. If the client must change, adding something like SET prepare_level = ‘parse_only’ is the least intrusive and easiest to test — but I stress again that in many real-world cases the client is not flexible.
A session-level parameter that controls prepared statement behavior defaults (never cache by default? parse cache only? parse cache and plan cache?) would be a blessing. A DBA could administer a fix to certain problems without having to force clients to change behavior or wait for new client API versions with fixes.
That reminds me, isn't there a global parameter that can force no prepared statements to be cached, does that make them all behave as if they are unnamed? Or are they simply re-created each time? I believe I tried this in the past and the prepared statements were unable to use the parameter values for partition table selection, suggesting the latter.
Typically, I run into the issue with queries in a back-end process that operates on large data sets or partitioned tables. Prepared statements essentially kill performance by several orders of magnitude (think, scan 1 versus scan 5000 partition tables). However, my recent issue is brutally simple.
I need to have an auto-complete web form, and thus need a query with a portion like
WHERE name LIKE ‘userenteredtext%’
Thus, I make an index with varchar_pattern_ops and off we go! ... Or not. Works fine with explicit queries, but not a prepared query. Unfortunately, this is highly prone to SQL injection, and the industry standard way to deal with this is by parameterization.
http://www.owasp.org/index.php/Guide_to_SQL_Injection
(that site is a weath of information, tools, and tests on the topic, for example: http://www.owasp.org/index.php/Testing_for_SQL_Injection).
Prepared statements are a blessing from a client code perspective, preventing all sorts of bugs and catching many others early, on the client side. Not being able to use them because it causes the database to execute very bad query plans is a flaw in the database, not the client.
------------------------------------------------------------------------------------
Unnamed prepared statements did not work for me when I tried them as a solution (8.3.2, supposedly after the fix). I was in a hurry to fix my issue, and just moved on when they were plainly not working. It is possible I did something wrong back then. They are also poorly supported by many client APIs -- when they did not work for me, I supposed it was a JDBC issue or perhaps user error, but maybe it was server side? The documentation on both sides is not entirely clear on what should really be happening. And I could not figure out how to debug where the problem was. Can you even test an unnamed prepared statement in psql?
I had no time to waste and changed a lot of the client code instead, and have since been less interested until the topic came up in this thread, and then after letting this message sit as a draft for a month, ran into the “varchar_pattern_ops + prepared statement = index, what index?” Issue.
--------
On queries that are overly parse-heavy:
It is very difficult to tease apart parse time from plan time in Postgres (genrally speaking, getting performance data from Postgres is much harder than the commercial DB's I've used). However, my experience with commercial DBs that have running counters of time spent in various operations is that parsing can be upwards of 50% of the total CPU usage with some workloads. Decision Support type stuff where many light-weight queries are executed, is where I have seen that with Oracle. In Oracle you can find out exactly how much of CPU time was spent parsing versus planning versus other stuff in aggregate over a time interval. I don't suspect that Postgres has a parser that is orders of magnitude faster. But its true I don't have direct evidence to share right now teasing apart plan from parse time.
So yes, unnamed prepared statements are potentially part (but not all) of a solution, provided there was good control over their use in client API’s (which there isn’t and won’t be, since they are non-standard), or there is a useful non-global way to configure them.
Planning is long on most queries that take long to parse, but this will vary. (We have several queries that access tables with either only primary key indexes or no indexes, lots of embedded 'stuff' that is naturally more parser than planner heavy like CASE statements and value constraints/checks/modifiers on columns without indexes and not in the where clause, and some are approximately 2k character behemoths).
-----
On Performance Improvements gained by avoiding prepared statements:
I gained about a factor of 5x to 50x in performance by changing a lot of code to avoid prepared statements. Some of the worst cases involved partitioned tables, where planning issues are particularly troublesome. (lets scan 1000 tables for no good reason! Hooray!)
I never posted here about my struggles with prepared statements and execution performance, I knew it wasn't going to change and I had to fix the problem in a few days time. One of the queries changed had about 7 sub-selects in it, but the eventual plan can be very fast for some cases, returning almost no data, and we run this query repeatedly with slightly different parameters. So with some parameters the execution time dominates by orders of magnitude, and for most parameter combinations the execution time is almost none of it. Of course, now we just write a new query for all the variants, else the performance is unacceptable. This is not too difficult of a change because these are not a SQL injection worry, although it has complicated the client code and test cases.
---
Being able to avoid these problems and let programmers use prepared statements would be a good thing, and so I think a solution more useable and flexible than the current unnamed prepared statements would be great. And if the avoidable and redundant re-parsing can be avoided too, its win-win.
Being able to cut parsing out of the loop in many cases to improve performance, should be able to stand up on its own as a legitimate improvement. If it is permanently bound to planning, it is permanently bound to significant caveats. Decoupling the two and providing a means of control over these WRT prepared statements and related features has much merit IMO.