Thread: prepared query performs much worse than regular query
Hi everyone, I use DBD::Pg to interface with our 8.4.2 database, but for a particular query, performance is horrible. I'm assuming thatthe behavior of $dbh->prepare is as if I did PREPARE foo AS (query), so I did an explain analyze in the commandline: > db_alpha=# prepare foo6 as (SELECT me.id, me.assignment, me.title, me.x_firstname, me.x_lastname, me.owner, me.node, me.grade,me.folder, me.word_count, me.char_length, me.char_count, me.page_count FROM submissions me WHERE ( ( owner = $1AND me.assignment = $2 ) )); > PREPARE > db_alpha=# explain analyze execute foo6('-1', '8996557'); > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------------------------------------------- > Bitmap Heap Scan on submissions me (cost=38.84..42.85 rows=1 width=70) (actual time=346567.665..346567.665 rows=0 loops=1) > Recheck Cond: ((assignment = $2) AND (owner = $1)) > -> BitmapAnd (cost=38.84..38.84 rows=1 width=0) (actual time=346567.642..346567.642 rows=0 loops=1) > -> Bitmap Index Scan on submissions_assignment_idx (cost=0.00..19.27 rows=177 width=0) (actual time=0.038..0.038rows=2 loops=1) > Index Cond: (assignment = $2) > -> Bitmap Index Scan on submissions_owner_idx (cost=0.00..19.32 rows=184 width=0) (actual time=346566.501..346566.501rows=28977245 loops=1) > Index Cond: (owner = $1) > Total runtime: 346567.757 ms > (8 rows) Now, if I run it without preparing it--just run it directly in the commandline--I get this plan: > db_alpha=# explain analyze SELECT me.id, me.assignment, me.title, me.x_firstname, me.x_lastname, me.owner, me.node, me.grade,me.folder, me.word_count, me.char_length, me.char_count, me.page_count FROM submissions me WHERE ( ( owner = -1AND me.assignment = 8996557 ) ) > db_alpha-# ; > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------------------------------------- > Index Scan using submissions_assignment_idx on submissions me (cost=0.00..549.15 rows=36 width=70) (actual time=0.021..0.021rows=0 loops=1) > Index Cond: (assignment = 8996557) > Filter: (owner = (-1)) > Total runtime: 0.042 ms > (4 rows) submissions has ~124 million rows, and owner -1 is a placeholder in my database, to fulfill a foreign key requirement. Itried REINDEXing submissions_owner_idx and performing a VACUUM ANALYZE on the submissions table, but nothing seems to makea difference for this query. One other thing to note is that if I use any other value for the owner column, it comesback really fast (< 0.04 ms). Any ideas why the query planner chooses a different query plan when using prepared statements? --Richard
On Fri, May 21, 2010 at 4:53 PM, Richard Yen <dba@richyen.com> wrote: > Any ideas why the query planner chooses a different query plan when using prepared statements? A prepared plan is the best one the planner can come up with *in general* for the query in question. If the distribution of the values you're querying against -- in your case, "owner" and "assignment" -- aren't relatively uniform, that plan is going to be suboptimal, if not downright pathological, for the more outlying-ly distributed values. Looking at your prepared plan, it seems that, on average, there are 177 rows for every "assignment", and 184 per "owner". As it turns out, though, nearly a quarter of your table has an "owner" of -1. It's not terribly surprising, with a table that big and a distribution skew of that magnitude, that this query plan, with these arguments, ends up pretty firmly in the "pathological" category. rls -- :wq
On Fri, 21 May 2010, Richard Yen wrote: > Any ideas why the query planner chooses a different query plan when using prepared statements? This is a FAQ. Preparing a statement makes Postgres create a plan, without knowing the values that you will plug in, so it will not be as optimal as if the values were available. The whole idea is to avoid the planning cost each time the query is executed, but if your data is unusual it can result in worse plans. Matthew -- Existence is a convenient concept to designate all of the files that an executable program can potentially process. -- Fortran77 standard
On May 21, 2010, at 8:26 PM, Matthew Wakeling wrote: > On Fri, 21 May 2010, Richard Yen wrote: >> Any ideas why the query planner chooses a different query plan when using prepared statements? > > This is a FAQ. Preparing a statement makes Postgres create a plan, without > knowing the values that you will plug in, so it will not be as optimal as > if the values were available. The whole idea is to avoid the planning cost > each time the query is executed, but if your data is unusual it can > result in worse plans. > Two things I disagree with. 1. The "whole idea" is not just to avoid planning cost. It is also to easily avoid SQL injection, reduce query parse time,and to make client code cleaner and more re-usable. 2. The data does not need to be "unusual". It just needs to have a skewed distribution. Skewed is not unusual (well, itwould be for a primary key :P ). Maybe the planner could note a prepared query parameter is on a high skew column and build a handful of plans to choose from,or just partially re-plan on the skewed column with each execution. Or make it easier for a user to have a prepared statement that re-plans the query each time. Even just a per connectionparameter "SET prepared.query.cacheplan = FALSE" > Matthew > > -- > Existence is a convenient concept to designate all of the files that an > executable program can potentially process. -- Fortran77 standard > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance
On Tue, May 25, 2010 at 11:27:08AM -0700, Scott Carey wrote: > On May 21, 2010, at 8:26 PM, Matthew Wakeling wrote: > > On Fri, 21 May 2010, Richard Yen wrote: > >> Any ideas why the query planner chooses a different query plan when using prepared statements? > > > > This is a FAQ. Preparing a statement makes Postgres create a plan, without > > knowing the values that you will plug in, so it will not be as optimal as > > if the values were available. The whole idea is to avoid the planning cost > > each time the query is executed, but if your data is unusual it can > > result in worse plans. > > > Maybe the planner could note a prepared query parameter is on a high skew > column and build a handful of plans to choose from, or just partially > re-plan on the skewed column with each execution. Or make it easier for a > user to have a prepared statement that re-plans the query each time. Even > just a per connection parameter "SET prepared.query.cacheplan = FALSE" There was talk in this year's developers' meeting of doing this replanning you've suggested. ("Re(?)plan parameterized plans with actual parameter values" on http://wiki.postgresql.org/wiki/PgCon_2010_Developer_Meeting, specificall). This wouldn't show up until at least 9.1, but it's something people are thinking about. -- Joshua Tolley / eggyknap End Point Corporation http://www.endpoint.com