Thread: prepared query performs much worse than regular query

prepared query performs much worse than regular query

From
Richard Yen
Date:
Hi everyone,

I use DBD::Pg to interface with our 8.4.2 database, but for a particular query, performance is horrible.  I'm assuming
thatthe behavior of $dbh->prepare is as if I did PREPARE foo AS (query), so I did an explain analyze in the
commandline:
> db_alpha=# prepare foo6 as (SELECT me.id, me.assignment, me.title, me.x_firstname, me.x_lastname, me.owner, me.node,
me.grade,me.folder, me.word_count, me.char_length, me.char_count, me.page_count FROM submissions me WHERE ( ( owner =
$1AND me.assignment = $2 ) )); 
> PREPARE
> db_alpha=# explain analyze execute foo6('-1', '8996557');
>                                                                         QUERY PLAN
                                     
>
-----------------------------------------------------------------------------------------------------------------------------------------------------------
>  Bitmap Heap Scan on submissions me  (cost=38.84..42.85 rows=1 width=70) (actual time=346567.665..346567.665 rows=0
loops=1)
>    Recheck Cond: ((assignment = $2) AND (owner = $1))
>    ->  BitmapAnd  (cost=38.84..38.84 rows=1 width=0) (actual time=346567.642..346567.642 rows=0 loops=1)
>          ->  Bitmap Index Scan on submissions_assignment_idx  (cost=0.00..19.27 rows=177 width=0) (actual
time=0.038..0.038rows=2 loops=1) 
>                Index Cond: (assignment = $2)
>          ->  Bitmap Index Scan on submissions_owner_idx  (cost=0.00..19.32 rows=184 width=0) (actual
time=346566.501..346566.501rows=28977245 loops=1) 
>                Index Cond: (owner = $1)
>  Total runtime: 346567.757 ms
> (8 rows)


Now, if I run it without preparing it--just run it directly in the commandline--I get this plan:
> db_alpha=# explain analyze SELECT me.id, me.assignment, me.title, me.x_firstname, me.x_lastname, me.owner, me.node,
me.grade,me.folder, me.word_count, me.char_length, me.char_count, me.page_count FROM submissions me WHERE ( ( owner =
-1AND me.assignment = 8996557 ) ) 
> db_alpha-# ;
>                                                                      QUERY PLAN
                               
>
-----------------------------------------------------------------------------------------------------------------------------------------------------
>  Index Scan using submissions_assignment_idx on submissions me  (cost=0.00..549.15 rows=36 width=70) (actual
time=0.021..0.021rows=0 loops=1) 
>    Index Cond: (assignment = 8996557)
>    Filter: (owner = (-1))
>  Total runtime: 0.042 ms
> (4 rows)

submissions has ~124 million rows, and owner -1 is a placeholder in my database, to fulfill a foreign key requirement.
Itried REINDEXing submissions_owner_idx and performing a VACUUM ANALYZE on the submissions table, but nothing seems to
makea difference for this query.  One other thing to note is that if I use any other value for the owner column, it
comesback really fast (< 0.04 ms). 

Any ideas why the query planner chooses a different query plan when using prepared statements?

--Richard

Re: prepared query performs much worse than regular query

From
Rosser Schwarz
Date:
On Fri, May 21, 2010 at 4:53 PM, Richard Yen <dba@richyen.com> wrote:
> Any ideas why the query planner chooses a different query plan when using prepared statements?

A prepared plan is the best one the planner can come up with *in
general* for the query in question.  If the distribution of the values
you're querying against -- in your case, "owner" and "assignment" --
aren't relatively uniform, that plan is going to be suboptimal, if not
downright pathological, for the more outlying-ly distributed values.

Looking at your prepared plan, it seems that, on average, there are
177 rows for every "assignment", and 184 per "owner".  As it turns
out, though, nearly a quarter of your table has an "owner" of -1.
It's not terribly surprising, with a table that big and a distribution
skew of that magnitude, that this query plan, with these arguments,
ends up pretty firmly in the "pathological" category.

rls

--
:wq

Re: prepared query performs much worse than regular query

From
Matthew Wakeling
Date:
On Fri, 21 May 2010, Richard Yen wrote:
> Any ideas why the query planner chooses a different query plan when using prepared statements?

This is a FAQ. Preparing a statement makes Postgres create a plan, without
knowing the values that you will plug in, so it will not be as optimal as
if the values were available. The whole idea is to avoid the planning cost
each time the query is executed, but if your data is unusual it can
result in worse plans.

Matthew

--
 Existence is a convenient concept to designate all of the files that an
 executable program can potentially process.   -- Fortran77 standard

Re: prepared query performs much worse than regular query

From
Scott Carey
Date:
On May 21, 2010, at 8:26 PM, Matthew Wakeling wrote:

> On Fri, 21 May 2010, Richard Yen wrote:
>> Any ideas why the query planner chooses a different query plan when using prepared statements?
>
> This is a FAQ. Preparing a statement makes Postgres create a plan, without
> knowing the values that you will plug in, so it will not be as optimal as
> if the values were available. The whole idea is to avoid the planning cost
> each time the query is executed, but if your data is unusual it can
> result in worse plans.
>

Two things I disagree with.
1. The "whole idea" is not just to avoid planning cost.  It is also to easily avoid SQL injection, reduce query parse
time,and to make client code cleaner and more re-usable. 
2. The data does not need to be "unusual".  It just needs to have a skewed distribution.  Skewed is not unusual (well,
itwould be for a primary key :P ). 

Maybe the planner could note a prepared query parameter is on a high skew column and build a handful of plans to choose
from,or just partially re-plan on the skewed column with each execution.   
Or make it easier for a user to have a prepared statement that re-plans the query each time.  Even just a per
connectionparameter "SET prepared.query.cacheplan = FALSE" 

> Matthew
>
> --
> Existence is a convenient concept to designate all of the files that an
> executable program can potentially process.   -- Fortran77 standard
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


Re: prepared query performs much worse than regular query

From
Joshua Tolley
Date:
On Tue, May 25, 2010 at 11:27:08AM -0700, Scott Carey wrote:
> On May 21, 2010, at 8:26 PM, Matthew Wakeling wrote:
> > On Fri, 21 May 2010, Richard Yen wrote:
> >> Any ideas why the query planner chooses a different query plan when using prepared statements?
> >
> > This is a FAQ. Preparing a statement makes Postgres create a plan, without
> > knowing the values that you will plug in, so it will not be as optimal as
> > if the values were available. The whole idea is to avoid the planning cost
> > each time the query is executed, but if your data is unusual it can
> > result in worse plans.
> >
> Maybe the planner could note a prepared query parameter is on a high skew
> column and build a handful of plans to choose from, or just partially
> re-plan on the skewed column with each execution.  Or make it easier for a
> user to have a prepared statement that re-plans the query each time.  Even
> just a per connection parameter "SET prepared.query.cacheplan = FALSE"

There was talk in this year's developers' meeting of doing this replanning
you've suggested. ("Re(?)plan parameterized plans with actual parameter
values" on http://wiki.postgresql.org/wiki/PgCon_2010_Developer_Meeting,
specificall). This wouldn't show up until at least 9.1, but it's something
people are thinking about.

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com

Attachment