Re: Bad Query Plan with Range Query - Mailing list pgsql-performance

From Mark Williams
Subject Re: Bad Query Plan with Range Query
Date
Msg-id 4DA8893B.8050403@jivesoftware.com
Whole thread Raw
In response to Re: Bad Query Plan with Range Query  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Responses Re: Bad Query Plan with Range Query
List pgsql-performance
Thanks for the response guys. There is something else which confuses me.
If I re-write the query like this:

explain SELECT messageID FROM Message WHERE modificationDate >=
1302627793988 ORDER BY modificationDate LIMIT 1;
                                            QUERY PLAN
-------------------------------------------------------------------------------------------------
  Limit  (cost=0.00..2.97 rows=1 width=16)
    ->  Index Scan using jvmssg_mdate_idx on message
(cost=0.00..3705.59 rows=1249 width=16)
          Index Cond: (modificationdate >= 1302627793988::bigint)
(3 rows)

I also get a better plan. However, this is not always the case. On some
other instances we still get a sequential scan on the primary key.




On 04/15/2011 10:54 AM, Kevin Grittner wrote:
> Mark Williams<mark.williams@jivesoftware.com>  wrote:
>
>> explain SELECT min(messageID) FROM Message
>>    WHERE modificationDate>= 1302627793988;
>
>> For some reason it is deciding to scan the primary key column of
>> the table. This results in scanning the entire table
>
> No, it scans until it finds the first row where modificationDate>=
> 1302627793988, at which point the scan is done because it's doing an
> ascending scan on what you want the min() of.  You might have a clue
> that the first such row will be ten million rows into the scan, but
> the optimizer doesn't know that.  It's assuming that rows which meet
> that condition are scattered randomly through the primary key range.
> It thinks that it will, on average, need to scan 1249 rows to find a
> match.
>
> The patch Ken referenced causes the alternative to be assigned a
> more accurate (and lower) cost, which tips the scales in favor of
> that plan -- at least for the case you've tried; but this seems to
> me to be another case related to the correlation of values.  It's a
> new and different form of it, but it seems at least somewhat
> related.  It might be a good example for those working on
> multi-column statistics to keep in mind.
>
> -Kevin


pgsql-performance by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: Bad Query Plan with Range Query
Next
From: "Kevin Grittner"
Date:
Subject: Re: Bad Query Plan with Range Query