Re: Cost of sort/order by not estimated by the query planner - Mailing list pgsql-hackers

From Laurent Laborde
Subject Re: Cost of sort/order by not estimated by the query planner
Date
Msg-id 8a1bfe660912020313w2438eb14ie8237ee5461c3d9@mail.gmail.com
Whole thread Raw
In response to Cost of sort/order by not estimated by the query planner  (Laurent Laborde <kerdezixe@gmail.com>)
List pgsql-hackers
hummm.... Adding pgsql-perf :)

On Mon, Nov 30, 2009 at 5:54 PM, Laurent Laborde <kerdezixe@gmail.com> wrote:
> Friendly greetings !
> I use postgresql 8.3.6.
>
> here is a few info about the table i'm querying :
> -------------------------------------------------------------
> - select count(*) from _article : 17301610
> - select count(*) from _article WHERE (_article.bitfield && getbit(0)) : 6729
>
>
> Here are both request with problems :
> --------------------------------------------------
>
> QUERY 1 : Very fast !
> -------------
>
> explain SELECT *
> FROM   _article
> WHERE (_article.bitfield && getbit(0))
> ORDER BY _article.id ASC
> LIMIT 500;
>                                             QUERY PLAN
> -----------------------------------------------------------------------------------------------------
>  Limit  (cost=66114.13..66115.38 rows=500 width=1114)
>   ->  Sort  (cost=66114.13..66157.37 rows=17296 width=1114)
>         Sort Key: id
>         ->  Bitmap Heap Scan on _article  (cost=138.32..65252.29
> rows=17296 width=1114)
>               Recheck Cond: (bitfield && B'1'::bit varying)
>               ->  Bitmap Index Scan on idx_article_bitfield
> (cost=0.00..134.00 rows=17296 width=0)
>                     Index Cond: (bitfield && B'1'::bit varying)
>
>
>
>
> QUERY 2 : Endless ... (more than 30mn... i stopped the query)
> -------------
>
> explain SELECT *
> FROM   _article
> WHERE (_article.bitfield && getbit(0))
> ORDER BY _article.id ASC
> LIMIT 5;
>                                           QUERY PLAN
> -------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..2042.87 rows=5 width=1114)
>   ->  Index Scan using _article_pkey on _article
> (cost=0.00..7066684.46 rows=17296 width=1114)
>         Filter: (bitfield && B'1'::bit varying)
> (3 rows)
>
>
> With LIMIT 5 and LIMIT 500, the query plan are differents.
> Postgresql estimate that it can do a a simple index scan to find only 5 row.
> With more than LIMIT ~400 it estimate that it's faster to do a more
> complex plan.
> and it make sense !
>
> The problem is in the order by, of course.
> If i remove the "order by" the LIMIT 5 is faster (0.044 ms) and do an
> index scan.
> At limit 500 (without order) it still use an index scan and it is
> slightly slower.
> At limit 5000 (without order) it switch to a Bitmap Index Scan +
> Bitmap Heap Scan and it's slower but acceptable (5.275 ms)
>
> Why, with the "QUERY 2", postgresql doesn't estimate the cost of the
> Sort/ORDER BY ?
> Of course, by ignoring the order, both query plan are right and the
> choice for thoses differents plans totally make sense.
>
> But... if the planner would be kind enough to considerate the cost of
> the order by, it would certainly choose the Bitmap Index + Bitmap Heap
> scan for the limit 5.
> And not an index_scan pkey !
>
> I have set the statistics to 1000 for _article.bitfield, just in case
> (and ran a vacuum analyze), it doesn't change anything.
>
> Is that a bug ? any Idea ?
>
> Thank you :)
>
> --
> Laurent "ker2x" Laborde
> Sysadmin & DBA at http://www.over-blog.com/
>



--
Laurent "ker2x" Laborde
Sysadmin & DBA at http://www.over-blog.com/

pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Hot Standby remaining issues
Next
From: Heikki Linnakangas
Date:
Subject: Re: Hot Standby remaining issues