Cost of sort/order by not estimated by the query planner - Mailing list pgsql-hackers

From Laurent Laborde
Subject Cost of sort/order by not estimated by the query planner
Date
Msg-id 8a1bfe660911300854h63ffb39anfeb667da662cb37c@mail.gmail.com
Whole thread Raw
Responses Re: Cost of sort/order by not estimated by the query planner  (Laurent Laborde <kerdezixe@gmail.com>)
Re: Cost of sort/order by not estimated by the query planner  (Hitoshi Harada <umi.tanuki@gmail.com>)
List pgsql-hackers
Friendly greetings !
I use postgresql 8.3.6.

here is a few info about the table i'm querying :
-------------------------------------------------------------
- select count(*) from _article : 17301610
- select count(*) from _article WHERE (_article.bitfield && getbit(0)) : 6729


Here are both request with problems :
--------------------------------------------------

QUERY 1 : Very fast !
-------------

explain SELECT *
FROM   _article
WHERE (_article.bitfield && getbit(0))
ORDER BY _article.id ASC
LIMIT 500;                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------Limit
(cost=66114.13..66115.38rows=500 width=1114)  ->  Sort  (cost=66114.13..66157.37 rows=17296 width=1114)        Sort
Key:id        ->  Bitmap Heap Scan on _article  (cost=138.32..65252.29
 
rows=17296 width=1114)              Recheck Cond: (bitfield && B'1'::bit varying)              ->  Bitmap Index Scan on
idx_article_bitfield
(cost=0.00..134.00 rows=17296 width=0)                    Index Cond: (bitfield && B'1'::bit varying)




QUERY 2 : Endless ... (more than 30mn... i stopped the query)
-------------

explain SELECT *
FROM   _article
WHERE (_article.bitfield && getbit(0))
ORDER BY _article.id ASC
LIMIT 5;                                          QUERY PLAN
-------------------------------------------------------------------------------------------------Limit
(cost=0.00..2042.87rows=5 width=1114)  ->  Index Scan using _article_pkey on _article
 
(cost=0.00..7066684.46 rows=17296 width=1114)        Filter: (bitfield && B'1'::bit varying)
(3 rows)


With LIMIT 5 and LIMIT 500, the query plan are differents.
Postgresql estimate that it can do a a simple index scan to find only 5 row.
With more than LIMIT ~400 it estimate that it's faster to do a more
complex plan.
and it make sense !

The problem is in the order by, of course.
If i remove the "order by" the LIMIT 5 is faster (0.044 ms) and do an
index scan.
At limit 500 (without order) it still use an index scan and it is
slightly slower.
At limit 5000 (without order) it switch to a Bitmap Index Scan +
Bitmap Heap Scan and it's slower but acceptable (5.275 ms)

Why, with the "QUERY 2", postgresql doesn't estimate the cost of the
Sort/ORDER BY ?
Of course, by ignoring the order, both query plan are right and the
choice for thoses differents plans totally make sense.

But... if the planner would be kind enough to considerate the cost of
the order by, it would certainly choose the Bitmap Index + Bitmap Heap
scan for the limit 5.
And not an index_scan pkey !

I have set the statistics to 1000 for _article.bitfield, just in case
(and ran a vacuum analyze), it doesn't change anything.

Is that a bug ? any Idea ?

Thank you :)

-- 
Laurent "ker2x" Laborde
Sysadmin & DBA at http://www.over-blog.com/


pgsql-hackers by date:

Previous
From: Jaime Casanova
Date:
Subject: Re: set the cost of an aggregate function
Next
From: Bruce Momjian
Date:
Subject: Re: enable-thread-safety defaults?