Thread: Cost of sort/order by not estimated by the query planner

Cost of sort/order by not estimated by the query planner

From
Laurent Laborde
Date:
Friendly greetings !
I use postgresql 8.3.6.

here is a few info about the table i'm querying :
-------------------------------------------------------------
- select count(*) from _article : 17301610
- select count(*) from _article WHERE (_article.bitfield && getbit(0)) : 6729


Here are both request with problems :
--------------------------------------------------

QUERY 1 : Very fast !
-------------

explain SELECT *
FROM   _article
WHERE (_article.bitfield && getbit(0))
ORDER BY _article.id ASC
LIMIT 500;                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------Limit
(cost=66114.13..66115.38rows=500 width=1114)  ->  Sort  (cost=66114.13..66157.37 rows=17296 width=1114)        Sort
Key:id        ->  Bitmap Heap Scan on _article  (cost=138.32..65252.29
 
rows=17296 width=1114)              Recheck Cond: (bitfield && B'1'::bit varying)              ->  Bitmap Index Scan on
idx_article_bitfield
(cost=0.00..134.00 rows=17296 width=0)                    Index Cond: (bitfield && B'1'::bit varying)




QUERY 2 : Endless ... (more than 30mn... i stopped the query)
-------------

explain SELECT *
FROM   _article
WHERE (_article.bitfield && getbit(0))
ORDER BY _article.id ASC
LIMIT 5;                                          QUERY PLAN
-------------------------------------------------------------------------------------------------Limit
(cost=0.00..2042.87rows=5 width=1114)  ->  Index Scan using _article_pkey on _article
 
(cost=0.00..7066684.46 rows=17296 width=1114)        Filter: (bitfield && B'1'::bit varying)
(3 rows)


With LIMIT 5 and LIMIT 500, the query plan are differents.
Postgresql estimate that it can do a a simple index scan to find only 5 row.
With more than LIMIT ~400 it estimate that it's faster to do a more
complex plan.
and it make sense !

The problem is in the order by, of course.
If i remove the "order by" the LIMIT 5 is faster (0.044 ms) and do an
index scan.
At limit 500 (without order) it still use an index scan and it is
slightly slower.
At limit 5000 (without order) it switch to a Bitmap Index Scan +
Bitmap Heap Scan and it's slower but acceptable (5.275 ms)

Why, with the "QUERY 2", postgresql doesn't estimate the cost of the
Sort/ORDER BY ?
Of course, by ignoring the order, both query plan are right and the
choice for thoses differents plans totally make sense.

But... if the planner would be kind enough to considerate the cost of
the order by, it would certainly choose the Bitmap Index + Bitmap Heap
scan for the limit 5.
And not an index_scan pkey !

I have set the statistics to 1000 for _article.bitfield, just in case
(and ran a vacuum analyze), it doesn't change anything.

Is that a bug ? any Idea ?

Thank you :)

-- 
Laurent "ker2x" Laborde
Sysadmin & DBA at http://www.over-blog.com/


Re: Cost of sort/order by not estimated by the query planner

From
Laurent Laborde
Date:
hummm.... Adding pgsql-perf :)

On Mon, Nov 30, 2009 at 5:54 PM, Laurent Laborde <kerdezixe@gmail.com> wrote:
> Friendly greetings !
> I use postgresql 8.3.6.
>
> here is a few info about the table i'm querying :
> -------------------------------------------------------------
> - select count(*) from _article : 17301610
> - select count(*) from _article WHERE (_article.bitfield && getbit(0)) : 6729
>
>
> Here are both request with problems :
> --------------------------------------------------
>
> QUERY 1 : Very fast !
> -------------
>
> explain SELECT *
> FROM   _article
> WHERE (_article.bitfield && getbit(0))
> ORDER BY _article.id ASC
> LIMIT 500;
>                                             QUERY PLAN
> -----------------------------------------------------------------------------------------------------
>  Limit  (cost=66114.13..66115.38 rows=500 width=1114)
>   ->  Sort  (cost=66114.13..66157.37 rows=17296 width=1114)
>         Sort Key: id
>         ->  Bitmap Heap Scan on _article  (cost=138.32..65252.29
> rows=17296 width=1114)
>               Recheck Cond: (bitfield && B'1'::bit varying)
>               ->  Bitmap Index Scan on idx_article_bitfield
> (cost=0.00..134.00 rows=17296 width=0)
>                     Index Cond: (bitfield && B'1'::bit varying)
>
>
>
>
> QUERY 2 : Endless ... (more than 30mn... i stopped the query)
> -------------
>
> explain SELECT *
> FROM   _article
> WHERE (_article.bitfield && getbit(0))
> ORDER BY _article.id ASC
> LIMIT 5;
>                                           QUERY PLAN
> -------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..2042.87 rows=5 width=1114)
>   ->  Index Scan using _article_pkey on _article
> (cost=0.00..7066684.46 rows=17296 width=1114)
>         Filter: (bitfield && B'1'::bit varying)
> (3 rows)
>
>
> With LIMIT 5 and LIMIT 500, the query plan are differents.
> Postgresql estimate that it can do a a simple index scan to find only 5 row.
> With more than LIMIT ~400 it estimate that it's faster to do a more
> complex plan.
> and it make sense !
>
> The problem is in the order by, of course.
> If i remove the "order by" the LIMIT 5 is faster (0.044 ms) and do an
> index scan.
> At limit 500 (without order) it still use an index scan and it is
> slightly slower.
> At limit 5000 (without order) it switch to a Bitmap Index Scan +
> Bitmap Heap Scan and it's slower but acceptable (5.275 ms)
>
> Why, with the "QUERY 2", postgresql doesn't estimate the cost of the
> Sort/ORDER BY ?
> Of course, by ignoring the order, both query plan are right and the
> choice for thoses differents plans totally make sense.
>
> But... if the planner would be kind enough to considerate the cost of
> the order by, it would certainly choose the Bitmap Index + Bitmap Heap
> scan for the limit 5.
> And not an index_scan pkey !
>
> I have set the statistics to 1000 for _article.bitfield, just in case
> (and ran a vacuum analyze), it doesn't change anything.
>
> Is that a bug ? any Idea ?
>
> Thank you :)
>
> --
> Laurent "ker2x" Laborde
> Sysadmin & DBA at http://www.over-blog.com/
>



--
Laurent "ker2x" Laborde
Sysadmin & DBA at http://www.over-blog.com/

Re: Cost of sort/order by not estimated by the query planner

From
Hitoshi Harada
Date:
2009/12/1 Laurent Laborde <kerdezixe@gmail.com>:
> The problem is in the order by, of course.
> If i remove the "order by" the LIMIT 5 is faster (0.044 ms) and do an
> index scan.
> At limit 500 (without order) it still use an index scan and it is
> slightly slower.
> At limit 5000 (without order) it switch to a Bitmap Index Scan +
> Bitmap Heap Scan and it's slower but acceptable (5.275 ms)
>
> Why, with the "QUERY 2", postgresql doesn't estimate the cost of the
> Sort/ORDER BY ?
> Of course, by ignoring the order, both query plan are right and the
> choice for thoses differents plans totally make sense.

It's because the result of IndexScan is already sorted by demanded
key, whereas the one of BitmapIndexScan isn't. But I'm not sure why
the query lasts more than 30 minutes...


Regards,

-- 
Hitoshi Harada


Re: Cost of sort/order by not estimated by the query planner

From
Laurent Laborde
Date:
'morning !

And here is the query plan for :
---------------------------------------
explain analyze SELECT *
FROM   _article
WHERE (_article.bitfield && getbit(0))
ORDER BY _article.id ASC
LIMIT 5;
Limit  (cost=0.00..2238.33 rows=5 width=1099) (actual
time=17548636.326..17548837.082 rows=5 loops=1)  ->  Index Scan using _article_pkey on _article
(cost=0.00..7762964.53 rows=17341 width=1099) (actual
time=17548636.324..17548837.075 rows=5 loops=1)        Filter: (bitfield && B'1'::bit varying)Total runtime:
17548837.154ms
 


Versus the "limit 500" query plan :
-------------------------------------------
explain analyze SELECT *
FROM   _article
WHERE (_article.bitfield && getbit(0))
ORDER BY _article.id ASC
LIMIT 500;
Limit  (cost=66229.90..66231.15 rows=500 width=1099) (actual
time=1491.905..1492.146 rows=500 loops=1)  ->  Sort  (cost=66229.90..66273.25 rows=17341 width=1099) (actual
time=1491.904..1492.008 rows=500 loops=1)        Sort Key: id        Sort Method:  top-N heapsort  Memory: 603kB
-> Bitmap Heap Scan on _article  (cost=138.67..65365.82
 
rows=17341 width=1099) (actual time=777.489..1487.120 rows=6729
loops=1)              Recheck Cond: (bitfield && B'1'::bit varying)              ->  Bitmap Index Scan on
idx_article_bitfield
(cost=0.00..134.33 rows=17341 width=0) (actual time=769.799..769.799
rows=6729 loops=1)                    Index Cond: (bitfield && B'1'::bit varying)Total runtime: 1630.690 ms


I will read (and try to understand) all you said yesterday and reply
as soon as i can :)
Thank you !

-- 
Laurent "ker2x" Laborde
Sysadmin & DBA at http://www.over-blog.com/


Re: Cost of sort/order by not estimated by the query planner

From
Laurent Laborde
Date:
The table is clustered by by blog_id.
So, for testing purpose, i tried an ORDER BY blog_id.

limit 500 :
-------------
explain analyze SELECT *
FROM   _article
WHERE (_article.bitfield && getbit(0))
ORDER BY _article.blog_id ASC
LIMIT 500;
Limit  (cost=66229.90..66231.15 rows=500 width=1099) (actual
time=9.368..9.580 rows=500 loops=1)  ->  Sort  (cost=66229.90..66273.25 rows=17341 width=1099) (actual
time=9.367..9.443 rows=500 loops=1)        Sort Key: blog_id        Sort Method:  top-N heapsort  Memory: 660kB
-> Bitmap Heap Scan on _article  (cost=138.67..65365.82
 
rows=17341 width=1099) (actual time=0.905..4.042 rows=6729 loops=1)              Recheck Cond: (bitfield && B'1'::bit
varying)             ->  Bitmap Index Scan on idx_article_bitfield
 
(cost=0.00..134.33 rows=17341 width=0) (actual time=0.772..0.772
rows=6729 loops=1)                    Index Cond: (bitfield && B'1'::bit varying)Total runtime: 9.824 ms

Limit 5 :
----------
explain analyze SELECT *
FROM   _article
WHERE (_article.bitfield && getbit(0))
ORDER BY _article.blog_id ASC
LIMIT 5;
Limit  (cost=0.00..1419.22 rows=5 width=1099) (actual
time=125076.420..280419.143 rows=5 loops=1)  ->  Index Scan using idx_article_blog_id on _article
(cost=0.00..4922126.37 rows=17341 width=1099) (actual
time=125076.419..280419.137 rows=5 loops=1)        Filter: (bitfield && B'1'::bit varying)Total runtime: 280419.241 ms


-- 
Laurent "ker2x" Laborde
Sysadmin & DBA at http://www.over-blog.com/