Inoptimal query plan for max() and multicolumn index - Mailing list pgsql-performance

From Vladimir Kulev
Subject Inoptimal query plan for max() and multicolumn index
Date
Msg-id BANLkTimsNMhnAHk6Wz7gARfdkTmz2QKJ7g@mail.gmail.com
Whole thread Raw
Responses Re: Inoptimal query plan for max() and multicolumn index
Re: Inoptimal query plan for max() and multicolumn index
Re: Inoptimal query plan for max() and multicolumn index
List pgsql-performance
Hi all!
Please, just look at these query explanations and try to explain why
planner does so (PostgreSQL 8.4).
There is an index on table sms (number, timestamp).

And three fast & simple queries:
=# explain analyze select max(timestamp) from sms where number='5502712';

-------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=3.79..3.80 rows=1 width=0) (actual time=0.269..0.270
rows=1 loops=1)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.00..3.79 rows=1 width=8) (actual
time=0.259..0.260 rows=1 loops=1)
           ->  Index Scan Backward using sms_number_timestamp on sms
(cost=0.00..5981.98 rows=1579 width=8) (actual time=0.253..0.253
rows=1 loops=1)
                 Index Cond: ((number)::text = '5502712'::text)
                 Filter: ("timestamp" IS NOT NULL)
 Total runtime: 0.342 ms

=# explain analyze select max(timestamp) from sms where number='5802693';

-------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=3.79..3.80 rows=1 width=0) (actual time=0.425..0.426
rows=1 loops=1)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.00..3.79 rows=1 width=8) (actual
time=0.413..0.414 rows=1 loops=1)
           ->  Index Scan Backward using sms_number_timestamp on sms
(cost=0.00..5981.98 rows=1579 width=8) (actual time=0.409..0.409
rows=1 loops=1)
                 Index Cond: ((number)::text = '5802693'::text)
                 Filter: ("timestamp" IS NOT NULL)
 Total runtime: 0.513 ms

=# explain analyze select max(timestamp) from sms where number='5802693';

-------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=3.79..3.80 rows=1 width=0) (actual time=0.425..0.426
rows=1 loops=1)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.00..3.79 rows=1 width=8) (actual
time=0.413..0.414 rows=1 loops=1)
           ->  Index Scan Backward using sms_number_timestamp on sms
(cost=0.00..5981.98 rows=1579 width=8) (actual time=0.409..0.409
rows=1 loops=1)
                 Index Cond: ((number)::text = '5802693'::text)
                 Filter: ("timestamp" IS NOT NULL)
 Total runtime: 0.513 ms



But this does not work:
# explain analyze select max(timestamp) from sms where number in
('5502712','5802693','5801981');

------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=15912.30..15912.31 rows=1 width=8) (actual
time=587.952..587.954 rows=1 loops=1)
   ->  Bitmap Heap Scan on sms  (cost=1413.02..15758.71 rows=61432
width=8) (actual time=34.266..491.853 rows=59078 loops=1)
         Recheck Cond: ((number)::text = ANY
('{5502712,5802693,5801981}'::text[]))
         ->  Bitmap Index Scan on sms_number_timestamp
(cost=0.00..1397.67 rows=61432 width=0) (actual time=30.778..30.778
rows=59078 loops=1)
               Index Cond: ((number)::text = ANY
('{5502712,5802693,5801981}'::text[]))
 Total runtime: 588.199 ms

And this too:
# explain analyze select max(timestamp) from sms where
number='5502712' or number='5802693' or number='5801981';

------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=16205.75..16205.76 rows=1 width=8) (actual
time=851.204..851.205 rows=1 loops=1)
   ->  Bitmap Heap Scan on sms  (cost=1473.31..16052.17 rows=61432
width=8) (actual time=68.233..745.004 rows=59090 loops=1)
         Recheck Cond: (((number)::text = '5502712'::text) OR
((number)::text = '5802693'::text) OR ((number)::text =
'5801981'::text))
         ->  BitmapOr  (cost=1473.31..1473.31 rows=61592 width=0)
(actual time=64.992..64.992 rows=0 loops=1)
               ->  Bitmap Index Scan on sms_number_timestamp
(cost=0.00..40.27 rows=1579 width=0) (actual time=0.588..0.588 rows=59
loops=1)
                     Index Cond: ((number)::text = '5502712'::text)
               ->  Bitmap Index Scan on sms_number_timestamp
(cost=0.00..40.27 rows=1579 width=0) (actual time=0.266..0.266 rows=59
loops=1)
                     Index Cond: ((number)::text = '5802693'::text)
               ->  Bitmap Index Scan on sms_number_timestamp
(cost=0.00..1346.69 rows=58434 width=0) (actual time=64.129..64.129
rows=58972 loops=1)
                     Index Cond: ((number)::text = '5801981'::text)
 Total runtime: 853.176 ms


According to planner cost estimations - it has enough data to
understand that it is better to aggregate maximum from three
subqueries. I suppose it's not a bug but not implemented feature -
maybe there is already something about it on roadmap?


--
Vladimir Kulev
Mobile: +7 (921) 555-44-22

Jabber: me@lightoze.net

Skype: lightoze

pgsql-performance by date:

Previous
From: Jose Ildefonso Camargo Tolosa
Date:
Subject: Re: Large rows number, and large objects
Next
From: Julius Tuskenis
Date:
Subject: Re: generating a large XML document