Thread: Inoptimal query plan for max() and multicolumn index

Inoptimal query plan for max() and multicolumn index

From
Vladimir Kulev
Date:
Hi all!
Please, just look at these query explanations and try to explain why
planner does so (PostgreSQL 8.4).
There is an index on table sms (number, timestamp).

And three fast & simple queries:
=# explain analyze select max(timestamp) from sms where number='5502712';

-------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=3.79..3.80 rows=1 width=0) (actual time=0.269..0.270
rows=1 loops=1)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.00..3.79 rows=1 width=8) (actual
time=0.259..0.260 rows=1 loops=1)
           ->  Index Scan Backward using sms_number_timestamp on sms
(cost=0.00..5981.98 rows=1579 width=8) (actual time=0.253..0.253
rows=1 loops=1)
                 Index Cond: ((number)::text = '5502712'::text)
                 Filter: ("timestamp" IS NOT NULL)
 Total runtime: 0.342 ms

=# explain analyze select max(timestamp) from sms where number='5802693';

-------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=3.79..3.80 rows=1 width=0) (actual time=0.425..0.426
rows=1 loops=1)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.00..3.79 rows=1 width=8) (actual
time=0.413..0.414 rows=1 loops=1)
           ->  Index Scan Backward using sms_number_timestamp on sms
(cost=0.00..5981.98 rows=1579 width=8) (actual time=0.409..0.409
rows=1 loops=1)
                 Index Cond: ((number)::text = '5802693'::text)
                 Filter: ("timestamp" IS NOT NULL)
 Total runtime: 0.513 ms

=# explain analyze select max(timestamp) from sms where number='5802693';

-------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=3.79..3.80 rows=1 width=0) (actual time=0.425..0.426
rows=1 loops=1)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.00..3.79 rows=1 width=8) (actual
time=0.413..0.414 rows=1 loops=1)
           ->  Index Scan Backward using sms_number_timestamp on sms
(cost=0.00..5981.98 rows=1579 width=8) (actual time=0.409..0.409
rows=1 loops=1)
                 Index Cond: ((number)::text = '5802693'::text)
                 Filter: ("timestamp" IS NOT NULL)
 Total runtime: 0.513 ms



But this does not work:
# explain analyze select max(timestamp) from sms where number in
('5502712','5802693','5801981');

------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=15912.30..15912.31 rows=1 width=8) (actual
time=587.952..587.954 rows=1 loops=1)
   ->  Bitmap Heap Scan on sms  (cost=1413.02..15758.71 rows=61432
width=8) (actual time=34.266..491.853 rows=59078 loops=1)
         Recheck Cond: ((number)::text = ANY
('{5502712,5802693,5801981}'::text[]))
         ->  Bitmap Index Scan on sms_number_timestamp
(cost=0.00..1397.67 rows=61432 width=0) (actual time=30.778..30.778
rows=59078 loops=1)
               Index Cond: ((number)::text = ANY
('{5502712,5802693,5801981}'::text[]))
 Total runtime: 588.199 ms

And this too:
# explain analyze select max(timestamp) from sms where
number='5502712' or number='5802693' or number='5801981';

------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=16205.75..16205.76 rows=1 width=8) (actual
time=851.204..851.205 rows=1 loops=1)
   ->  Bitmap Heap Scan on sms  (cost=1473.31..16052.17 rows=61432
width=8) (actual time=68.233..745.004 rows=59090 loops=1)
         Recheck Cond: (((number)::text = '5502712'::text) OR
((number)::text = '5802693'::text) OR ((number)::text =
'5801981'::text))
         ->  BitmapOr  (cost=1473.31..1473.31 rows=61592 width=0)
(actual time=64.992..64.992 rows=0 loops=1)
               ->  Bitmap Index Scan on sms_number_timestamp
(cost=0.00..40.27 rows=1579 width=0) (actual time=0.588..0.588 rows=59
loops=1)
                     Index Cond: ((number)::text = '5502712'::text)
               ->  Bitmap Index Scan on sms_number_timestamp
(cost=0.00..40.27 rows=1579 width=0) (actual time=0.266..0.266 rows=59
loops=1)
                     Index Cond: ((number)::text = '5802693'::text)
               ->  Bitmap Index Scan on sms_number_timestamp
(cost=0.00..1346.69 rows=58434 width=0) (actual time=64.129..64.129
rows=58972 loops=1)
                     Index Cond: ((number)::text = '5801981'::text)
 Total runtime: 853.176 ms


According to planner cost estimations - it has enough data to
understand that it is better to aggregate maximum from three
subqueries. I suppose it's not a bug but not implemented feature -
maybe there is already something about it on roadmap?


--
Vladimir Kulev
Mobile: +7 (921) 555-44-22

Jabber: me@lightoze.net

Skype: lightoze

Re: Inoptimal query plan for max() and multicolumn index

From
"Kevin Grittner"
Date:
Vladimir Kulev <me@lightoze.net> wrote:

> # explain analyze select max(timestamp) from sms where number in
> ('5502712','5802693','5801981');

> According to planner cost estimations - it has enough data to
> understand that it is better to aggregate maximum from three
> subqueries. I suppose it's not a bug but not implemented feature

Yeah, you're hoping for an optimization which hasn't been
implemented.

I expect you're hoping for a plan similar to what this gives you?:

explain analyze select greatest(
  (select max(timestamp) from sms where number = '5502712'),
  (select max(timestamp) from sms where number = '5802693'),
  (select max(timestamp) from sms where number = '5801981'));

-Kevin

Re: Inoptimal query plan for max() and multicolumn index

From
Vladimir Kulev
Date:
Yes, exactly :)

On Mon, Jun 20, 2011 at 7:41 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> I expect you're hoping for a plan similar to what this gives you?:
>
> explain analyze select greatest(
>  (select max(timestamp) from sms where number = '5502712'),
>  (select max(timestamp) from sms where number = '5802693'),
>  (select max(timestamp) from sms where number = '5801981'));

--
Vladimir Kulev
Mobile: +7 (921) 555-44-22

Jabber: me@lightoze.net

Skype: lightoze

Re: Inoptimal query plan for max() and multicolumn index

From
"F. BROUARD / SQLpro"
Date:
Le 20/06/2011 18:08, Vladimir Kulev a écrit :
>
> Yes, exactly :)

SQL Server does it but PG does not. Expect this for the future....

So try to rewrite the query like this :

select max(timestamp) from sms where number = '5502712'
UNIUON ALL,
select max(timestamp) from sms where number = '5802693'
UNION ALL
select max(timestamp) from sms where number = '5801981'

To see what happen to the query plan !

A +

>
> On Mon, Jun 20, 2011 at 7:41 PM, Kevin Grittner
> <Kevin.Grittner@wicourts.gov>  wrote:
>> I expect you're hoping for a plan similar to what this gives you?:
>>
>> explain analyze select greatest(
>>   (select max(timestamp) from sms where number = '5502712'),
>>   (select max(timestamp) from sms where number = '5802693'),
>>   (select max(timestamp) from sms where number = '5801981'));
>


--
Frédéric BROUARD - expert SGBDR et SQL - MVP SQL Server - 06 11 86 40 66
Le site sur le langage SQL et les SGBDR  :  http://sqlpro.developpez.com
Enseignant Arts & Métiers PACA, ISEN Toulon et CESI/EXIA Aix en Provence
Audit, conseil, expertise, formation, modélisation, tuning, optimisation
*********************** http://www.sqlspot.com *************************


Re: Inoptimal query plan for max() and multicolumn index

From
Gaetano Mendola
Date:
On 20/06/2011 07:35, Vladimir Kulev wrote:

> But this does not work:
> # explain analyze select max(timestamp) from sms where number in
> ('5502712','5802693','5801981');

Try to rewrite that query this way:

explain analyze select timestamp from sms where number in
('5502712','5802693','5801981') order by timestamp desc limit 1;


Regards
Gaetano Mendola



Re: Inoptimal query plan for max() and multicolumn index

From
Gaetano Mendola
Date:
On 20/06/2011 07:35, Vladimir Kulev wrote:

> But this does not work:
> # explain analyze select max(timestamp) from sms where number in
> ('5502712','5802693','5801981');

Try to rewrite that query this way:

explain analyze select timestamp from sms where number in
('5502712','5802693','5801981') order by timestamp desc limit 1;


Regards
Gaetano Mendola