Re: min/max performance inequality. - Mailing list pgsql-general

From Pawel Veselov
Subject Re: min/max performance inequality.
Date
Msg-id CAMnJ+BdFqTGxBOc0OQ0MEL82q0yhkUmM2wE301xsDFLMgG31NA@mail.gmail.com
Whole thread Raw
In response to Re: min/max performance inequality.  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-general
Thanks Jeff (and Tom)

On Wed, Jan 7, 2015 at 3:34 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Wed, Jan 7, 2015 at 3:00 PM, Pawel Veselov <pawel.veselov@gmail.com> wrote:
Hi.

I was wondering how come there is such a drastic difference between finding max and min. Seems like "index scan backwards" is really bad... The table is freshly re-indexed just in case. I added a count(*) in there, forcing the seq scan, and it's even better than the backwards index scan...

db=> EXPLAIN ANALYZE select min(rowdate) from r_agrio where blockid = 4814;

It crawls the data in rowdate order (either forward or reverse) until it finds the first 4814.  Crawling forward it finds 4814 very early. Crawling backwards it has to pass through a bunch of non-4814 before it finds the first 4814.

This fact doesn't show up in your EXPLAIN ANALYZE, but if you used a more modern version of postgresql (9.2 or above) there would be another line for "Rows Removed by Filter:" which would tell the story of what is going on.

Yeah, there is 10x more rows on when going backwards
 

If you have a composite index on (blockid, rowdate), it would help make this much faster, as it can go directly to the desired row.

That does help a lot. So, when does postgres use a more-dimensional index, even if not all dimensions are engaged (as there is an index that involves those 2 fields, and more)? I definitely see it do that in some cases...

Even with that index, however, there is still a good difference in time (the interest is theoretical at this point, as I found a better way to extract that data anyway).

On a newer db.

db=> EXPLAIN analyze select min(rowdate) from r_agrio where blockid = 4814;
                                                                   QUERY PLAN                                                                    
-------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=521.54..521.55 rows=1 width=0) (actual time=39.770..39.770 rows=1 loops=1)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.55..521.54 rows=1 width=13) (actual time=39.765..39.766 rows=1 loops=1)
           ->  Index Scan using rowdate_r_agrio on r_agrio  (cost=0.55..303738.47 rows=583 width=13) (actual time=39.763..39.763 rows=1 loops=1)
                 Index Cond: ((rowdate)::text IS NOT NULL)
                 Filter: (blockid = 4814::numeric)
                 Rows Removed by Filter: 37246
 Total runtime: 39.798 ms
(8 rows)

db=> EXPLAIN analyze select max(rowdate) from r_agrio where blockid = 4814;
                                                                          QUERY PLAN                                                                          
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=521.54..521.55 rows=1 width=0) (actual time=1497.377..1497.378 rows=1 loops=1)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.55..521.54 rows=1 width=13) (actual time=1497.371..1497.372 rows=1 loops=1)
           ->  Index Scan Backward using rowdate_r_agrio on r_agrio  (cost=0.55..303738.47 rows=583 width=13) (actual time=1497.370..1497.370 rows=1 loops=1)
                 Index Cond: ((rowdate)::text IS NOT NULL)
                 Filter: (blockid = 4814::numeric)
                 Rows Removed by Filter: 317739
 Total runtime: 1497.407 ms
(8 rows)
db=> CREATE INDEX concurrently xxx on r_agrio(rowdate,blockid);
CREATE INDEX

db=> EXPLAIN analyze select min(rowdate) from r_agrio where blockid = 4814;
                                                               QUERY PLAN                                                                
-----------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=85.05..85.06 rows=1 width=0) (actual time=17.585..17.585 rows=1 loops=1)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.43..85.05 rows=1 width=13) (actual time=17.580..17.581 rows=1 loops=1)
           ->  Index Only Scan using xxx on r_agrio  (cost=0.43..37827.09 rows=447 width=13) (actual time=17.578..17.578 rows=1 loops=1)
                 Index Cond: ((rowdate IS NOT NULL) AND (blockid = 4814::numeric))
                 Heap Fetches: 0
 Total runtime: 17.616 ms
(7 rows)

db=> EXPLAIN analyze select max(rowdate) from r_agrio where blockid = 4814;
                                                                    QUERY PLAN                                                                    
--------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=85.04..85.05 rows=1 width=0) (actual time=89.141..89.142 rows=1 loops=1)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.43..85.04 rows=1 width=13) (actual time=89.135..89.136 rows=1 loops=1)
           ->  Index Only Scan Backward using xxx on r_agrio  (cost=0.43..37823.09 rows=447 width=13) (actual time=89.134..89.134 rows=1 loops=1)
                 Index Cond: ((rowdate IS NOT NULL) AND (blockid = 4814::numeric))
                 Heap Fetches: 1
 Total runtime: 89.173 ms
(7 rows)


 

Cheers,

 

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: min/max performance inequality.
Next
From: Michael Paquier
Date:
Subject: Re: How to exclude building/installing contrib modules on Windows