BUG #5543: Poor performance - Index scan backwards not used for order by desc with partitioned tables - Mailing list pgsql-bugs

From Ranga Gopalan
Subject BUG #5543: Poor performance - Index scan backwards not used for order by desc with partitioned tables
Date
Msg-id 201007061820.o66IK6Xd099664@wwwmaster.postgresql.org
Whole thread Raw
Responses Re: BUG #5543: Poor performance - Index scan backwards not used for order by desc with partitioned tables  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-bugs
The following bug has been logged online:

Bug reference:      5543
Logged by:          Ranga Gopalan
Email address:      ranga_gopalan@hotmail.com
PostgreSQL version: 8.4.4
Operating system:   Linux x86-64
Description:        Poor performance - Index scan backwards not used for
order by desc with partitioned tables
Details:

My problem is regarding ORDER BY / LIMIT query behavior when using
partitioning.

I have a large table (about 100 columns, several million rows) partitioned
by a column called day (which is the date stored as yyyymmdd - say 20100502
for May 2nd 2010 etc.). Say the main table  is called FACT_TABLE and each
child table is called FACT_TABLE_yyyymmdd (e.g. FACT_TABLE_20100502,
FACT_TABLE_20100503 etc.) and has an appropriate CHECK constraint created on
it to CHECK (day = yyyymmdd).

The query pattern I am looking at is (I have tried to simplify the column
names for readability):

SELECT F1 from FACT_TABLE
where day >= 20100502 and day <= 20100507  # selecting for a week
ORDER BY F2 desc
LIMIT 100


This is what is happening:

When I query from the specific day's (child) table, I get what I expect - a
descending Index scan and good performance.

# explain  select F1 from FACT_TABLE_20100502 where day = 20100502 order by
F2 desc limit 100;
                                                                    QUERY
PLAN

----------------------------------------------------------------------------
--------------------------------------------------------------------
--
 Limit  (cost=0.00..4.81 rows=100 width=41)
   ->  Index Scan Backward using F2_20100502 on FACT_TABLE_20100502
(cost=0.00..90355.89 rows=1876985 width=41
)
         Filter: (day = 20100502)



BUT:

When I do the same query against the parent table it is much slower - two
things seem to happen - one is that the descending scan of the index is not
done and secondly there seems to be a separate sort/limit at the end - i.e.
all data from all partitions is retrieved and then sorted and limited - This
seems to be much less efficient than doing a descending scan on each
partition and limiting the results and then combining and reapplying the
limit at the end.

explain  select F1 from FACT_TABLE where day = 20100502 order by F2 desc
limit 100;
                                                                    QUERY
PLAN

----------------------------------------------------------------------------
--------------------------------------------------------------------
---
 Limit  (cost=20000084948.01..20000084948.01 rows=100 width=41)
   ->  Sort  (cost=20000084948.01..20000084994.93 rows=1876986 width=41)
         Sort Key: public.FACT_TABLE.F2
         ->  Result  (cost=10000000000.00..20000084230.64 rows=1876986
width=41)
               ->  Append  (cost=10000000000.00..20000084230.64 rows=1876986
width=41)
                     ->  Seq Scan on FACT_TABLE
(cost=10000000000.00..10000000010.02 rows=1 width=186)
                           Filter: (day = 20100502)
                     ->  Seq Scan on FACT_TABLE_20100502 FACT_TABLE
(cost=10000000000.00..10000084220.62 rows=1876985 width=4
1)
                           Filter: (day = 20100502)
(9 rows)

pgsql-bugs by date:

Previous
From: Merlin Moncure
Date:
Subject: ERROR: cannot handle unplanned sub-select
Next
From: Tom Lane
Date:
Subject: Re: ERROR: cannot handle unplanned sub-select