Thread: BUG #5543: Poor performance - Index scan backwards not used for order by desc with partitioned tables

The following bug has been logged online:

Bug reference:      5543
Logged by:          Ranga Gopalan
Email address:      ranga_gopalan@hotmail.com
PostgreSQL version: 8.4.4
Operating system:   Linux x86-64
Description:        Poor performance - Index scan backwards not used for
order by desc with partitioned tables
Details:

My problem is regarding ORDER BY / LIMIT query behavior when using
partitioning.

I have a large table (about 100 columns, several million rows) partitioned
by a column called day (which is the date stored as yyyymmdd - say 20100502
for May 2nd 2010 etc.). Say the main table  is called FACT_TABLE and each
child table is called FACT_TABLE_yyyymmdd (e.g. FACT_TABLE_20100502,
FACT_TABLE_20100503 etc.) and has an appropriate CHECK constraint created on
it to CHECK (day = yyyymmdd).

The query pattern I am looking at is (I have tried to simplify the column
names for readability):

SELECT F1 from FACT_TABLE
where day >= 20100502 and day <= 20100507  # selecting for a week
ORDER BY F2 desc
LIMIT 100


This is what is happening:

When I query from the specific day's (child) table, I get what I expect - a
descending Index scan and good performance.

# explain  select F1 from FACT_TABLE_20100502 where day = 20100502 order by
F2 desc limit 100;
                                                                    QUERY
PLAN

----------------------------------------------------------------------------
--------------------------------------------------------------------
--
 Limit  (cost=0.00..4.81 rows=100 width=41)
   ->  Index Scan Backward using F2_20100502 on FACT_TABLE_20100502
(cost=0.00..90355.89 rows=1876985 width=41
)
         Filter: (day = 20100502)



BUT:

When I do the same query against the parent table it is much slower - two
things seem to happen - one is that the descending scan of the index is not
done and secondly there seems to be a separate sort/limit at the end - i.e.
all data from all partitions is retrieved and then sorted and limited - This
seems to be much less efficient than doing a descending scan on each
partition and limiting the results and then combining and reapplying the
limit at the end.

explain  select F1 from FACT_TABLE where day = 20100502 order by F2 desc
limit 100;
                                                                    QUERY
PLAN

----------------------------------------------------------------------------
--------------------------------------------------------------------
---
 Limit  (cost=20000084948.01..20000084948.01 rows=100 width=41)
   ->  Sort  (cost=20000084948.01..20000084994.93 rows=1876986 width=41)
         Sort Key: public.FACT_TABLE.F2
         ->  Result  (cost=10000000000.00..20000084230.64 rows=1876986
width=41)
               ->  Append  (cost=10000000000.00..20000084230.64 rows=1876986
width=41)
                     ->  Seq Scan on FACT_TABLE
(cost=10000000000.00..10000000010.02 rows=1 width=186)
                           Filter: (day = 20100502)
                     ->  Seq Scan on FACT_TABLE_20100502 FACT_TABLE
(cost=10000000000.00..10000084220.62 rows=1876985 width=4
1)
                           Filter: (day = 20100502)
(9 rows)
On Tue, Jul 6, 2010 at 2:20 PM, Ranga Gopalan <ranga_gopalan@hotmail.com> w=
rote:
>
> The following bug has been logged online:
>
> Bug reference: =A0 =A0 =A05543
> Logged by: =A0 =A0 =A0 =A0 =A0Ranga Gopalan
> Email address: =A0 =A0 =A0ranga_gopalan@hotmail.com
> PostgreSQL version: 8.4.4
> Operating system: =A0 Linux x86-64
> Description: =A0 =A0 =A0 =A0Poor performance - Index scan backwards not u=
sed for
> order by desc with partitioned tables
> Details:
>
> My problem is regarding ORDER BY / LIMIT query behavior when using
> partitioning.
>
> I have a large table (about 100 columns, several million rows) partitioned
> by a column called day (which is the date stored as yyyymmdd - say 201005=
02
> for May 2nd 2010 etc.). Say the main table =A0is called FACT_TABLE and ea=
ch
> child table is called FACT_TABLE_yyyymmdd (e.g. FACT_TABLE_20100502,
> FACT_TABLE_20100503 etc.) and has an appropriate CHECK constraint created=
 on
> it to CHECK (day =3D yyyymmdd).
>
> The query pattern I am looking at is (I have tried to simplify the column
> names for readability):
>
> SELECT F1 from FACT_TABLE
> where day >=3D 20100502 and day <=3D 20100507 =A0# selecting for a week
> ORDER BY F2 desc
> LIMIT 100
>
>
> This is what is happening:
>
> When I query from the specific day's (child) table, I get what I expect -=
 a
> descending Index scan and good performance.
>
> # explain =A0select F1 from FACT_TABLE_20100502 where day =3D 20100502 or=
der by
> F2 desc limit 100;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0QUERY
> PLAN
>
> -------------------------------------------------------------------------=
---
> --------------------------------------------------------------------
> --
> =A0Limit =A0(cost=3D0.00..4.81 rows=3D100 width=3D41)
> =A0 -> =A0Index Scan Backward using F2_20100502 on FACT_TABLE_20100502
> (cost=3D0.00..90355.89 rows=3D1876985 width=3D41
> )
> =A0 =A0 =A0 =A0 Filter: (day =3D 20100502)
>
>
>
> BUT:
>
> When I do the same query against the parent table it is much slower - two
> things seem to happen - one is that the descending scan of the index is n=
ot
> done and secondly there seems to be a separate sort/limit at the end - i.=
e.
> all data from all partitions is retrieved and then sorted and limited - T=
his
> seems to be much less efficient than doing a descending scan on each
> partition and limiting the results and then combining and reapplying the
> limit at the end.
>
> explain =A0select F1 from FACT_TABLE where day =3D 20100502 order by F2 d=
esc
> limit 100;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0QUERY
> PLAN
>
> -------------------------------------------------------------------------=
---
> --------------------------------------------------------------------
> ---
> =A0Limit =A0(cost=3D20000084948.01..20000084948.01 rows=3D100 width=3D41)
> =A0 -> =A0Sort =A0(cost=3D20000084948.01..20000084994.93 rows=3D1876986 w=
idth=3D41)
> =A0 =A0 =A0 =A0 Sort Key: public.FACT_TABLE.F2
> =A0 =A0 =A0 =A0 -> =A0Result =A0(cost=3D10000000000.00..20000084230.64 ro=
ws=3D1876986
> width=3D41)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 -> =A0Append =A0(cost=3D10000000000.00..20000=
084230.64 rows=3D1876986
> width=3D41)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 -> =A0Seq Scan on FACT_TABLE
> (cost=3D10000000000.00..10000000010.02 rows=3D1 width=3D186)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Filter: (day =3D 2010=
0502)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 -> =A0Seq Scan on FACT_TABLE_2010=
0502 FACT_TABLE
> (cost=3D10000000000.00..10000084220.62 rows=3D1876985 width=3D4
> 1)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Filter: (day =3D 2010=
0502)
> (9 rows)

Does it help if you put a CHECK (false) constraint on the parent table?

--=20
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
Robert Haas <robertmhaas@gmail.com> writes:
> Does it help if you put a CHECK (false) constraint on the parent table?

It won't --- it'll still result in an append plan even if there's only
one surviving child.

This is one of many things that seem to me to not make sense to tackle
until we have an explicit notion of partitioning.  Having the planner
try to prove from individual constraints that it could get a correctly
sorted Append result without an explicit sort step would be hugely
expensive, and complicated --- imagine even trying to pick out the
relevant indexes without any infrastructure to help identify them.
With a partitioned structure we could understand that a-priori.

            regards, tom lane
On Tue, Jul 27, 2010 at 7:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Does it help if you put a CHECK (false) constraint on the parent table?
>
> It won't --- it'll still result in an append plan even if there's only
> one surviving child.
>
> This is one of many things that seem to me to not make sense to tackle
> until we have an explicit notion of partitioning. =A0Having the planner
> try to prove from individual constraints that it could get a correctly
> sorted Append result without an explicit sort step would be hugely
> expensive, and complicated --- imagine even trying to pick out the
> relevant indexes without any infrastructure to help identify them.
> With a partitioned structure we could understand that a-priori.

Hmm, I thought we had something that made it behave more like the
non-partitioned case when there is only one surviving partition.  But
I agree that, perhaps apart from that special case, there's not much
hope of improving this until we have more infrastructure.

--=20
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company