Re: BUG #5543: Poor performance - Index scan backwards not used for order by desc with partitioned tables - Mailing list pgsql-bugs

From Robert Haas
Subject Re: BUG #5543: Poor performance - Index scan backwards not used for order by desc with partitioned tables
Date
Msg-id AANLkTi=R_bCQ7UKpKqcoKMcXQuNgtMfcHKVFmTbozBny@mail.gmail.com
Whole thread Raw
In response to BUG #5543: Poor performance - Index scan backwards not used for order by desc with partitioned tables  ("Ranga Gopalan" <ranga_gopalan@hotmail.com>)
Responses Re: BUG #5543: Poor performance - Index scan backwards not used for order by desc with partitioned tables  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
On Tue, Jul 6, 2010 at 2:20 PM, Ranga Gopalan <ranga_gopalan@hotmail.com> w=
rote:
>
> The following bug has been logged online:
>
> Bug reference: =A0 =A0 =A05543
> Logged by: =A0 =A0 =A0 =A0 =A0Ranga Gopalan
> Email address: =A0 =A0 =A0ranga_gopalan@hotmail.com
> PostgreSQL version: 8.4.4
> Operating system: =A0 Linux x86-64
> Description: =A0 =A0 =A0 =A0Poor performance - Index scan backwards not u=
sed for
> order by desc with partitioned tables
> Details:
>
> My problem is regarding ORDER BY / LIMIT query behavior when using
> partitioning.
>
> I have a large table (about 100 columns, several million rows) partitioned
> by a column called day (which is the date stored as yyyymmdd - say 201005=
02
> for May 2nd 2010 etc.). Say the main table =A0is called FACT_TABLE and ea=
ch
> child table is called FACT_TABLE_yyyymmdd (e.g. FACT_TABLE_20100502,
> FACT_TABLE_20100503 etc.) and has an appropriate CHECK constraint created=
 on
> it to CHECK (day =3D yyyymmdd).
>
> The query pattern I am looking at is (I have tried to simplify the column
> names for readability):
>
> SELECT F1 from FACT_TABLE
> where day >=3D 20100502 and day <=3D 20100507 =A0# selecting for a week
> ORDER BY F2 desc
> LIMIT 100
>
>
> This is what is happening:
>
> When I query from the specific day's (child) table, I get what I expect -=
 a
> descending Index scan and good performance.
>
> # explain =A0select F1 from FACT_TABLE_20100502 where day =3D 20100502 or=
der by
> F2 desc limit 100;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0QUERY
> PLAN
>
> -------------------------------------------------------------------------=
---
> --------------------------------------------------------------------
> --
> =A0Limit =A0(cost=3D0.00..4.81 rows=3D100 width=3D41)
> =A0 -> =A0Index Scan Backward using F2_20100502 on FACT_TABLE_20100502
> (cost=3D0.00..90355.89 rows=3D1876985 width=3D41
> )
> =A0 =A0 =A0 =A0 Filter: (day =3D 20100502)
>
>
>
> BUT:
>
> When I do the same query against the parent table it is much slower - two
> things seem to happen - one is that the descending scan of the index is n=
ot
> done and secondly there seems to be a separate sort/limit at the end - i.=
e.
> all data from all partitions is retrieved and then sorted and limited - T=
his
> seems to be much less efficient than doing a descending scan on each
> partition and limiting the results and then combining and reapplying the
> limit at the end.
>
> explain =A0select F1 from FACT_TABLE where day =3D 20100502 order by F2 d=
esc
> limit 100;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0QUERY
> PLAN
>
> -------------------------------------------------------------------------=
---
> --------------------------------------------------------------------
> ---
> =A0Limit =A0(cost=3D20000084948.01..20000084948.01 rows=3D100 width=3D41)
> =A0 -> =A0Sort =A0(cost=3D20000084948.01..20000084994.93 rows=3D1876986 w=
idth=3D41)
> =A0 =A0 =A0 =A0 Sort Key: public.FACT_TABLE.F2
> =A0 =A0 =A0 =A0 -> =A0Result =A0(cost=3D10000000000.00..20000084230.64 ro=
ws=3D1876986
> width=3D41)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 -> =A0Append =A0(cost=3D10000000000.00..20000=
084230.64 rows=3D1876986
> width=3D41)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 -> =A0Seq Scan on FACT_TABLE
> (cost=3D10000000000.00..10000000010.02 rows=3D1 width=3D186)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Filter: (day =3D 2010=
0502)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 -> =A0Seq Scan on FACT_TABLE_2010=
0502 FACT_TABLE
> (cost=3D10000000000.00..10000084220.62 rows=3D1876985 width=3D4
> 1)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Filter: (day =3D 2010=
0502)
> (9 rows)

Does it help if you put a CHECK (false) constraint on the parent table?

--=20
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: failed to fetch tuple for EvalPlanQual recheck
Next
From: Tom Lane
Date:
Subject: Re: BUG #5543: Poor performance - Index scan backwards not used for order by desc with partitioned tables