Re: BUG #5543: Poor performance - Index scan backwards not used for order by desc with partitioned tables - Mailing list pgsql-bugs
From | Robert Haas |
---|---|
Subject | Re: BUG #5543: Poor performance - Index scan backwards not used for order by desc with partitioned tables |
Date | |
Msg-id | AANLkTi=R_bCQ7UKpKqcoKMcXQuNgtMfcHKVFmTbozBny@mail.gmail.com Whole thread Raw |
In response to | BUG #5543: Poor performance - Index scan backwards not used for order by desc with partitioned tables ("Ranga Gopalan" <ranga_gopalan@hotmail.com>) |
Responses |
Re: BUG #5543: Poor performance - Index scan backwards not used for order by desc with partitioned tables
|
List | pgsql-bugs |
On Tue, Jul 6, 2010 at 2:20 PM, Ranga Gopalan <ranga_gopalan@hotmail.com> w= rote: > > The following bug has been logged online: > > Bug reference: =A0 =A0 =A05543 > Logged by: =A0 =A0 =A0 =A0 =A0Ranga Gopalan > Email address: =A0 =A0 =A0ranga_gopalan@hotmail.com > PostgreSQL version: 8.4.4 > Operating system: =A0 Linux x86-64 > Description: =A0 =A0 =A0 =A0Poor performance - Index scan backwards not u= sed for > order by desc with partitioned tables > Details: > > My problem is regarding ORDER BY / LIMIT query behavior when using > partitioning. > > I have a large table (about 100 columns, several million rows) partitioned > by a column called day (which is the date stored as yyyymmdd - say 201005= 02 > for May 2nd 2010 etc.). Say the main table =A0is called FACT_TABLE and ea= ch > child table is called FACT_TABLE_yyyymmdd (e.g. FACT_TABLE_20100502, > FACT_TABLE_20100503 etc.) and has an appropriate CHECK constraint created= on > it to CHECK (day =3D yyyymmdd). > > The query pattern I am looking at is (I have tried to simplify the column > names for readability): > > SELECT F1 from FACT_TABLE > where day >=3D 20100502 and day <=3D 20100507 =A0# selecting for a week > ORDER BY F2 desc > LIMIT 100 > > > This is what is happening: > > When I query from the specific day's (child) table, I get what I expect -= a > descending Index scan and good performance. > > # explain =A0select F1 from FACT_TABLE_20100502 where day =3D 20100502 or= der by > F2 desc limit 100; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0QUERY > PLAN > > -------------------------------------------------------------------------= --- > -------------------------------------------------------------------- > -- > =A0Limit =A0(cost=3D0.00..4.81 rows=3D100 width=3D41) > =A0 -> =A0Index Scan Backward using F2_20100502 on FACT_TABLE_20100502 > (cost=3D0.00..90355.89 rows=3D1876985 width=3D41 > ) > =A0 =A0 =A0 =A0 Filter: (day =3D 20100502) > > > > BUT: > > When I do the same query against the parent table it is much slower - two > things seem to happen - one is that the descending scan of the index is n= ot > done and secondly there seems to be a separate sort/limit at the end - i.= e. > all data from all partitions is retrieved and then sorted and limited - T= his > seems to be much less efficient than doing a descending scan on each > partition and limiting the results and then combining and reapplying the > limit at the end. > > explain =A0select F1 from FACT_TABLE where day =3D 20100502 order by F2 d= esc > limit 100; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0QUERY > PLAN > > -------------------------------------------------------------------------= --- > -------------------------------------------------------------------- > --- > =A0Limit =A0(cost=3D20000084948.01..20000084948.01 rows=3D100 width=3D41) > =A0 -> =A0Sort =A0(cost=3D20000084948.01..20000084994.93 rows=3D1876986 w= idth=3D41) > =A0 =A0 =A0 =A0 Sort Key: public.FACT_TABLE.F2 > =A0 =A0 =A0 =A0 -> =A0Result =A0(cost=3D10000000000.00..20000084230.64 ro= ws=3D1876986 > width=3D41) > =A0 =A0 =A0 =A0 =A0 =A0 =A0 -> =A0Append =A0(cost=3D10000000000.00..20000= 084230.64 rows=3D1876986 > width=3D41) > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 -> =A0Seq Scan on FACT_TABLE > (cost=3D10000000000.00..10000000010.02 rows=3D1 width=3D186) > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Filter: (day =3D 2010= 0502) > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 -> =A0Seq Scan on FACT_TABLE_2010= 0502 FACT_TABLE > (cost=3D10000000000.00..10000084220.62 rows=3D1876985 width=3D4 > 1) > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Filter: (day =3D 2010= 0502) > (9 rows) Does it help if you put a CHECK (false) constraint on the parent table? --=20 Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
pgsql-bugs by date: