Issue with partition elimination - Mailing list pgsql-performance

From Paul Pierce
Subject Issue with partition elimination
Date
Msg-id 505432.93484.qm@web120209.mail.ne1.yahoo.com
Whole thread Raw
Responses Re: Issue with partition elimination
List pgsql-performance

This is a good one :)


Here is a "brief" description of our issue(Postgres 9.0): 


Tables:

main fact table:

Table "public.parent_fact"

        Column        |            Type             |  

----------------------+-----------------------------+-----------

 etime                | date | not null

 pcamp_id             | integer                     | 

 location_id          | integer                     | 

 impressions          | bigint                      | 

 clicks              | int


this table partitioned by etime.


We are trying to build a report, which has last week numbers alongside with this week numbers. For example: if today is Wednesday, I want to compare daily numbers from last week 3 days (mon through wed) with this week 3 days(mon through wed).


To accomplish that, we've decided to build a transformation table, which has two columns:


 Table "public.trans_last_week"

  Column  |            Type             | Modifiers 

----------+-----------------------------+-----------

 etime    | date | 

 lw_etime | date |


So for each date(etime), we have lw_etime, which is essentially etime-7 days.


Here is the first query, which performs fine:


select    a11.location_id AS location_id,

    a11.pcamp_id AS  pcamp_id,

    sum(a11.clicks)

from    parent_fact    a11

where    a11.etime between '2011-14-18' and '2011-04-20'

group  by    a11.location_id,

    a11.pcamp_id


everything is good there -  it calculates numbers from the current week and goes to only 3 partitions to aggregate numbers. 


Here is the second query:


select    a11.location_id AS location_id,

    a11.pcamp_id AS  pcamp_id,

    sum(a11.clicks)

from    parent_fact    a11

    join    trans_last_week    a12

      on     (a11.etime = a12.lw_etime)

where    a12.etime between '2011-14-18' and '2011-04-20'

group  by    a11.location_id,

    a11.pcamp_id



Here it scans through all partitions in the parent_fact table and runs 3-4 times slower.


What was noticed, that the only case when Postgres is actually going to execute the query against the right partitions is query #1. 


Is that by design? Second query join, will also result in 3 days(3 partitions) 


This query (#3) also scans all partitions:


select    a11.location_id AS location_id,

    a11.pcamp_id AS  pcamp_id,

    sum(a11.clicks)

from    parent_fact    a11

where    a11.etime in (select a12.etime from trans_last_week    a12 where a11.etime = a12.lw_etime)

group  by    a11.location_id,

    a11.pcamp_id



Thank you!

pgsql-performance by date:

Previous
From: Robert Haas
Date:
Subject: Re: REINDEX takes half a day (and still not complete!)
Next
From: Tory M Blue
Date:
Subject: Re: oom_killer