Thread: Question about partitioned query behavior

Question about partitioned query behavior

From
Ranga Gopalan
Date:
Hi,

My question is regarding ORDER BY / LIMIT query behavior when using partitioning.

I have a large table (about 100 columns, several million rows) partitioned by a column called day (which is the date stored as yyyymmdd - say 20100502 for May 2nd 2010 etc.). Say the main table  is called FACT_TABLE and each child table is called FACT_TABLE_yyyymmdd (e.g. FACT_TABLE_20100502, FACT_TABLE_20100503 etc.) and has an appropriate CHECK constraint created on it to CHECK (day = yyyymmdd).

Postgres Version:  PostgreSQL 8.4.2 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-10), 64-bit

The query pattern I am looking at is (I have tried to simplify the column names for readability):

SELECT F1 from FACT_TABLE
where day >= 20100502 and day <= 20100507  # selecting for a week
ORDER BY F2 desc
LIMIT 100


This is what is happening:

When I query from the specific day's (child) table, I get what I expect - a descending Index scan and good performance.

# explain  select F1 from FACT_TABLE_20100502 where day = 20100502 order by F2 desc limit 100;
                                                                    QUERY PLAN                                                                 
 
------------------------------------------------------------------------------------------------------------------------------------------------
--
 Limit  (cost=0.00..4.81 rows=100 width=41)
   ->  Index Scan Backward using F2_20100502 on FACT_TABLE_20100502  (cost=0.00..90355.89 rows=1876985 width=41
)
         Filter: (day = 20100502)



BUT:

When I do the same query against the parent table it is much slower - two things seem to happen - one is that the descending scan of the index is not done and secondly there seems to be a separate sort/limit at the end - i.e. all data from all partitions is retrieved and then sorted and limited - This seems to be much less efficient than doing a descending scan on each partition and limiting the results and then combining and reapplying the limit at the end.

explain  select F1 from FACT_TABLE where day = 20100502 order by F2 desc limit 100;
                                                                    QUERY PLAN                                                                 
  
------------------------------------------------------------------------------------------------------------------------------------------------
---
 Limit  (cost=20000084948.01..20000084948.01 rows=100 width=41)
   ->  Sort  (cost=20000084948.01..20000084994.93 rows=1876986 width=41)
         Sort Key: public.FACT_TABLE.F2
         ->  Result  (cost=10000000000.00..20000084230.64 rows=1876986 width=41)
               ->  Append  (cost=10000000000.00..20000084230.64 rows=1876986 width=41)
                     ->  Seq Scan on FACT_TABLE  (cost=10000000000.00..10000000010.02 rows=1 width=186)
                           Filter: (day = 20100502)
                     ->  Seq Scan on FACT_TABLE_20100502 FACT_TABLE  (cost=10000000000.00..10000084220.62 rows=1876985 width=4
1)
                           Filter: (day = 20100502)
(9 rows)


Could anyone please explain why this is happening and what I can do to get the query to perform well even when querying from the parent table?

Thanks,

Ranga






Hotmail is redefining busy with tools for the New Busy. Get more from your inbox. See how.

Re: Question about partitioned query behavior

From
"Benjamin Krajmalnik"
Date:

In postgresql.conf, what are your settings for constraint_exclusion?

There are 3 settings – on, off, or partition.

Mine are set as follows:

 

constraint_exclusion = on            # on, off, or partition

 

Under 8.4.4 I had it set to partition, but the behavior was not what I expected so I set it back to “on” and only the applicable partitions get processed.

 

From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Ranga Gopalan
Sent: Friday, July 02, 2010 9:29 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Question about partitioned query behavior

 

Hi,

My question is regarding ORDER BY / LIMIT query behavior when using partitioning.

I have a large table (about 100 columns, several million rows) partitioned by a column called day (which is the date stored as yyyymmdd - say 20100502 for May 2nd 2010 etc.). Say the main table  is called FACT_TABLE and each child table is called FACT_TABLE_yyyymmdd (e.g. FACT_TABLE_20100502, FACT_TABLE_20100503 etc.) and has an appropriate CHECK constraint created on it to CHECK (day = yyyymmdd).

Postgres Version:  PostgreSQL 8.4.2 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-10), 64-bit

The query pattern I am looking at is (I have tried to simplify the column names for readability):

SELECT F1 from FACT_TABLE
where day >= 20100502 and day <= 20100507  # selecting for a week
ORDER BY F2 desc
LIMIT 100


This is what is happening:

When I query from the specific day's (child) table, I get what I expect - a descending Index scan and good performance.

# explain  select F1 from FACT_TABLE_20100502 where day = 20100502 order by F2 desc limit 100;
                                                                    QUERY PLAN                                                                 
 
------------------------------------------------------------------------------------------------------------------------------------------------
--
 Limit  (cost=0.00..4.81 rows=100 width=41)
   ->  Index Scan Backward using F2_20100502 on FACT_TABLE_20100502  (cost=0.00..90355.89 rows=1876985 width=41
)
         Filter: (day = 20100502)



BUT:

When I do the same query against the parent table it is much slower - two things seem to happen - one is that the descending scan of the index is not done and secondly there seems to be a separate sort/limit at the end - i.e. all data from all partitions is retrieved and then sorted and limited - This seems to be much less efficient than doing a descending scan on each partition and limiting the results and then combining and reapplying the limit at the end.

explain  select F1 from FACT_TABLE where day = 20100502 order by F2 desc limit 100;
                                                                    QUERY PLAN                                                                 
  
------------------------------------------------------------------------------------------------------------------------------------------------
---
 Limit  (cost=20000084948.01..20000084948.01 rows=100 width=41)
   ->  Sort  (cost=20000084948.01..20000084994.93 rows=1876986 width=41)
         Sort Key: public.FACT_TABLE.F2
         ->  Result  (cost=10000000000.00..20000084230.64 rows=1876986 width=41)
               ->  Append  (cost=10000000000.00..20000084230.64 rows=1876986 width=41)
                     ->  Seq Scan on FACT_TABLE  (cost=10000000000.00..10000000010.02 rows=1 width=186)
                           Filter: (day = 20100502)
                     ->  Seq Scan on FACT_TABLE_20100502 FACT_TABLE  (cost=10000000000.00..10000084220.62 rows=1876985 width=4
1)
                           Filter: (day = 20100502)
(9 rows)


Could anyone please explain why this is happening and what I can do to get the query to perform well even when querying from the parent table?

Thanks,

Ranga





Hotmail is redefining busy with tools for the New Busy. Get more from your inbox. See how.

Re: Question about partitioned query behavior

From
Ranga Gopalan
Date:
Hi,

It seems that this is an issue faced by others as well - Please see this link: http://stackoverflow.com/questions/2236776/efficient-querying-of-multi-partition-postgres-table

Is this a known bug? Is this something that someone is working on or is there a known work around?

Thanks,

Ranga



From: ranga_gopalan@hotmail.com
To: pgsql-performance@postgresql.org
Subject: Question about partitioned query behavior
Date: Fri, 2 Jul 2010 15:28:45 +0000

.ExternalClass .ecxhmmessage P {padding:0px;} .ExternalClass body.ecxhmmessage {font-size:10pt;font-family:Verdana;} Hi,

My question is regarding ORDER BY / LIMIT query behavior when using partitioning.

I have a large table (about 100 columns, several million rows) partitioned by a column called day (which is the date stored as yyyymmdd - say 20100502 for May 2nd 2010 etc.). Say the main table  is called FACT_TABLE and each child table is called FACT_TABLE_yyyymmdd (e.g. FACT_TABLE_20100502, FACT_TABLE_20100503 etc.) and has an appropriate CHECK constraint created on it to CHECK (day = yyyymmdd).

Postgres Version:  PostgreSQL 8.4.2 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-10), 64-bit

The query pattern I am looking at is (I have tried to simplify the column names for readability):

SELECT F1 from FACT_TABLE
where day >= 20100502 and day <= 20100507  # selecting for a week
ORDER BY F2 desc
LIMIT 100


This is what is happening:

When I query from the specific day's (child) table, I get what I expect - a descending Index scan and good performance.

# explain  select F1 from FACT_TABLE_20100502 where day = 20100502 order by F2 desc limit 100;
                                                                    QUERY PLAN                                                                 
 
------------------------------------------------------------------------------------------------------------------------------------------------
--
 Limit  (cost=0.00..4.81 rows=100 width=41)
   ->  Index Scan Backward using F2_20100502 on FACT_TABLE_20100502  (cost=0.00..90355.89 rows=1876985 width=41
)
         Filter: (day = 20100502)



BUT:

When I do the same query against the parent table it is much slower - two things seem to happen - one is that the descending scan of the index is not done and secondly there seems to be a separate sort/limit at the end - i.e. all data from all partitions is retrieved and then sorted and limited - This seems to be much less efficient than doing a descending scan on each partition and limiting the results and then combining and reapplying the limit at the end.

explain  select F1 from FACT_TABLE where day = 20100502 order by F2 desc limit 100;
                                                                    QUERY PLAN                                                                 
  
------------------------------------------------------------------------------------------------------------------------------------------------
---
 Limit  (cost=20000084948.01..20000084948.01 rows=100 width=41)
   ->  Sort  (cost=20000084948.01..20000084994.93 rows=1876986 width=41)
         Sort Key: public.FACT_TABLE.F2
         ->  Result  (cost=10000000000.00..20000084230.64 rows=1876986 width=41)
               ->  Append  (cost=10000000000.00..20000084230.64 rows=1876986 width=41)
                     ->  Seq Scan on FACT_TABLE  (cost=10000000000.00..10000000010.02 rows=1 width=186)
                           Filter: (day = 20100502)
                     ->  Seq Scan on FACT_TABLE_20100502 FACT_TABLE  (cost=10000000000.00..10000084220.62 rows=1876985 width=4
1)
                           Filter: (day = 20100502)
(9 rows)


Could anyone please explain why this is happening and what I can do to get the query to perform well even when querying from the parent table?

Thanks,

Ranga






Hotmail is redefining busy with tools for the New Busy. Get more from your inbox. See how.

Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox. Learn more.

Re: Question about partitioned query behavior

From
Robert Haas
Date:
On Tue, Jul 6, 2010 at 12:30 PM, Ranga Gopalan
<ranga_gopalan@hotmail.com> wrote:
> It seems that this is an issue faced by others as well - Please see this
> link:
> http://stackoverflow.com/questions/2236776/efficient-querying-of-multi-partition-postgres-table
>
> Is this a known bug? Is this something that someone is working on or is
> there a known work around?

I think that we know this problem exists, but I'm not aware anyone is
working on fixing it.  There is a "Merge Append" patch floating around
out there that I think might help with this, but AFAICS it was last
updated on July 5, 2009, and still needed some more work at that time.

Since this is an all-volunteer effort, complicated problems like this
don't always get fixed as fast as we'd like; most of us have to spend
most of our time on whatever it is that our employer pays us to do.
Of course if you're in a position to sponsor a developer there are a
number of companies that will be happy to work with you.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: Question about partitioned query behavior

From
Stephen Frost
Date:
Ranga,

* Ranga Gopalan (ranga_gopalan@hotmail.com) wrote:
> It seems that this is an issue faced by others as well - Please see this link:
http://stackoverflow.com/questions/2236776/efficient-querying-of-multi-partition-postgres-table
>
> Is this a known bug? Is this something that someone is working on or is there a known work around?

Actually, if you look at that, the problem the original poster had was
that they didn't have constraint_exclusion turned on.  Then they were
complaining about having the (empty) master table and the needed
partition included (which, really, shouldn't be that big a deal).

Did you look at what the other reply suggested?  Do you have
constraint_exclusion = 'on' in your postgresql.conf?

    Thanks,

        Stephen

Attachment

Re: Question about partitioned query behavior

From
Ranga Gopalan
Date:
Hi Stephen,

Constraint exclusion was initially partition and I set it to "on" as suggested and tried that - the query planner in both cases was correctly identifying the specific partitions being queried - the problem seems to be a generic issue related to the way queries on partition tables are handled and how the order by / limit is applied in this scenario.

Thanks,

Ranga

> Date: Tue, 6 Jul 2010 16:26:23 -0400
> From: sfrost@snowman.net
> To: ranga_gopalan@hotmail.com
> CC: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Question about partitioned query behavior
>
> Ranga,
>
> * Ranga Gopalan (ranga_gopalan@hotmail.com) wrote:
> > It seems that this is an issue faced by others as well - Please see this link: http://stackoverflow.com/questions/2236776/efficient-querying-of-multi-partition-postgres-table
> >
> > Is this a known bug? Is this something that someone is working on or is there a known work around?
>
> Actually, if you look at that, the problem the original poster had was
> that they didn't have constraint_exclusion turned on. Then they were
> complaining about having the (empty) master table and the needed
> partition included (which, really, shouldn't be that big a deal).
>
> Did you look at what the other reply suggested? Do you have
> constraint_exclusion = 'on' in your postgresql.conf?
>
> Thanks,
>
> Stephen


The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail. Get busy.