Thread: Partitioned tables in queries
I have a case where I am partitioning tables based on a date range in version 8.1.4. For example: table_with_millions_of_records interaction_id char(16) primary key start_date timestamp (without timezone) - indexed .. other columns child_1 start_date >= 2006-07-21 00:00:00 child_2 start_date >= 2006-07-20 00:00:00 and start_date < 2006-07-21 00:00:00 ... child_5 start_date >= 2006-07-17 00:00:00 and start_date < 2006-07-18 00:00:00 with rules on the parent and child tables that redirect the data to the appropriate child table based on the start_date. Because this table is going to grow very large (very quickly), and will need to be purged daily, I created partitions, or child tables to hold data for each day. I have done the same thing in Oracle in the past, and the PostgreSQL solution works great. The archival process is very simple - drop the expired child table. I am having one problem. If I run a query on the full table (there are 5 child tables with data for the last 5 days), and my where clause contains data for the current day only: where start_date > date_trunc('day', now()) all 5 child tables are scanned when I look at the output from explain analyze. My question is - can I force the planner to only scan the relevant child table - when the key related to the partitioned data it part of the where clause? Thanks, Kevin ...
On Jul 21, 2006, at 12:17 PM, Kevin Keith wrote: > I have a case where I am partitioning tables based on a date range > in version 8.1.4. For example: > > table_with_millions_of_records > interaction_id char(16) primary key > start_date timestamp (without timezone) - indexed > .. other columns > > child_1 start_date >= 2006-07-21 00:00:00 > child_2 start_date >= 2006-07-20 00:00:00 and start_date < > 2006-07-21 00:00:00 > ... > child_5 start_date >= 2006-07-17 00:00:00 and start_date < > 2006-07-18 00:00:00 > > with rules on the parent and child tables that redirect the data to > the appropriate child table based on the start_date. > > Because this table is going to grow very large (very quickly), and > will need to be purged daily, I created partitions, or child tables > to hold data for each day. I have done the same thing in Oracle in > the past, and the PostgreSQL solution works great. The archival > process is very simple - drop the expired child table. I am having > one problem. > > If I run a query on the full table (there are 5 child tables with > data for the last 5 days), and my where clause contains data for > the current day only: > where start_date > date_trunc('day', now()) > all 5 child tables are scanned when I look at the output from > explain analyze. > > My question is - can I force the planner to only scan the relevant > child table - when the key related to the partitioned data it part > of the where clause? Yes. You'll need non-overlapping check constraints in each child table and to set constraint_exclusion to "on" in postgresql.conf. See http://www.postgresql.org/docs/8.1/static/ddl-partitioning.html for the gory details. Cheers, Steve
My post might have been a little premature - and I apologize for that. I have figured out what was causing the problem: 1. Constraint exclusion was disabled. I re-enabled. 2. I found that using the now() function - and arbitrary interval will produce a different execution plan that using a specific date. For example: assuming the current time is 16:00: a) where start_date > now() - interval '4 hours' scans all child tables. b) where start_date > '2006-07-21 12:00:00' only scans the child table with today's data. So am I to assume that the value in the query must be a constant, and cannot be a result of a built-in function in order for constraint_exclusion to work correctly? Thanks, Kevin Kevin Keith wrote: > I have a case where I am partitioning tables based on a date range in > version 8.1.4. For example: > > table_with_millions_of_records > interaction_id char(16) primary key > start_date timestamp (without timezone) - indexed > .. other columns > > child_1 start_date >= 2006-07-21 00:00:00 > child_2 start_date >= 2006-07-20 00:00:00 and start_date < > 2006-07-21 00:00:00 > ... > child_5 start_date >= 2006-07-17 00:00:00 and start_date < > 2006-07-18 00:00:00 > > with rules on the parent and child tables that redirect the data to > the appropriate child table based on the start_date. > > Because this table is going to grow very large (very quickly), and > will need to be purged daily, I created partitions, or child tables to > hold data for each day. I have done the same thing in Oracle in the > past, and the PostgreSQL solution works great. The archival process is > very simple - drop the expired child table. I am having one problem. > > If I run a query on the full table (there are 5 child tables with data > for the last 5 days), and my where clause contains data for the > current day only: > where start_date > date_trunc('day', now()) > all 5 child tables are scanned when I look at the output from explain > analyze. > > My question is - can I force the planner to only scan the relevant > child table - when the key related to the partitioned data it part of > the where clause? > > Thanks, > > Kevin > > > ... >
> 2. I found that using the now() function - and arbitrary interval will > produce a different execution plan that using a specific date. For example: > assuming the current time is 16:00: > a) where start_date > now() - interval '4 hours' scans all child tables. > b) where start_date > '2006-07-21 12:00:00' only scans the child > table with today's data. > > So am I to assume that the value in the query must be a constant, and > cannot be a result of a built-in function in order for > constraint_exclusion to work correctly? Have you tried WHERE start_date > (SELECT now() - interval '4 hours')? Certainly using the constant will allow CBE to work. I think that a subquery might too.