Thread: Monthly table partitioning for fast purges?
We are moving an application from Oracle 8i to Postgres and I've run into a problem attempting to duplicate a feature wecurrently use. In Oracle you can divide a table into partitions. We use this feature to break up the data by month. Each month we storeseveral tens of millions of rows in a particular table, and each month we drop the partition that's a year old. In otherwords, we always keep the last 12 months of data (12 partitions). This is clean and fast. Since the partition is bya timestamp column, it also gives us a certain amount of automatic indexing. Postgres doesn't support table partitions (correct me if I'm wrong!) so the only option appears to be to dump everythinginto one big table. What I'm worried about is the purging of the data from 12 months ago ... I'm worried that thiswill be a slow and expensive operation. Does anyone have any advice for how best to handle this? Thanks in advance, -Roger
On Fri, Aug 01, 2003 at 01:46:54PM -0700, Roger Hand wrote: > We are moving an application from Oracle 8i to Postgres and I've run into > a problem attempting to duplicate a feature we currently use. > > In Oracle you can divide a table into partitions. We use this feature to > break up the data by month. Each month we store several tens of millions > of rows in a particular table, and each month we drop the partition that's > a year old. In other words, we always keep the last 12 months of data (12 > partitions). This is clean and fast. Since the partition is by a timestamp > column, it also gives us a certain amount of automatic indexing. > > Postgres doesn't support table partitions (correct me if I'm wrong!) so > the only option appears to be to dump everything into one big table. What > I'm worried about is the purging of the data from 12 months ago ... I'm > worried that this will be a slow and expensive operation. > > Does anyone have any advice for how best to handle this? I feel your pain! No, PortgreSQL doesn't support this. There were some proposals recently on -hackers but there didn't seem to be a great deal of interest. The best solution I've come up with is by creating base tables for each year by hand and using a view to combine them. You can create RULEs to automatically move new data to various tables. As long as you're not doing UPDATEs you can avoid a lot of the complexity. Similar effects can be acheived using inheritance. Good luck! -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > "All that is needed for the forces of evil to triumph is for enough good > men to do nothing." - Edmond Burke > "The penalty good people pay for not being interested in politics is to be > governed by people worse than themselves." - Plato
Attachment
On 1 Aug 2003 at 13:46, Roger Hand wrote: > In Oracle you can divide a table into partitions. We use this feature to break up the data by month. Each month we storeseveral tens of millions of rows in a particular table, and each month we drop the partition that's a year old. In otherwords, we always keep the last 12 months of data (12 partitions). This is clean and fast. Since the partition is bya timestamp column, it also gives us a certain amount of automatic indexing. > > Postgres doesn't support table partitions (correct me if I'm wrong!) so the only option appears to be to dump everythinginto one big table. What I'm worried about is the purging of the data from 12 months ago ... I'm worried that thiswill be a slow and expensive operation. Well, you need to emulate this feature on postgresql, as pointed out already. It is possible in postgresql but is not provided out of the box. Here is what you can do. * Create a empty base table. * Create a trigger the directs the select on timestamps accordingly * Write a schedule to create a new table at the start of every month and update the trigger accordingly * Create and maintain rules to insert into base table depending upon the timestamp. This is just a rough idea. There might be fair amount of work to get this working but surely it is not imposible. HTH Bye Shridhar -- Justice, n.: A decision in your favor.
On Sun, 2003-08-03 at 06:31, Shridhar Daithankar wrote: > On 1 Aug 2003 at 13:46, Roger Hand wrote: [snip] > Here is what you can do. > > * Create a empty base table. > * Create a trigger the directs the select on timestamps accordingly > * Write a schedule to create a new table at the start of every month and update > the trigger accordingly > * Create and maintain rules to insert into base table depending upon the > timestamp. > > This is just a rough idea. > > There might be fair amount of work to get this working but surely it is not > imposible. And you get a big performance hit when all those records are moved. Partitioning "should" be put on the TODO list soon after tablespaces (or DBA-defined directories) is implemented. -- +-----------------------------------------------------------------+ | Ron Johnson, Jr. Home: ron.l.johnson@cox.net | | Jefferson, LA USA | | | | "I'm not a vegetarian because I love animals, I'm a vegetarian | | because I hate vegetables!" | | unknown | +-----------------------------------------------------------------+
I am looking at ways to speed up queries, the most common way by for queries to be constrianed is by date range. I have indexed the date column. Queries are still slower than i would like. Would there be any performance increase for these types of queries if the tables were split by month as described by Shridhar (method1) so only the required tables were loaded from disk? Will there be any performance increase if table partitioning is implemented? If method1 is likely to yield a performance increase, will there be a large hit for other types of queries which require all the data? I'd be happy to post the EXPLAIN ANALYZE output for a typical query if more info is needed. The date column is of type timestamp (and so goes right down to seconds) , most user queries are only concerned about whole days without the times, (but hte time data is required for other queries) can i do something with an index of the timestamps cast to dates? and then cast the queries to dates too? Thanks! Ron Johnson Wrote: > On Sun, 2003-08-03 at 06:31, Shridhar Daithankar wrote: > > On 1 Aug 2003 at 13:46, Roger Hand wrote: > [snip] > > Here is what you can do. > > > > * Create a empty base table. > > * Create a trigger the directs the select on timestamps accordingly > > * Write a schedule to create a new table at the start of every month and update > > the trigger accordingly > > * Create and maintain rules to insert into base table depending upon the > > timestamp. > > > > This is just a rough idea. > > > > There might be fair amount of work to get this working but surely it is not > > imposible. > > And you get a big performance hit when all those records are moved. > > Partitioning "should" be put on the TODO list soon after tablespaces > (or DBA-defined directories) is implemented. > --
You could create an index on the function date(), which strips the time information. > -----Original Message----- > From: psql-mail@freeuk.com [mailto:psql-mail@freeuk.com] > Sent: 04 August 2003 14:01 > To: PgSQL General ML > Subject: Re: [GENERAL] Monthly table partitioning for fast purges? > > > I am looking at ways to speed up queries, the most common way by for > queries to be constrianed is by date range. I have indexed the date > column. Queries are still slower than i would like. ... > The date column is of type timestamp (and so goes right down > to seconds) > , most user queries are only concerned about whole days without the > times, (but hte time data is required for other queries) can i do > something with an index of the timestamps cast to dates? and > then cast > the queries to dates too?
Did you think about cluster on index ? psql-mail@freeuk.com wrote: >I am looking at ways to speed up queries, the most common way by for >queries to be constrianed is by date range. I have indexed the date >column. Queries are still slower than i would like. > >Would there be any performance increase for these types of queries if >the tables were split by month as described by Shridhar (method1) so >only the required tables were loaded from disk? > >Will there be any performance increase if table partitioning is >implemented? > >If method1 is likely to yield a performance increase, will there be a >large hit for other types >of queries which require all the data? > >I'd be happy to post the EXPLAIN ANALYZE output for a typical query if >more info is needed. > >The date column is of type timestamp (and so goes right down to seconds) >, most user queries are only concerned about whole days without the >times, (but hte time data is required for other queries) can i do >something with an index of the timestamps cast to dates? and then cast >the queries to dates too? > >Thanks! > >Ron Johnson Wrote: > > >>On Sun, 2003-08-03 at 06:31, Shridhar Daithankar wrote: >> >> >>>On 1 Aug 2003 at 13:46, Roger Hand wrote: >>> >>> >>[snip] >> >> >>>Here is what you can do. >>> >>>* Create a empty base table. >>>* Create a trigger the directs the select on timestamps accordingly >>> >>> > > > >>>* Write a schedule to create a new table at the start of every >>> >>> >month and update > > >>>the trigger accordingly >>>* Create and maintain rules to insert into base table depending >>> >>> >upon the > > >>>timestamp. >>> >>>This is just a rough idea. >>> >>>There might be fair amount of work to get this working but surely >>> >>> >it is not > > >>>imposible. >>> >>> >>And you get a big performance hit when all those records are moved. >> >>Partitioning "should" be put on the TODO list soon after tablespaces >>(or DBA-defined directories) is implemented. >> >> >> > > >
On Fri, 1 Aug 2003, Roger Hand wrote: > We are moving an application from Oracle 8i to Postgres and I've run > into a problem attempting to duplicate a feature we currently use. > > In Oracle you can divide a table into partitions. We use this feature > to break up the data by month. Each month we store several tens of > millions of rows in a particular table, and each month we drop the > partition that's a year old. In other words, we always keep the last 12 > months of data (12 partitions). This is clean and fast. Since the > partition is by a timestamp column, it also gives us a certain amount of > automatic indexing. > > Postgres doesn't support table partitions (correct me if I'm wrong!) so > the only option appears to be to dump everything into one big table. > What I'm worried about is the purging of the data from 12 months ago > ... I'm worried that this will be a slow and expensive operation. There are two approaches I can think of, and you should test each one to see how it holds up to your usage. Approach the first: Using scripts, create a table for each month. Create a view that combines all of these months. When a month goes out of date, simply remove it from the view. Deleting the month can be done at your leisure, as it only saves disk space at this point, but since it isn't in the view, it doesn't slow you down to keep them. Approach the second: Use partial indexes to make it look like the table is partitioned. I.e. every month create a new partial index like: create index on bigoldtable (datefield) where datefield >=1stofmonth and datefield<=lastofmonth. Then include the "where date >=firstofmonth AND date <= lastofmonth This should then hit the partial index, which will be small compared to the master table with all the rows, or the main index, which will index all fields.
On Mon, 2003-08-04 at 09:02, Benjamin Jury wrote: > You could create an index on the function date(), which strips the time > information. How much of a hit would there be if he had separate "monthly tables" and UNIONed them? > > -----Original Message----- > > From: psql-mail@freeuk.com [mailto:psql-mail@freeuk.com] > > Sent: 04 August 2003 14:01 > > To: PgSQL General ML > > Subject: Re: [GENERAL] Monthly table partitioning for fast purges? > > > > > > I am looking at ways to speed up queries, the most common way by for > > queries to be constrianed is by date range. I have indexed the date > > column. Queries are still slower than i would like. > > ... > > > The date column is of type timestamp (and so goes right down > > to seconds) > > , most user queries are only concerned about whole days without the > > times, (but hte time data is required for other queries) can i do > > something with an index of the timestamps cast to dates? and > > then cast > > the queries to dates too? -- +-----------------------------------------------------------------+ | Ron Johnson, Jr. Home: ron.l.johnson@cox.net | | Jefferson, LA USA | | | | "I'm not a vegetarian because I love animals, I'm a vegetarian | | because I hate vegetables!" | | unknown | +-----------------------------------------------------------------+
I second that. I have discussed adding partitioning tables almost a year ago... No need to partition a functional index or anything like that. Just partition on a specific field. Ron Johnson wrote: >[...] > Partitioning "should" be put on the TODO list soon after tablespaces > (or DBA-defined directories) is implemented. > > -- > +-----------------------------------------------------------------+ > | Ron Johnson, Jr. Home: ron.l.johnson@cox.net | > | Jefferson, LA USA | > | | > | "I'm not a vegetarian because I love animals, I'm a vegetarian | > | because I hate vegetables!" | > | unknown | > +-----------------------------------------------------------------+ > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
On Tue, 2003-08-05 at 10:28, Jean-Luc Lachance wrote: > I second that. > I have discussed adding partitioning tables almost a year ago... > No need to partition a functional index or anything like that. > Just partition on a specific field. Well..... If you're going to partition a *huge* table by, say, a date field, and there's also an index on that date field, partitioning the index on that same field may have some advantages. > Ron Johnson wrote: > >[...] > > Partitioning "should" be put on the TODO list soon after tablespaces > > (or DBA-defined directories) is implemented. -- +-----------------------------------------------------------------+ | Ron Johnson, Jr. Home: ron.l.johnson@cox.net | | Jefferson, LA USA | | | | "I'm not a vegetarian because I love animals, I'm a vegetarian | | because I hate vegetables!" | | unknown | +-----------------------------------------------------------------+