Thread: Monthly table partitioning for fast purges?

Monthly table partitioning for fast purges?

From
"Roger Hand"
Date:
We are moving an application from Oracle 8i to Postgres and I've run into a problem attempting to duplicate a feature
wecurrently use.  

In Oracle you can divide a table into partitions. We use this feature to break up the data by month. Each month we
storeseveral tens of millions of rows in a particular table, and each month we drop the partition that's a year old. In
otherwords, we always keep the last 12 months of data (12 partitions). This is clean and fast. Since the partition is
bya timestamp column, it also gives us a certain amount of automatic indexing.  

Postgres doesn't support table partitions (correct me if I'm wrong!) so the only option appears to be to dump
everythinginto one big table. What I'm worried about is the purging of the data from 12 months ago ... I'm worried that
thiswill be a slow and expensive operation.  

Does anyone have any advice for how best to handle this?

Thanks in advance,

-Roger

Re: Monthly table partitioning for fast purges?

From
Martijn van Oosterhout
Date:
On Fri, Aug 01, 2003 at 01:46:54PM -0700, Roger Hand wrote:
> We are moving an application from Oracle 8i to Postgres and I've run into
> a problem attempting to duplicate a feature we currently use.
>
> In Oracle you can divide a table into partitions. We use this feature to
> break up the data by month. Each month we store several tens of millions
> of rows in a particular table, and each month we drop the partition that's
> a year old. In other words, we always keep the last 12 months of data (12
> partitions). This is clean and fast. Since the partition is by a timestamp
> column, it also gives us a certain amount of automatic indexing.
>
> Postgres doesn't support table partitions (correct me if I'm wrong!) so
> the only option appears to be to dump everything into one big table. What
> I'm worried about is the purging of the data from 12 months ago ... I'm
> worried that this will be a slow and expensive operation.
>
> Does anyone have any advice for how best to handle this?

I feel your pain! No, PortgreSQL doesn't support this. There were some
proposals recently on -hackers but there didn't seem to be a great deal of
interest. The best solution I've come up with is by creating base tables for
each year by hand and using a view to combine them.

You can create RULEs to automatically move new data to various tables. As
long as you're not doing UPDATEs you can avoid a lot of the complexity.
Similar effects can be acheived using inheritance.

Good luck!

--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> "All that is needed for the forces of evil to triumph is for enough good
> men to do nothing." - Edmond Burke
> "The penalty good people pay for not being interested in politics is to be
> governed by people worse than themselves." - Plato

Attachment

Re: Monthly table partitioning for fast purges?

From
"Shridhar Daithankar"
Date:
On 1 Aug 2003 at 13:46, Roger Hand wrote:
> In Oracle you can divide a table into partitions. We use this feature to break up the data by month. Each month we
storeseveral tens of millions of rows in a particular table, and each month we drop the partition that's a year old. In
otherwords, we always keep the last 12 months of data (12 partitions). This is clean and fast. Since the partition is
bya timestamp column, it also gives us a certain amount of automatic indexing.  
>
> Postgres doesn't support table partitions (correct me if I'm wrong!) so the only option appears to be to dump
everythinginto one big table. What I'm worried about is the purging of the data from 12 months ago ... I'm worried that
thiswill be a slow and expensive operation.  

Well, you need to emulate this feature on postgresql, as pointed out already.
It is possible in postgresql but is not provided out of the box.

Here is what you can do.

* Create a empty base table.
* Create a trigger the directs the select on timestamps accordingly
* Write a schedule to create a new table at the start of every month and update
the trigger accordingly
* Create and maintain rules to insert into base table depending upon the
timestamp.

This is just a rough idea.

There might be fair amount of work to get this working but surely it is not
imposible.

HTH

Bye
 Shridhar

--
Justice, n.:    A decision in your favor.


Re: Monthly table partitioning for fast purges?

From
Ron Johnson
Date:
On Sun, 2003-08-03 at 06:31, Shridhar Daithankar wrote:
> On 1 Aug 2003 at 13:46, Roger Hand wrote:
[snip]
> Here is what you can do.
>
> * Create a empty base table.
> * Create a trigger the directs the select on timestamps accordingly
> * Write a schedule to create a new table at the start of every month and update
> the trigger accordingly
> * Create and maintain rules to insert into base table depending upon the
> timestamp.
>
> This is just a rough idea.
>
> There might be fair amount of work to get this working but surely it is not
> imposible.

And you get a big performance hit when all those records are moved.

Partitioning "should" be put on the TODO list soon after tablespaces
(or DBA-defined directories) is implemented.

--
+-----------------------------------------------------------------+
| Ron Johnson, Jr.        Home: ron.l.johnson@cox.net             |
| Jefferson, LA  USA                                              |
|                                                                 |
| "I'm not a vegetarian because I love animals, I'm a vegetarian  |
|  because I hate vegetables!"                                    |
|    unknown                                                      |
+-----------------------------------------------------------------+



Re: Monthly table partitioning for fast purges?

From
psql-mail@freeuk.com
Date:
I am looking at ways to speed up queries, the most common way by for
queries to be constrianed is by date range. I have indexed the date
column. Queries are still slower than i would like.

Would there be any performance increase for these types of queries if
the tables were split by month as described by Shridhar (method1) so
only the required tables were loaded from disk?

Will there be any performance increase if table partitioning is
implemented?

If method1 is likely to yield a performance increase, will there be a
large hit for other types
of queries which require all the data?

I'd be happy to post the EXPLAIN ANALYZE output for a typical query if
more info is needed.

The date column is of type timestamp (and so goes right down to seconds)
, most user queries are only concerned about whole days without the
times, (but hte time data is required for other queries) can i do
something with an index of the timestamps cast to dates? and then cast
the queries to dates too?

Thanks!

Ron Johnson Wrote:
> On Sun, 2003-08-03 at 06:31, Shridhar Daithankar wrote:
> > On 1 Aug 2003 at 13:46, Roger Hand wrote:
> [snip]
> > Here is what you can do.
> >
> > * Create a empty base table.
> > * Create a trigger the directs the select on timestamps accordingly

> > * Write a schedule to create a new table at the start of every
month and update
> > the trigger accordingly
> > * Create and maintain rules to insert into base table depending
upon the
> > timestamp.
> >
> > This is just a rough idea.
> >
> > There might be fair amount of work to get this working but surely
it is not
> > imposible.
>
> And you get a big performance hit when all those records are moved.
>
> Partitioning "should" be put on the TODO list soon after tablespaces
> (or DBA-defined directories) is implemented.
>

--

Re: Monthly table partitioning for fast purges?

From
Benjamin Jury
Date:
You could create an index on the function date(), which strips the time
information.

> -----Original Message-----
> From: psql-mail@freeuk.com [mailto:psql-mail@freeuk.com]
> Sent: 04 August 2003 14:01
> To: PgSQL General ML
> Subject: Re: [GENERAL] Monthly table partitioning for fast purges?
>
>
> I am looking at ways to speed up queries, the most common way by for
> queries to be constrianed is by date range. I have indexed the date
> column. Queries are still slower than i would like.

...

> The date column is of type timestamp (and so goes right down
> to seconds)
> , most user queries are only concerned about whole days without the
> times, (but hte time data is required for other queries) can i do
> something with an index of the timestamps cast to dates? and
> then cast
> the queries to dates too?

Re: Monthly table partitioning for fast purges?

From
Jan Poslusny
Date:
Did you think about cluster on index ?

psql-mail@freeuk.com wrote:

>I am looking at ways to speed up queries, the most common way by for
>queries to be constrianed is by date range. I have indexed the date
>column. Queries are still slower than i would like.
>
>Would there be any performance increase for these types of queries if
>the tables were split by month as described by Shridhar (method1) so
>only the required tables were loaded from disk?
>
>Will there be any performance increase if table partitioning is
>implemented?
>
>If method1 is likely to yield a performance increase, will there be a
>large hit for other types
>of queries which require all the data?
>
>I'd be happy to post the EXPLAIN ANALYZE output for a typical query if
>more info is needed.
>
>The date column is of type timestamp (and so goes right down to seconds)
>, most user queries are only concerned about whole days without the
>times, (but hte time data is required for other queries) can i do
>something with an index of the timestamps cast to dates? and then cast
>the queries to dates too?
>
>Thanks!
>
>Ron Johnson Wrote:
>
>
>>On Sun, 2003-08-03 at 06:31, Shridhar Daithankar wrote:
>>
>>
>>>On 1 Aug 2003 at 13:46, Roger Hand wrote:
>>>
>>>
>>[snip]
>>
>>
>>>Here is what you can do.
>>>
>>>* Create a empty base table.
>>>* Create a trigger the directs the select on timestamps accordingly
>>>
>>>
>
>
>
>>>* Write a schedule to create a new table at the start of every
>>>
>>>
>month and update
>
>
>>>the trigger accordingly
>>>* Create and maintain rules to insert into base table depending
>>>
>>>
>upon the
>
>
>>>timestamp.
>>>
>>>This is just a rough idea.
>>>
>>>There might be fair amount of work to get this working but surely
>>>
>>>
>it is not
>
>
>>>imposible.
>>>
>>>
>>And you get a big performance hit when all those records are moved.
>>
>>Partitioning "should" be put on the TODO list soon after tablespaces
>>(or DBA-defined directories) is implemented.
>>
>>
>>
>
>
>


Re: Monthly table partitioning for fast purges?

From
"scott.marlowe"
Date:
On Fri, 1 Aug 2003, Roger Hand wrote:

> We are moving an application from Oracle 8i to Postgres and I've run
> into a problem attempting to duplicate a feature we currently use.
>
> In Oracle you can divide a table into partitions. We use this feature
> to break up the data by month. Each month we store several tens of
> millions of rows in a particular table, and each month we drop the
> partition that's a year old. In other words, we always keep the last 12
> months of data (12 partitions). This is clean and fast. Since the
> partition is by a timestamp column, it also gives us a certain amount of
> automatic indexing.
>
> Postgres doesn't support table partitions (correct me if I'm wrong!) so
> the only option appears to be to dump everything into one big table.
> What I'm worried about is the purging of the data from 12 months ago
> ... I'm worried that this will be a slow and expensive operation.

There are two approaches I can think of, and you should test each one to
see how it holds up to your usage.

Approach the first:  Using scripts, create a table for each month.  Create
a view that combines all of these months.  When a month goes out of date,
simply remove it from the view.  Deleting the month can be done at your
leisure, as it only saves disk space at this point, but since it isn't in
the view, it doesn't slow you down to keep them.

Approach the second:  Use partial indexes to make it look like the table
is partitioned.  I.e. every month create a new partial index like:

create index on bigoldtable (datefield) where datefield >=1stofmonth and
datefield<=lastofmonth.

Then include the "where date >=firstofmonth AND date <= lastofmonth

This should then hit the partial index, which will be small compared to
the master table with all the rows, or the main index, which will index
all fields.


Re: Monthly table partitioning for fast purges?

From
Ron Johnson
Date:
On Mon, 2003-08-04 at 09:02, Benjamin Jury wrote:
> You could create an index on the function date(), which strips the time
> information.

How much of a hit would there be if he had separate "monthly tables"
and UNIONed them?

> > -----Original Message-----
> > From: psql-mail@freeuk.com [mailto:psql-mail@freeuk.com]
> > Sent: 04 August 2003 14:01
> > To: PgSQL General ML
> > Subject: Re: [GENERAL] Monthly table partitioning for fast purges?
> >
> >
> > I am looking at ways to speed up queries, the most common way by for
> > queries to be constrianed is by date range. I have indexed the date
> > column. Queries are still slower than i would like.
>
> ...
>
> > The date column is of type timestamp (and so goes right down
> > to seconds)
> > , most user queries are only concerned about whole days without the
> > times, (but hte time data is required for other queries) can i do
> > something with an index of the timestamps cast to dates? and
> > then cast
> > the queries to dates too?

--
+-----------------------------------------------------------------+
| Ron Johnson, Jr.        Home: ron.l.johnson@cox.net             |
| Jefferson, LA  USA                                              |
|                                                                 |
| "I'm not a vegetarian because I love animals, I'm a vegetarian  |
|  because I hate vegetables!"                                    |
|    unknown                                                      |
+-----------------------------------------------------------------+



Re: Monthly table partitioning for fast purges?

From
Jean-Luc Lachance
Date:
I second that.
I have discussed adding partitioning tables almost a year ago...
No need to partition a functional index or anything like that.
Just partition on a specific field.


Ron Johnson wrote:
>[...]
> Partitioning "should" be put on the TODO list soon after tablespaces
> (or DBA-defined directories) is implemented.
>
> --
> +-----------------------------------------------------------------+
> | Ron Johnson, Jr.        Home: ron.l.johnson@cox.net             |
> | Jefferson, LA  USA                                              |
> |                                                                 |
> | "I'm not a vegetarian because I love animals, I'm a vegetarian  |
> |  because I hate vegetables!"                                    |
> |    unknown                                                      |
> +-----------------------------------------------------------------+
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Re: Monthly table partitioning for fast purges?

From
Ron Johnson
Date:
On Tue, 2003-08-05 at 10:28, Jean-Luc Lachance wrote:
> I second that.
> I have discussed adding partitioning tables almost a year ago...
> No need to partition a functional index or anything like that.
> Just partition on a specific field.

Well.....

If you're going to partition a *huge* table by, say, a date field,
and there's also an index on that date field, partitioning the
index on that same field may have some advantages.

> Ron Johnson wrote:
> >[...]
> > Partitioning "should" be put on the TODO list soon after tablespaces
> > (or DBA-defined directories) is implemented.

--
+-----------------------------------------------------------------+
| Ron Johnson, Jr.        Home: ron.l.johnson@cox.net             |
| Jefferson, LA  USA                                              |
|                                                                 |
| "I'm not a vegetarian because I love animals, I'm a vegetarian  |
|  because I hate vegetables!"                                    |
|    unknown                                                      |
+-----------------------------------------------------------------+