Home > mailing lists

Re: Inheritence versus delete from - Mailing list pgsql-performance

From	John Arbash Meinel
Subject	Re: Inheritence versus delete from
Date	March 1, 2005 16:41:50
Msg-id	42249B44.6020009@arbash-meinel.com Whole thread Raw
In response to	Re: Inheritence versus delete from (Sven Willenberger <sven@dmv.com>)
List	pgsql-performance

Tree view

Sven Willenberger wrote:

>On Tue, 2005-03-01 at 09:48 -0600, John Arbash Meinel wrote:
>
>
>>Sven Willenberger wrote:
>>
>>
>>
>>>Trying to determine the best overall approach for the following
>>>scenario:
>>>
>>>Each month our primary table accumulates some 30 million rows (which
>>>could very well hit 60+ million rows per month by year's end). Basically
>>>there will end up being a lot of historical data with little value
>>>beyond archival.
>>>
>>>The question arises then as the best approach of which I have enumerated
>>>three:
>>>
>>>
>>>
>>I just thought of another possibility. You could create each table
>>month-by-month, and then use a view to combine them, and possibly a rule
>>to keep things clean.
>>
>>So you would do something like:
>>
>>I will assume you already have the data in one big table to show the
>>easiest way to create the small tables.
>>
>>create table tblname-2005-01 as select * from orig_tbl where day >=
>>'2005-01-01' and day < '2005-02-01';
>>create table tblname-2005-02 as select * from orig_tbl where day >=
>>'2005-02-01' and day < '2005-03-01';
>>create table tblname-2005-03 as select * from orig_tbl where day >=
>>'2005-03-01' and day < '2005-04-01';
>>-- create appropriate indicies, rules, constraints on these tables
>>
>>Then you create a view which includes all of these tables.
>>
>>create or replace view tblname as
>>    select * from tblname-2005-01
>>    union all select * from tblname-2005-02
>>    union all select * from tblname-2005-03
>>;
>>
>>Then create insert and update rules which fixe which table gets the new
>>data.
>>
>>create rule up_tblname as on update to tblname do instead
>>    update tblname-2005-03 set
>>       col1 = NEW.col1,
>>       col2 = NEW.col2,
>>       ...
>>    where id = NEW.id;
>>-- This assumes that you have a unique id on your tables. This is just
>>whatever your
>>-- primary key is, so it should be a decent assumption.
>>
>>create rule ins_tblname as on insert to tblname do instead
>>    insert into tblname-2005-03 (col1, col2, ...)
>>    values (new.col1, new.col2, ...);
>>
>>Now the downside of this method, is that every month you need to create
>>a new table, and then update the views and the rules. The update rules
>>are pretty straightforward, though.
>>
>>The nice thing is that it keeps your data partitioned, and you don't
>>ever have a large select/delete step. You probably will want a small one
>>each month to keep the data exactly aligned by month. You don't really
>>have to have exact alignments, but as humans, we tend to like that stuff. :)
>>
>>Probably this is more overhead than you would like to do. Especially if
>>you know that you can get away with method 2 (keep 1 big table, and just
>>remove old rows out of it every month.)
>>
>>But this method means that all of your data stays live, but queries with
>>appropriate restrictions should stay fast. You also have the ability
>>(with v8.0) to move the individual tables onto separate disks.
>>
>>One more time, though, if you can get away with removing old data and
>>just archiving it, do so. But if you want to keep the data live, there
>>are a couple of alternatives.
>>
>>
>>
>
>Actually that was the thought behind my using inheritance; when querying
>the <bigtable>, it basically does a union all; also, I think it would be
>quicker to insert directly into the child table (simply by modifying my
>query once a month) rather than the overhead sustained by the rule.
>
>Since the children tables are individual tables, all the benefits you
>cite above still hold.
>
>Thanks for the input on this ... will have to try a couple things to see
>which is most manageable.\
>
>Sven
>
>

You're right, child tables to act like that. I just recall that at least
at one point, postgres didn't handle indexes with child tables very
well. That's more just what someone else ran into, so he could have been
doing something wrong.
I agree, if child tables end up doing a union all, then it is much
easier to maintain. A select against the master table should
automatically get all of the child tables.
It might just be that you need to create a new index on the child table
whenever you create it, and then postgres can use that new index to do
the filtering.

John
=:->

Attachment

signature.asc

pgsql-performance by date:

From: Markus Schaber
Date: 01 March 2005, 16:27:09
Subject: Re: multi billion row tables: possible or insane?

From: John Arbash Meinel
Date: 01 March 2005, 16:45:04
Subject: Re: multi billion row tables: possible or insane?

Re: Inheritence versus delete from - Mailing list pgsql-performance

Attachment

Previous

Next