Re: Data archiving/warehousing idea - Mailing list pgsql-hackers

From Gavin Sherry
Subject Re: Data archiving/warehousing idea
Date
Msg-id Pine.LNX.4.58.0702011307380.12097@linuxworld.com.au
Whole thread Raw
In response to Data archiving/warehousing idea  (Chris Dunlop <chris@onthe.net.au>)
Responses Re: Data archiving/warehousing idea  (Chris Dunlop <chris@onthe.net.au>)
A more general approach (Re: Data archiving/warehousing idea)  (Hannu Krosing <hannu@skype.net>)
List pgsql-hackers
On Thu, 1 Feb 2007, Chris Dunlop wrote:

> G'day hackers,

G'Day Chris,

> already - I couldn't find anything in the mail archives, but
> that doesn't mean it's not there...)

There has been a lot of discussion about this kind of thing over the
years.

> The main idea is that, there might be space utilisation and
> performance advantages if postgres had "hard" read-only tables,
> i.e. tables which were guaranteed (by postgres) to never have
> their data changed (insert/update/delete).
>
> This could potentially save disk space by allowing "book
> keeping" elements in the page/tuple headers to be removed, e.g.
> visibility information etc.  Also, some indexes could
> potentially be packed tighter if we know the data will never
> change (of course this is already available using the fillfactor
> control).

Well, there is also CPU overhead doing MVCC but there are a few
fundamental problems that must be overcome. The most significant is that
no useful table is always read only, otherwise you could never load it.
What do we do in the presence of a failure during the load or a user
issued ABORT? I guess we'd truncate the table... What about replay after a
crash?

Another way of looking at it is, we use the 'bookkeeping' information in
the tuple header for concurrency and for handling the abortion of the
transaction.

> The idea would be to introduce a statement something like:
>
>   ALTER TABLE foo SET ARCHIVE;

I'd not thought of that approach. There are two problems: some archive
tables are so large that loading them and then reprocessing them isn't
appealing. Secondly, we'd be rewriting the binary structure of the table
and this does not suit the non-overwriting nature of Postgres's storage
system.

A different approach discussed earlier involves greatly restricting the
way in which the table is used. This table could only be written to if an
exclusive lock is held; on error or ABORT, the table is truncated.

The problem is that a lot of this looks like a hack and I haven't seen a
very clean approach which has gone beyond basic brain dump.

Thanks,

Gavin


pgsql-hackers by date:

Previous
From: Chris Dunlop
Date:
Subject: Data archiving/warehousing idea
Next
From: Bruce Momjian
Date:
Subject: Re: DROP FUNCTION failure: cache lookup failed for relation X