Re: A more general approach (Re: Data archiving/warehousing idea) - Mailing list pgsql-hackers

From Hannu Krosing
Subject Re: A more general approach (Re: Data archiving/warehousing idea)
Date
Msg-id 1170334311.3226.12.camel@localhost.localdomain
Whole thread Raw
In response to A more general approach (Re: Data archiving/warehousing idea)  (Hannu Krosing <hannu@skype.net>)
List pgsql-hackers
Ühel kenal päeval, N, 2007-02-01 kell 14:38, kirjutas Hannu Krosing:
> Ühel kenal päeval, N, 2007-02-01 kell 13:24, kirjutas Gavin Sherry:
> 
> > A different approach discussed earlier involves greatly restricting the
> > way in which the table is used. This table could only be written to if an
> > exclusive lock is held; on error or ABORT, the table is truncated.
> > 
> > The problem is that a lot of this looks like a hack and I haven't seen a
> > very clean approach which has gone beyond basic brain dump.
> 
> A more radical variation of the "restricted-use archive table" approach
> is storing all tuple visibility info in a separate file.
> 
> At first it seems to just add overhead, but for lots (most ? ) usecases
> the separately stored visibility should be highly compressible, so for
> example for bulk-loaded tables you could end up with one bit per page
> saying that all tuples on this page are visible.
> 
> Also this could be used to speed up vacuums, as only the visibility
> table needs to be scanned duting phase 1 of vacuum, and so tables with
> localised/moving hotspots can be vacuumed withoutd scanning lots of
> static data.
> 
> Also, storing the whole visibility info, but in a separate heap, lifts
> all restrictions of the "restricted-use archive table" variant. 
> 
> And the compression of visibility info (mostly replacing per-tuple info
> with per-page info) can be carried out by a separate vacuum-like
> process.
> 
> And it has many of the benefits of static/RO tables, like space saving
> and index-only queries. Index-only will of course need to get the
> visibility info from visibility heap, but if it is mostly heavily
> compressed, it will be a lot cheaper than random access to data heap.

For tables with fixed-width tuples it can probably be extended to
support vertical fragmentation as well, to get DWH benefits similar to
http://monetdb.cwi.nl/ .

-- 
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com





pgsql-hackers by date:

Previous
From: Hannu Krosing
Date:
Subject: A more general approach (Re: Data archiving/warehousing idea)
Next
From: "Simon Riggs"
Date:
Subject: Re: A more general approach (Re: Dataarchiving/warehousing idea)