Re: A more general approach (Re: Dataarchiving/warehousing idea) - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: A more general approach (Re: Dataarchiving/warehousing idea)
Date
Msg-id 1170335357.3681.574.camel@silverbirch.site
Whole thread Raw
In response to A more general approach (Re: Data archiving/warehousing idea)  (Hannu Krosing <hannu@skype.net>)
List pgsql-hackers
On Thu, 2007-02-01 at 14:38 +0200, Hannu Krosing wrote:
> Ühel kenal päeval, N, 2007-02-01 kell 13:24, kirjutas Gavin Sherry:
>
> > A different approach discussed earlier involves greatly restricting the
> > way in which the table is used. This table could only be written to if an
> > exclusive lock is held; on error or ABORT, the table is truncated.
> >
> > The problem is that a lot of this looks like a hack and I haven't seen a
> > very clean approach which has gone beyond basic brain dump.
>
> A more radical variation of the "restricted-use archive table" approach
> is storing all tuple visibility info in a separate file.
>
> At first it seems to just add overhead, but for lots (most ? ) usecases
> the separately stored visibility should be highly compressible, so for
> example for bulk-loaded tables you could end up with one bit per page
> saying that all tuples on this page are visible.
>
> Also this could be used to speed up vacuums, as only the visibility
> table needs to be scanned duting phase 1 of vacuum, and so tables with
> localised/moving hotspots can be vacuumed withoutd scanning lots of
> static data.
>
> Also, storing the whole visibility info, but in a separate heap, lifts
> all restrictions of the "restricted-use archive table" variant.
>
> And the compression of visibility info (mostly replacing per-tuple info
> with per-page info) can be carried out by a separate vacuum-like
> process.
>
> And it has many of the benefits of static/RO tables, like space saving
> and index-only queries. Index-only will of course need to get the
> visibility info from visibility heap, but if it is mostly heavily
> compressed, it will be a lot cheaper than random access to data heap.

I like that idea, as a non-default option, since in-line visibility is
important for OLTP applications.

This idea is sufficiently flexible to allow a range of use cases without
requiring a complete removal of functionality. Read-mostly is an
important use case for very large databases.

This is essentially the same thing as PhantomCommandId, just for the
whole tuple header. It's something that would go on pg_class easily,
just as WITH/WITHOUT OIDS has done.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




pgsql-hackers by date:

Previous
From: Hannu Krosing
Date:
Subject: Re: A more general approach (Re: Data archiving/warehousing idea)
Next
From: Bruce Momjian
Date:
Subject: Re: "May", "can", "might"