Home > mailing lists

Re: Including Snapshot Info with Indexes - Mailing list pgsql-hackers

From	Hannu Krosing
Subject	Re: Including Snapshot Info with Indexes
Date	October 8, 2007 08:51:37
Msg-id	1191844281.8919.14.camel@hannu-laptop Whole thread
In response to	Re: Including Snapshot Info with Indexes ("Heikki Linnakangas" <heikki@enterprisedb.com>)
List	pgsql-hackers

Tree view

Ühel kenal päeval, E, 2007-10-08 kell 11:41, kirjutas Heikki
Linnakangas: 
> The dead space map holds
> visibility information in a condensed form. For index-only-scans, we
> need to know if all tuples on page are are visible to us. If the dead
> space map is designed with index-only-scans in mind, we can store a bit
> there indicating "all tuples on this page are visible to everyone".
> Pages that have that bit set don't need to be visited to check visibility.
> 
> What information exactly is going to be stored in the dead space map is
> still debated. For vacuuming, we need to know which pages contain dead
> tuples that are worth vacuuming, which isn't the same thing as "all
> tuples are visible to everyone", but it's quite close.

I would prefer a separate MVC visibility heap (aka. extended "dead space
map") which would duplicate whole visibility info from heap pages, just
in compressed form. After a few releases with duplicated visibility
info, we could remove it from the data heap.

If the whole visibility info (cmin, cmax, tmin, tmax, flags, (+ size for
DSM uses)) for tuples, were in a separate heap, it would allow for a lot
of on-the-fly compression. for example we could:

* get rid of both tmin and tmax for all completed transactions
* reduce any deleted tuple to just flags
* reduce any tuple produced by aborted transaction to just flags
* reduce any tuple visible to all backends to just flags
* RRL compress (runs of) pages produced by same transaction 
* RRL compress (runs of) pages with all tuples visible
* RRL compress (runs of) pages with all tuples deleted

depending on distribution of Inserts/Updates/Deletes it will make
visibility info a little or a lot smaller than it is currently, greatly
enchancing chances that it stays in cache (even for OLAP loads, because
data for these are usually produced by bulk inserts and thus their
visibility info is  highly compressable)

It also makes VACUUM more efficient, as it's initial scan (find
vacuumable tuples) will need to do lot less IO.

And it allows for more intelligent choices for new tuple placement ,
especially if we want to preserve clustering.

And of course it gives you kind of index-only scans (mostly read index + check in vis.heap)

-------------
Hannu

pgsql-hackers by date:

From: Peter Eisentraut
Date: 08 October 2007, 07:53:45
Subject: Re: proposal casting from XML[] to int[], numeric[], text[]

From: Hannu Krosing
Date: 08 October 2007, 08:58:17
Subject: Another Idea: Try Including snapshot with TOAS (was: Including Snapshot Info with Indexes)

Re: Including Snapshot Info with Indexes - Mailing list pgsql-hackers

Previous

Next