Re: possible TODO: read-only tables, select from indexes - Mailing list pgsql-hackers

From Hannu Krosing
Subject Re: possible TODO: read-only tables, select from indexes
Date
Msg-id 1114426460.5848.1.camel@fuji.krosing.net
Whole thread Raw
In response to Re: possible TODO: read-only tables, select from indexes only.  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
I send it now the 3rd time because I'm not sure my mail works, as it has not appeared on pgsql-hackers


On L, 2005-04-23 at 18:27 -0400, Tom Lane wrote:
> Ron Mayer <rm_pg@cheapcomplexdevices.com> writes:
> > Is this a fair summary of the potential benefits of READ-ONLY
> > tables? (from both this thread and the archives):
> 
> >   1.  Index-only scans are made possible fairly easily because
> >       you wouldn't need to check the heap for visibility.
> 
> >   2.  Simple tables can be much smaller since you don't need
> >       most of the HeapTupleHeaderData.
> 
> What you are talking about is not a "read only" table, it is a
> "non-MVCC" table.  This is a much greater assault on the fundamental
> semantics of Postgres than it's being painted to be in this thread.

That's why I proposed a much less ambitious, and much more "low-hanging-
fruit-first" compliant thing when I satrted the thread - a simple two-
flag system to mark a relation as safe to use for index-only queries.

Main use of that would be in partiotioned table setups in data
warehouses, where older partitions can be switched to read-only state.

and such all-or-nothing scheme would also make it much easier to
estimate cost of index(only)scan.

> In particular, how is such a table going to come into being?  You'd
> not be able to just flip the READ-ONLY flag on and off.

original idea (with HeapTupleHeader intact) was to first mark the table
as READ-ONLY, and then either run VACUUM, or preferrably VACUUM-FULL +
REINDEX TABLE on it, so that all index tuples point to valid and visible
tuples. 
A check must be made to make sure, that all transactions started before
setting the READ-ONLY flag have finished before starting VACUUM or
REINDEX.

> (The notion of having tuples in the system that don't have the standard
> HeapTupleHeader is not as easy to implement as you might think, either,
> because that data structure is *everywhere*.)

My impression was, that HeapTupleHeader is usually not carried with
fields after doing the initial visibility checks ? 

But if it is needed, then it should be added when generating tuples from
index scan, preferrably in such a way, that non-SELECT queries get these
tuples with Xids set in a way which prevent them from being modified.

> While I don't say it's impossible to do, I do think that the work and
> semantic ugliness involved would outweigh the possible benefits.  In
> particular, there are other, more transparent ways of doing #1.

If HeapTupleHeader is essential for pg, than #1 should generate fake
HeapTupleHeader with some constant values (xmin=FrozenTransactionId,
xmax=MAXINT).


BTW, do we really store tableoid column in heap tuples or is it added
somewhere on the way from heap ? 

-- 
Hannu Krosing <hannu@skype.net>



pgsql-hackers by date:

Previous
From: Hannu Krosing
Date:
Subject: How to make lazy VACUUM of one table run in several transactions ?
Next
From: "Dave Held"
Date:
Subject: Re: [PERFORM] Bad n_distinct estimation; hacks suggested?