Re: clustered indexes? - Mailing list pgsql-general

From Alvaro Herrera
Subject Re: clustered indexes?
Date
Msg-id Pine.LNX.4.44.0206221241060.2135-100000@cm-lcon-46-187.cm.vtr.net
Whole thread Raw
In response to Re: clustered indexes?  (Curt Sampson <cjs@cynic.net>)
List pgsql-general
Curt Sampson dijo:

> BTW, clustered indexes would be a *really* cool feature to have in
> postgres. I recently saw a query go from 70 seconds down to .6 seconds
> when I clustered the table on the particular column I was selecting a
> value from.

I don't know if the storage manager allows it, but maybe it is possible
to use the free space map to allocate some free space on each page; that
way, anytime a new tuple is added, it is written on the page that
correspond to its clustered value (here you add the overhead of checking
whether a given relation is clustered or not to _every_ relation).

If there is a lot of tuples for a particular value, it should just
allocate free space on the last page that contains tuples of that value.
OTOH, if there are a lot of values with small number of tuples in one
page, it allocates space for all of them at the end of the page.

Another thing would be that if one value has tuples that fit in an
integer number of pages, a page next to it would have to be allocated
for free space also.

Of course, when running out of space for a particular value, some more
space would have to be allocated, moving all the tuples in the rest of
the table.  Looks quite inefficient.  Or if the clustered values do not
have also to be ordered, maybe just the tuples of the next value have to
be moved to the end of the table, freeing all the space they were using
(but this is a really bad idea if the naxt value has a lot of tuples).

Another way to do this would be having some dummy tuples after each
values' real tuples.  When something wants to write a new tuple, it uses
one of the dummy tuples.  Variable size tuples would be a problem, of
course.  Limiting the feature to only fixed-size-tuple-tables simplifies
the problem somewhat.

All this sounds pretty difficult to me, and I wonder whether it's worth
the trouble.  Maybe fixing the problem with the current CLUSTER
implementation (the dropping of indexes and grant permissions) helps
somewhat, but seeing the comments in CVS log from Tom Lane does not
help:

revision 1.63
date: 2001/01/12 01:22:21;  author: tgl;  state: Exp;  lines: +2 -2
Preserve constraints and column defaults during CLUSTER.
Wish they were all this easy ...

--
Alvaro Herrera (<alvherre[a]atentus.com>)
"Hay dos momentos en la vida de un hombre en los que no debería
especular: cuando puede permitírselo y cuando no puede" (Mark Twain)


pgsql-general by date:

Previous
From: "Werner Schalk"
Date:
Subject: Creating a user fails
Next
From: Tina Messmann
Date:
Subject: Re: NetBSD Live CD/PostgreSQL?