Re: ToDo List Item - System Table Index Clustering - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: ToDo List Item - System Table Index Clustering
Date
Msg-id 201101182136.p0ILaPa21058@momjian.us
Whole thread Raw
In response to Re: ToDo List Item - System Table Index Clustering  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: ToDo List Item - System Table Index Clustering  ("Simone Aiken" <saiken@quietlyCompetent.com>)
List pgsql-hackers
Robert Haas wrote:
> On Tue, Jan 18, 2011 at 8:35 AM, Alvaro Herrera
> <alvherre@commandprompt.com> wrote:
> > Excerpts from Simone Aiken's message of dom ene 16 02:11:26 -0300 2011:
> >>
> >> Hello Postgres Hackers,
> >>
> >> In reference to this todo item about clustering system table indexes,
> >> ( http://archives.postgresql.org/pgsql-hackers/2004-05/msg00989.php )
> >> I have been studying the system tables to see which would benefit ?from
> >> clustering. ?I have some index suggestions and a question if you have a
> >> moment.
> >
> > Wow, this is really old stuff. ?I don't know if this is really of any
> > benefit, given that these catalogs are loaded into syscaches anyway.
> > Furthermore, if you cluster at initdb time, they will soon lose the
> > ordering, given that updates move tuples around and inserts put them
> > anywhere. ?So you'd need the catalogs to be re-clustered once in a
> > while, and I don't see how you'd do that (except by asking the user to
> > do it, which doesn't sound so great).
> 
> The idea of the TODO seems to have been to set the default clustering
> to something reasonable.  That doesn't necessarily seem like a bad
> idea even if we can't automatically maintain the cluster order, but
> there's some question in my mind whether we'd get any measurable
> benefit from the clustering.  Even on a database with a gigantic
> number of tables, it seems likely that the relevant system catalogs
> will stay fully cached and, as you point out, the system caches will
> further blunt the impact of any work in this area.  I think the first
> thing to do would be to try to come up with a reproducible test case
> where clustering the tables improves performance.  If we can't, that
> might mean it's time to remove this TODO.

I think CLUSTER is a win when you are looking up multiple rows in the
same table, either using a non-unique index or a range search.  What
places do such lookups?  Having them all in adjacent pages would be a
win --- single-row lookups are usually not.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: test_fsync label adjustments
Next
From: Tom Lane
Date:
Subject: Re: Fixing GIN for empty/null/full-scan cases