Re: questions about CLUSTER - Mailing list pgsql-performance

From Bill Moran
Subject Re: questions about CLUSTER
Date
Msg-id 20080227124057.9b3937ce.wmoran@collaborativefusion.com
Whole thread Raw
In response to questions about CLUSTER  (Douglas J Hunley <doug@hunley.homeip.net>)
Responses Re: questions about CLUSTER  (Douglas J Hunley <doug@hunley.homeip.net>)
List pgsql-performance
In response to Douglas J Hunley <doug@hunley.homeip.net>:

> After reviewing http://www.postgresql.org/docs/8.3/static/sql-cluster.html a
> couple of times, I have some questions:
> 1) it says to run analyze after doing a cluster. i'm assuming autovacuum will
> take care of this? or should i go ahead and do the analyze 'now' instead of
> waiting?

It's always a good idea to analyze after major DB operations.  Autovacuum
only runs so often.  Also, it won't hurt anything, so why risk not doing
it?

> 2) is there any internal data in the db that would allow me to
> programmatically determine which tables would benefit from being clustered?
> 3) for that matter, is there info to allow me to determine which index it
> should be clustered on in cases where the table has more than one index?

The pg_stat_user_indexes table keeps stats on how often the index is used.
Indexes that are used frequently are candidates for clustering.

> 4) for tables with >1 indexes, does clustering on one index negatively impact
> queries that use the other indexes?

Not likely.  Clustering only really helps performance if you have an index
that is used to gather ranges of data.  For example, if you frequently
do things like SELECT * FROM log WHERE logdate > 'somedate" and < 'somedate,
you might benefit from clustering on logdate.

But it doesn't really do much if you're only ever pulling one record at a
time.  It's the kind of thing that you really need to experiment with to
understand whether it will have a worthwhile performance impact on your
data and your workload.  I doubt if there's any pat answer.

> 5) is it better to cluster on a compound index (index on lastnamefirstname) or
> on the underlying index (index on lastname)?

If cluster helps you at all, it's going to help if you have an index that's
frequently used to fetch ranges of data.  Whether that index is compound or
not isn't likely to factor in.

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

pgsql-performance by date:

Previous
From: "Laurent Raufaste"
Date:
Subject: Re: PG planning randomly ?
Next
From: Douglas J Hunley
Date:
Subject: Re: questions about CLUSTER