Re: questions about CLUSTER - Mailing list pgsql-performance

From Douglas J Hunley
Subject Re: questions about CLUSTER
Date
Msg-id 200802271335.16397.doug@hunley.homeip.net
Whole thread Raw
In response to Re: questions about CLUSTER  (Bill Moran <wmoran@collaborativefusion.com>)
Responses Re: questions about CLUSTER
List pgsql-performance
On Wednesday 27 February 2008 12:40:57 Bill Moran wrote:
> In response to Douglas J Hunley <doug@hunley.homeip.net>:
> > After reviewing
> > http://www.postgresql.org/docs/8.3/static/sql-cluster.html a couple of
> > times, I have some questions:
> > 1) it says to run analyze after doing a cluster. i'm assuming autovacuum
> > will take care of this? or should i go ahead and do the analyze 'now'
> > instead of waiting?

> It's always a good idea to analyze after major DB operations.  Autovacuum
> only runs so often.  Also, it won't hurt anything, so why risk not doing
> it?

being overly-cautious. i was concerned about both autovac and me doing
analyzes over each other

>
> > 2) is there any internal data in the db that would allow me to
> > programmatically determine which tables would benefit from being
> > clustered? 3) for that matter, is there info to allow me to determine
> > which index it should be clustered on in cases where the table has more
> > than one index?
>
> The pg_stat_user_indexes table keeps stats on how often the index is used.
> Indexes that are used frequently are candidates for clustering.

I had just started looking at this actually.

>
> > 4) for tables with >1 indexes, does clustering on one index negatively
> > impact queries that use the other indexes?
>
> Not likely.  Clustering only really helps performance if you have an index
> that is used to gather ranges of data.  For example, if you frequently
> do things like SELECT * FROM log WHERE logdate > 'somedate" and <
> 'somedate, you might benefit from clustering on logdate.
>
> But it doesn't really do much if you're only ever pulling one record at a
> time.  It's the kind of thing that you really need to experiment with to
> understand whether it will have a worthwhile performance impact on your
> data and your workload.  I doubt if there's any pat answer.

makes sense.

>
> > 5) is it better to cluster on a compound index (index on
> > lastnamefirstname) or on the underlying index (index on lastname)?
>
> If cluster helps you at all, it's going to help if you have an index that's
> frequently used to fetch ranges of data.  Whether that index is compound or
> not isn't likely to factor in.

understood. i didn't really think it would matter, but its easier to ask than
to screw up performance for existing customers :)

--
Douglas J Hunley (doug at hunley.homeip.net) - Linux User #174778
http://doug.hunley.homeip.net

If a turtle doesn't have a shell, is he homeless or naked?

pgsql-performance by date:

Previous
From: Bill Moran
Date:
Subject: Re: questions about CLUSTER
Next
From: Douglas J Hunley
Date:
Subject: Re: questions about CLUSTER