Thread: questions about CLUSTER

questions about CLUSTER

From
Douglas J Hunley
Date:
After reviewing http://www.postgresql.org/docs/8.3/static/sql-cluster.html a
couple of times, I have some questions:
1) it says to run analyze after doing a cluster. i'm assuming autovacuum will
take care of this? or should i go ahead and do the analyze 'now' instead of
waiting?
2) is there any internal data in the db that would allow me to
programmatically determine which tables would benefit from being clustered?
3) for that matter, is there info to allow me to determine which index it
should be clustered on in cases where the table has more than one index?
4) for tables with >1 indexes, does clustering on one index negatively impact
queries that use the other indexes?
5) is it better to cluster on a compound index (index on lastnamefirstname) or
on the underlying index (index on lastname)?

tia
--
Douglas J Hunley (doug at hunley.homeip.net) - Linux User #174778
http://doug.hunley.homeip.net

Everything takes twice as long as you plan for and produces results about half
as good as you hoped.

Re: questions about CLUSTER

From
Bill Moran
Date:
In response to Douglas J Hunley <doug@hunley.homeip.net>:

> After reviewing http://www.postgresql.org/docs/8.3/static/sql-cluster.html a
> couple of times, I have some questions:
> 1) it says to run analyze after doing a cluster. i'm assuming autovacuum will
> take care of this? or should i go ahead and do the analyze 'now' instead of
> waiting?

It's always a good idea to analyze after major DB operations.  Autovacuum
only runs so often.  Also, it won't hurt anything, so why risk not doing
it?

> 2) is there any internal data in the db that would allow me to
> programmatically determine which tables would benefit from being clustered?
> 3) for that matter, is there info to allow me to determine which index it
> should be clustered on in cases where the table has more than one index?

The pg_stat_user_indexes table keeps stats on how often the index is used.
Indexes that are used frequently are candidates for clustering.

> 4) for tables with >1 indexes, does clustering on one index negatively impact
> queries that use the other indexes?

Not likely.  Clustering only really helps performance if you have an index
that is used to gather ranges of data.  For example, if you frequently
do things like SELECT * FROM log WHERE logdate > 'somedate" and < 'somedate,
you might benefit from clustering on logdate.

But it doesn't really do much if you're only ever pulling one record at a
time.  It's the kind of thing that you really need to experiment with to
understand whether it will have a worthwhile performance impact on your
data and your workload.  I doubt if there's any pat answer.

> 5) is it better to cluster on a compound index (index on lastnamefirstname) or
> on the underlying index (index on lastname)?

If cluster helps you at all, it's going to help if you have an index that's
frequently used to fetch ranges of data.  Whether that index is compound or
not isn't likely to factor in.

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

Re: questions about CLUSTER

From
Douglas J Hunley
Date:
On Wednesday 27 February 2008 12:40:57 Bill Moran wrote:
> In response to Douglas J Hunley <doug@hunley.homeip.net>:
> > After reviewing
> > http://www.postgresql.org/docs/8.3/static/sql-cluster.html a couple of
> > times, I have some questions:
> > 1) it says to run analyze after doing a cluster. i'm assuming autovacuum
> > will take care of this? or should i go ahead and do the analyze 'now'
> > instead of waiting?

> It's always a good idea to analyze after major DB operations.  Autovacuum
> only runs so often.  Also, it won't hurt anything, so why risk not doing
> it?

being overly-cautious. i was concerned about both autovac and me doing
analyzes over each other

>
> > 2) is there any internal data in the db that would allow me to
> > programmatically determine which tables would benefit from being
> > clustered? 3) for that matter, is there info to allow me to determine
> > which index it should be clustered on in cases where the table has more
> > than one index?
>
> The pg_stat_user_indexes table keeps stats on how often the index is used.
> Indexes that are used frequently are candidates for clustering.

I had just started looking at this actually.

>
> > 4) for tables with >1 indexes, does clustering on one index negatively
> > impact queries that use the other indexes?
>
> Not likely.  Clustering only really helps performance if you have an index
> that is used to gather ranges of data.  For example, if you frequently
> do things like SELECT * FROM log WHERE logdate > 'somedate" and <
> 'somedate, you might benefit from clustering on logdate.
>
> But it doesn't really do much if you're only ever pulling one record at a
> time.  It's the kind of thing that you really need to experiment with to
> understand whether it will have a worthwhile performance impact on your
> data and your workload.  I doubt if there's any pat answer.

makes sense.

>
> > 5) is it better to cluster on a compound index (index on
> > lastnamefirstname) or on the underlying index (index on lastname)?
>
> If cluster helps you at all, it's going to help if you have an index that's
> frequently used to fetch ranges of data.  Whether that index is compound or
> not isn't likely to factor in.

understood. i didn't really think it would matter, but its easier to ask than
to screw up performance for existing customers :)

--
Douglas J Hunley (doug at hunley.homeip.net) - Linux User #174778
http://doug.hunley.homeip.net

If a turtle doesn't have a shell, is he homeless or naked?

Re: questions about CLUSTER

From
Douglas J Hunley
Date:
On Wednesday 27 February 2008 13:35:16 Douglas J Hunley wrote:
> > > 2) is there any internal data in the db that would allow me to
> > > programmatically determine which tables would benefit from being
> > > clustered? 3) for that matter, is there info to allow me to determine
> > > which index it should be clustered on in cases where the table has more
> > > than one index?
> >
> > The pg_stat_user_indexes table keeps stats on how often the index is
> > used. Indexes that are used frequently are candidates for clustering.
>
> I had just started looking at this actually.

ok, so for a follow-on, should I be more concerned with idx_scan,
idx_tup_read, or idx_tup_fetch when determining which indexes are 'good'
candidates?

again, tia. i feel like such a noob around here :)

--
Douglas J Hunley (doug at hunley.homeip.net) - Linux User #174778
http://doug.hunley.homeip.net

<kwall> I don't get paid. I just get a tip as the money passes through my
hands on its way from my employer to my debtors.

Re: questions about CLUSTER

From
Bill Moran
Date:
In response to Douglas J Hunley <doug@hunley.homeip.net>:

> On Wednesday 27 February 2008 13:35:16 Douglas J Hunley wrote:
> > > > 2) is there any internal data in the db that would allow me to
> > > > programmatically determine which tables would benefit from being
> > > > clustered? 3) for that matter, is there info to allow me to determine
> > > > which index it should be clustered on in cases where the table has more
> > > > than one index?
> > >
> > > The pg_stat_user_indexes table keeps stats on how often the index is
> > > used. Indexes that are used frequently are candidates for clustering.
> >
> > I had just started looking at this actually.
>
> ok, so for a follow-on, should I be more concerned with idx_scan,
> idx_tup_read, or idx_tup_fetch when determining which indexes are 'good'
> candidates?

Again, not an easy question to answer, as it's probably different for
different people.

idx_scan is the count of how many times the index was used.
idx_tup_read and idx_tup_fetch are counts of how much data has been
accessed by using the index.

This part of the docs has more:
http://www.postgresql.org/docs/8.2/static/monitoring-stats.html

So, you'll probably have to watch all of those if you want to determine
when to automate clustering operations.

Personally, if I were you, I'd set up a test box and make sure
clustering makes enough of a difference to be doing all of this work.

> again, tia. i feel like such a noob around here :)

Bah ... we all start out as noobs.  Just don't go googling for my posts
from years back, it's embarrassing ...

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023