Thread: What are the benefits of using a clustered index?

What are the benefits of using a clustered index?

From
Mike Christensen
Date:
Hi guys!  This'll hopefully be an easy question for one of the perf guys
like Tom..

I'm wondering what the performance advantage is to using a clustered
index.  Take the following example: I have a table of recipes with a
recipeid, and a table of comments that various users might have entered
about those recipes ("this tastes like crap!", etc) with a commentid and
a FK on recipeid.  Naturally I want to be able to say "give me all the
comments about recipe x".  To me, this seems like the perfect
opportunity to use a cluster.  Inserts are fairly rare and reads need to
be super fast.  If all the comments were clustered by recipeid, all the
data would be right next to each other on disk.  It seems like the query
planner could count on this and make a lot of cool optimizations.  It
could just find the first row with recipe x and traverse the rows until
it got to the next recipe.

However, the thing that really confuses me is I can insert as many rows
as I want, and they're not clustered.  In fact, the only time they'll
cluster is if I run the "CLUSTER RecipeComments" command.  Sure, I could
write a trigger to do this on insert, or I could write a script that
runs every midnight to do this, or what not.  However, if the query
planner can't assume the data is in a certain order on disk, what's the
point of having this at all?

I'm quite sure SQL Server doesn't work this way and I'm not sure about
Oracle.  Can someone enlighten me on the exact benefit of this?  Thanks!!

Mike

Re: What are the benefits of using a clustered index?

From
Craig Ringer
Date:
Mike Christensen wrote:

>  However, if the query
> planner can't assume the data is in a certain order on disk, what's the
> point of having this at all?

One benefit is that it reduces the number of pages that must be read
from disk to retrieve all tuples that have a particular value for the
indexed field. Because the tuples tend to be clustered by (in your case)
recipeid, you'll normally have quite a few relevant tuples on a page and
won't have as much uninteresting and irrelevent data getting read in
along with what you actually want. That'll also improve efficiency of
memory/cache use.

I think it also helps keep the index smaller, since it doesn't need to
refer to as many pages for a given value of interest.

If you set a non-default FILLFACTOR on the table (and proably index) Pg
will leave gaps in the table and I think it will try to insert tuples to
maintain clustering order for smaller indexes and faster index scans.
The downside is that there are holes in the table that may slow down a
sequential scan somewhat.

--
Craig Ringer

Re: What are the benefits of using a clustered index?

From
Alvaro Herrera
Date:
Mike Christensen wrote:

> I'm wondering what the performance advantage is to using a clustered
> index.

In Postgres you don't cluster indexes.  You cluster tables.  It only
means that the table is written from scratch, following the index order.
So what it gives you is locality of access for queries that follow
ranges of that index, nothing more.  It seems very obvious that in this
implementation a new tuple is not going to follow the index order; it'll
just be stored wherever there is free space.  If you run CLUSTER again,
they'll be put in the right place.

(There was a patch to enhance this so that a new insertion would instead
use space closer to where the tuple would be if it followed the order.
But it was only a hint; if there wasn't enough free space in the right
spot, it would be stored elsewhere.  Still, the patch was not
committed.)

> I'm quite sure SQL Server doesn't work this way and I'm not sure about
> Oracle.  Can someone enlighten me on the exact benefit of this?  Thanks!!

Yeah, they use a completely different definition of "clustered index"
from ours.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: What are the benefits of using a clustered index?

From
Simon Riggs
Date:
On Tue, 2009-03-17 at 09:16 -0400, Alvaro Herrera wrote:
> > I'm quite sure SQL Server doesn't work this way and I'm not sure
> about
> > Oracle.  Can someone enlighten me on the exact benefit of this?
> Thanks!!
>
> Yeah, they use a completely different definition of "clustered index"
> from ours.

Hopefully we regard it as a missing feature rather than as a separate
definition. We could cluster the index, we just don't, yet.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


Re: What are the benefits of using a clustered index?

From
Mike Christensen
Date:
I would only like this as a feature if the optimizer can really take advantage of this.  Clustering on every insert or update just for the fun of it won't really give us anything but more expensive writes.

I kinda figured if SQL Server and Oracle have it, they probably take full advantage of it for reads..

Mike

Simon Riggs wrote:
On Tue, 2009-03-17 at 09:16 -0400, Alvaro Herrera wrote: 
I'm quite sure SQL Server doesn't work this way and I'm not sure     
about     
Oracle.  Can someone enlighten me on the exact benefit of this?     
Thanks!!

Yeah, they use a completely different definition of "clustered index"
from ours.   
Hopefully we regard it as a missing feature rather than as a separate
definition. We could cluster the index, we just don't, yet.
 

Re: What are the benefits of using a clustered index?

From
Simon Riggs
Date:
On Tue, 2009-03-17 at 15:26 -0700, Mike Christensen wrote:
> I would only like this as a feature if the optimizer can really take
> advantage of this.  Clustering on every insert or update just for the
> fun of it won't really give us anything but more expensive writes.
>
> I kinda figured if SQL Server and Oracle have it, they probably take
> full advantage of it for reads..

With SQLServer and Oracle the optimization is implicit in the physical
design of the index. The Postgres equivalent would be a grouped index,
not necessarily clustered in the same way.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


Re: What are the benefits of using a clustered index?

From
Scott Ribe
Date:
> Hopefully we regard it as a missing feature rather than as a separate
> definition. We could cluster the index, we just don't, yet.

Wouldn't this require keeping around multiple versions of index pages for
MVCC? Which would create performance degradations elsewhere?

--
Scott Ribe
scott_ribe@killerbytes.com
http://www.killerbytes.com/
(303) 722-0567 voice



Re: What are the benefits of using a clustered index?

From
Simon Riggs
Date:
On Fri, 2009-03-20 at 09:33 -0600, Scott Ribe wrote:
> > Hopefully we regard it as a missing feature rather than as a separate
> > definition. We could cluster the index, we just don't, yet.
>
> Wouldn't this require keeping around multiple versions of index pages for
> MVCC?

No, it wouldn't require that.

Just think one index tuple points to more than one heap row. We would
still need to check visibility on the rows returned to ensure MVCC.

Less index pointers, smaller index.

The trick is: How? But that's a secondary issue to getting it on the
TODO list, which is all I'm suggesting at present.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


Re: What are the benefits of using a clustered index?

From
Scott Ribe
Date:
> Just think one index tuple points to more than one heap row. We would
> still need to check visibility on the rows returned to ensure MVCC.

So you wind up with the heap rows stored in their own tree-like structure
outside the index? OK.

--
Scott Ribe
scott_ribe@killerbytes.com
http://www.killerbytes.com/
(303) 722-0567 voice



Re: What are the benefits of using a clustered index?

From
Gregory Stark
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:

> Just think one index tuple points to more than one heap row.

Could you expand on that? Like, uh, I have no idea what you're saying.

> Less index pointers, smaller index.

Are you talking about Heikkie's grouped-items-index?

> The trick is: How? But that's a secondary issue to getting it on the
> TODO list, which is all I'm suggesting at present.

Well I think we need to be clear enough at least on the "what" if not the
"how". But there's a bit a of a fuzzy line between them I admit.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!