Home > mailing lists

Re: CLUSTER and clustered indices - Mailing list pgsql-hackers

From	Simon Riggs
Subject	Re: CLUSTER and clustered indices
Date	November 17, 2005 17:48:48
Msg-id	1132264123.4959.283.camel@localhost.localdomain Whole thread Raw
In response to	Re: CLUSTER and clustered indices (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: CLUSTER and clustered indices
List	pgsql-hackers

Tree view

On Thu, 2005-11-17 at 10:58 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > When a table has been CLUSTERed on a particular index AND that index
> > values is monotonically increasing, then it would be a bad move to use
> > blocks from the FSM since this would tend to destroy the natural
> > clustering sequence.
> 
> By the time there are any blocks in FSM to take, the original clean
> index page sequence is doubtless history.  The pure-increasing-key
> scenario you are thinking of never will have any FSM entries, so it's
> moot.

Well, just because its a pure-increasing-key doesn't imply rows don't
get deleted. Look at the NewOrder table in DBT-2/TPC-C. Order numbers
increase, but mostly they get delivered in the end. 

In the case I describe, tuples are added always to the rightmost edge of
the index, so it seems worthwhile to always add tuples to the top of the
heap only so that the order is consistent.

On Thu, 2005-11-17 at 12:45 -0300, Alvaro Herrera wrote:

> That's a nice idea, but what's the cost?  You will have to check every
> insert: does the table has indexes?  Is any of them clustered?  Is the
> clustered index attached to a sequence?  It seems quite an expensive
> check to be making.

I'd make the check when the relcache is loaded and store it there as a
boolean. heap_insert already has a boolean on it for use_fsm, so it
would transfer very simply at insert time with almost no overhead.

The use case exists and the technique is low overhead, but the main
question is: Does anybody think this behaviour would be beneficial for
them? (I'm actually in two minds myself, but once the idea has arisen,
it seems sensible to discuss this for everybody's sake).

The trade-off is a table that keeps growing in size, even though you
VACUUM it, with the benefit that the clustering is maintained. 

So how would you maintain it? Looks like you'd still have to use regular
CLUSTER commands, but at least it would stay good in between.

Best Regards, Simon Riggs

pgsql-hackers by date:

From: Simon Riggs
Date: 17 November 2005, 17:47:06
Subject: Re: Improving count(*)

From: Martijn van Oosterhout
Date: 17 November 2005, 18:12:56
Subject: Re: Improving count(*)

Re: CLUSTER and clustered indices - Mailing list pgsql-hackers

Previous

Next