Re: CLUSTER and clustered indices - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: CLUSTER and clustered indices
Date
Msg-id 1132264123.4959.283.camel@localhost.localdomain
Whole thread Raw
In response to Re: CLUSTER and clustered indices  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: CLUSTER and clustered indices  (Alvaro Herrera <alvherre@commandprompt.com>)
List pgsql-hackers
On Thu, 2005-11-17 at 10:58 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > When a table has been CLUSTERed on a particular index AND that index
> > values is monotonically increasing, then it would be a bad move to use
> > blocks from the FSM since this would tend to destroy the natural
> > clustering sequence.
> 
> By the time there are any blocks in FSM to take, the original clean
> index page sequence is doubtless history.  The pure-increasing-key
> scenario you are thinking of never will have any FSM entries, so it's
> moot.

Well, just because its a pure-increasing-key doesn't imply rows don't
get deleted. Look at the NewOrder table in DBT-2/TPC-C. Order numbers
increase, but mostly they get delivered in the end. 

In the case I describe, tuples are added always to the rightmost edge of
the index, so it seems worthwhile to always add tuples to the top of the
heap only so that the order is consistent.

On Thu, 2005-11-17 at 12:45 -0300, Alvaro Herrera wrote:

> That's a nice idea, but what's the cost?  You will have to check every
> insert: does the table has indexes?  Is any of them clustered?  Is the
> clustered index attached to a sequence?  It seems quite an expensive
> check to be making.

I'd make the check when the relcache is loaded and store it there as a
boolean. heap_insert already has a boolean on it for use_fsm, so it
would transfer very simply at insert time with almost no overhead.

The use case exists and the technique is low overhead, but the main
question is: Does anybody think this behaviour would be beneficial for
them? (I'm actually in two minds myself, but once the idea has arisen,
it seems sensible to discuss this for everybody's sake).

The trade-off is a table that keeps growing in size, even though you
VACUUM it, with the benefit that the clustering is maintained. 

So how would you maintain it? Looks like you'd still have to use regular
CLUSTER commands, but at least it would stay good in between.

Best Regards, Simon Riggs



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Improving count(*)
Next
From: Martijn van Oosterhout
Date:
Subject: Re: Improving count(*)