Thread: CLUSTER and clustered indices
When a table has been CLUSTERed on a particular index AND that index values is monotonically increasing, then it would be a bad move to use blocks from the FSM since this would tend to destroy the natural clustering sequence. The index values will be monotonically increasing if a datatype is defined as SERIAL or if the default value is defined as the nextval of a sequence. Does anybody think it would be a good idea to not use the FSM if - we have a CLUSTER defined on an index - for the indexed column we have default value set of nextval() Best Regards, Simon Riggs
Simon Riggs <simon@2ndquadrant.com> writes: > When a table has been CLUSTERed on a particular index AND that index > values is monotonically increasing, then it would be a bad move to use > blocks from the FSM since this would tend to destroy the natural > clustering sequence. By the time there are any blocks in FSM to take, the original clean index page sequence is doubtless history. The pure-increasing-key scenario you are thinking of never will have any FSM entries, so it's moot. regards, tom lane
Simon Riggs wrote: > When a table has been CLUSTERed on a particular index AND that index > values is monotonically increasing, then it would be a bad move to use > blocks from the FSM since this would tend to destroy the natural > clustering sequence. > > The index values will be monotonically increasing if a datatype is > defined as SERIAL or if the default value is defined as the nextval of a > sequence. > > Does anybody think it would be a good idea to not use the FSM if > - we have a CLUSTER defined on an index > - for the indexed column we have default value set of nextval() That's a nice idea, but what's the cost? You will have to check every insert: does the table has indexes? Is any of them clustered? Is the clustered index attached to a sequence? It seems quite an expensive check to be making. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Thu, 2005-11-17 at 10:58 -0500, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > When a table has been CLUSTERed on a particular index AND that index > > values is monotonically increasing, then it would be a bad move to use > > blocks from the FSM since this would tend to destroy the natural > > clustering sequence. > > By the time there are any blocks in FSM to take, the original clean > index page sequence is doubtless history. The pure-increasing-key > scenario you are thinking of never will have any FSM entries, so it's > moot. Well, just because its a pure-increasing-key doesn't imply rows don't get deleted. Look at the NewOrder table in DBT-2/TPC-C. Order numbers increase, but mostly they get delivered in the end. In the case I describe, tuples are added always to the rightmost edge of the index, so it seems worthwhile to always add tuples to the top of the heap only so that the order is consistent. On Thu, 2005-11-17 at 12:45 -0300, Alvaro Herrera wrote: > That's a nice idea, but what's the cost? You will have to check every > insert: does the table has indexes? Is any of them clustered? Is the > clustered index attached to a sequence? It seems quite an expensive > check to be making. I'd make the check when the relcache is loaded and store it there as a boolean. heap_insert already has a boolean on it for use_fsm, so it would transfer very simply at insert time with almost no overhead. The use case exists and the technique is low overhead, but the main question is: Does anybody think this behaviour would be beneficial for them? (I'm actually in two minds myself, but once the idea has arisen, it seems sensible to discuss this for everybody's sake). The trade-off is a table that keeps growing in size, even though you VACUUM it, with the benefit that the clustering is maintained. So how would you maintain it? Looks like you'd still have to use regular CLUSTER commands, but at least it would stay good in between. Best Regards, Simon Riggs
Simon Riggs wrote: > On Thu, 2005-11-17 at 10:58 -0500, Tom Lane wrote: > > Simon Riggs <simon@2ndquadrant.com> writes: > > The use case exists and the technique is low overhead, but the main > question is: Does anybody think this behaviour would be beneficial for > them? (I'm actually in two minds myself, but once the idea has arisen, > it seems sensible to discuss this for everybody's sake). I have no use for it but I see it would be beneficial in some cases. > The trade-off is a table that keeps growing in size, even though you > VACUUM it, with the benefit that the clustering is maintained. > > So how would you maintain it? Looks like you'd still have to use regular > CLUSTER commands, but at least it would stay good in between. Yeah, this is a problem. The growth is unbounded. Even if there's a completely empty page somewhere, it can't be used because all tuples will go to the last page. The problem with using CLUSTER for maintenance is that it takes an exclusive lock on the table, which is a thing we've been running away from. You are right in that it's much cheaper than CLUSTERing a table that isn't ordered, because there's much more locality. But I don't think it's a big enough win. Because of the drawbacks (unbounded growth being the most prominent one) this would have to be an optional thing. This means we would need an additional system catalog column to keep whether it's active or not. And a user command to activate it. So it's starting to be a more invasive thing. Not that these things matter a whole lot, but anyway. Personally I'd prefer to see index-ordered heaps, where the heap is itself an index, so the ordering it automatically kept. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
I agree, keeping it clustered would be very nice.
On 11/17/05, Alvaro Herrera <alvherre@commandprompt.com> wrote:
Simon Riggs wrote:
> On Thu, 2005-11-17 at 10:58 -0500, Tom Lane wrote:
> > Simon Riggs < simon@2ndquadrant.com> writes:
>
> The use case exists and the technique is low overhead, but the main
> question is: Does anybody think this behaviour would be beneficial for
> them? (I'm actually in two minds myself, but once the idea has arisen,
> it seems sensible to discuss this for everybody's sake).
I have no use for it but I see it would be beneficial in some cases.
> The trade-off is a table that keeps growing in size, even though you
> VACUUM it, with the benefit that the clustering is maintained.
>
> So how would you maintain it? Looks like you'd still have to use regular
> CLUSTER commands, but at least it would stay good in between.
Yeah, this is a problem. The growth is unbounded. Even if there's a
completely empty page somewhere, it can't be used because all tuples
will go to the last page. The problem with using CLUSTER for
maintenance is that it takes an exclusive lock on the table, which is a
thing we've been running away from. You are right in that it's much
cheaper than CLUSTERing a table that isn't ordered, because there's much
more locality. But I don't think it's a big enough win.
Because of the drawbacks (unbounded growth being the most prominent one)
this would have to be an optional thing. This means we would need an
additional system catalog column to keep whether it's active or not.
And a user command to activate it. So it's starting to be a more
invasive thing. Not that these things matter a whole lot, but anyway.
Personally I'd prefer to see index-ordered heaps, where the heap is
itself an index, so the ordering it automatically kept.
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
On Thu, 2005-11-17 at 21:57 -0300, Alvaro Herrera wrote: > Personally I'd prefer to see index-ordered heaps, where the heap is > itself an index, so the ordering it automatically kept. Agreed. (I think thats case-closed on the previous proposal.) As an aside, Index Organized Tables (IOTs) isn't just an Oracle term. They first used the term, but the concept had already been implemented in both Tandem (value-ordered) and Teradata (hash-ordered) before this, as well as numerous OLAP systems. The concept doesn't look to be patented. If anybody is looking for a justification for IOTs, the reduction in table volume for large tables is very high. IOTs are the equivalent of removing all of the leaf blocks of the clustered index. Best Regards, Simon Riggs
That sounds very much like a CLUSTERED INDEX under Sybase ASE (or the derivative Microsoft SQL Server). In those products, when you create a clustered index, the data pages are sorted according to the index sequence, and are used as the leaf pages in the index. A clustered index does not have another leaf level. >>> Simon Riggs <simon@2ndquadrant.com> >>> As an aside, Index Organized Tables (IOTs) isn't just an Oracle term. They first used the term, but the concept had already been implemented in both Tandem (value-ordered) and Teradata (hash-ordered) before this, as well as numerous OLAP systems. The concept doesn't look to be patented. If anybody is looking for a justification for IOTs, the reduction in table volume for large tables is very high. IOTs are the equivalent of removing all of the leaf blocks of the clustered index.
+1, and I know Sybase had this in 11.0.3, which IIRC is over 10 years old now. BTW, http://archives.postgresql.org/pgsql-performance/2004-08/msg00492.php is one discussion about this from the past. I seem to recall that there was an objection to true Index Organized Tables because it would be too dificult to make that work with MVCC. If that's the case then what I laid out in that email might get some of the benefit without the difficulty. But hopefully it's easy to just store heap values in the leaf nodes of an index. FWIW, I know that Sybase required that an IOT be clustered on a unique index. I think Oracle has the same requirement as well. On Fri, Nov 18, 2005 at 08:30:14AM +0000, Simon Riggs wrote: > On Thu, 2005-11-17 at 21:57 -0300, Alvaro Herrera wrote: > > > Personally I'd prefer to see index-ordered heaps, where the heap is > > itself an index, so the ordering it automatically kept. > > Agreed. (I think thats case-closed on the previous proposal.) > > As an aside, Index Organized Tables (IOTs) isn't just an Oracle term. > They first used the term, but the concept had already been implemented > in both Tandem (value-ordered) and Teradata (hash-ordered) before this, > as well as numerous OLAP systems. The concept doesn't look to be > patented. > > If anybody is looking for a justification for IOTs, the reduction in > table volume for large tables is very high. IOTs are the equivalent of > removing all of the leaf blocks of the clustered index. > > Best Regards, Simon Riggs > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend > -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461