Re: Grouped Index Tuples / Clustered Indexes - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Grouped Index Tuples / Clustered Indexes
Date
Msg-id 45F54D35.7090204@enterprisedb.com
Whole thread Raw
In response to Re: Grouped Index Tuples / Clustered Indexes  ("Simon Riggs" <simon@2ndquadrant.com>)
List pgsql-hackers
Simon Riggs wrote:
> Better thought: say that CLUSTER requires an "order-defining index".
> That better explains the point that it is the table being clustered,
> using the index to define the physical order of the rows in the heap. We
> then use the word "clustered" to refer to what has happened to the
> table, and with this patch, for the index also.
> 
> That way we can have new syntax for CLUSTER
> 
>     CLUSTER table ORDER BY indexname
> 
> which is then the preferred syntax, rather than the perverse
> 
>     CLUSTER index ON table
> 
> which gives the wrong impression about what is happening, since it is
> the table that is changed, not the index.

I like that, "order-defining index" conveys the point pretty well.

> - Are you suggesting that we have an explicit new syntax
> 
> CREATE [UNIQUE] CLUSTERED INDEX [CONCURRENTLY] fooidx ON foo (....) ...
> 
> or just that we refer to this feature as Clustered Indexes?

I'm not proposing new syntax, just a WITH-parameter. Makes more sense to 
me that way, the clusteredness has no user-visible effects except 
performance, and it's b-tree specific (though I guess you could apply 
the same concept to other indexams as well).

> - Do you think that all Primary Keys should be clustered?

No. There's a significant CPU overhead when the index and table are in 
memory and you're doing simple one-row lookups. And there's no promise 
that a table is physically in primary key order anyway.

There might be some interesting cases where we could enable it 
automatically. I've been thinking that if you explicitly CLUSTER a 
table, the order-defining index would definitely benefit from being a 
clustered index. If it's small enough that it fits in memory, there's no 
point in running CLUSTER in the first place. And if you run CLUSTER, we 
know it's in order. That seems like a pretty safe bet.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Galy Lee
Date:
Subject: Re: autovacuum next steps, take 3
Next
From: "Simon Riggs"
Date:
Subject: Re: Synchronized Scan update