Thread: Grouped Index Tuples / Clustered Indexes
I've updated the GIT patch at http://community.enterprisedb.com/git/. Bitrot caused by the findinsertloc-patch has been fixed, making that part of the GIT patch a little bit smaller and cleaner. I also did some refactoring, and minor cleanup and commenting. Any comments on the design or patch? For your convenience, I copied the same text I added to access/nbtree/README to http://community.enterprisedb.com/git/git-readme.txt Should we start playing the name game at this point? I've been thinking we should call this feature just Clustered Indexes, even though it's not exactly the same thing as clustered indexes in other DBMSs. From user point of view, they behave similarly enough that it may be best to use the existing term. As a next step, I'm hoping to get the indexam API changes from the bitmap index patch committed soon, and in a way that supports GIT as well. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
my only question would be. Why isn't that in core already ?
+1 On 3/7/07 6:53 AM, "Grzegorz Jaskiewicz" <gj@pointblue.com.pl> wrote: > my only question would be. > Why isn't that in core already ? > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match >
On Wed, 2007-03-07 at 10:32 +0000, Heikki Linnakangas wrote: > I've been thinking > we should call this feature just Clustered Indexes Works for me. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
> On Wed, 2007-03-07 at 10:32 +0000, Heikki Linnakangas wrote: >> I've been thinking >> we should call this feature just Clustered Indexes So we would have "clustered tables" which are tables whose heap is ordered according to an index and separately "clustered indexes" which are indexes optimized for such tables? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark wrote: >> On Wed, 2007-03-07 at 10:32 +0000, Heikki Linnakangas wrote: >>> I've been thinking >>> we should call this feature just Clustered Indexes > > So we would have "clustered tables" which are tables whose heap is ordered > according to an index and separately "clustered indexes" which are indexes > optimized for such tables? Yes, that's what I was thinking. There's a third related term in use as well. When you issue CLUSTER, the table will be clustered on an index. And that index is then the "index the table is clustered on". That's a bit cumbersome but that's the terminology we're using at the moment. Maybe we should to come up with a new term for that to avoid confusion.. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote: > There's a third related term in use as well. When you issue CLUSTER, the > table will be clustered on an index. And that index is then the "index > the table is clustered on". That's a bit cumbersome but that's the > terminology we're using at the moment. Maybe we should to come up with a > new term for that to avoid confusion.. This reminds me of something i've been wondering about for quite some time. Why is it that one has to write "cluster <index> on <table>", and not "cluster <table> on <index>"? To me, the second variant would seem more logical, but then I'm not a native english speaker... I'm not suggesting that this should be changed, I'm just wondering why it is the way it is. greetings, Florian Pflug
On Sun, 2007-03-11 at 11:22 +0000, Heikki Linnakangas wrote: > Gregory Stark wrote: > >> On Wed, 2007-03-07 at 10:32 +0000, Heikki Linnakangas wrote: > >>> I've been thinking > >>> we should call this feature just Clustered Indexes > > > > So we would have "clustered tables" which are tables whose heap is ordered > > according to an index and separately "clustered indexes" which are indexes > > optimized for such tables? > > Yes, that's what I was thinking. > > There's a third related term in use as well. When you issue CLUSTER, the > table will be clustered on an index. And that index is then the "index > the table is clustered on". That's a bit cumbersome but that's the > terminology we're using at the moment. Maybe we should to come up with a > new term for that to avoid confusion.. First thought: we can use the term "cluster*ing* index" for CLUSTER and use the term "clustered" to refer to what has happened to the table and the index. That will probably be confused with high availability clustering, so perhaps not. Better thought: say that CLUSTER requires an "order-defining index". That better explains the point that it is the table being clustered, using the index to define the physical order of the rows in the heap. We then use the word "clustered" to refer to what has happened to the table, and with this patch, for the index also. That way we can have new syntax for CLUSTER CLUSTER table ORDER BY indexname which is then the preferred syntax, rather than the perverse CLUSTER index ON table which gives the wrong impression about what is happening, since it is the table that is changed, not the index. - - - - Are you suggesting that we have an explicit new syntax CREATE [UNIQUE] CLUSTERED INDEX [CONCURRENTLY] fooidx ON foo (....) ... or just that we refer to this feature as Clustered Indexes? - Do we still need the index WITH option, in either case? - Do you think that all Primary Keys should be clustered? - Are you thinking to rename docs, catalog etc to reflect the new naming/meaning? My thinking would be: CLUSTERED, no, yes, yes but I'd like to know what you think? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
On Sun, 2007-03-11 at 19:06 +0100, Florian G. Pflug wrote: > Heikki Linnakangas wrote: > > There's a third related term in use as well. When you issue CLUSTER, the > > table will be clustered on an index. And that index is then the "index > > the table is clustered on". That's a bit cumbersome but that's the > > terminology we're using at the moment. Maybe we should to come up with a > > new term for that to avoid confusion.. > > This reminds me of something i've been wondering about for quite some > time. Why is it that one has to write "cluster <index> on <table>", > and not "cluster <table> on <index>"? > > To me, the second variant would seem more logical, but then I'm > not a native english speaker... > > I'm not suggesting that this should be changed, I'm just wondering > why it is the way it is. No idea, but I agree it conveys exactly the opposite view of what happens when the command is issued. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Simon Riggs wrote: > Better thought: say that CLUSTER requires an "order-defining index". > That better explains the point that it is the table being clustered, > using the index to define the physical order of the rows in the heap. We > then use the word "clustered" to refer to what has happened to the > table, and with this patch, for the index also. > > That way we can have new syntax for CLUSTER > > CLUSTER table ORDER BY indexname > > which is then the preferred syntax, rather than the perverse > > CLUSTER index ON table > > which gives the wrong impression about what is happening, since it is > the table that is changed, not the index. I like that, "order-defining index" conveys the point pretty well. > - Are you suggesting that we have an explicit new syntax > > CREATE [UNIQUE] CLUSTERED INDEX [CONCURRENTLY] fooidx ON foo (....) ... > > or just that we refer to this feature as Clustered Indexes? I'm not proposing new syntax, just a WITH-parameter. Makes more sense to me that way, the clusteredness has no user-visible effects except performance, and it's b-tree specific (though I guess you could apply the same concept to other indexams as well). > - Do you think that all Primary Keys should be clustered? No. There's a significant CPU overhead when the index and table are in memory and you're doing simple one-row lookups. And there's no promise that a table is physically in primary key order anyway. There might be some interesting cases where we could enable it automatically. I've been thinking that if you explicitly CLUSTER a table, the order-defining index would definitely benefit from being a clustered index. If it's small enough that it fits in memory, there's no point in running CLUSTER in the first place. And if you run CLUSTER, we know it's in order. That seems like a pretty safe bet. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Simon Riggs wrote: > On Sun, 2007-03-11 at 19:06 +0100, Florian G. Pflug wrote: > > Heikki Linnakangas wrote: > > > There's a third related term in use as well. When you issue CLUSTER, the > > > table will be clustered on an index. And that index is then the "index > > > the table is clustered on". That's a bit cumbersome but that's the > > > terminology we're using at the moment. Maybe we should to come up with a > > > new term for that to avoid confusion.. > > > > This reminds me of something i've been wondering about for quite some > > time. Why is it that one has to write "cluster <index> on <table>", > > and not "cluster <table> on <index>"? > > > > To me, the second variant would seem more logical, but then I'm > > not a native english speaker... > > > > I'm not suggesting that this should be changed, I'm just wondering > > why it is the way it is. > > No idea, but I agree it conveys exactly the opposite view of what > happens when the command is issued. We got the syntax from Berkely, and it has always seemed backwards to me too. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Added to TODO: o Add more logical syntax CLUSTER table ORDER BY index; support current syntax for backward compatibility --------------------------------------------------------------------------- Simon Riggs wrote: > On Sun, 2007-03-11 at 11:22 +0000, Heikki Linnakangas wrote: > > Gregory Stark wrote: > > >> On Wed, 2007-03-07 at 10:32 +0000, Heikki Linnakangas wrote: > > >>> I've been thinking > > >>> we should call this feature just Clustered Indexes > > > > > > So we would have "clustered tables" which are tables whose heap is ordered > > > according to an index and separately "clustered indexes" which are indexes > > > optimized for such tables? > > > > Yes, that's what I was thinking. > > > > There's a third related term in use as well. When you issue CLUSTER, the > > table will be clustered on an index. And that index is then the "index > > the table is clustered on". That's a bit cumbersome but that's the > > terminology we're using at the moment. Maybe we should to come up with a > > new term for that to avoid confusion.. > > First thought: we can use the term "cluster*ing* index" for CLUSTER and > use the term "clustered" to refer to what has happened to the table and > the index. That will probably be confused with high availability > clustering, so perhaps not. > > Better thought: say that CLUSTER requires an "order-defining index". > That better explains the point that it is the table being clustered, > using the index to define the physical order of the rows in the heap. We > then use the word "clustered" to refer to what has happened to the > table, and with this patch, for the index also. > > That way we can have new syntax for CLUSTER > > CLUSTER table ORDER BY indexname > > which is then the preferred syntax, rather than the perverse > > CLUSTER index ON table > > which gives the wrong impression about what is happening, since it is > the table that is changed, not the index. > > - - - > > - Are you suggesting that we have an explicit new syntax > > CREATE [UNIQUE] CLUSTERED INDEX [CONCURRENTLY] fooidx ON foo (....) ... > > or just that we refer to this feature as Clustered Indexes? > > - Do we still need the index WITH option, in either case? > > - Do you think that all Primary Keys should be clustered? > > - Are you thinking to rename docs, catalog etc to reflect the new > naming/meaning? > > My thinking would be: CLUSTERED, no, yes, yes > but I'd like to know what you think? > > -- > Simon Riggs > EnterpriseDB http://www.enterprisedb.com > > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Your patch has been added to the PostgreSQL unapplied patches list at: http://momjian.postgresql.org/cgi-bin/pgpatches It will be applied as soon as one of the PostgreSQL committers reviews and approves it. --------------------------------------------------------------------------- Heikki Linnakangas wrote: > I've updated the GIT patch at http://community.enterprisedb.com/git/. > Bitrot caused by the findinsertloc-patch has been fixed, making that > part of the GIT patch a little bit smaller and cleaner. I also did some > refactoring, and minor cleanup and commenting. > > Any comments on the design or patch? For your convenience, I copied the > same text I added to access/nbtree/README to > http://community.enterprisedb.com/git/git-readme.txt > > Should we start playing the name game at this point? I've been thinking > we should call this feature just Clustered Indexes, even though it's not > exactly the same thing as clustered indexes in other DBMSs. From user > point of view, they behave similarly enough that it may be best to use > the existing term. > > As a next step, I'm hoping to get the indexam API changes from the > bitmap index patch committed soon, and in a way that supports GIT as well. > > -- > Heikki Linnakangas > EnterpriseDB http://www.enterprisedb.com > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +