Re: Index AM change proposals, redux - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Index AM change proposals, redux |
Date | |
Msg-id | 1208966672.4259.1397.camel@ebony.site Whole thread Raw |
In response to | Index AM change proposals, redux (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Index AM change proposals, redux
Re: Index AM change proposals, redux |
List | pgsql-hackers |
On Wed, 2008-04-09 at 20:30 -0400, Tom Lane wrote: > * GIT (Grouped Index Tuple) indexes, which achieve index space savings > in btrees by having a single index tuple represent multiple heap tuples > (on a single heap page) containing a range of key values. I am not sure > what the development status is --- Heikki had submitted a completed > patch but there seemed to be agreement on making changes, and that's not > been done AFAIK. The really serious problem I've got with it is that > it'd foreclose the possibility of returning actual index keys from btree > indexes, thus basically killing the usefulness of that idea. I'm not > convinced it would offer enough gain to be worth paying that price. > Another issue is that we'd need to check how much of the use-case for > GIT has been taken over by HOT. That seems to be a misunderstanding about HOT and GIT. HOT is an important requirement for GIT, but other than they are unrelated. Testing in 2006/2007 showed that HOT stabilised the effects of repeated updates, which then showed as a "gain" in performance. But GIT did show considerable actual performance gains in its target use case. GIT significantly reduces the size of clustered indexes, greatly improving the number of index pointers that can be held in memory for very large indexes. That translates directly into a reduction in I/O for large databases on typical hardware, for primary operations, file backups and recovery (and this, log replication). Test results validated that and showed increased performance, over and above that experienced with HOT, when tested together. Now there may be problems with the GIT code as it stands, but we should acknowledge that the general technique has been proven to improve performance on a recent PostgreSQL codebase. This is an unsurprising result, since SQLServer, Sybase, DB2, Oracle and Teradata (at least) all use indexes of this category to improve real-world performance. The idea is definitely not a benchmark-only feature. Many users would be very interested if we could significantly reduce the size of the main index on their largest tables. I would at least like to see clustered indexes acknowledged as a TODO item, so we keep the door open for a future implementation based around the basic concept of GIT. Nobody is going to waste their time flogging a dead horse, which is why the patch isn't ready. Maybe *that* horse is dead, not really for me to say, but if we can at least agree on a basic statement that equine animals are fast we may find a rider willing to invest time in them. I don't see the "returns index keys" idea as being killed by or killing this concept. Returning keys is valid and useful when we can, but there are other considerations that, in some use cases, will be a dominant factor. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
pgsql-hackers by date: