Re: Minmax indexes - Mailing list pgsql-hackers
From | Alvaro Herrera |
---|---|
Subject | Re: Minmax indexes |
Date | |
Msg-id | 20140709211619.GQ6390@eldon.alvh.no-ip.org Whole thread Raw |
In response to | Re: Minmax indexes (Claudio Freire <klaussfreire@gmail.com>) |
Responses |
Re: Minmax indexes
Re: Minmax indexes Re: Minmax indexes Re: Minmax indexes Re: Minmax indexes Re: Minmax indexes |
List | pgsql-hackers |
Claudio Freire wrote: > An aggregate to generate a "compressed set" from several values > A function which adds a new value to the "compressed set" and returns > the new "compressed set" > A function which tests if a value is in a "compressed set" > A function which tests if a "compressed set" overlaps another > "compressed set" of equal type > > If you can define different compressed sets, you can use this to > generate both min/max indexes as well as bloom filter indexes. Whether > we'd want to have both is perhaps questionable, but having the ability > to is probably desirable. Here's a new version of this patch, which is more generic the original versions, and similar to what you describe. The way it works now, each opclass needs to have three support procedures; I've called them getOpers, maybeUpdateValues, and compare. (I realize these names are pretty bad, and will be changing them.) getOpers is used to obtain information about what is stored for that data type; it says how many datum values are stored for a column of that type (two for sortable: min and max), and how many operators it needs setup. Then, the generic code fills in a MinmaxDesc(riptor) and creates an initial DeformedMMTuple (which is a rather ugly name for a minmax tuple held in memory). The maybeUpdateValues amproc can then be called when there's a new heap tuple, which updates the DeformedMMTuple to account for the new tuple (in essence, it's a union of the original values and the new tuple). This can be done repeatedly (when a new index is being created) or only once (when a new heap tuple is inserted into an existing index). There is no need for an "aggregate". This DeformedMMTuple can easily be turned into the on-disk representation; there is no hardcoded assumption on the number of index values stored per heap column, so it is possible to build an opclass that stores a bounding box column for a geometry heap column, for instance. Then we have the "compare" amproc. This is used during index scans; after extracting an index tuple, it is turned into DeformedMMTuple, and the "compare" amproc for each column is called with the values of scan keys. (Now that I think about this, it seems pretty much what "consistent" is for GiST opclasses). A true return value indicates that the scan key matches the page range boundaries and thus all pages in the range are added to the output TID bitmap. Of course, you can have multicolumn indexes, and (as would be expected) each column can have totally different opclasses; so for instance you could have an integer column and a geometric column in the same index, and it should work fine. In a query that constrained both columns, only those page ranges that satisfied the scan keys for both columns would be returned. I think this level of abstraction is good --- AFAICS it is easy to implement opclasses for datatypes not suitable for btree opclasses such as geometric ones, etc. This answers the concerns of those who wanted to see this support datatypes that don't have the concept of min/max at all. I'm not sure about bloom filters, as I've forgotten how those work. Of course, the implementation of min/max is there: that logic has been abstracted out into what I've called "sortable opfamilies" for now (name suggestions welcome) --- it can be used for any datatype with a btree opclass. I think design-wise it ended up making a lot of sense, because all the opclass-specific assumptions about usage of "min" and "max" values and comparisons using the less-than etc operators are contained in the mmsortable.c file, and the basic minmax.c file only needs to know to call the right opclass-specific procedures. The basic code might need some tweaks to ensure that we're not assuming anything about the datatypes of the stored values (as opposed to the datatypes of the indexed columns), but this is a matter of tweaking the MinmaxDesc and the getOpers amprocs; it wouldn't require changing the on-disk representation, and thus is upgrade-compatible. There's a bit of boilerplate code in the amproc routines which would be nice to be able to remove (mainly involving null values), but I think to do that we would need to split those three amprocs into maybe four or five, which is not as nice, so I'm not real sure about doing it. All this being said, I'm sticking to the name "Minmax indexes". There was a poll in pgsql-advocacy http://www.postgresql.org/message-id/53A0B4F8.8080803@agliodbs.com about a new name, but there were no suggestions supported by more than one person. If a brilliant new name comes up, I'm open to changing it. Another thing I noticed is that version 8 of the patch blindly believed the "pages_per_range" declared in catalogs. This meant that if somebody did "alter index foo set pages_per_range=123" the index would immediately break (i.e. return corrupted results when queried). I have fixed this by storing the pages_per_range value used to construct the index in the metapage. Now if you do the ALTER INDEX thing, the new value is only used when the index is recreated by REINDEX. There are still things to go over before this is committable, particularly regarding vacuuming the index, but as far as index creation and scanning it should be good to test. (Vacuuming should work just fine most of the time also, but there are a few wrinkles pointed out by Heikki.) One thing I've disabled for now is the pageinspect code that displays index items. I need to rewrite that. Closing thought: thinking more about it, the bit about returning function OIDs by getOpers when creating a MinmaxDesc seems unnecessary. I think we could go by with just returning the number of values stored in the column, and have the operators be part of an opaque struct that's initialized and only touched by the opclass amprocs, not by the generic code. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
pgsql-hackers by date: