Re: [HACKERS] Re: Multi field hash indexes - Mailing list pgsql-hackers
From | Hannu Krosing |
---|---|
Subject | Re: [HACKERS] Re: Multi field hash indexes |
Date | |
Msg-id | 350F708D.51862F26@sid.trust.ee Whole thread Raw |
In response to | Re: [HACKERS] Re: Multi field hash indexes (ocie@paracel.com) |
List | pgsql-hackers |
ocie@paracel.com wrote: > Hannu Krosing wrote: > > > > ocie@paracel.com wrote: > > > > > Hannu Krosing wrote: > > > > > > > > The trick is to hash each field separately and then have a concatenation > > > > > > > > of the hash values. > > > > > > > > so assuming that for fields (a,b,c) values (120, 'friday', 3.1415927) > > > > hash > > > > to 'aa', 'bb', 'cc' the hash value of the whole tuple will be 'aabbcc' > > > > > > > > then the index can be used not only when selecting = a,b,c but also > > > > when > > > > selecting on _any_ of the fields in this index. For example when > > > > selecting > > > > for b='friday' one would examine only buckets where the middle is 'bb' > > > > > > > > > > HMM, this doesn't feel right. If I have an index on four int4s > > > a,b,c,d and I only know d, then it seems like searching for these in > > > the hash table could be as much work, or more work than a table scan. > > > > I was assuming that the hash table scan would be cheaper than the table scan > > I think I see what you're getting at. The different parts are > separately hashed and then concatenated. I was thinking the other way > concatenate and THEN hash, which would spread the values all over the > table. > > One problem with this (don't know how severe) is that a hash index on > a can answer 'select blah from t where a=10' by computing one hash > value and scaning the list at that point, whereas with an index on > (a,b), several hash values must be computed (one for each possible > resulting hash value of b). The idea is that after computing the hash value you still have to find that value and that values are arranged in some way that makes finding consecutive values cheap (like ordered or tree or matrix (using sparse files)), at least for values with the same start or possibly for all other kinds of same parts. > > > > > indexes. Has this come up before? Are there any reasons not to > > > > > support this? As far as I understand, specifying an index method is > > > > > non standard (above and beyond standard) to begin with. > > > > > > > > While you are at it could you please comment if the GIST indexes are > > > > used or > > > > at least easily usable? > > > > > > What are GIST indexes? > > > > Some kind of generalised binary tree indexes that should make it easy to > > define additional indexing strategies. > > > > There is a directory access/gist in src/backend, and they are briefly > > mentioned in the PostgreSQL programmers guide. > > > > They seem to be offspring of some independent (of postgres) Berkeley project. > > Interesting. I saw something in Dr Dobbs Journal (the last bastion of > programming magazines) about trinary trees. The idea is that there is > a less than branch for items less than the given node and one for > those greater and another for those equal. This can be used to index > items by using the less than/ greater than links to find the first > letter, then taking the 'equal' link and looking for the second > letter, etc. The problem is that these are not self balancing (like > red black trees or btrees), but maybe there is a way to make them so. The information about GIST is at: http://GiST.CS.Berkeley.EDU:8000/gist/ with more on different indexing and sorting schemes at: http://s2k-ftp.CS.Berkeley.EDU:8000/personal/jmh/ And there is more interesting reading at the Berkely databese site at : http://epoch.cs.berkeley.edu:8000/ Hannu
pgsql-hackers by date: