Re: [HACKERS] Re: Multi field hash indexes - Mailing list pgsql-hackers
From | ocie@paracel.com |
---|---|
Subject | Re: [HACKERS] Re: Multi field hash indexes |
Date | |
Msg-id | 9803171840.AA02081@dolomite.paracel.com Whole thread Raw |
In response to | Re: Multi field hash indexes (Hannu Krosing <hannu@trust.ee>) |
Responses |
Re: [HACKERS] Re: Multi field hash indexes
|
List | pgsql-hackers |
Hannu Krosing wrote: > > Ocie Mitchell wrote: > > > I was playing around with my latest compile and tried to make a unique > > > index on two columns, using a hash method. Both of these (more than > > one column and unique) are currently not allowed for hash indexes. > > > > I thought about this for a bit and realized that making a NESTED hash > > index (index on a and b also serves as an index on a) would be a > > trick, but allowing the unique clause should not be a problem. > > It can be complicated (especially for extensible hashing) but the > theory > for this seems to be on in the database handbooks under the name of > 'segmented hash' or some like fancy name. > > The trick is to hash each field separately and then have a concatenation > > of the hash values. > > so assuming that for fields (a,b,c) values (120, 'friday', 3.1415927) > hash > to 'aa', 'bb', 'cc' the hash value of the whole tuple will be 'aabbcc' > > then the index can be used not only when selecting = a,b,c but also > when > selecting on _any_ of the fields in this index. For example when > selecting > for b='friday' one would examine only buckets where the middle is 'bb' > HMM, this doesn't feel right. If I have an index on four int4s a,b,c,d and I only know d, then it seems like searching for these in the hash table could be as much work, or more work than a table scan. Now if a hash on two columns implicitly make a seperate hash on each, and then took their intersection, that might be fast. The kind of thing I am thinking that these two-or-more-column hash indexes would be use for supporting intersection tests, where the order of the items doesn't matter, but it is good to be able to come up with a quick answer to the questions: select x from t where y=5; select y from t where x in (1, 4, 6, 7); select count(*) from t where x=1 and y=10; I was originally thinking that this would be supported like the btree indexes are now -- an index on (a,b,c,d) serves as in index on a, (a,b), (a,b,c) and (a,b,c,d), but it doesn't serve as an index on b, or (b,c), etc. My original idea was that the first item in the index would define a hash table whose entries were hash tables on the second item, etc. I now think that this would waste quite a bit of space, and would have the same restriction as btrees, which is unnatural. > > Therefore, I would like to try implementing unique constraints on hash > > > indexes. Has this come up before? Are there any reasons not to > > support this? As far as I understand, specifying an index method is > > non standard (above and beyond standard) to begin with. > > While you are at it could you please comment if the GIST indexes are > used or > at least easily usable? What are GIST indexes? Ocie Mitchell
pgsql-hackers by date: