Re: [HACKERS] Re: Multi field hash indexes - Mailing list pgsql-hackers

From ocie@paracel.com
Subject Re: [HACKERS] Re: Multi field hash indexes
Date
Msg-id 9803171840.AA02081@dolomite.paracel.com
Whole thread Raw
In response to Re: Multi field hash indexes  (Hannu Krosing <hannu@trust.ee>)
Responses Re: [HACKERS] Re: Multi field hash indexes
List pgsql-hackers
Hannu Krosing wrote:
>
> Ocie Mitchell wrote:
>
> > I was playing around with my latest compile and tried to make a unique
>
> > index on two columns, using a hash method.  Both of these (more than
> > one column and unique) are currently not allowed for hash indexes.
> >
> > I thought about this for a bit and realized that making a NESTED hash
> > index (index on a and b also serves as an index on a) would be a
> > trick, but allowing the unique clause should not be a problem.
>
> It can be complicated (especially for extensible hashing) but the
> theory
> for this seems to be on in the database handbooks under the name of
> 'segmented hash' or some like fancy name.
>
> The trick is to hash each field separately and then have a concatenation
>
> of the hash values.
>
> so assuming that for fields (a,b,c) values (120, 'friday', 3.1415927)
> hash
> to 'aa', 'bb', 'cc' the hash value of the whole tuple will be 'aabbcc'
>
> then the index can be used not only when selecting  = a,b,c but also
> when
> selecting on _any_ of the fields in this index. For example when
> selecting
> for b='friday' one would examine only buckets where the middle is 'bb'
>

HMM, this doesn't feel right.  If I have an index on four int4s
a,b,c,d and I only know d, then it seems like searching for these in
the hash table could be as much work, or more work than a table scan.
Now if a hash on two columns implicitly make a seperate hash on each,
and then took their intersection, that might be fast.

The kind of thing I am thinking that these two-or-more-column hash
indexes would be use for supporting intersection tests, where the
order of the items doesn't matter, but it is good to be able to come
up with a quick answer to the questions:

select x from t where y=5;
select y from t where x in (1, 4, 6, 7);
select count(*) from t where x=1 and y=10;


I was originally thinking that this would be supported like the btree
indexes are now -- an index on (a,b,c,d) serves as in index on a,
(a,b), (a,b,c) and (a,b,c,d), but it doesn't serve as an index on b,
or (b,c), etc.  My original idea was that the first item in the index
would define a hash table whose entries were hash tables on the second
item, etc.  I now think that this would waste quite a bit of space,
and would have the same restriction as btrees, which is unnatural.


> > Therefore, I would like to try implementing unique constraints on hash
>
> > indexes.  Has this come up before?  Are there any reasons not to
> > support this?  As far as I understand, specifying an index method is
> > non standard (above and beyond standard) to begin with.
>
> While you are at it could you please comment if the GIST indexes are
> used or
> at least easily usable?

What are GIST indexes?


Ocie Mitchell

pgsql-hackers by date:

Previous
From: "Jackson, DeJuan"
Date:
Subject: RE: [HACKERS] First mega-patch...
Next
From: The Hermit Hacker
Date:
Subject: RE: [HACKERS] First mega-patch...