Re: Hash indexes (was: On-disk bitmap index patch) - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Hash indexes (was: On-disk bitmap index patch)
Date
Msg-id 29998.1154114828@sss.pgh.pa.us
Whole thread Raw
In response to Re: Hash indexes (was: On-disk bitmap index patch)  (Alvaro Herrera <alvherre@commandprompt.com>)
Responses Re: Hash indexes (was: On-disk bitmap index patch)  (Gregory Stark <gsstark@mit.edu>)
List pgsql-hackers
Alvaro Herrera <alvherre@commandprompt.com> writes:
> The btree index needs to descend potentially many pages before getting
> to the leaf page, where the actual index is stored.  The hash index can
> get at the "leaf" node in --supposedly-- one fetch.  Btree is O(logN) to
> get a single key, while hash is O(1).  Our problem lies in the
> constants; for btree they are smaller than for hash, so in practice
> that O(logN) is always smaller than O(1).

> I've heard other database systems manage to have hash indexes that are
> actually faster than btree, so either (1) our btree absolutely rocks, or
> (2) their hash implementations are better (probably both).

I think the problem may well be that we use hash buckets that are too
large (ie, whole pages).  After we fetch the page, we have to grovel
through every tuple on it to find the one(s) that really match the
query, whereas btree has a much more intelligent strategy (viz binary
search) to do its intrapage searches.  Smaller buckets would help make
up for this.

Another issue is that we don't store the raw hashcode in the index
tuples, so the only way to test a tuple is to actually invoke the
datatype equality function.  If we stored the whole 32-bit hashcode
we could eliminate non-matching hashcodes cheaply.  I'm not sure how
painful it'd be to do this though ... hash uses the same index tuple
layout as everybody else, and so there's no convenient place to put
the hashcode.

Anyway the bottom line here is that no one's tried hard to fix it,
but there are certainly things that might help.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: On-disk bitmap index patch
Next
From: "Jim C. Nasby"
Date:
Subject: Re: Hash indexes (was: On-disk bitmap index patch)