Re: hash index work - Mailing list pgsql-patches

From Tom Lane
Subject Re: hash index work
Date
Msg-id 9723.1117296218@sss.pgh.pa.us
Whole thread Raw
In response to Re: hash index work  (Neil Conway <neilc@samurai.com>)
Responses Re: hash index work
List pgsql-patches
Neil Conway <neilc@samurai.com> writes:
> The patch does two things: (1) change hash indexes to only store the
> key's hash value, not the entire key (2) store index elements within a
> hash bucket in order of hash key and search for matches via binary
> search. #1 is definitely a win in some in some circumstances (e.g.
> indexing large fields or types with expensive equality operators), but
> those aren't the common case. I'm surprised that #2 is not a more
> noticeable improvement...

It occurs to me that change #1 makes it cheaper to skip over index
entries that are in the same bucket but don't have the exact same hash
code; but it makes it significantly more expensive to skip over entries
that have the same hash code but aren't really equal to the key being
sought (since you have to visit the heap to find out they aren't equal).
Maybe your test workload had enough occurrences of the latter case to
balance out the win from the former.

I think it would be worth investigating a variant in which the index
stores both the hash code and the key value.  This allows cheap
elimination of non-matching hash codes, and binary sorting of index
entries, without adding any extra trips to the heap.  The downside is
that it makes the index larger so you incur more I/O there --- so this
might not be a win either.

> One possibility would be to provide an optional implementation of #1,
> perhaps via an alternate index operator class. That way people could
> select it in those situations in which it is worth using.

I think it would definitely be a good idea to make the lossy behavior
optional.

            regards, tom lane

pgsql-patches by date:

Previous
From: Tom Lane
Date:
Subject: Re: psql backslash consistency
Next
From: Robert Treat
Date:
Subject: Re: psql backslash consistency