I have added these TODO items based on Neil's ideas:
* Consider sorting hash buckets so entries can be found using a binary
search, rather than a linear scan
* In hash indexes, consider storing the hash value with or instead
of the key itself
However, Neil's tests don't show much of a win. Should I keep these
items on the TODO list?
---------------------------------------------------------------------------
Tom Lane wrote:
> Neil Conway <neilc@samurai.com> writes:
> > The patch does two things: (1) change hash indexes to only store the
> > key's hash value, not the entire key (2) store index elements within a
> > hash bucket in order of hash key and search for matches via binary
> > search. #1 is definitely a win in some in some circumstances (e.g.
> > indexing large fields or types with expensive equality operators), but
> > those aren't the common case. I'm surprised that #2 is not a more
> > noticeable improvement...
>
> It occurs to me that change #1 makes it cheaper to skip over index
> entries that are in the same bucket but don't have the exact same hash
> code; but it makes it significantly more expensive to skip over entries
> that have the same hash code but aren't really equal to the key being
> sought (since you have to visit the heap to find out they aren't equal).
> Maybe your test workload had enough occurrences of the latter case to
> balance out the win from the former.
>
> I think it would be worth investigating a variant in which the index
> stores both the hash code and the key value. This allows cheap
> elimination of non-matching hash codes, and binary sorting of index
> entries, without adding any extra trips to the heap. The downside is
> that it makes the index larger so you incur more I/O there --- so this
> might not be a win either.
>
> > One possibility would be to provide an optional implementation of #1,
> > perhaps via an alternate index operator class. That way people could
> > select it in those situations in which it is worth using.
>
> I think it would definitely be a good idea to make the lossy behavior
> optional.
>
> regards, tom lane
>
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073