Re: hash index work - Mailing list pgsql-patches

From Bruce Momjian
Subject Re: hash index work
Date
Msg-id 200506250003.j5P038621441@candle.pha.pa.us
Whole thread Raw
In response to Re: hash index work  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-patches
I have added these TODO items based on Neil's ideas:

    * Consider sorting hash buckets so entries can be found using a binary
      search, rather than a linear scan
    * In hash indexes, consider storing the hash value with or instead
      of the key itself

However, Neil's tests don't show much of a win.  Should I keep these
items on the TODO list?

---------------------------------------------------------------------------

Tom Lane wrote:
> Neil Conway <neilc@samurai.com> writes:
> > The patch does two things: (1) change hash indexes to only store the
> > key's hash value, not the entire key (2) store index elements within a
> > hash bucket in order of hash key and search for matches via binary
> > search. #1 is definitely a win in some in some circumstances (e.g.
> > indexing large fields or types with expensive equality operators), but
> > those aren't the common case. I'm surprised that #2 is not a more
> > noticeable improvement...
>
> It occurs to me that change #1 makes it cheaper to skip over index
> entries that are in the same bucket but don't have the exact same hash
> code; but it makes it significantly more expensive to skip over entries
> that have the same hash code but aren't really equal to the key being
> sought (since you have to visit the heap to find out they aren't equal).
> Maybe your test workload had enough occurrences of the latter case to
> balance out the win from the former.
>
> I think it would be worth investigating a variant in which the index
> stores both the hash code and the key value.  This allows cheap
> elimination of non-matching hash codes, and binary sorting of index
> entries, without adding any extra trips to the heap.  The downside is
> that it makes the index larger so you incur more I/O there --- so this
> might not be a win either.
>
> > One possibility would be to provide an optional implementation of #1,
> > perhaps via an alternate index operator class. That way people could
> > select it in those situations in which it is worth using.
>
> I think it would definitely be a good idea to make the lossy behavior
> optional.
>
>             regards, tom lane
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

pgsql-patches by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: Function's LEAST, GREATEST and DECODE (Oracle vararg
Next
From: Bruce Momjian
Date:
Subject: Re: Add PG version number to NLS files