Re: What is wrong with hashed index usage? - Mailing list pgsql-hackers

From Dann Corbit
Subject Re: What is wrong with hashed index usage?
Date
Msg-id D90A5A6C612A39408103E6ECDD77B82906F47C@voyager.corporate.connx.com
Whole thread Raw
In response to What is wrong with hashed index usage?  ("Dann Corbit" <DCorbit@connx.com>)
List pgsql-hackers
> -----Original Message-----
> From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
> Sent: Friday, June 21, 2002 9:52 AM
> To: Tom Lane
> Cc: Neil Conway; mloftis@wgops.com; Dann Corbit;
> pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] What is wrong with hashed index usage?
>
>
> Tom Lane wrote:
> > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > > I remember three problems:  build time, index size, and
> concurrency
> > > problems.  I was wondering about the equal key case
> myself, and assumed
> > > hash may be a win there, but with the concurrency
> problems, is that even
> > > possible?
> >
> > Sure.  Many-equal-keys are a problem for btree whether you have any
> > concurrency or not.
> >
> > > OK, I have reworded it.  Is that better?
> >
> > It's better, but you've still discarded the original's
> explicit mention
> > of concurrency problems.  Why do you want to remove information?
>
> OK, concurrency added.  How is that?
>
> >
> > > How about an elog(NOTICE) for hash use?
> >
> > I don't think that's appropriate.
>
> I was thinking of this during CREATE INDEX ... hash:
>
>     NOTICE:  Hash index use is discouraged.  See the CREATE INDEX
>     reference page for more information.
>
> Does anyone else like/dislike that?

I think it might be OK temporarily, to show that there is some work that
needs done.  When hashed indexes are fixed, the notice should be
removed.

I have not looked at the hash code.  Here is a strategy (off the top of
my head) that seems that it should work:

Use Bob Jenkins' 64 bit generic hash from here (totally free for use and
fast as blazes):
http://burtleburtle.net/bob/hash/index.html

Specifically:
http://burtleburtle.net/bob/c/lookup8.c and routine: hash( k, length,
level)

Now, with a 64 bit hash, there is very tiny probability of a collision
(but you could have duplicate data).
The hash index would consist of nothing more than this:
[long long hash=64 bit hash code][unsigned nmatches=count of matching
hashes][array of {nmatches} pointers directly to the records with that
hash]

This is probably grotesqely oversimplified.  But maybe it will spur an
indea in the person who writes the indexing code.


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: What is wrong with hashed index usage?
Next
From: Larry Rosenman
Date:
Subject: Re: What is wrong with hashed index usage?