Re: on hash indexes - Mailing list pgsql-hackers

From Kenneth Marshall
Subject Re: on hash indexes
Date
Msg-id 20090204234849.GE2995@it.is.rice.edu
Whole thread Raw
In response to Re: on hash indexes  (Zdenek Kotala <Zdenek.Kotala@Sun.COM>)
List pgsql-hackers
On Wed, Feb 04, 2009 at 11:22:44PM +0100, Zdenek Kotala wrote:
> The main speed improvement is for varchar datatype. I think It should be
> mention here as well. IIRC, times are similar with B-Tree for integer
> datatype.
> 
>     Zdenek
> 
> Kenneth Marshall p????e v st 04. 02. 2009 v 13:57 -0600:
> > I had submitted the documentation change as part of my
> > hash function patch but it was removed as not relevant.
> > (It wasn't really.) I would basically remove the first
> > sentence:
> > 
> >         Note: Hash index operations are not presently WAL-logged,
> >   so hash indexes might need to be rebuilt with REINDEX  after a
> >   database crash. For this reason, hash index use is presently
> >   discouraged.
> > 
> > Ken
> > 
> > 
> > On Wed, Feb 04, 2009 at 01:22:23PM -0300, Alvaro Herrera wrote:
> > > Hi,
> > > 
> > > indices.sgml contains this paragraph about hash indexes:
> > > 
> > >     Note:  Testing has shown PostgreSQL's hash indexes to perform no
> > > better than B-tree indexes, and the index size and build time for hash
> > > indexes is much worse. Furthermore, hash index operations are not
> > > presently WAL-logged, so hash indexes might need to be rebuilt with
> > > REINDEX  after a database crash. For these reasons, hash index use is
> > > presently discouraged. 
> > > 
> > > 
> > > However, it seems to me that hash indexes are much improved in 8.4, so
> > > maybe this needs to be reworded.  I'm not sure to what point they have
> > > been improved though.
> > > 
> > > -- 
> > > Alvaro Herrera                                http://www.CommandPrompt.com/
> > > PostgreSQL Replication, Consulting, Custom Development, 24x7 support
> > > 

The speed improvement applies particularly to any datatype that is
larger than an integer (typically 4 bytes). Also the size of fields that
can be indexed efficiently is much, much larger than the 2K for Btree.
And even 32-bit quantities may be indexed more efficiently than Btrees
for large indexes due to the O(1) probe behavior. Btrees typically need
to cache/probe the upper levels of the tree to locate the tuple. I have
held off on extensive benchmarking until WAL has been implemented.

Regards,
Ken


pgsql-hackers by date:

Previous
From: "Dann Corbit"
Date:
Subject: Re: Is a plan for lmza commpression in pg_dump
Next
From: Fujii Masao
Date:
Subject: Re: Hot standby, recovery infra