Thread: on hash indexes

on hash indexes

From
Alvaro Herrera
Date:
Hi,

indices.sgml contains this paragraph about hash indexes:
Note:  Testing has shown PostgreSQL's hash indexes to perform no
better than B-tree indexes, and the index size and build time for hash
indexes is much worse. Furthermore, hash index operations are not
presently WAL-logged, so hash indexes might need to be rebuilt with
REINDEX  after a database crash. For these reasons, hash index use is
presently discouraged. 


However, it seems to me that hash indexes are much improved in 8.4, so
maybe this needs to be reworded.  I'm not sure to what point they have
been improved though.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: on hash indexes

From
Kenneth Marshall
Date:
I had submitted the documentation change as part of my
hash function patch but it was removed as not relevant.
(It wasn't really.) I would basically remove the first
sentence:
       Note: Hash index operations are not presently WAL-logged, so hash indexes might need to be rebuilt with REINDEX
aftera database crash. For this reason, hash index use is presently discouraged.
 

Ken


On Wed, Feb 04, 2009 at 01:22:23PM -0300, Alvaro Herrera wrote:
> Hi,
> 
> indices.sgml contains this paragraph about hash indexes:
> 
>     Note:  Testing has shown PostgreSQL's hash indexes to perform no
> better than B-tree indexes, and the index size and build time for hash
> indexes is much worse. Furthermore, hash index operations are not
> presently WAL-logged, so hash indexes might need to be rebuilt with
> REINDEX  after a database crash. For these reasons, hash index use is
> presently discouraged. 
> 
> 
> However, it seems to me that hash indexes are much improved in 8.4, so
> maybe this needs to be reworded.  I'm not sure to what point they have
> been improved though.
> 
> -- 
> Alvaro Herrera                                http://www.CommandPrompt.com/
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
> 


Re: on hash indexes

From
Zdenek Kotala
Date:
The main speed improvement is for varchar datatype. I think It should be
mention here as well. IIRC, times are similar with B-Tree for integer
datatype.
Zdenek

Kenneth Marshall píše v st 04. 02. 2009 v 13:57 -0600:
> I had submitted the documentation change as part of my
> hash function patch but it was removed as not relevant.
> (It wasn't really.) I would basically remove the first
> sentence:
> 
>         Note: Hash index operations are not presently WAL-logged,
>   so hash indexes might need to be rebuilt with REINDEX  after a
>   database crash. For this reason, hash index use is presently
>   discouraged.
> 
> Ken
> 
> 
> On Wed, Feb 04, 2009 at 01:22:23PM -0300, Alvaro Herrera wrote:
> > Hi,
> > 
> > indices.sgml contains this paragraph about hash indexes:
> > 
> >     Note:  Testing has shown PostgreSQL's hash indexes to perform no
> > better than B-tree indexes, and the index size and build time for hash
> > indexes is much worse. Furthermore, hash index operations are not
> > presently WAL-logged, so hash indexes might need to be rebuilt with
> > REINDEX  after a database crash. For these reasons, hash index use is
> > presently discouraged. 
> > 
> > 
> > However, it seems to me that hash indexes are much improved in 8.4, so
> > maybe this needs to be reworded.  I'm not sure to what point they have
> > been improved though.
> > 
> > -- 
> > Alvaro Herrera                                http://www.CommandPrompt.com/
> > PostgreSQL Replication, Consulting, Custom Development, 24x7 support
> > 
> > -- 
> > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-hackers
> > 
> 



Re: on hash indexes

From
Kenneth Marshall
Date:
On Wed, Feb 04, 2009 at 11:22:44PM +0100, Zdenek Kotala wrote:
> The main speed improvement is for varchar datatype. I think It should be
> mention here as well. IIRC, times are similar with B-Tree for integer
> datatype.
> 
>     Zdenek
> 
> Kenneth Marshall p????e v st 04. 02. 2009 v 13:57 -0600:
> > I had submitted the documentation change as part of my
> > hash function patch but it was removed as not relevant.
> > (It wasn't really.) I would basically remove the first
> > sentence:
> > 
> >         Note: Hash index operations are not presently WAL-logged,
> >   so hash indexes might need to be rebuilt with REINDEX  after a
> >   database crash. For this reason, hash index use is presently
> >   discouraged.
> > 
> > Ken
> > 
> > 
> > On Wed, Feb 04, 2009 at 01:22:23PM -0300, Alvaro Herrera wrote:
> > > Hi,
> > > 
> > > indices.sgml contains this paragraph about hash indexes:
> > > 
> > >     Note:  Testing has shown PostgreSQL's hash indexes to perform no
> > > better than B-tree indexes, and the index size and build time for hash
> > > indexes is much worse. Furthermore, hash index operations are not
> > > presently WAL-logged, so hash indexes might need to be rebuilt with
> > > REINDEX  after a database crash. For these reasons, hash index use is
> > > presently discouraged. 
> > > 
> > > 
> > > However, it seems to me that hash indexes are much improved in 8.4, so
> > > maybe this needs to be reworded.  I'm not sure to what point they have
> > > been improved though.
> > > 
> > > -- 
> > > Alvaro Herrera                                http://www.CommandPrompt.com/
> > > PostgreSQL Replication, Consulting, Custom Development, 24x7 support
> > > 

The speed improvement applies particularly to any datatype that is
larger than an integer (typically 4 bytes). Also the size of fields that
can be indexed efficiently is much, much larger than the 2K for Btree.
And even 32-bit quantities may be indexed more efficiently than Btrees
for large indexes due to the O(1) probe behavior. Btrees typically need
to cache/probe the upper levels of the tree to locate the tuple. I have
held off on extensive benchmarking until WAL has been implemented.

Regards,
Ken


Re: on hash indexes

From
Bruce Momjian
Date:
Kenneth Marshall wrote:
> I had submitted the documentation change as part of my
> hash function patch but it was removed as not relevant.
> (It wasn't really.) I would basically remove the first
> sentence:
>
>         Note: Hash index operations are not presently WAL-logged,
>   so hash indexes might need to be rebuilt with REINDEX  after a
>   database crash. For this reason, hash index use is presently
>   discouraged.

Change made and attached;  thanks.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: doc/src/sgml/indices.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/indices.sgml,v
retrieving revision 1.75
diff -c -c -r1.75 indices.sgml
*** doc/src/sgml/indices.sgml    23 Sep 2008 09:20:34 -0000    1.75
--- doc/src/sgml/indices.sgml    7 Feb 2009 20:03:51 -0000
***************
*** 190,202 ****

    <note>
     <para>
!     Testing has shown <productname>PostgreSQL</productname>'s hash
!     indexes to perform no better than B-tree indexes, and the
!     index size and build time for hash indexes is much worse.
!     Furthermore, hash index operations are not presently WAL-logged,
      so hash indexes might need to be rebuilt with <command>REINDEX</>
      after a database crash.
!     For these reasons, hash index use is presently discouraged.
     </para>
    </note>

--- 190,199 ----

    <note>
     <para>
!     Hash index operations are not presently WAL-logged,
      so hash indexes might need to be rebuilt with <command>REINDEX</>
      after a database crash.
!     For this reason, hash index use is presently discouraged.
     </para>
    </note>