Re: [PATCH]-hash index improving - Mailing list pgsql-hackers

From Dann Corbit
Subject Re: [PATCH]-hash index improving
Date
Msg-id D425483C2C5C9F49B5B7A41F8944154701000F81@postal.corporate.connx.com
Whole thread Raw
In response to Re: [PATCH]-hash index improving  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: [PATCH]-hash index improving  ("Jonah H. Harris" <jonah.harris@gmail.com>)
Re: [PATCH]-hash index improving  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-hackers
-----Original Message-----

> From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Simon Riggs
> Sent: Thursday, July 17, 2008 4:10 PM

> To: Jonah H. Harris

> Cc: Xiao Meng; pgsql-hackers@postgresql.org; Kenneth Marshall

> Subject: Re: [HACKERS] [PATCH]-hash index improving

>

>

> On Thu, 2008-07-17 at 16:24 -0400, Jonah H. Harris wrote:

> > I'm not really seeing any performance improvements over b-tree.

>

> I'd like to see a theoretical analysis on the benefit case before we

> spend too many teraflops on investigation.

>

> In which cases do we expect that hash indexes will beat btrees?

Large table unique index equality search should be very fast with hashed
index (and the only place where any advantage will be seen).  Hashed
indexes are useless for any search besides equality and gain more and
more when the levels of the b-tree index increase.

Here is a hash index lookup that is frightfully fast:
http://www.corpit.ru/mjt/tinycdb.html

It's a constant database, but the file format might be worth
examination.  Here is a quickie definition of the CDB format:
http://cr.yp.to/cdb/cdb.txt

I think that the problem with hashed indexes tends to be the blocking of
index pages on disk.  For instance, the FastDB/GigaBase author was able
to make FastDB memory based hash indexes that are faster than the memory
based b-tree versions, but not for the disk based GigaBase database.
The only way to get better performance from hash based indexes is to
read fewer index pages than if a tree-based index were used.  So I think
that the scheme used to create the index pages is the focus to make them
worthwhile.



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: TABLE-function patch vs plpgsql
Next
From: David Fetter
Date:
Subject: Re: [PATCH]-hash index improving