Re: [PATCH]-hash index improving - Mailing list pgsql-hackers

From Kenneth Marshall
Subject Re: [PATCH]-hash index improving
Date
Msg-id 20080717210135.GN337@it.is.rice.edu
Whole thread Raw
In response to Re: [PATCH]-hash index improving  ("Jonah H. Harris" <jonah.harris@gmail.com>)
List pgsql-hackers
On Thu, Jul 17, 2008 at 04:24:28PM -0400, Jonah H. Harris wrote:
> On Thu, Jul 17, 2008 at 5:26 AM, Xiao Meng <mx.cogito@gmail.com> wrote:
> > The patch store hash code only in the index tuple.
> > It based on  Neil Conway's patch with an old version of PostgreSQL.
> > It passes the regression test but I didn't test the performance yet.
> > Anyone interested can make a performance test;-)
> > You can undefine the macro HASHVALUE_ONLY in hash.h to get the
> > original implementation.
> > It's a preliminary implementation and  I'm looking for input here.
> > Hope to hear from you.
> 
> I've spent some time today performing tests similar to those mentioned
> here (http://archives.postgresql.org/pgsql-hackers/2007-09/msg00208.php)
> 
> Using a word list of 2650024 unique words (maximum length is 118
> bytes), build times are still high, but I'm not really seeing any
> performance improvements over b-tree.  I haven't profiled it yet, but
> my test is as follows:
> 
> - Created the dict table
> - Loaded the dict table
> - Counted the records in the dict table
> - Created the index
> - Shutdown the database
> - Randomly selected 200 entries from the word list and built a file
> full of (SELECT * FROM dict WHERE word = '<word>') queries using them.
> - Cleared out the kernel cache
> - Started the database
> - Ran the query file
> 
> The result of this is between 5-10ms improvement in the overall
> execution time of all 200 queries.  The time-per-query is practically
> unnoticeable.  As this is in the range of noise, methinks there's a
> larger problem with hash indexes.  I haven't looked heavily into their
> implementation, but do you any of you know of any major design flaws?
> 
Jonah,

Thank you for running these tests. I was trying to reproduce my initial
tests on the original system to make it more apples to apples, but the
latest release needs more resources semaphore-wise than the 8.2 and
to fix it on Solaris 8 I will need a reboot. Would you mind posting
the timings for the hash_only index versus the hash_value versus the
btree index for the same test. Also, what is the on-disk size of all
three indexes? This will allow us to figure out the bucket/page load
or fill-factor for each scenario.

The basic implementation looked reasonable. I will take a look at
the patch this evening.

Regards,
Ken


pgsql-hackers by date:

Previous
From: "Jonah H. Harris"
Date:
Subject: Re: [PATCH]-hash index improving
Next
From: Alvaro Herrera
Date:
Subject: Re: autovacuum crash due to null pointer