Re: Hash index todo list item - Mailing list pgsql-hackers

From Kenneth Marshall
Subject Re: Hash index todo list item
Date
Msg-id 20070907132928.GC19403@it.is.rice.edu
Whole thread Raw
In response to Re: Hash index todo list item  (Neil Conway <neilc@samurai.com>)
Responses Re: Hash index todo list item
List pgsql-hackers
On Thu, Sep 06, 2007 at 11:56:25PM -0700, Neil Conway wrote:
> On Sun, 2007-02-09 at 13:04 -0500, Kenneth Marshall wrote:
> > 2. Evaluate the performance of different hash index implementations
> >    and/or changes to the current implementation. My current plan is
> >    to keep the implementation as simple as possible and still provide
> >    the desired performance. Several hash index suggestions deal with
> >    changing the layout of the keys on a page to improve lookup
> >    performance, including reducing the bucket size to a fraction of
> >    a page or only storing the hash value on the page, instead of
> >    the index value itself.
> 
> You might find this patch useful:
> 
>     http://archives.postgresql.org/pgsql-patches/2005-05/msg00164.php
> 
> It implements the "just store the hash in the index" idea; it also sorts
> the entries in a bucket by the hash value, which allows binary search to
> be used to locate candidate matches.
> 
> I was surprised that this didn't result in a performance improvement for
> the benchmarks that I ran, but I never got around to investigating
> further (either running more benchmarks or checking whether there was a
> bug in the implementation).
> 
> Unfortunately, the patch doesn't apply cleanly to HEAD, but I can merge
> it up to HEAD if you'd like.
> 
> -Neil
> 
This is a great starting point. I would appreciate it if you have the
time and could make it apply cleanly to HEAD. I remember when you first
posted it but had forgotten, probably because of the lack-luster results.
Just a quick glance at the patch and from what I can tell, tagging the
index as lossy causes a lot more work to be done than should be needed
in theory. Currently the index-scan machinery will recheck the value
against the original value for lossy indexes. However, given that we
are using a good hash function with a low chance of collision, if we
mark the unique items in the index then they do not actually have to
be rechecked during the scan. Do you have any suggestions for implementing
that optimization or is there any option to tell the scan machinery to
only re-check a certain list of tuples? Thank you again for pointing
this patch out and please let me know when you have a version against
HEAD.

Regards,
Ken
> 
> 


pgsql-hackers by date:

Previous
From: Markus Schiltknecht
Date:
Subject: Re: [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)
Next
From: Kenneth Marshall
Date:
Subject: Re: Hash index todo list item