Re: [PATCHES] updated hash functions for postgresql v1 - Mailing list pgsql-hackers

From Kenneth Marshall
Subject Re: [PATCHES] updated hash functions for postgresql v1
Date
Msg-id 20090109202914.GA4930@it.is.rice.edu
Whole thread Raw
In response to Re: [PATCHES] updated hash functions for postgresql v1  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: [PATCHES] updated hash functions for postgresql v1
List pgsql-hackers
On Fri, Jan 09, 2009 at 12:04:15PM -0800, Jeff Davis wrote:
> On Mon, 2008-12-22 at 13:47 -0600, Kenneth Marshall wrote: 
> > Dear PostgreSQL developers,
> > 
> > I am re-sending this to keep this last change to the
> > internal hash function on the radar.
> > 
> 
> Hi Ken,
> 
> A few comments:
> 
> 1. New patch with very minor changes attached.
> 
> 2. I reverted the change you made to indices.sgml. We still don't use
> WAL for hash indexes, and in my opinion we should continue to discourage
> their use until we do use WAL. We can add back in the comment that hash
> indexes are suitable for large keys if we have some results to show
> that.
> 
> 3. There was a regression test failure in union.sql because the ordering
> of the results was different. I updated the regression test.
> 
> 4. Hash functions affect a lot more than hash indexes, so I ran through
> a variety of tests that use a HashAggregate plan. Test setup and results
> are attached. These results show no difference between the old and the
> new code (about 0.1% better).
> 
> 5. The hash index build time shows some improvement. The new code won in
> every instance in which a there were a lot of duplicates in the table
> (100 distinct values, 50K of each) by around 5%.
> 
> The new code appeared to be the same or slightly worse in the case of
> hash index builds with few duplicates (1000000 distinct values, 5 of
> each). The difference was about 1% worse, which is probably just noise.
> 
> Note: I'm no expert on hash functions. Take all of my tests with a grain
> of salt.
> 
> I would feel a little better if I saw at least one test that showed
> better performance of the new code on a reasonable-looking distribution
> of data. The hash index build that you showed only took a second or two
> -- it would be nice to see a test that lasted at least a minute.
> 
> Regards,
>     Jeff Davis
> 
> 

Jeff,

Thanks for the review. I would not really expect any differences in hash
index build times other than normal noise variances. The most definitive
benchmark that I have seen was done with my original patch submission
in March posted by Luke of Greenplum:
 "We just applied this and saw a 5 percent speedup on a hash aggregation  query with four columns in a 'group by'
clauserun against a single  TPC-H table."
 

I wonder if they would be willing to re-run their test? Thanks again.

Regards,
Ken



pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: [PATCHES] updated hash functions for postgresql v1
Next
From: Greg Smith
Date:
Subject: Re: Improving compressibility of WAL files