Re: updated hash functions for postgresql v1 - Mailing list pgsql-patches

From Kenneth Marshall
Subject Re: updated hash functions for postgresql v1
Date
Msg-id 20071029033626.GA27949@it.is.rice.edu
Whole thread Raw
In response to Re: updated hash functions for postgresql v1  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-patches
On Sun, Oct 28, 2007 at 08:06:58PM +0000, Simon Riggs wrote:
> On Sun, 2007-10-28 at 13:05 -0500, Kenneth Marshall wrote:
> > On Sun, Oct 28, 2007 at 05:27:38PM +0000, Simon Riggs wrote:
> > > On Sat, 2007-10-27 at 15:15 -0500, Kenneth Marshall wrote:
> > > > Its features include a better and faster hash function.
> > >
> > > Looks very promising. Do you have any performance test results to show
> > > it really is faster, when compiled into Postgres? Better probably needs
> > > some definition also; in what way are the hash functions better?
> > >
> > > --
> > >   Simon Riggs
> > >   2ndQuadrant  http://www.2ndQuadrant.com
> > >
> > The new hash function is roughly twice as fast as the old function in
> > terms of straight CPU time. It uses the same design as the current
> > hash but provides code paths for aligned and unaligned access as well
> > as separate mixing functions for different blocks in the hash run
> > instead of having one general purpose block. I think the speed will
> > not be an obvious win with smaller items, but will be very important
> > when hashing larger items (up to 32kb).
> >
> > Better in this case means that the new hash mixes more thoroughly
> > which results in less collisions and more even bucket distribution.
> > There is also a 64-bit varient which is still faster since it can
> > take advantage of the 64-bit processor instruction set.
>
> Ken, I was really looking for some tests that show both of the above
> were true. We've had some trouble proving the claims of other algorithms
> before, so I'm less inclined to take those things at face value.
>
> I'd suggest tests with Integers, BigInts, UUID, CHAR(20) and CHAR(100).
> Others may have different concerns.
>

Simon,

I agree, that we should not take claims withoug testing them ourselves.
My main motivation for posting the patch was to get feedback on how to
add support for 64-bit hashes that will work with all of our supported
platforms. I am trying to avoid the "work on a feature in isolation...
and submit a giant patch with many problems" problem. I intend to do
more extensive testing, but I am trying to reach a basic implementation
level before I start the testing. I am pretty good with theory, but my
coding skills are out of practice. It will take me longer to generate
the tests now and without any clear benefit to the hash index implementation.
I am willing to test further, but I would like to have my testing benefit
the hash index implementation and not just the effectiveness and efficiency
of the hashing algorithm.

Regards,
Ken
> --
>   Simon Riggs
>   2ndQuadrant  http://www.2ndQuadrant.com
>
>

pgsql-patches by date:

Previous
From: "Luke Lonergan"
Date:
Subject: Re: updated hash functions for postgresql v1
Next
From: Magnus Hagander
Date:
Subject: mingw autoconf again