Re: hash index improving v3 - Mailing list pgsql-patches

From Alex Hunsaker
Subject Re: hash index improving v3
Date
Msg-id 34d269d40809102117g470c4d25l55be7898fc43fc40@mail.gmail.com
Whole thread Raw
In response to Re: hash index improving v3  ("Alex Hunsaker" <badalex@gmail.com>)
Responses Re: hash index improving v3
List pgsql-patches
On Wed, Sep 10, 2008 at 9:49 PM, Alex Hunsaker <badalex@gmail.com> wrote:
> On Wed, Sep 10, 2008 at 7:04 AM, Kenneth Marshall <ktm@rice.edu> wrote:
>> On Tue, Sep 09, 2008 at 07:23:03PM -0600, Alex Hunsaker wrote:
>>> On Tue, Sep 9, 2008 at 7:48 AM, Kenneth Marshall <ktm@rice.edu> wrote:
>>> > I think that the glacial speed for generating a big hash index is
>>> > the same problem that the original code faced.
>>>
>>> Yeah sorry, I was not saying it was a new problem with the patch.  Err
>>> at least not trying to :) *Both* of them had been running at 18+ (I
>>> finally killed them sometime Sunday or around +32 hours...)
>>>
>>> > It would be useful to have an equivalent test for the hash-only
>>> > index without the modified int8 hash function, since that would
>>> > be more representative of its performance. The collision rates
>>> > that I was observing in my tests of the old and new mix() functions
>>> > was about 2 * (1/10000) of what you test generated. You could just
>>> > test against the integers between 1 and 2000000.
>>>
>>> Sure but then its pretty much just a general test of patch vs no
>>> patch.  i.e. How do we measure how much longer collisions take when
>>> the new patch makes things faster?  That's what I was trying to
>>> measure... Though I apologize I don't think that was clearly stated
>>> anywhere...
>>
>> Right, I agree that we need to benchmark the collision processing
>> time difference. I am not certain that two data points is useful
>> information. There are 469 collisions with our current hash function
>> on the integers from 1 to 2000000. What about testing the performance
>> at power-of-2 multiples of 500, i.e. 500, 1000, 2000, 4000, 8000,...
>> Unless you adjust the fill calculation for the CREATE INDEX, I would
>> stop once the time to create the index spikes. It might also be useful
>> to see if a CLUSTER affects the performance as well. What do you think
>> of that strategy?
>
> Not sure it will be a good benchmark of collision processing.  Then
> again you seem to have studied the hash algo closer than me.  Ill go
> see about doing this.  Stay tuned.

Assuming I understood you correctly, And I probably didn't this does
not work very well because you max out at 27,006 values before you get
this error:
ERROR:  index row size 8152 exceeds hash maximum 8144
HINT:  Values larger than a buffer page cannot be indexed.

So is a power-of-2 multiple of 500 not simply:
x = 500;
while(1)
{
    print x;
    x *= 2;
}

?

pgsql-patches by date:

Previous
From: "Alex Hunsaker"
Date:
Subject: Re: hash index improving v3
Next
From: Peter Eisentraut
Date:
Subject: Re: still alive?