Re: hash index improving v3 - Mailing list pgsql-patches
From | Alex Hunsaker |
---|---|
Subject | Re: hash index improving v3 |
Date | |
Msg-id | 34d269d40809102117g470c4d25l55be7898fc43fc40@mail.gmail.com Whole thread Raw |
In response to | Re: hash index improving v3 ("Alex Hunsaker" <badalex@gmail.com>) |
Responses |
Re: hash index improving v3
|
List | pgsql-patches |
On Wed, Sep 10, 2008 at 9:49 PM, Alex Hunsaker <badalex@gmail.com> wrote: > On Wed, Sep 10, 2008 at 7:04 AM, Kenneth Marshall <ktm@rice.edu> wrote: >> On Tue, Sep 09, 2008 at 07:23:03PM -0600, Alex Hunsaker wrote: >>> On Tue, Sep 9, 2008 at 7:48 AM, Kenneth Marshall <ktm@rice.edu> wrote: >>> > I think that the glacial speed for generating a big hash index is >>> > the same problem that the original code faced. >>> >>> Yeah sorry, I was not saying it was a new problem with the patch. Err >>> at least not trying to :) *Both* of them had been running at 18+ (I >>> finally killed them sometime Sunday or around +32 hours...) >>> >>> > It would be useful to have an equivalent test for the hash-only >>> > index without the modified int8 hash function, since that would >>> > be more representative of its performance. The collision rates >>> > that I was observing in my tests of the old and new mix() functions >>> > was about 2 * (1/10000) of what you test generated. You could just >>> > test against the integers between 1 and 2000000. >>> >>> Sure but then its pretty much just a general test of patch vs no >>> patch. i.e. How do we measure how much longer collisions take when >>> the new patch makes things faster? That's what I was trying to >>> measure... Though I apologize I don't think that was clearly stated >>> anywhere... >> >> Right, I agree that we need to benchmark the collision processing >> time difference. I am not certain that two data points is useful >> information. There are 469 collisions with our current hash function >> on the integers from 1 to 2000000. What about testing the performance >> at power-of-2 multiples of 500, i.e. 500, 1000, 2000, 4000, 8000,... >> Unless you adjust the fill calculation for the CREATE INDEX, I would >> stop once the time to create the index spikes. It might also be useful >> to see if a CLUSTER affects the performance as well. What do you think >> of that strategy? > > Not sure it will be a good benchmark of collision processing. Then > again you seem to have studied the hash algo closer than me. Ill go > see about doing this. Stay tuned. Assuming I understood you correctly, And I probably didn't this does not work very well because you max out at 27,006 values before you get this error: ERROR: index row size 8152 exceeds hash maximum 8144 HINT: Values larger than a buffer page cannot be indexed. So is a power-of-2 multiple of 500 not simply: x = 500; while(1) { print x; x *= 2; } ?
pgsql-patches by date: