Re: hash index improving v3 - Mailing list pgsql-patches

From Kenneth Marshall
Subject Re: hash index improving v3
Date
Msg-id 20080910130401.GG6714@it.is.rice.edu
Whole thread Raw
In response to Re: hash index improving v3  ("Alex Hunsaker" <badalex@gmail.com>)
Responses Re: hash index improving v3
List pgsql-patches
On Tue, Sep 09, 2008 at 07:23:03PM -0600, Alex Hunsaker wrote:
> On Tue, Sep 9, 2008 at 7:48 AM, Kenneth Marshall <ktm@rice.edu> wrote:
> > I think that the glacial speed for generating a big hash index is
> > the same problem that the original code faced.
>
> Yeah sorry, I was not saying it was a new problem with the patch.  Err
> at least not trying to :) *Both* of them had been running at 18+ (I
> finally killed them sometime Sunday or around +32 hours...)
>
> > It would be useful to have an equivalent test for the hash-only
> > index without the modified int8 hash function, since that would
> > be more representative of its performance. The collision rates
> > that I was observing in my tests of the old and new mix() functions
> > was about 2 * (1/10000) of what you test generated. You could just
> > test against the integers between 1 and 2000000.
>
> Sure but then its pretty much just a general test of patch vs no
> patch.  i.e. How do we measure how much longer collisions take when
> the new patch makes things faster?  That's what I was trying to
> measure... Though I apologize I don't think that was clearly stated
> anywhere...

Right, I agree that we need to benchmark the collision processing
time difference. I am not certain that two data points is useful
information. There are 469 collisions with our current hash function
on the integers from 1 to 2000000. What about testing the performance
at power-of-2 multiples of 500, i.e. 500, 1000, 2000, 4000, 8000,...
Unless you adjust the fill calculation for the CREATE INDEX, I would
stop once the time to create the index spikes. It might also be useful
to see if a CLUSTER affects the performance as well. What do you think
of that strategy?

Regards,
Ken

> Now of course it still would be interesting...  And if its only to
> 2,000,000 I can still use the modified int8 or just use the int4
> one...
>
> Anyway Here are the numbers:
> create table test_hash(num int8);
> insert into test_hash (num) select generate_series(1, 2000000);
> create index test_hash_num_idx on test_hash (num);
>
> pgbench -c1 -n -t10000 -f bench_index.sql
> cvs head: tps = 3161.500511
> v5:           tps = 7248.808839
>
> BTW Im still planning on doing a wide vs narrow test... sometime... :)
>

pgsql-patches by date:

Previous
From: "Alex Hunsaker"
Date:
Subject: Re: hash index improving v3
Next
From: Zdenek Kotala
Date:
Subject: Re: hash index improving v3