Re: Hash Indexes - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: Hash Indexes
Date
Msg-id CAMkU=1xcDMWCOLV6z9R7YTCHGousLu85M5Vw=Vbv9RBvLL_Yfw@mail.gmail.com
Whole thread Raw
In response to Re: Hash Indexes  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Hash Indexes  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Thu, Sep 29, 2016 at 5:14 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Sep 29, 2016 at 8:07 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Wed, Sep 28, 2016 at 8:06 PM, Andres Freund <andres@anarazel.de> wrote:
>> On 2016-09-28 15:04:30 -0400, Robert Haas wrote:
>>> Andres already
>>> stated that he things working on btree-over-hash would be more
>>> beneficial than fixing hash, but at this point it seems like he's the
>>> only one who takes that position.
>>
>> Note that I did *NOT* take that position. I was saying that I think we
>> should evaluate whether that's not a better approach, doing some simple
>> performance comparisons.
>
> I, for one, agree with this position.

Well, I, for one, find it frustrating.  It seems pretty unhelpful to
bring this up only after the code has already been written.  The first
post on this thread was on May 10th.  The first version of the patch
was posted on June 16th.  This position was first articulated on
September 15th.

But, by all means, please feel free to do the performance comparison
and post the results.  I'd be curious to see them myself.


I've done a simple comparison using pgbench's default transaction, in which all the primary keys have been dropped and replaced with indexes of either hash or btree type, alternating over many rounds.

I run 'pgbench -c16 -j16 -T 900 -M prepared' on an 8 core machine with a scale of 40.  All the data fits in RAM, but not in shared_buffers (128MB).

I find a 4% improvement for hash indexes over btree indexes, 9324.744 vs 9727.766.  The difference is significant at p-value of 1.9e-9.

The four versions of hash indexes (HEAD, concurrent, wal, cache, applied cumulatively) have no statistically significant difference in performance from each other.

I certainly don't see how btree-over-hash-over-integer could be better than direct btree-over-integer.

I think I don't see improvement in hash performance with the concurrent and cache patches because I don't have enough cores to get to the contention that those patches are targeted at.  But since the concurrent patch is a prerequisite to the wal patch, that is enough to justify it even without a demonstrated performance boost.  A 4% gain is not astonishing, but is nice to have provided we can get it without giving up crash safety.

Cheers,

Jeff

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Tracking wait event for latches
Next
From: Tom Lane
Date:
Subject: Re: Hash Indexes