Re: Hash index use presently(?) discouraged since 2005: revive or bury it? - Mailing list pgsql-performance

From Jeff Janes
Subject Re: Hash index use presently(?) discouraged since 2005: revive or bury it?
Date
Msg-id CAMkU=1xYy0rH9E7JfgMJEzDuESTs=diRkuiRfh7VF8RA+Mg4PA@mail.gmail.com
Whole thread Raw
In response to Re: Hash index use presently(?) discouraged since 2005: revive or bury it?  (Stefan Keller <sfkeller@gmail.com>)
List pgsql-performance
On Sun, Sep 18, 2011 at 7:59 AM, Stefan Keller <sfkeller@gmail.com> wrote:
> Merlin and Jeff,
>
> General remark again:It's hard for me to imagine that btree is
> superior for all the issues mentioned before. I still believe in hash
> index for primary keys and certain unique constraints where you need
> equality search and don't need ordering or range search.

I certainly agree that hash indexes as implemented in PG
could be improved on.

>
> 2011/9/17 Jeff Janes <jeff.janes@gmail.com>:
> (...)
>> Also, that link doesn't address concurrency of selects at all, only of inserts.
>
> How would (or did) you test and benchmark concurrency of inserts and selects?
> Use pgbench with own config for a blackbox test?

I used pgbench -S -M prepared with a scale that fits in
shared_buffers, at various concurrencies.  drop the pgbench_accounts
primary key and build alternatingly a regular index and a hash index
between runs.  (If the scale doesn't fit in memory, that should
advantage the hash, but I haven't seen a large one--I've never tested
a size at which the branch blocks don't fit in memory)

It is hard to see real differences here because the index is not the
main bottleneck, regardless of which index is in use (at least on only
8 CPUs, with enough CPUs you might be able to drive the hash index
over the edge)

I also used a custom pgbench option -P, (a patch adding which feature
I was supposed to submit to this commitfest, but missed).  Cuts down
on a lot of the network chatter, locking, and other overhead and so
simulates an index look up occurring on the inside of a nested loop.

The performance at -c 1 was roughly equal, but at -c 8 the hash was
three times slower.

I don't recall concurrent testing inserts (not for speed, anyway).

Cheers,

Jeff

pgsql-performance by date:

Previous
From: Claudio Freire
Date:
Subject: Re: How to make hash indexes fast
Next
From: MirrorX
Date:
Subject: cannot use multicolumn index