Re: Hash index use presently(?) discouraged since 2005: revive or bury it? - Mailing list pgsql-performance

From Jeff Janes
Subject Re: Hash index use presently(?) discouraged since 2005: revive or bury it?
Date
Msg-id CAMkU=1y82n_mhy-YA8-jC7vR_Nwsajcn3s8i5A_Q=CmwQaMZcQ@mail.gmail.com
Whole thread Raw
In response to Re: Hash index use presently(?) discouraged since 2005: revive or bury it?  (Stefan Keller <sfkeller@gmail.com>)
List pgsql-performance
On Wed, Sep 14, 2011 at 4:03 PM, Stefan Keller <sfkeller@gmail.com> wrote:
> 2011/9/14 Tom Lane <tgl@sss.pgh.pa.us> writes:
>> (...) I think that
>> the current state of affairs is still what depesz said, namely that
>> there might be cases where they'd be a win to use, except the lack of
>> WAL support is a killer.  I imagine somebody will step up and do that
>> eventually.


I think that adding WAL to hash indexes without first
addressing the heavy-weight locking issue would be a mistake.
Even if the WAL was fixed, the bad performance under
concurrent selects would still make it at best a narrow
niche thing.  And fixing the locking *after* WAL is in place would
probably be very much harder than the other order.

> How much of work (in man days) do you estimate would this mean for
> someone who can program but has to learn PG internals first?

Are these 8 hour days? :)

I think it could be several months at least and a high likelihood of not
getting done at all.  (depending on how good the person is, of course).

They would first have to become familiar with the WAL log and replay system.
This is quite hairy.

Also, I think that adding WAL to hash indexes would be even harder than for
other indexes, because of bucket-splits, which can touch an arbitrarily high
number of pages.  At least, that is what lead me to give up on this last time
I looked into it seriously.

I think that if it were not for those bucket-splits, it would be
relatively easy
to get rid of both the heavy-weight locks, and to add WAL logging.  I had
considered proposing making hash indexes have a fixed number of buckets
specified at creation time.  That would be an unfortunate limitation, but I
think it would be a net win over non-WAL, non-highly-concurrent hash indexes
that currently exist.  Especially if the number of buckets could be enlarged
by concurrently making a new, larger, index and then dropping the old one.
I've only thought about proposing it, because currently I don't have time
to do anything on it if the proposal was well received.


Cheers,

Jeff

pgsql-performance by date:

Previous
From: Jeff Janes
Date:
Subject: Re: Hash index use presently(?) discouraged since 2005: revive or bury it?
Next
From: Merlin Moncure
Date:
Subject: Re: Hash index use presently(?) discouraged since 2005: revive or bury it?