Re: WIP: dynahash replacement for buffer table - Mailing list pgsql-hackers

From Robert Haas
Subject Re: WIP: dynahash replacement for buffer table
Date
Msg-id CA+TgmobKOF=gV2ue5p14iAJj0QuQdL+eBLXsM8+rogMjABmG8A@mail.gmail.com
Whole thread Raw
In response to Re: WIP: dynahash replacement for buffer table  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: WIP: dynahash replacement for buffer table
List pgsql-hackers
On Tue, Oct 14, 2014 at 10:47 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-10-14 09:30:58 -0400, Robert Haas wrote:
>> I took the basic infrastructure from before and used it to replace the
>> buffer table.  Patch is attached.
>
> Hm. Unless I missed something you pretty much completely eradicated
> locks from the buffer mapping code? That'd be pretty cool, but it's also
> scary.
>
> How confident are you that your conversion is correct? Not in the sense
> that there aren't any buglets, but fundamentally.

It doesn't look particularly dangerous to me.  Famous last words.

Basically, I think what we're doing right now is holding the buffer
mapping lock so that the buffer can't be renamed under us while we're
pinning it.  If we don't do that, I think the only consequence is
that, by the time we get the buffer pin, the buffer might no longer be
the one we looked up.  So we need to recheck.  But assuming we do
that, I don't see an issue.  In fact, it occurred to me while I was
cobbling this together that we might want to experiment with that
change independently of chash.  We already know (from the
StrategyGetBuffer locking changes) that holding system-wide locks to
prevent people from twaddling the state of individual buffers can be a
loser.  This case isn't nearly as bad, because we're only pinning one
buffer, rather than potentially all of them multiple times, and the
lock we're holding only affects 1/128th of the buffers, not all of
them.

The other application I had in mind for chash was SLRU stuff.  I
haven't tried to write the code yet, but fundamentally, the same
considerations apply there.  You do the lookup locklessly to find out
which buffer is thought to contain the SLRU page you want, but by the
time you lock the page the answer might be out of date.  Assuming that
this doesn't happen many times in a row and that lookups are
relatively cheap, you could get rid of having any centralized lock for
SLRU, and just have per-page locks.

I'm not quite sure what fundamental dangers you're thinking about
here, but from what I understand from reading the literature, doing
lookups in a lock-free hash table needs to be thought about in a sort
of relativistic way.  The different CPUs have slightly different views
of the world, so any answer you get may be obsolete by the time you
get it, and may according to some other CPU's view of the world have
been obsolete even at the time that it was copied.  That's OK, because
it's just equivalent to some other CPU doing its hash table update
slightly sooner or later than it actually did it, within the bounds
imposed by memory barriers, which it could have done anyway.  There is
no one source of truth.  The result of all this is that some of the
synchronization responsibility is transferred to the caller.  That's
obviously a bit more complex to reason about, but it hasn't stopped
lock-free hash tables from being wildly popular data structures.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: WIP: dynahash replacement for buffer table
Next
From: Amit Kapila
Date:
Subject: Re: WIP: dynahash replacement for buffer table