Re: Lockless StrategyGetBuffer() clock sweep - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Lockless StrategyGetBuffer() clock sweep
Date
Msg-id CA+TgmoaZS7Dx2EqcXjxc2PMrFq=ciJibi6rbwcRO-Hdn64H0tg@mail.gmail.com
Whole thread Raw
In response to Lockless StrategyGetBuffer() clock sweep  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: Lockless StrategyGetBuffer() clock sweep
List pgsql-hackers
On Mon, Oct 27, 2014 at 9:32 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> I've previously posted a patch at
> http://archives.postgresql.org/message-id/20141010160020.GG6670%40alap3.anarazel.de
> that reduces contention in StrategyGetBuffer() by making the clock sweep
> lockless.  Robert asked me to post it to a new thread; I originally
> wrote it to see some other contention in more detail, that's why it
> ended up in that thread...

Does this LATCHPTR_ACCESS_ONCE crap really do anything?  Why do we
need that?  I'm not convinced that it's actually solving any problem
we would otherwise have with this code.  But if it is, then how about
adding a flag that is 4 bytes wide or less alongside bgwriterLatch,
and just checking the flag instead of checking bgwriterLatch itself?

Actually, the same approach would work for INT_ACCESS_ONCE.  So I
propose this approach instead: Add a new atomic flag to
StrategyControl, useSlowPath.  If StrategyGetBuffer() finds
useSlowPath set, call StrategyGetBufferSlow(), which will acquire the
spinlock, check whether bgwriterLatch is set and/or the freelist is
non-empty and return a buffer ID if able to allocate from the
freelist; otherwise -1.  If StrategyGetBufferSlow() returns -1, or we
decide not to call it in the first place because useSlowPath isn't
set, then fall through to your clock-sweep logic.  We can set
useSlowPath at stratup and whenever we put buffers on the free list.

The lack of memory barriers while testing useSlowPath (or, in your
version, when testing the ACCESS_ONCE conditions) means that we could
fail to set the bgwriter latch or pop from the freelist if another
backend has very recently updated those things.  But that's not
actually a problem; it's fine for any individual request to skip those
things, as they are more like hints than correctness requirements.

The interaction between this and the bgreclaimer idea is interesting.
We can't making popping the freelist lockless without somehow dealing
with the resulting A-B-A problem (namely, that between the time we
read &head->next and the time we CAS the list-head to that value, the
head might have been popped, another item pushed, and the original
list head pushed again).  So even if bgreclaimer saves some work for
individual backends - avoiding the need for them to clock-sweep across
many buffers - it may not be worth it if it means taking a spinlock to
pop the freelist instead of using an atomics-driven clock sweep.
Considering that there may be a million plus buffers to scan through,
that's a surprising conclusion, but it seems to be where the data is
pointing us.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Kevin Grittner
Date:
Subject: Re: Directory/File Access Permissions for COPY and Generic File Access Functions
Next
From: Robert Haas
Date:
Subject: Re: Materialized views don't show up in information_schema