Re: Lockless StrategyGetBuffer() clock sweep - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Lockless StrategyGetBuffer() clock sweep
Date
Msg-id 20141029190927.GF17724@awork2.anarazel.de
Whole thread Raw
In response to Re: Lockless StrategyGetBuffer() clock sweep  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Lockless StrategyGetBuffer() clock sweep
Re: Lockless StrategyGetBuffer() clock sweep
List pgsql-hackers
On 2014-10-29 14:18:33 -0400, Robert Haas wrote:
> On Mon, Oct 27, 2014 at 9:32 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > I've previously posted a patch at
> > http://archives.postgresql.org/message-id/20141010160020.GG6670%40alap3.anarazel.de
> > that reduces contention in StrategyGetBuffer() by making the clock sweep
> > lockless.  Robert asked me to post it to a new thread; I originally
> > wrote it to see some other contention in more detail, that's why it
> > ended up in that thread...
> 
> Does this LATCHPTR_ACCESS_ONCE crap really do anything?

Well, t forces the variable to be read from memory, instead of allowing
it to be read from a register... I'd much rather have a generic, type
independent, ACCESS_ONCE macro, but I'm not aware how to do that in a
way that works across compilers. It also prevents the compiler from
doing speculative writes.

> Why do we need that?  I'm not convinced that it's actually solving any
> problem we would otherwise have with this code.

I agree that in this case we can get rid of it.

> But if it is, then how about
> adding a flag that is 4 bytes wide or less alongside bgwriterLatch,
> and just checking the flag instead of checking bgwriterLatch itself?

Yea, that'd be nicer. I didn't do it because it made the patch slightly
more invasive... The complexity really is only needed because we're not
guaranteed that 64bit reads are atomic. Although we actually can be
sure, because there's no platform with nonatomic intptr_t reads - so
64bit platforms actually *do* have atomic 64bit reads/writes.

So if we just have a integer 'setBgwriterLatch' somewhere we're good. We
don't even need to take a lock to set it. Afaics the worst that can
happen is that several processes set the latch...

> Actually, the same approach would work for INT_ACCESS_ONCE.  So I
> propose this approach instead: Add a new atomic flag to
> StrategyControl, useSlowPath.  If StrategyGetBuffer() finds
> useSlowPath set, call StrategyGetBufferSlow(), which will acquire the
> spinlock, check whether bgwriterLatch is set and/or the freelist is
> non-empty and return a buffer ID if able to allocate from the
> freelist; otherwise -1.  If StrategyGetBufferSlow() returns -1, or we
> decide not to call it in the first place because useSlowPath isn't
> set, then fall through to your clock-sweep logic.  We can set
> useSlowPath at stratup and whenever we put buffers on the free list.

I don't like the idea to to conflate the slowpath for the bgwriter latch
with the freelist slowpath. And I don't really see what that'd buy us
over what's in the patch right now?

> The lack of memory barriers while testing useSlowPath (or, in your
> version, when testing the ACCESS_ONCE conditions) means that we could
> fail to set the bgwriter latch or pop from the freelist if another
> backend has very recently updated those things.  But that's not
> actually a problem; it's fine for any individual request to skip those
> things, as they are more like hints than correctness requirements.

Right.

> The interaction between this and the bgreclaimer idea is interesting.
> We can't making popping the freelist lockless without somehow dealing
> with the resulting A-B-A problem (namely, that between the time we
> read &head->next and the time we CAS the list-head to that value, the
> head might have been popped, another item pushed, and the original
> list head pushed again).

I think if we really feel the need to, we can circumvent the ABA problem
here. But I'm not yet convinced that there's the need to do so.  I'm
unsure that a single process that touches all the buffers at some
frequency is actually a good idea on modern NUMA systems...

I wonder if we could make a trylock for spinlocks work - then we could
look at the freelist if the lock is free and just go to the clock cycle
otherwise. My guess is that that'd be quite performant.  IIRC it's just
the spinlock semaphore fallback that doesn't know how to do trylock...


> So even if bgreclaimer saves some work for
> individual backends - avoiding the need for them to clock-sweep across
> many buffers - it may not be worth it if it means taking a spinlock to
> pop the freelist instead of using an atomics-driven clock sweep.
> Considering that there may be a million plus buffers to scan through,
> that's a surprising conclusion, but it seems to be where the data is
> pointing us.

I'm not really convinced of this yet either. It might just be that the
bgreclaimer implementation isn't good enough. But to me that doesn't
really change the fact that there's clear benefit in this patch - even
if we can make bgreclaimer beneficial the lockless scan will be good.

My feeling is that make bg*writer* more efficient is more likely to be
beneficial overall than introducing bgreclaimer. I've seen the clock
sweeps really hurt in cases where many many buffers are dirty.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: lag_until_you_get_something() OVER () window function
Next
From: Robert Haas
Date:
Subject: Re: pg_receivexlog --status-interval add fsync feedback