Re: Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease
Date
Msg-id 20140214182141.GA11943@awork2.anarazel.de
Whole thread Raw
In response to Re: Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease  (Florian Pflug <fgp@phlo.org>)
Responses Re: Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease  (Florian Pflug <fgp@phlo.org>)
List pgsql-hackers
On 2014-02-14 18:49:33 +0100, Florian Pflug wrote:
> > I wouldn't consider it a major architecture... And I am not sure how
> > much out of order a CPU has to be to be affected by this, the read side
> > uses spinlocks, which in most of the spinlock implementations implies a
> > full memory barrier which in many cache coherency designs will cause
> > other CPUs to flush writes. And I think the control dependency in the
> > PGSemaphoreUnlock() loop will actually cause a flush on ppc...
> 
> I guess it's sufficient for the store to lwWaitLink to be delayed.
> As long as the CPU on which that store is executing hasn't managed to gain
> exclusive access to the relevant cache line, memory barriers on the read
> side won't help because the store won't be visible to other CPUs.

The whole lwlock actually should be on the same cacheline on anything
with cachelines >= 32. As the woken up side is doing an atomic op on it
*before* modifying lwWaitLink I think we are actually guaranteed that
any incomplete store on the writer will have completed unless the compiler
reordered. So we are safe against out of order stores, but not out of
order execution. That might have been what prevented the issue from
being noticable on some platforms.

> Well, the assumption isn't all that new. We already have the situation that
> a PGPROC may be not on any wait queue, yet its lwWaitLink may be non-NULL.
> Currently, the process who took it off the queue is responsible for rectifying
> that eventually, with the changed proposed below it'll be the backend who
> owns the PGPROC. From the point of view of other backends, nothing really
> changes.

That window is absolutely tiny tho.

> >> b) resetting lwWaitLink introduces a race condition (however unlikely)
> >> 
> >> we'll trade correctness for cleanliness if we continue to reset lwWaitLink
> >> without protecting against the race. That's a bit of a weird trade-off to make.
> > 
> > It's not just cleanliness, it's being able to actually debug crashes.
> 
> We could still arrange for the stray lwWaitLink from being visible only
> momentarily. If a backend is woken up and detects that lwWaiting is false,
> it knows that it cannot be on any wait queue - it was just removed, and
> hasn't added itself again yet. At that point, it's safe to reset lwWaitLink
> back to NULL. The stray value would thus only be visible while a backend executes
> the PGSemaphoreLock() loop, and whether or not this is the case can be deduced
> from a stack trace. So I'm not convinced that this makes debugging any harder.

That's not actually safe on an out of order architecture afaics. Without
barriers the store to lwWaitLink in the woken up backend can preempt the
read for the next element in the PGSemaphoreUnlock() loop.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Release schedule for 9.3.3?
Next
From: Josh Berkus
Date:
Subject: Re: Release schedule for 9.3.3?