Re: Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease - Mailing list pgsql-hackers

From Florian Pflug
Subject Re: Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease
Date
Msg-id 974C503F-3B73-4447-82DE-434E266E2327@phlo.org
Whole thread Raw
In response to Re: Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On Feb14, 2014, at 16:51 , Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-02-14 15:03:16 +0100, Florian Pflug wrote:
>> On Feb14, 2014, at 14:07 , Andres Freund <andres@2ndquadrant.com> wrote:
>>> On 2014-02-14 13:52:45 +0100, Florian Pflug wrote:
>>>>> I agree we should do that, but imo not in the backbranches. Anything
>>>>> more than than the minimal fix in that code should be avoided in the
>>>>> stable branches, this stuff is friggin performance sensitive, and the
>>>>> spinlock already is a *major* contention point in many workloads.
>>>>
>>>> No argument there. But unless I missed something, there actually is a bug
>>>> there, and using volatile won't fix it. A barrier would, but what about the
>>>> back branches that don't have barrier.h?
>>>
>>> Yea, but I don't see a better alternative. Realistically the likelihood
>>> of a problem without the compiler reordering stuff is miniscule on any
>>> relevant platform that's supported by the !barrier.h branches. Newer
>>> ARMs are the only realistic suspect, and the support for in older
>>> releases wasn't so good...
>>
>> Isn't POWER/PowerPC potentially affected by this?
>
> I wouldn't consider it a major architecture... And I am not sure how
> much out of order a CPU has to be to be affected by this, the read side
> uses spinlocks, which in most of the spinlock implementations implies a
> full memory barrier which in many cache coherency designs will cause
> other CPUs to flush writes. And I think the control dependency in the
> PGSemaphoreUnlock() loop will actually cause a flush on ppc...

I guess it's sufficient for the store to lwWaitLink to be delayed.
As long as the CPU on which that store is executing hasn't managed to gain
exclusive access to the relevant cache line, memory barriers on the read
side won't help because the store won't be visible to other CPUs.

>> Also, there is still the alternative of not resetting lwWaitLink in LWLockRelease.
>> I can understand why you oppose that - it's certainly nicer to not have stray
>> pointer lying around. But since (as least as far as we know)
>>
>> a) resetting lwWaitLink is not strictly necessary
>
> I don't want to rely on that in the backbranches, that's an entirely new
> assumption. I think anything more than minimal changes are hard to
> justify for a theoretical issue that hasn't been observed in the field.
>
> I think the biggest danger here is that writes are reordered by the
> compiler, that we definitely need to protect against. Making a variable
> volatile or introducing a memory barrier is pretty simple to analyze.

Well, the assumption isn't all that new. We already have the situation that
a PGPROC may be not on any wait queue, yet its lwWaitLink may be non-NULL.
Currently, the process who took it off the queue is responsible for rectifying
that eventually, with the changed proposed below it'll be the backend who
owns the PGPROC. From the point of view of other backends, nothing really
changes.

>
>> b) resetting lwWaitLink introduces a race condition (however unlikely)
>>
>> we'll trade correctness for cleanliness if we continue to reset lwWaitLink
>> without protecting against the race. That's a bit of a weird trade-off to make.
>
> It's not just cleanliness, it's being able to actually debug crashes.

We could still arrange for the stray lwWaitLink from being visible only
momentarily. If a backend is woken up and detects that lwWaiting is false,
it knows that it cannot be on any wait queue - it was just removed, and
hasn't added itself again yet. At that point, it's safe to reset lwWaitLink
back to NULL. The stray value would thus only be visible while a backend executes
the PGSemaphoreLock() loop, and whether or not this is the case can be deduced
from a stack trace. So I'm not convinced that this makes debugging any harder.

(knizhnik has just suggested the same)

It's interesting, BTW, that updating lwWaitLink after lwWaiting is OK here -
the crucial property is not that it's updated before lwWaiting, but rather that
it is reset before enqueuing the backend again. Which is something that backend
itself can guarantee far more easily than whoever happens to de-queue it due to
spurious wakeups.

best regards,
Florian Pflug




pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: narwhal and PGDLLIMPORT
Next
From: Peter C Rigby
Date:
Subject: Re: git-review: linking commits to review discussion in git