andres@anarazel.de (Andres Freund) writes:
> On 2017-11-21 18:50:05 -0500, Tom Lane wrote:
>> (If Justin saw that while still on 9.6, then it'd be worth looking
>> closer.)
> Right. I took this to be referring to something before the current
> migration, but I might have overinterpreted things. There've been
> various forks/ports of pg around that had hand-coded replacements with
> futex usage, and there were definitely buggy versions going around a few
> years back.
Poking around in the archives reminded me of this thread:
https://www.postgresql.org/message-id/flat/14947.1475690465@sss.pgh.pa.us
which describes symptoms uncomfortably close to what Justin is showing.
I remember speculating that the SysV-sema implementation, because it'd
always enter the kernel, would provide some memory barrier behavior
that POSIX-sema code based on futexes might miss when taking the no-wait
path. I'd figured that any real problems of that sort would show up
pretty quickly, but that could've been over optimistic. Maybe we need
to take a closer look at where LWLocks devolve to blocking on the process
semaphore and see if there's any implicit assumptions about barriers there.
regards, tom lane