Re: Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease
Date
Msg-id 20140211130757.GE31598@awork2.anarazel.de
Whole thread Raw
In response to Re: Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease  ("MauMau" <maumau307@gmail.com>)
Responses Re: Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease  ("MauMau" <maumau307@gmail.com>)
List pgsql-hackers
On 2014-02-11 21:46:04 +0900, MauMau wrote:
> From: "Andres Freund" <andres@2ndquadrant.com>
> >which means they manipulate the lwWaitLink queue without
> >protection. That's done intentionally. The code tries to protect against
> >corruption of the list to do a woken up backend acquiring a lock (this
> >or an independent one) by only continuing when the lwWaiting flag is set
> >to false. Unfortunately there's absolutely no guarantee that a) the
> >assignment to lwWaitLink and lwWaiting are done in that order b) that
> >the stores are done in-order from the POV of other backends.
> >So what we need to do is to acquire a write barrier between the
> >assignments to lwWaitLink and lwWaiting, i.e.
> >       proc->lwWaitLink = NULL;
> >       pg_write_barrier();
> >       proc->lwWaiting = false;
> >the reader side already uses an implicit barrier by using spinlocks.
> 
> I've got a report from one customer that they encountered a hang during
> performance benchmarking.  They were using PostgreSQL 9.2.4.  I remember
> that the stack trace showed many backends blocked forever at LWLockAcuuire()
> during btree insert operation.  I'm not sure this has something to do with
> what you are raising, but the release notes for 9.2.5/6 doesn't suggest any
> fixes for this.  So I felt there is something wrong with lwlocks.
> 
> Do you think that your question could cause my customer's problem --
> backends block at lwlock forever?

It's x86, right? Then it's unlikely to be actual unordered memory
accesses, but if the compiler reordered:   LOG_LWDEBUG("LWLockRelease", T_NAME(l), T_ID(l), "release waiter");   proc =
head;  head = proc->lwWaitLink;   proc->lwWaitLink = NULL;   proc->lwWaiting = false;   PGSemaphoreUnlock(&proc->sem);
 
to   LOG_LWDEBUG("LWLockRelease", T_NAME(l), T_ID(l), "release waiter");   proc = head;   proc->lwWaiting = false;
head= proc->lwWaitLink;   proc->lwWaitLink = NULL;   PGSemaphoreUnlock(&proc->sem);
 
which it is permitted to do, yes, that could cause symptoms like you
describe.

Any chance you have the binaries the customer ran back then around?
Disassembling that piece of code might give you a hint whether that's a
possible cause.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: "MauMau"
Date:
Subject: Re: Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease
Next
From: Robert Haas
Date:
Subject: Re: Patch: show xid and xmin in pg_stat_activity and pg_stat_replication