Re: [HACKERS] Deadlock in XLogInsert at AIX - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: [HACKERS] Deadlock in XLogInsert at AIX
Date
Msg-id 0184657d-cf64-10f6-9706-7329f23f5f89@iki.fi
Whole thread Raw
In response to [HACKERS] Deadlock in XLogInsert at AIX  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Responses Re: [HACKERS] Deadlock in XLogInsert at AIX
List pgsql-hackers
On 01/24/2017 04:47 PM, Konstantin Knizhnik wrote:
> As I already mentioned, we built Postgres with LOCK_DEBUG , so we can
> inspect lock owner. Backend is waiting for itself!
> Now please look at two frames in this stack trace marked with red.
> XLogInsertRecord is setting WALInsert locks at the beginning of the
> function:
>
>      if (isLogSwitch)
>          WALInsertLockAcquireExclusive();
>      else
>          WALInsertLockAcquire();
>
> WALInsertLockAcquire just selects random item from WALInsertLocks array
> and exclusively locks:
>
>      if (lockToTry == -1)
>          lockToTry = MyProc->pgprocno % NUM_XLOGINSERT_LOCKS;
>      MyLockNo = lockToTry;
>      immed = LWLockAcquire(&WALInsertLocks[MyLockNo].l.lock, LW_EXCLUSIVE);
>
> Then, following the stack trace, AdvanceXLInsertBuffer calls
> WaitXLogInsertionsToFinish:
>
>              /*
>               * Now that we have an up-to-date LogwrtResult value, see if we
>               * still need to write it or if someone else already did.
>               */
>              if (LogwrtResult.Write < OldPageRqstPtr)
>              {
>                  /*
>                   * Must acquire write lock. Release WALBufMappingLock
> first,
>                   * to make sure that all insertions that we need to
> wait for
>                   * can finish (up to this same position). Otherwise we risk
>                   * deadlock.
>                   */
>                  LWLockRelease(WALBufMappingLock);
>
> WaitXLogInsertionsToFinish(OldPageRqstPtr);
>
>                  LWLockAcquire(WALWriteLock, LW_EXCLUSIVE);
>
>
> It releases WALBufMappingLock but not WAL insert locks!
> Finally in WaitXLogInsertionsToFinish tries to wait for all locks:
>
>      for (i = 0; i < NUM_XLOGINSERT_LOCKS; i++)
>      {
>          XLogRecPtr    insertingat = InvalidXLogRecPtr;
>
>          do
>          {
>              /*
>               * See if this insertion is in progress. LWLockWait will
> wait for
>               * the lock to be released, or for the 'value' to be set by a
>               * LWLockUpdateVar call.  When a lock is initially
> acquired, its
>               * value is 0 (InvalidXLogRecPtr), which means that we
> don't know
>               * where it's inserting yet.  We will have to wait for it.  If
>               * it's a small insertion, the record will most likely fit
> on the
>               * same page and the inserter will release the lock without
> ever
>               * calling LWLockUpdateVar.  But if it has to sleep, it will
>               * advertise the insertion point with LWLockUpdateVar before
>               * sleeping.
>               */
>              if (LWLockWaitForVar(&WALInsertLocks[i].l.lock,
>   &WALInsertLocks[i].l.insertingAt,
>                                   insertingat, &insertingat))
>
> And here we stuck!

Interesting.. What should happen here is that for the backend's own 
insertion slot, the "insertingat" value should be greater than the 
requested flush point ('upto' variable). That's because before 
GetXLogBuffer() calls AdvanceXLInsertBuffer(), it updates the backend's 
insertingat value, to the position that it wants to insert to. And 
AdvanceXLInsertBuffer() only calls WaitXLogInsertionsToFinish() with 
value smaller than what was passed as the 'upto' argument.

> The comment to WaitXLogInsertionsToFinish says:
>
>   * Note: When you are about to write out WAL, you must call this function
>   * *before* acquiring WALWriteLock, to avoid deadlocks. This function might
>   * need to wait for an insertion to finish (or at least advance to next
>   * uninitialized page), and the inserter might need to evict an old WAL
> buffer
>   * to make room for a new one, which in turn requires WALWriteLock.
>
> Which contradicts to the observed stack trace.

Not AFAICS. In the stack trace you showed, the backend is not holding 
WALWriteLock. It would only acquire it after the 
WaitXLogInsertionsToFinish() call finished.

> I wonder if it is really synchronization bug in xlog.c or there is
> something wrong in this stack trace and it can not happen in case of
> normal work?

Yeah, hard to tell. Something is clearly wrong..

This line in the stack trace is suspicious:

> WaitXLogInsertionsToFinish(upto = 102067874328), line 1583 in "xlog.c"

AdvanceXLInsertBuffer() should only ever call 
WaitXLogInsertionsToFinish() with an xlog position that points to a page 
bounary, but that upto value points to the middle of a page.

Perhaps the value stored in the stack trace is not what the caller 
passed, but it was updated because it was past the 'reserveUpto' value? 
That would explain the "request to flush past end
of generated WAL" notices you saw in the log. Now, why would that 
happen, I have no idea.

If you can and want to provide me access to the system, I could have a 
look myself. I'd also like to see if the attached additional Assertions 
will fire.

- Heikki


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: [HACKERS] multivariate statistics (v19)
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] privileges for changing schema owner