Re: "multiple backends attempting to wait for pincount 1" - Mailing list pgsql-hackers

From Tom Lane
Subject Re: "multiple backends attempting to wait for pincount 1"
Date
Msg-id 9277.1423941053@sss.pgh.pa.us
Whole thread Raw
In response to Re: "multiple backends attempting to wait for pincount 1"  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: "multiple backends attempting to wait for pincount 1"  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
Andres Freund <andres@2ndquadrant.com> writes:
> I don't think it's actually 675333 at fault here. I think it's a
> long standing bug in LockBufferForCleanup() that can just much easier be
> hit with the new interrupt code.

> Imagine what happens in LockBufferForCleanup() when ProcWaitForSignal()
> returns spuriously - something it's documented to possibly do (and which
> got more likely with the new patches). In the normal case UnpinBuffer()
> will have unset BM_PIN_COUNT_WAITER - but in a spurious return it'll
> still be set and LockBufferForCleanup() will see it still set.

Yeah, you're right: LockBufferForCleanup has never coped with the
possibility that ProcWaitForSignal returns prematurely.  I'm not sure
if that was possible when this code was written, but we've got it
documented as being possible at least back to 8.2.  So this needs to
be fixed in all branches.

> If you just gdb into the VACUUM process with 6647248e370884 checked out,
> and do a PGSemaphoreUnlock(&MyProc->sem) you'll hit it as well. I think
> we should simply move the buf->flags &= ~BM_PIN_COUNT_WAITER (Inside
> LockBuffer) to LockBufferForCleanup, besides the PinCountWaitBuf =
> NULL. Afaics, that should do the trick.

If we're moving the responsibility for clearing that flag from the waker
to the wakee, I think it would be smarter to duplicate all the logic
that's currently in UnlockBuffers(), just to make real sure we don't
drop somebody else's waiter flag.  So the bottom of the loop would
look more like this:             LockBufHdr(bufHdr);       if ((bufHdr->flags & BM_PIN_COUNT_WAITER) != 0 &&
bufHdr->wait_backend_pid== MyProcPid)       {           /* Release hold on the BM_PIN_COUNT_WAITER bit */
bufHdr->flags&= ~BM_PIN_COUNT_WAITER;           PinCountWaitBuf = NULL;           // optionally, we could check for pin
count1 here ...       }       UnlockBufHdr(bufHdr);       /* Loop back and try again */
 


Also we should rethink at least the comment in UnlockBuffers().
I'm not sure what the failure conditions are with this reassignment
of responsibility, but the described case couldn't occur anymore.
        regards, tom lane



pgsql-hackers by date:

Previous
From: "Henry B (Hank) Hotz, CISSP"
Date:
Subject: Re: reducing our reliance on MD5
Next
From: Kevin Grittner
Date:
Subject: Re: "multiple backends attempting to wait for pincount 1"