Home > mailing lists

Re: "multiple backends attempting to wait for pincount 1" - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: "multiple backends attempting to wait for pincount 1"
Date	February 14, 2015 22:11:13
Msg-id	9277.1423941053@sss.pgh.pa.us Whole thread Raw
In response to	Re: "multiple backends attempting to wait for pincount 1" (Andres Freund <andres@2ndquadrant.com>)
Responses	Re: "multiple backends attempting to wait for pincount 1"
List	pgsql-hackers

Tree view

Andres Freund <andres@2ndquadrant.com> writes:
> I don't think it's actually 675333 at fault here. I think it's a
> long standing bug in LockBufferForCleanup() that can just much easier be
> hit with the new interrupt code.

> Imagine what happens in LockBufferForCleanup() when ProcWaitForSignal()
> returns spuriously - something it's documented to possibly do (and which
> got more likely with the new patches). In the normal case UnpinBuffer()
> will have unset BM_PIN_COUNT_WAITER - but in a spurious return it'll
> still be set and LockBufferForCleanup() will see it still set.

Yeah, you're right: LockBufferForCleanup has never coped with the
possibility that ProcWaitForSignal returns prematurely.  I'm not sure
if that was possible when this code was written, but we've got it
documented as being possible at least back to 8.2.  So this needs to
be fixed in all branches.

> If you just gdb into the VACUUM process with 6647248e370884 checked out,
> and do a PGSemaphoreUnlock(&MyProc->sem) you'll hit it as well. I think
> we should simply move the buf->flags &= ~BM_PIN_COUNT_WAITER (Inside
> LockBuffer) to LockBufferForCleanup, besides the PinCountWaitBuf =
> NULL. Afaics, that should do the trick.

If we're moving the responsibility for clearing that flag from the waker
to the wakee, I think it would be smarter to duplicate all the logic
that's currently in UnlockBuffers(), just to make real sure we don't
drop somebody else's waiter flag.  So the bottom of the loop would
look more like this:             LockBufHdr(bufHdr);       if ((bufHdr->flags & BM_PIN_COUNT_WAITER) != 0 &&
bufHdr->wait_backend_pid== MyProcPid)       {           /* Release hold on the BM_PIN_COUNT_WAITER bit */
bufHdr->flags&= ~BM_PIN_COUNT_WAITER;           PinCountWaitBuf = NULL;           // optionally, we could check for pin
count1 here ...       }       UnlockBufHdr(bufHdr);       /* Loop back and try again */

Also we should rethink at least the comment in UnlockBuffers().
I'm not sure what the failure conditions are with this reassignment
of responsibility, but the described case couldn't occur anymore.
        regards, tom lane

pgsql-hackers by date:

From: "Henry B (Hank) Hotz, CISSP"
Date: 14 February 2015, 21:12:15
Subject: Re: reducing our reliance on MD5

From: Kevin Grittner
Date: 14 February 2015, 22:14:41
Subject: Re: "multiple backends attempting to wait for pincount 1"

Re: "multiple backends attempting to wait for pincount 1" - Mailing list pgsql-hackers

Previous

Next