Andres Freund <andres@2ndquadrant.com> wrote:
> I don't think it's actually 675333 at fault here. I think it's a
> long standing bug in LockBufferForCleanup() that can just much
> easier be hit with the new interrupt code.
The patches I'll be posting soon make it even easier to hit, which
is why I was trying to sort this out when Tom noticed the buildfarm
issues.
> Imagine what happens in LockBufferForCleanup() when
> ProcWaitForSignal() returns spuriously - something it's
> documented to possibly do (and which got more likely with the new
> patches). In the normal case UnpinBuffer() will have unset
> BM_PIN_COUNT_WAITER - but in a spurious return it'll still be set
> and LockBufferForCleanup() will see it still set.
That analysis makes sense to me.
> I think we should simply move the
> buf->flags &= ~BM_PIN_COUNT_WAITER (Inside LockBuffer)
I think you meant inside UnpinBuffer?
> to LockBufferForCleanup, besides the PinCountWaitBuf = NULL.
> Afaics, that should do the trick.
I tried that on the master branch (33e879c) (attached) and it
passes `make check-world` with no problems. I'm reviewing the
places that BM_PIN_COUNT_WAITER appears, to see if I can spot any
flaw in this. Does anyone else see a problem with it? Even though
it appears to be a long-standing bug, there don't appear to have
been any field reports, so it doesn't seem like something to
back-patch.
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company