Re: ENOSPC FailedAssertion("!(RefCountErrors == 0)" - Mailing list pgsql-hackers

From Tom Lane
Subject Re: ENOSPC FailedAssertion("!(RefCountErrors == 0)"
Date
Msg-id 20436.1531751966@sss.pgh.pa.us
Whole thread Raw
In response to Re: ENOSPC FailedAssertion("!(RefCountErrors == 0)"  (Andres Freund <andres@anarazel.de>)
Responses Re: ENOSPC FailedAssertion("!(RefCountErrors == 0)"
List pgsql-hackers
Andres Freund <andres@anarazel.de> writes:
> On 2018-07-15 18:48:43 -0400, Tom Lane wrote:
>> So basically, WAL replay hits an error while holding a buffer pin, and
>> nothing is done to release the buffer pin, but AtProcExit_Buffers thinks
>> something should have been done.

> I think there's a few other cases where we hit this. I've seen something
> similar from inside checkpointer / BufferSync(). I'd be surprised if
> bgwriter couldn't be triggered into the same.

Hm, yeah, on reflection it's pretty obvious that those are hazard cases.

> I'm pretty sure that we do *not* force a panic on all nonzero-exit-code
> cases for other subprocesses.

That's my recollection as well -- mostly, we just start a new one.

So I said I didn't want to do extra work on this, but I am looking into
fixing it by having these aux process types run a ResourceOwner that can
be told to clean up any open buffer pins at exit.  We could be sure the
coverage is complete by dint of removing the special-case code in
resowner.c that allows buffer pins to be taken with no active resowner.
Then CheckForBufferLeaks can be left as-is, ie something we do only
in assert builds.

            regards, tom lane


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: patch to allow disable of WAL recycling
Next
From: Laurenz Albe
Date:
Subject: Re: Libpq support to connect to standby server as priority