Thread: "multiple backends attempting to wait for pincount 1"

"multiple backends attempting to wait for pincount 1"

From

Tom Lane

Date:

13 February 2015, 05:27:15

Two different CLOBBER_CACHE_ALWAYS critters recently reported exactly
the same failure pattern on HEAD:

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=markhor&dt=2015-02-06%2011%3A59%3A59
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tick&dt=2015-02-12%2010%3A22%3A57

I'd say we have a problem.  I'd even go so far as to say that somebody has
completely broken locking, because this looks like autovacuum and manual
vacuuming are hitting the same table at the same time.
        regards, tom lane

Re: "multiple backends attempting to wait for pincount 1"

From

Andres Freund

Date:

13 February 2015, 09:00:15

On 2015-02-13 00:27:04 -0500, Tom Lane wrote:
> Two different CLOBBER_CACHE_ALWAYS critters recently reported exactly
> the same failure pattern on HEAD:
> 
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=markhor&dt=2015-02-06%2011%3A59%3A59
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tick&dt=2015-02-12%2010%3A22%3A57

Those are rather strange, yea.

Unfortunately both report a relatively large number of changes since the
last run...

> I'd say we have a problem.  I'd even go so far as to say that somebody has
> completely broken locking, because this looks like autovacuum and manual
> vacuuming are hitting the same table at the same time.

Hm. It seems likely that that would show up more widely.

Oddly enough other CLOBBER_CACHE animals, like
http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=jaguarundi&dt=2015-02-12%2013%3A03%3A00
, that run more frequently have not reported a problem. Neither has
leech which IIRC runs on the same system...

One avenue to look are my changes around both buffer pinning and
interrupt handling...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: "multiple backends attempting to wait for pincount 1"

From

Kevin Grittner

Date:

13 February 2015, 22:33:49

Andres Freund <andres@2ndquadrant.com> wrote:
> On 2015-02-13 00:27:04 -0500, Tom Lane wrote:

>> I'd say we have a problem.  I'd even go so far as to say that
>> somebody has completely broken locking, because this looks like
>> autovacuum and manual vacuuming are hitting the same table at
>> the same time.

> One avenue to look are my changes around both buffer pinning and
> interrupt handling...

I found a way to cause this reliably on my machine and did a
bisect.  That pointed to commit 6753333f55e1d9bcb9da4323556b456583624a07

For the record, I would build and start the cluster, start two psql
sessions, and paste this into the first session:

drop table if exists m;
create table m (id int primary key);
insert into m select generate_series(1, 1000000) x;
checkpoint;
vacuum analyze;
checkpoint;
delete from m where id between 50 and 100;
begin;
declare c cursor for select * from m;
fetch c;
fetch c;
fetch c;

As soon as I saw the fetches execute I hit Enter on this in the
other psql session:

vacuum freeze m;

It would block, and then within a minute (i.e., autovacuum_naptime)
I would get the error.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: "multiple backends attempting to wait for pincount 1"

From

Andres Freund

Date:

13 February 2015, 22:38:57

On 2015-02-13 22:33:35 +0000, Kevin Grittner wrote:
> Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2015-02-13 00:27:04 -0500, Tom Lane wrote:
> 
> >> I'd say we have a problem.  I'd even go so far as to say that
> >> somebody has completely broken locking, because this looks like
> >> autovacuum and manual vacuuming are hitting the same table at
> >> the same time.
> 
> > One avenue to look are my changes around both buffer pinning and
> > interrupt handling...
> 
> I found a way to cause this reliably on my machine and did a
> bisect.  That pointed to commit 6753333f55e1d9bcb9da4323556b456583624a07
> 
> For the record, I would build and start the cluster, start two psql
> sessions, and paste this into the first session:

> drop table if exists m;
> create table m (id int primary key);
> insert into m select generate_series(1, 1000000) x;
> checkpoint;
> vacuum analyze;
> checkpoint;
> delete from m where id between 50 and 100;
> begin;
> declare c cursor for select * from m;
> fetch c;
> fetch c;
> fetch c;
> 
> As soon as I saw the fetches execute I hit Enter on this in the
> other psql session:

> vacuum freeze m;
> 
> It would block, and then within a minute (i.e., autovacuum_naptime)
> I would get the error.

Great! Thanks for that piece of detective work. I've been travelling
until an hour ago and not looked yet. How did you get to that recipe?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: "multiple backends attempting to wait for pincount 1"

From

Kevin Grittner

Date:

13 February 2015, 23:05:29

Andres Freund <andres@2ndquadrant.com> wrote:

> How did you get to that recipe?

I have been working on some patches to allow vacuum to function in
the face of long-held snapshots.  (I'm struggling to get them into
presentable shape for the upcoming CF.)  I was devising the most
diabolical cases I could to try to break my patched code and
started seeing this error.  I was panicked that I had introduced
the bug, but on comparing to the master branch I found I was able
to cause it there, too.  So I saw this a couple days before the
report on list, and had some cases that *sometimes* caused the
error.  I tweaked until it seemed to be pretty reliable, and then
used that for the bisect.

I still consider you to be the uncontested champion of diabolical 
test cases, but I'm happy to have hit upon one that was useful 
here.  ;-)

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: "multiple backends attempting to wait for pincount 1"

From

Andres Freund

Date:

13 February 2015, 23:27:13

On 2015-02-13 23:05:16 +0000, Kevin Grittner wrote:
> Andres Freund <andres@2ndquadrant.com> wrote:
> 
> > How did you get to that recipe?
> 
> I have been working on some patches to allow vacuum to function in
> the face of long-held snapshots.  (I'm struggling to get them into
> presentable shape for the upcoming CF.)  I was devising the most
> diabolical cases I could to try to break my patched code and
> started seeing this error.  I was panicked that I had introduced
> the bug, but on comparing to the master branch I found I was able
> to cause it there, too.  So I saw this a couple days before the
> report on list, and had some cases that *sometimes* caused the
> error.  I tweaked until it seemed to be pretty reliable, and then
> used that for the bisect.
> 
> I still consider you to be the uncontested champion of diabolical 
> test cases, but I'm happy to have hit upon one that was useful 
> here.  ;-)

Hah. Not sure if that's something to be proud of :P

I don't think it's actually 675333 at fault here. I think it's a
long standing bug in LockBufferForCleanup() that can just much easier be
hit with the new interrupt code.

Imagine what happens in LockBufferForCleanup() when ProcWaitForSignal()
returns spuriously - something it's documented to possibly do (and which
got more likely with the new patches). In the normal case UnpinBuffer()
will have unset BM_PIN_COUNT_WAITER - but in a spurious return it'll
still be set and LockBufferForCleanup() will see it still set.

If you just gdb into the VACUUM process with 6647248e370884 checked out,
and do a PGSemaphoreUnlock(&MyProc->sem) you'll hit it as well. I think
we should simply move the buf->flags &= ~BM_PIN_COUNT_WAITER (Inside
LockBuffer) to LockBufferForCleanup, besides the PinCountWaitBuf =
NULL. Afaics, that should do the trick.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: "multiple backends attempting to wait for pincount 1"

From

Kevin Grittner

Date:

14 February 2015, 17:25:14

Andres Freund <andres@2ndquadrant.com> wrote:

> I don't think it's actually 675333 at fault here. I think it's a
> long standing bug in LockBufferForCleanup() that can just much
> easier be hit with the new interrupt code.

The patches I'll be posting soon make it even easier to hit, which
is why I was trying to sort this out when Tom noticed the buildfarm
issues.

> Imagine what happens in LockBufferForCleanup() when
> ProcWaitForSignal() returns spuriously - something it's
> documented to possibly do (and which got more likely with the new
> patches). In the normal case UnpinBuffer() will have unset
> BM_PIN_COUNT_WAITER - but in a spurious return it'll still be set
> and LockBufferForCleanup() will see it still set.

That analysis makes sense to me.

> I think we should simply move the
>   buf->flags &= ~BM_PIN_COUNT_WAITER (Inside LockBuffer)

I think you meant inside UnpinBuffer?

> to LockBufferForCleanup, besides the PinCountWaitBuf = NULL.
> Afaics, that should do the trick.

I tried that on the master branch (33e879c) (attached) and it
passes `make check-world` with no problems.  I'm reviewing the
places that BM_PIN_COUNT_WAITER appears, to see if I can spot any
flaw in this.  Does anyone else see a problem with it?  Even though
it appears to be a long-standing bug, there don't appear to have
been any field reports, so it doesn't seem like something to
back-patch.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

pincount-1-bugfix.diff

Re: "multiple backends attempting to wait for pincount 1"

From

Andres Freund

Date:

14 February 2015, 17:56:23

On 2015-02-14 17:25:00 +0000, Kevin Grittner wrote:
> Andres Freund <andres@2ndquadrant.com> wrote:
> > Imagine what happens in LockBufferForCleanup() when
> > ProcWaitForSignal() returns spuriously - something it's
> > documented to possibly do (and which got more likely with the new
> > patches). In the normal case UnpinBuffer() will have unset
> > BM_PIN_COUNT_WAITER - but in a spurious return it'll still be set
> > and LockBufferForCleanup() will see it still set.
> 
> That analysis makes sense to me.
> 
> > I think we should simply move the
> >   buf->flags &= ~BM_PIN_COUNT_WAITER (Inside LockBuffer)
> 
> I think you meant inside UnpinBuffer?

No, LockBufferHdr. What I meant was that the pincount can only be
manipulated while the buffer header spinlock is held.

> > to LockBufferForCleanup, besides the PinCountWaitBuf = NULL.
> > Afaics, that should do the trick.
> 
> I tried that on the master branch (33e879c) (attached) and it
> passes `make check-world` with no problems.  I'm reviewing the
> places that BM_PIN_COUNT_WAITER appears, to see if I can spot any
> flaw in this.  Does anyone else see a problem with it?  Even though
> it appears to be a long-standing bug, there don't appear to have
> been any field reports, so it doesn't seem like something to
> back-patch.

I was wondering about that as well. But I don't think I agree. The most
likely scenario for this to fail is in full table vacuums that have to
freeze rows - those are primarily triggered by autovacuum. I don't think
it's likely that such a error message would be discovered in the logs
unless it happens very regularly.

> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

> diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
> index e1e6240..40b2194 100644
> --- a/src/backend/storage/buffer/bufmgr.c
> +++ b/src/backend/storage/buffer/bufmgr.c
> @@ -1548,7 +1548,6 @@ UnpinBuffer(volatile BufferDesc *buf, bool fixOwner)
>              /* we just released the last pin other than the waiter's */
>              int            wait_backend_pid = buf->wait_backend_pid;
>  
> -            buf->flags &= ~BM_PIN_COUNT_WAITER;
>              UnlockBufHdr(buf);
>              ProcSendSignal(wait_backend_pid);
>          }
> @@ -3273,6 +3272,7 @@ LockBufferForCleanup(Buffer buffer)
>          else
>              ProcWaitForSignal();
>  
> +        bufHdr->flags &= ~BM_PIN_COUNT_WAITER;
>          PinCountWaitBuf = NULL;
>          /* Loop back and try again */
>      }

You can't manipulate flags without holding the spinlock. Otherwise you
(or the other writer) can easily cancel the other sides effects.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: "multiple backends attempting to wait for pincount 1"

From

Tom Lane

Date:

14 February 2015, 19:11:13

Andres Freund <andres@2ndquadrant.com> writes:
> I don't think it's actually 675333 at fault here. I think it's a
> long standing bug in LockBufferForCleanup() that can just much easier be
> hit with the new interrupt code.

> Imagine what happens in LockBufferForCleanup() when ProcWaitForSignal()
> returns spuriously - something it's documented to possibly do (and which
> got more likely with the new patches). In the normal case UnpinBuffer()
> will have unset BM_PIN_COUNT_WAITER - but in a spurious return it'll
> still be set and LockBufferForCleanup() will see it still set.

Yeah, you're right: LockBufferForCleanup has never coped with the
possibility that ProcWaitForSignal returns prematurely.  I'm not sure
if that was possible when this code was written, but we've got it
documented as being possible at least back to 8.2.  So this needs to
be fixed in all branches.

> If you just gdb into the VACUUM process with 6647248e370884 checked out,
> and do a PGSemaphoreUnlock(&MyProc->sem) you'll hit it as well. I think
> we should simply move the buf->flags &= ~BM_PIN_COUNT_WAITER (Inside
> LockBuffer) to LockBufferForCleanup, besides the PinCountWaitBuf =
> NULL. Afaics, that should do the trick.

If we're moving the responsibility for clearing that flag from the waker
to the wakee, I think it would be smarter to duplicate all the logic
that's currently in UnlockBuffers(), just to make real sure we don't
drop somebody else's waiter flag.  So the bottom of the loop would
look more like this:             LockBufHdr(bufHdr);       if ((bufHdr->flags & BM_PIN_COUNT_WAITER) != 0 &&
bufHdr->wait_backend_pid== MyProcPid)       {           /* Release hold on the BM_PIN_COUNT_WAITER bit */
bufHdr->flags&= ~BM_PIN_COUNT_WAITER;           PinCountWaitBuf = NULL;           // optionally, we could check for pin
count1 here ...       }       UnlockBufHdr(bufHdr);       /* Loop back and try again */

Also we should rethink at least the comment in UnlockBuffers().
I'm not sure what the failure conditions are with this reassignment
of responsibility, but the described case couldn't occur anymore.
        regards, tom lane

Re: "multiple backends attempting to wait for pincount 1"

From

Kevin Grittner

Date:

14 February 2015, 19:14:41

Andres Freund <andres@2ndquadrant.com> wrote:
> On 2015-02-14 17:25:00 +0000, Kevin Grittner wrote:

>>> I think we should simply move the
>>>  buf->flags &= ~BM_PIN_COUNT_WAITER (Inside LockBuffer)
>>
>> I think you meant inside UnpinBuffer?
>
> No, LockBufferHdr. What I meant was that the pincount can only be
> manipulated while the buffer header spinlock is held.

Oh, I see what you were saying -- I had read that a different way
entirely.  Got it.

>> Even though it appears to be a long-standing bug, there don't
>> appear to have been any field reports, so it doesn't seem like
>> something to back-patch.
>
> I was wondering about that as well. But I don't think I agree.
> The most likely scenario for this to fail is in full table
> vacuums that have to freeze rows - those are primarily triggered
> by autovacuum. I don't think it's likely that such a error
> message would be discovered in the logs unless it happens very
> regularly.

I guess we have some time before the next minor release to find any
problems with this; perhaps the benefit would outweigh the risk.
Anyone else want to weigh in on that?

> You can't manipulate flags without holding the spinlock.
> Otherwise you (or the other writer) can easily cancel the other
> sides effects.

So is the attached more like what you had in mind?  If not, feel
free to post a patch.  :-)

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

pincount-1-bugfix-v2.diff

Re: "multiple backends attempting to wait for pincount 1"

From

Andres Freund

Date:

17 February 2015, 16:57:57

On 2015-02-14 14:10:53 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > I don't think it's actually 675333 at fault here. I think it's a
> > long standing bug in LockBufferForCleanup() that can just much easier be
> > hit with the new interrupt code.
> 
> > Imagine what happens in LockBufferForCleanup() when ProcWaitForSignal()
> > returns spuriously - something it's documented to possibly do (and which
> > got more likely with the new patches). In the normal case UnpinBuffer()
> > will have unset BM_PIN_COUNT_WAITER - but in a spurious return it'll
> > still be set and LockBufferForCleanup() will see it still set.
> 
> Yeah, you're right: LockBufferForCleanup has never coped with the
> possibility that ProcWaitForSignal returns prematurely.  I'm not sure
> if that was possible when this code was written, but we've got it
> documented as being possible at least back to 8.2.  So this needs to
> be fixed in all branches.

Agreed.

> I think it would be smarter to duplicate all the logic
> that's currently in UnlockBuffers(), just to make real sure we don't
> drop somebody else's waiter flag.

ISTM that in LockBufferForCleanup() such a state shouldn't be accepted -
it'd be a sign of something going rather bad. I think asserting that
it's "our" flag is a good idea, but silently ignoring the fact sounds
like a bad plan.  As LockBufferForCleanup() really is only safe when
holding a SUE lock or heavier (otherwise one wait_backend_pid field
obviously would not be sufficient), there should never ever be another
waiter.

> > If you just gdb into the VACUUM process with 6647248e370884 checked out,
> > and do a PGSemaphoreUnlock(&MyProc->sem) you'll hit it as well. I think
> > we should simply move the buf->flags &= ~BM_PIN_COUNT_WAITER (Inside
> > LockBuffer) to LockBufferForCleanup, besides the PinCountWaitBuf =
> > NULL. Afaics, that should do the trick.
> 
> If we're moving the responsibility for clearing that flag from the waker
> to the wakee,

I'm not sure if that's the best plan.  Some buffers are pinned at an
incredible rate, sending a signal everytime might actually delay the
pincount waiter from actually getting through the loop. Unless we block
further buffer pins by any backend while the flag is set, which I think
would likely not be a good idea, there seem to be little benefit in
moving the responsibility.

The least invasive fix would be to to weaken the error check to not
trigger if it's not the first iteration through the loop... But that's
not particularly pretty.

I think just adding something like

...       /*        * Make sure waiter flag is reset - it might not be if        * ProcWaitForSignal() returned for
anotherreason than UnpinBuffer()        * signalling us.        */       LockBufHdr(bufHdr);       buf->flags &=
~BM_PIN_COUNT_WAITER;      Assert(bufHdr->wait_backend_pid == MyProcPid);       UnlockBufHdr(bufHdr);

       PinCountWaitBuf = NULL;       /* Loop back and try again */   }

to the bottom of the loop would suffice. I can't see a extra buffer
spinlock cycle matter in comparison to all the the other cost (like
ping/pong ing around between processes).

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: "multiple backends attempting to wait for pincount 1"

From

Tom Lane

Date:

17 February 2015, 18:14:17

Andres Freund <andres@2ndquadrant.com> writes:
> On 2015-02-14 14:10:53 -0500, Tom Lane wrote:
>> Andres Freund <andres@2ndquadrant.com> writes:
>>> If you just gdb into the VACUUM process with 6647248e370884 checked out,
>>> and do a PGSemaphoreUnlock(&MyProc->sem) you'll hit it as well. I think
>>> we should simply move the buf->flags &= ~BM_PIN_COUNT_WAITER (Inside
>>> LockBuffer) to LockBufferForCleanup, besides the PinCountWaitBuf =
>>> NULL. Afaics, that should do the trick.

>> If we're moving the responsibility for clearing that flag from the waker
>> to the wakee,

> I'm not sure if that's the best plan.  Some buffers are pinned at an
> incredible rate, sending a signal everytime might actually delay the
> pincount waiter from actually getting through the loop.

Hm, good point.  On the other hand, should we worry about the possibility
of a lost signal?  Moving the flag-clearing would guard against that,
which the current code does not.  But we've not seen field reports of such
issues AFAIR, so this might not be an important consideration.

> Unless we block
> further buffer pins by any backend while the flag is set, which I think
> would likely not be a good idea, there seem to be little benefit in
> moving the responsibility.

I concur that we don't want the flag to block other backends from
acquiring pins.  The whole point here is for VACUUM to lurk in the
background until it can proceed with deletion; we don't want it to take
priority over foreground queries.

> I think just adding something like

> ...
>         /*
>          * Make sure waiter flag is reset - it might not be if
>          * ProcWaitForSignal() returned for another reason than UnpinBuffer()
>          * signalling us.
>          */
>         LockBufHdr(bufHdr);
>         buf->flags &= ~BM_PIN_COUNT_WAITER;
>         Assert(bufHdr->wait_backend_pid == MyProcPid);
>         UnlockBufHdr(bufHdr);

>         PinCountWaitBuf = NULL;
>         /* Loop back and try again */
>     }

> to the bottom of the loop would suffice.

No, I disagree.  If we maintain the rule that the signaler clears
BM_PIN_COUNT_WAITER, then once that happens there is nothing to stop a
third party from trying to LockBufferForCleanup on the same buffer (except
for table-level locking conventions, which IMO this mechanism shouldn't be
dependent on).  So this coding would potentially clear the
BM_PIN_COUNT_WAITER flag belonging to that third party, and then fail the
Assert --- but only in debug builds, not in production, where it would
just silently lock up the third-party waiter.  So I think having a test to
verify that it's still "our" BM_PIN_COUNT_WAITER flag is essential.
        regards, tom lane

Re: "multiple backends attempting to wait for pincount 1"

From

Stefan Kaltenbrunner

Date:

23 February 2015, 06:03:45

On 02/13/2015 06:27 AM, Tom Lane wrote:
> Two different CLOBBER_CACHE_ALWAYS critters recently reported exactly
> the same failure pattern on HEAD:
> 
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=markhor&dt=2015-02-06%2011%3A59%3A59
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tick&dt=2015-02-12%2010%3A22%3A57
> 
> I'd say we have a problem.  I'd even go so far as to say that somebody has
> completely broken locking, because this looks like autovacuum and manual
> vacuuming are hitting the same table at the same time.

fwiw - looks like spoonbill(not doing CLOBBER_CACHE_ALWAYS) managed to
trigger that ones as well:

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbill&dt=2015-02-23%2000%3A00%3A06

there is also some failures from the BETWEEN changes in that
regression.diff but that might be fallout from the above problem.


Stefan

Re: "multiple backends attempting to wait for pincount 1"

From

Andres Freund

Date:

23 February 2015, 15:26:20

On 2015-02-17 13:14:00 -0500, Tom Lane wrote:
> Hm, good point.  On the other hand, should we worry about the possibility
> of a lost signal?  Moving the flag-clearing would guard against that,
> which the current code does not.  But we've not seen field reports of such
> issues AFAIR, so this might not be an important consideration.

I think if there were lost signals there'd be much bigger problems given
the same (or in master) similar mechanics are used for a lot of other
things including heavyweight and lightweight locks wait queues.

> > ...
> >         /*
> >          * Make sure waiter flag is reset - it might not be if
> >          * ProcWaitForSignal() returned for another reason than UnpinBuffer()
> >          * signalling us.
> >          */
> >         LockBufHdr(bufHdr);
> >         buf->flags &= ~BM_PIN_COUNT_WAITER;
> >         Assert(bufHdr->wait_backend_pid == MyProcPid);
> >         UnlockBufHdr(bufHdr);
> 
> >         PinCountWaitBuf = NULL;
> >         /* Loop back and try again */
> >     }
> 
> > to the bottom of the loop would suffice.
> 
> No, I disagree.  If we maintain the rule that the signaler clears
> BM_PIN_COUNT_WAITER, then once that happens there is nothing to stop a
> third party from trying to LockBufferForCleanup on the same buffer (except
> for table-level locking conventions, which IMO this mechanism shouldn't be
> dependent on).  So this coding would potentially clear the
> BM_PIN_COUNT_WAITER flag belonging to that third party, and then fail the
> Assert --- but only in debug builds, not in production, where it would
> just silently lock up the third-party waiter.  So I think having a test to
> verify that it's still "our" BM_PIN_COUNT_WAITER flag is essential.

Pushed with a test guarding against that. I still think it might be
slightly better to error out if somebody else waits, but I guess it's
unlikely that we'd mistakenly add code doing that.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services