Re: buffer assertion tripping under repeat pgbench load - Mailing list pgsql-hackers

From Greg Smith
Subject Re: buffer assertion tripping under repeat pgbench load
Date
Msg-id 50EDC6B6.2080103@2ndQuadrant.com
Whole thread Raw
In response to Re: buffer assertion tripping under repeat pgbench load  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On 12/23/12 3:17 PM, Simon Riggs wrote:
> We already have PrintBufferLeakWarning() for this, which might be a bit neater.

It does look like basically the same info.  I hacked the code to
generate this warning all the time.  Patch from Andres I've been using:

WARNING:  refcount of buf 1 containing global/12886 blockNum=0,
flags=0x106 is 0 should be 0, globally: 0

And using PrintBufferLeakWarning with the same input:

WARNING:  buffer refcount leak: [001] (rel=global/12886, blockNum=0,
flags=0x106, refcount=0 0)

Attached is a change I'd propose be committed.  This doesn't do anything
in a non-assertion build, it just follows all of the "why don't you
try..." suggestions I've gotten here.  I still have no idea why these
are popping out for me.  I'm saving the state on all this to try and
track it down again later.  I'm out of time to bang my head on this
particular thing anymore right now, it doesn't seem that important given
it's not reproducible anywhere else.

Any assertion errors that come out will be more useful with just this
change though.  Output looks like this (from hacked code to always trip
it and shared_buffers=16):

...
WARNING:  buffer refcount leak: [016]
(rel=pg_tblspc/0/PG_9.3_201212081/0/0_(null), blockNum=4294967295,
flags=0x0, refcount=0 0)
WARNING:  buffers with non-zero refcount is 16
TRAP: FailedAssertion("!(RefCountErrors == 0)", File: "bufmgr.c", Line:
1724)

I intentionally used a regular diff for this small one because it shows
the subtle changes I made here better.  I touched more than I strictly
had to because this area of the code is filled with off by one
corrections, and adding this warning highlighted one I objected to.
Note how the function is defined:

PrintBufferLeakWarning(Buffer buffer)

The official buffer identifier description says:

typedef int Buffer;
Zero is invalid, positive is the index of a shared buffer (1..NBuffers),
negative is the index of a local buffer (-1 .. -NLocBuffer).

However, the loop in AtEOXact_Buffers aimed to iterate over
PrivateRefCount, using array indexes 0...NBuffers-1.  It was using an
array index rather than a proper buffer identification number.  And that
meant when I tried to pass it to PrintBufferLeakWarning, it asserted on
the buffer id--because "Zero is invalid".

I could have just fixed that with an off by one fix in the other
direction.  That seemed wrong to me though.  All of the other code like
PrintBufferLeakWarning expects to see a Buffer value.  It already
corrects for the off by one problem like this:

                 buf = &BufferDescriptors[buffer - 1];
                 loccount = PrivateRefCount[buffer - 1];

It seemed cleaner to me to make the i variable in AtEOXact_Buffers be a
Buffer number.  Then you access PrivateRefCount[buffer - 1] in that
code, making it look like more of the surrounding references.  And the
value goes directly into PrintBufferLeakWarning without a problem.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

Attachment

pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Further pg_upgrade analysis for many tables
Next
From: Stephen Frost
Date:
Subject: Re: Index build temp files