Re: buffer assertion tripping under repeat pgbench load - Mailing list pgsql-hackers

From anarazel@anarazel.de
Subject Re: buffer assertion tripping under repeat pgbench load
Date
Msg-id 2bf7602e-35ab-4af8-98f5-f66f93437045@email.android.com
Whole thread Raw
In response to Re: buffer assertion tripping under repeat pgbench load  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: buffer assertion tripping under repeat pgbench load  (Greg Smith <greg@2ndQuadrant.com>)
List pgsql-hackers

Tom Lane <tgl@sss.pgh.pa.us> schrieb:

>Greg Smith <greg@2ndQuadrant.com> writes:
>> To try and speed up replicating this problem I switched to a smaller
>> database scale, 100, and I was able to get a crash there.  Here's the
>
>> latest:
>
>> 2012-12-26 00:01:19 EST [2278]: WARNING:  refcount of
>base/16384/57610
>> blockNum=118571, flags=0x106 is 1073741824 should be 0, globally: 0
>> 2012-12-26 00:01:19 EST [2278]: WARNING:  buffers with non-zero
>refcount
>> is 1
>> TRAP: FailedAssertion("!(RefCountErrors == 0)", File: "bufmgr.c",
>Line:
>> 1720)
>
>> That's the same weird 1073741824 count as before.  I was planning to
>> dump some index info, but then I saw this:
>
>> $ psql -d pgbench -c "select relname,relkind,relfilenode from
>pg_class
>> where relfilenode=57610"
>>       relname      | relkind | relfilenode
>> ------------------+---------+-------------
>>   pgbench_accounts | r       |       57610
>
>> Making me think this isn't isolated to being an index problem.
>
>Yeah, that destroys my theory that there's something broken about index
>management specifically.  Now we're looking for something that can
>affect any buffer's refcount, which more than likely means it has
>nothing to do with the buffer's contents ...
>
>> I tried
>> to soldier on with pg_filedump anyway.  It looks like the last
>version I
>> saw there (9.2.0 from November) doesn't compile anymore:
>
>Meh, looks like it needs fixes for Heikki's int64-xlogrecoff patch.
>I haven't gotten around to doing that yet, but would gladly take a
>patch if anyone wants to do it.  However, I now doubt that examining
>the buffer content will help much on this problem.
>
>Now that we know the bug's reproducible on smaller instances, could you
>put together an exact description of what you're doing to trigger
>it?  What is the DB configuration, pgbench parameters, etc?
>
>Also, it'd be worthwhile to just repeat the test a few more times
>to see if there's any sort of pattern in which buffers get affected.
>I'm now suspicious that it might not always be just one buffer,
>for example.

I don't think its necessarily only one buffer - if I read the above output correctly Greg used the suggested debug
outputwhich just put the elog(WARN) before the Assert... 

Greg, could you output all "bad" buffers and only assert after the loop if there was at least one refcounted buffer?

Andres

---
Please excuse the brevity and formatting - I am writing this on my mobile phone.



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: buffer assertion tripping under repeat pgbench load
Next
From: Greg Smith
Date:
Subject: Re: buffer assertion tripping under repeat pgbench load