Re: buffer assertion tripping under repeat pgbench load - Mailing list pgsql-hackers

From Greg Smith
Subject Re: buffer assertion tripping under repeat pgbench load
Date
Msg-id 50F2474F.5040204@2ndQuadrant.com
Whole thread Raw
In response to Re: buffer assertion tripping under repeat pgbench load  (Greg Stark <stark@mit.edu>)
Responses Re: buffer assertion tripping under repeat pgbench load  (Bruce Momjian <bruce@momjian.us>)
Re: buffer assertion tripping under repeat pgbench load  (Satoshi Nagayasu <snaga@uptime.jp>)
List pgsql-hackers
On 12/26/12 7:23 PM, Greg Stark wrote:
> It's also possible it's a bad cpu, not bad memory. If it affects
> decrement or increment in particular it's possible that the pattern of
> usage on LocalRefCount is particularly prone to triggering it.

This looks to be the winning answer.  It turns out that under extended 
multi-hour loads at high concurrency, something related to CPU 
overheating was occasionally flipping a bit.  One round of compressed 
air for all the fans/vents, a little tweaking of the fan controls, and 
now the system goes >24 hours with no problems.

Sorry about all the noise over this.  I do think the improved warning 
messages that came out of the diagnosis ideas are useful.  The reworked 
code must slows down the checking a few cycles, but if you care about 
performance these assertions are tacked onto the biggest pig around.

I added the patch to the January CF as "Improve buffer refcount leak 
warning messages".  The sample I showed with the patch submission was a 
simulated one.  Here's the output from the last crash before resolving 
the issue, where the assertion really triggered:

WARNING:  buffer refcount leak: [170583] (rel=base/16384/16578, 
blockNum=302295, flags=0x106, refcount=0 1073741824)
WARNING:  buffers with non-zero refcount is 1
TRAP: FailedAssertion("!(RefCountErrors == 0)", File: "bufmgr.c", Line: 
1712)

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com



pgsql-hackers by date:

Previous
From: Amit kapila
Date:
Subject: Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]
Next
From: Greg Smith
Date:
Subject: Re: Enabling Checksums