Re: buffer assertion tripping under repeat pgbench load - Mailing list pgsql-hackers

From Tom Lane
Subject Re: buffer assertion tripping under repeat pgbench load
Date
Msg-id 14021.1356626035@sss.pgh.pa.us
Whole thread Raw
In response to Re: buffer assertion tripping under repeat pgbench load  (Greg Stark <stark@mit.edu>)
Responses Re: buffer assertion tripping under repeat pgbench load  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Greg Stark <stark@mit.edu> writes:
> On Thu, Dec 27, 2012 at 3:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The thing that this theory has a hard time with is that the buffer's
>> global refcount is zero.  If you assume that there's a bit that
>> sometimes randomly goes to 1 when it should be 0, then what I'd expect
>> to typically happen is that UnpinBuffer sees nonzero LocalRefCount and
>> hence doesn't drop the session's global pin when it should.  The only
>> way that doesn't happen is if decrementing LocalRefCount to zero stores
>> a nonzero pattern when it should store zero, but nonetheless the CPU
>> thinks it stored zero.

> It seems entirely plausible when you factor in the L2 cache. The cpu
> could be happily incrementing and decrementing the count entirely
> correctly blissfully unaware that the value being stored in the DRAM
> has this extra bit set every time. Not until the transaction ends and
> it has to refetch the cache line because enough time has passed for it
> to age out of L2 cache does it find the corrupt value.

Hmm ... that could be plausible.  It would be good if we could reproduce
this (or fail to) on some other machine.  Greg mentioned running some
memory diagnostics as well.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Dimitri Fontaine
Date:
Subject: Re: Event Triggers: adding information
Next
From: Alvaro Herrera
Date:
Subject: fix bgworkers in EXEC_BACKEND