Re: hung backends stuck in spinlock heavy endless loop - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: hung backends stuck in spinlock heavy endless loop
Date
Msg-id CAMkU=1woHWPyJzdmTf34M1zXHa4C9N06YmpuUw=PES3dK3euKQ@mail.gmail.com
Whole thread Raw
In response to Re: hung backends stuck in spinlock heavy endless loop  (Merlin Moncure <mmoncure@gmail.com>)
List pgsql-hackers
On Thu, Jan 22, 2015 at 1:50 PM, Merlin Moncure <mmoncure@gmail.com> wrote:

So far, the 'nasty' damage seems to generally if not always follow a
checksum failure and the checksum failures are always numerically
adjacent.  For example:

[cds2 12707 2015-01-22 12:51:11.032 CST 2754]WARNING:  page
verification failed, calculated checksum 9465 but expected 9477 at
character 20
[cds2 21202 2015-01-22 13:10:18.172 CST 3196]WARNING:  page
verification failed, calculated checksum 61889 but expected 61903 at
character 20
[cds2 29153 2015-01-22 14:49:04.831 CST 4803]WARNING:  page
verification failed, calculated checksum 27311 but expected 27316

I'm not up on the intricacies of our checksum algorithm but this is
making me suspicious that we are looking at a improperly flipped
visibility bit via some obscure problem -- almost certainly with
vacuum playing a role. 

That very much sounds like the block is getting duplicated from one place to another.

Even flipping one hint bit (aren't these index pages?  Do they have hint bits) should thoroughly scramble the checksum.

Because the checksum adds in the block number after the scrambling has been done, copying a page to another nearby location will just move the (expected) checksum a little bit.

Cheers,

Jeff

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: pg_upgrade and rsync
Next
From: Stephen Frost
Date:
Subject: Re: pg_upgrade and rsync