Re: [HACKERS] Checksums by default? - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [HACKERS] Checksums by default?
Date
Msg-id 34e15a92-0bde-6809-b7ba-0cc1681635ab@2ndquadrant.com
Whole thread Raw
In response to Re: [HACKERS] Checksums by default?  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
List pgsql-hackers
On 02/13/2017 02:29 AM, Jim Nasby wrote:
> On 2/10/17 6:38 PM, Tomas Vondra wrote:
>> And no, backups may not be a suitable solution - the failure happens on
>> a standby, and the page (luckily) is not corrupted on the master. Which
>> means that perhaps the standby got corrupted by a WAL, which would
>> affect the backups too. I can't verify this, though, because the WAL got
>> removed from the archive, already. But it's a possibility.
>
> Possibly related... I've got a customer that periodically has SR replias
> stop in their tracks due to WAL checksum failure. I don't think there's
> any hardware correlation (they've seen this on multiple machines).
> Studying the code, it occurred to me that if there's any bugs in the
> handling of individual WAL record sizes or pointers during SR then you
> could get CRC failures. So far every one of these occurrences has been
> repairable by replacing the broken WAL file on the replica. I've
> requested that next time this happens they save the bad WAL.

I don't follow. You're talking about WAL checksums, this thread is about 
data checksums. I'm not seeing any WAL checksum failure, but when the 
standby attempts to apply the WAL (in particular a Btree/DELETE WAL 
record), it detects an incorrect data checksum in the underlying table.

So either there's a hardware issue, or the heap got corrupted by some 
preceding WAL. Or maybe one of the tiny gnomes in the CPU got tired and 
punched the bits wrong.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Jim Nasby
Date:
Subject: Re: [HACKERS] gitlab post-mortem: pg_basebackup waiting forcheckpoint
Next
From: Jim Nasby
Date:
Subject: Re: [HACKERS] Removal of deprecated views pg_user, pg_group,pg_shadow