Re: Online verification of checksums - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Online verification of checksums
Date
Msg-id cf720a09-a060-efa7-4cbb-9c5f5eb2d1ab@2ndquadrant.com
Whole thread Raw
In response to Re: Online verification of checksums  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 3/6/19 6:26 PM, Robert Haas wrote:
> On Sat, Mar 2, 2019 at 4:38 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>> FWIW I don't think this qualifies as torn page - i.e. it's not a full
>> read with a mix of old and new data. This is partial write, most likely
>> because we read the blocks one by one, and when we hit the last page
>> while the table is being extended, we may only see the fist 4kB. And if
>> we retry very fast, we may still see only the first 4kB.
> 
> I see the distinction you're making, and you're right.  The problem
> is, whether in this case or whether for a real torn page, we don't
> seem to have a way to distinguish between a state that occurs
> transiently due to lack of synchronization and a situation that is
> permanent and means that we have corruption.  And that worries me,
> because it means we'll either report bogus complaints that will scare
> easily-panicked users (and anybody who is running this tool has a good
> chance of being in the "easily-panicked" category ...), or else we'll
> skip reporting real problems.  Neither is good.
> 

Sure, I'd also prefer having a tool that reliably detects all cases of
data corruption, and I certainly do share your concerns about false
positives and false negatives.

But maybe we shouldn't expect a tool meant to verify checksums to detect
various other issues.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Jeremy Schneider
Date:
Subject: Re: few more wait events to add to docs
Next
From: Andres Freund
Date:
Subject: Re: Pluggable Storage - Andres's take