Re: Online verification of checksums - Mailing list pgsql-hackers

From David Steele
Subject Re: Online verification of checksums
Date
Msg-id 8a4df8ee-0381-26ad-d09a-0367f03914a1@pgmasters.net
Whole thread Raw
In response to Re: Online verification of checksums  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Online verification of checksums  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
Hi Michael,

On 11/23/20 8:10 PM, Michael Paquier wrote:
> On Mon, Nov 23, 2020 at 10:35:54AM -0500, Stephen Frost wrote:
> 
>> Also- what is the point of reading the page from shared buffers
>> anyway..?  All we need to do is prove that the page will be rewritten
>> during WAL replay.  If we can prove that, we don't actually care what
>> the contents of the page are.  We certainly can't calculate the
>> checksum on a page we plucked out of shared buffers since we only
>> calculate the checksum when we go to write the page out.
> 
> A LSN-based check makes the thing tricky.  How do you make sure that
> pd_lsn is not itself broken?  It could be perfectly possible that a
> random on-disk corruption makes pd_lsn seen as having a correct value,
> still the rest of the page is borked.

We are not just looking at one LSN value. Here are the steps we are 
proposing (I'll skip checks for zero pages here):

1) Test the page checksum. If it passes the page is OK.
2) If the checksum does not pass then record the page offset and LSN and 
continue.
3) After the file is copied, reopen and reread the file, seeking to 
offsets where possible invalid pages were recorded in the first pass.
     a) If the page is now valid then it is OK.
     b) If the page is not valid but the LSN has increased from the LSN 
recorded in the previous pass then it is OK. We can infer this because 
the LSN has been updated in a way that is not consistent with storage 
corruption.

This is what we are planning for the first round of improving our page 
checksum validation. We believe that doing the retry in a second pass 
will be faster and more reliable because some time will have passed 
since the first read without having to build in a delay for each page error.

A further improvement is to check the ascending LSNs found in 3b against 
PostgreSQL to be completely sure they are valid. We are planning this 
for our second round of improvements.

Reopening the file for the second pass does require some additional logic:

1) The file may have been deleted by PG since the first pass and in that 
case we won't report any page errors.
2) The file may have been truncated by PG since the first pass so we 
won't report any errors past the point of truncation.

A malicious attacker could easily trick these checks, but as Stephen 
pointed out elsewhere they would likely make the checksums valid which 
would escape detection anyway.

We believe that the chances of random storage corruption passing all 
these checks is incredibly small, but eventually we'll also check 
against the WAL to be completely sure.

Regards,
-- 
-David
david@pgmasters.net



pgsql-hackers by date:

Previous
From: Daniil Zakhlystov
Date:
Subject: Re: libpq compression
Next
From: Andrew Gierth
Date:
Subject: Re: mark/restore failures on unsorted merge joins