Re: Online verification of checksums - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Online verification of checksums
Date
Msg-id X8M6e4nhjMR2e8t6@paquier.xyz
Whole thread Raw
In response to Re: Online verification of checksums  (Stephen Frost <sfrost@snowman.net>)
Responses Re: Online verification of checksums  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
On Fri, Nov 27, 2020 at 11:15:27AM -0500, Stephen Frost wrote:
> * Magnus Hagander (magnus@hagander.net) wrote:
>> On Thu, Nov 26, 2020 at 8:42 AM Michael Paquier <michael@paquier.xyz> wrote:
>>> But here the checksum is broken, so while the offset is something we
>>> can rely on how do you make sure that the LSN is fine?  A broken
>>> checksum could perfectly mean that the LSN is actually *not* fine if
>>> the page header got corrupted.
>
> Of course that could be the case, but it gets to be a smaller and
> smaller chance by checking that the LSN read falls within reasonable
> bounds.

FWIW, I find that scary.

>> We cannot rely on the LSN itself. But it's a lot more likely that we
>> can rely on the LSN changing, and on the LSN changing in a "correct
>> way". That is, if the LSN *decreases* we know it's corrupt. If the LSN
>> *doesn't change* we know it's corrupt. But if the LSN *increases* AND
>> the new page now has a correct checksum, it's very most likely to be
>> correct. You could perhaps even put cap on it saying "if the LSN
>> increased, but less than <n>", where <n> is a sufficiently high number
>> that it's entirely unreasonable to advanced that far between the
>> reading of two blocks. But it has to have a very high margin in that
>> case.
>
> This is, in fact, included in what was proposed- the "max increase"
> would be "the ending LSN of the backup".  I don't think we can make it
> any tighter than that though without risking false positives, which is
> surely worse than a false negative in this particular case- we already
> risk false negatives due to the fact that our checksum isn't perfect, so
> even a perfect check to make sure that the page will, in fact, be
> replayed over during crash recovery doesn't guarantee that there's no
> corruption.
>
> If there's an easy and cheap way to see if there was concurrent i/o
> happening for the page, then let's hear it.

This has been discussed for a couple of months now.  I would recommend
to go through this thread:
https://www.postgresql.org/message-id/CAOBaU_aVvMjQn=ge5qPiJOPMmOj5=ii3st5Q0Y+WuLML5sR17w@mail.gmail.com

And this bit is interesting, because that would give the guarantees
you are looking for with a page retry (just grep for BM_IO_IN_PROGRESS
on the thread):
https://www.postgresql.org/message-id/20201102193457.fc2hoen7ahth4bbc@alap3.anarazel.de

> One idea that has occured
> to me which hasn't been discussed is checking the file's mtime to see if
> it's changed since the backup started.  In that case, I would think it'd
> be something like:
>
> - Checksum is invalid
> - LSN is within range
> - Close file
> - Stat file
> - If mtime is from before the backup then signal possible corruption

I suspect that relying on mtime may cause problems.  One case coming
to my mind is NFS.

> In general, however, I don't like the idea of reaching into PG and
> asking PG for this page.

It seems to me that if we don't ask to PG what it thinks about a page,
we will never have a fully bullet-proof design either.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: [HACKERS] logical decoding of two-phase transactions
Next
From: Michael Paquier
Date:
Subject: Re: [PATCH] LWLock self-deadlock detection