Re: Online verification of checksums - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: Online verification of checksums
Date
Msg-id 20201130142712.GE16415@tamriel.snowman.net
Whole thread Raw
In response to Re: Online verification of checksums  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Online verification of checksums  (David Steele <david@pgmasters.net>)
List pgsql-hackers
Greetings,

* Michael Paquier (michael@paquier.xyz) wrote:
> On Fri, Nov 27, 2020 at 11:15:27AM -0500, Stephen Frost wrote:
> > * Magnus Hagander (magnus@hagander.net) wrote:
> >> On Thu, Nov 26, 2020 at 8:42 AM Michael Paquier <michael@paquier.xyz> wrote:
> >>> But here the checksum is broken, so while the offset is something we
> >>> can rely on how do you make sure that the LSN is fine?  A broken
> >>> checksum could perfectly mean that the LSN is actually *not* fine if
> >>> the page header got corrupted.
> >
> > Of course that could be the case, but it gets to be a smaller and
> > smaller chance by checking that the LSN read falls within reasonable
> > bounds.
>
> FWIW, I find that scary.

There's ultimately different levels of 'scary' and the risk here that
something is actually wrong following these checks strikes me as being
on the same order as random bits being flipped in the page and still
getting a valid checksum (which is entirely possible with our current
checksum...), or maybe even less.  Both cases would result in a false
negative, which is surely bad, though that strikes me as better than a
false positive, where we say there's corruption when there isn't.

> And this bit is interesting, because that would give the guarantees
> you are looking for with a page retry (just grep for BM_IO_IN_PROGRESS
> on the thread):
> https://www.postgresql.org/message-id/20201102193457.fc2hoen7ahth4bbc@alap3.anarazel.de

There's no guarantee that the page is still in shared buffers or that we
have a buffer descriptor still for it by the time we're doing this, as I
said up-thread.  This approach requires that we reach into PG, acquire
at least a buffer descriptor and set BM_IO_IN_PROGRESS on it and then
read the page again and checksum it again before finally looking at the
(now 'trusted' LSN, even though it might have had some bits flipped in
it and we wouldn't know..) and see if it's higher than the start of the
backup, and maybe less than the current LSN.  Maybe we can avoid
actually pulling the page into shared buffers (reading it into our own
memory instead) and just have the buffer descriptor but none of this
seems like it's going to be very unobtrusive in either code or the
running system, and it isn't going to give us an actual guarantee that
there's been no corruption.  The amount that it improves on the checks
that I outline above seems to be exceedingly small and the question is
if it's worth it for, most likely, exclusively pg_basebackup (unless
we're going to figure out a way to expose this via SQL, which seems
unlikely).

> > One idea that has occured
> > to me which hasn't been discussed is checking the file's mtime to see if
> > it's changed since the backup started.  In that case, I would think it'd
> > be something like:
> >
> > - Checksum is invalid
> > - LSN is within range
> > - Close file
> > - Stat file
> > - If mtime is from before the backup then signal possible corruption
>
> I suspect that relying on mtime may cause problems.  One case coming
> to my mind is NFS.

I agree that it might not be perfect but it also seems like something
which could be reasonably cheaply checked and the window (between when
the backup started and the time we hit this torn page) is very likely to
be large enough that the mtime will have been updated and be different
(and forward, if it was modified) of what it was at the time the backup
started.  It's also something that incremental backups may be looking
at, so if there's serious problems with it then there's a good chance
you've got bigger issues.

> > In general, however, I don't like the idea of reaching into PG and
> > asking PG for this page.
>
> It seems to me that if we don't ask to PG what it thinks about a page,
> we will never have a fully bullet-proof design either.

None of this is bullet-proof, it's all trade-offs.

Thanks,

Stephen

Attachment

pgsql-hackers by date:

Previous
From: Alexey Kondratov
Date:
Subject: Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
Next
From: Jesper Pedersen
Date:
Subject: Re: [PATCH] Keeps tracking the uniqueness with UniqueKey