Hi,
On 2019-03-19 22:39:16 +0100, Michael Banck wrote:
> Am Dienstag, den 19.03.2019, 13:00 -0700 schrieb Andres Freund:
> > a) checks that the page is all zeroes if PageIsNew() (like
> > PageIsVerified() does for the backend). That avoids missing cases
> > where corruption just zeroed out the header, but not the whole page.
>
> We can't run pg_checksum_page() on those afterwards though as it would
> fire an assertion:
>
> |pg_checksums: [...]/../src/include/storage/checksum_impl.h:194:
> |pg_checksum_page: Assertion `!(((PageHeader) (&cpage->phdr))->pd_upper
> |== 0)' failed.
>
> But we should count it as a checksum error and generate an appropriate
> error message in that case.
All I'm saying is that if PageIsNew() you need to run the same checks
that PageIsVerified() runs in that case. Namely verifying that the page
is all-zeroes, rather than just the pd_upper field. That's separate
from running pg_checksum_page().
> > b) Check that pd_lsn is between startlsn and the insertion pointer. That
> > avoids accepting just about all random data.
>
> However, for pg_checksums being a stand-alone application it can't just
> access the insertion pointer, can it? We could maybe set a threshold
> from the last checkpoint after which we consider the pd_lsn bogus. But
> what's a good threshold here?
That's *PRECISELY* my point. I think it's a bad idea to do online
checksumming from outside the backend. It needs to be inside the
backend, and if there's any verification failures on a block, it needs
to acquire the IO lock on the page, and reread from disk.
Greetings,
Andres Freund