Re: Online checksums verification in the backend - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Online checksums verification in the backend
Date
Msg-id 20201103093650.GA2298@paquier.xyz
Whole thread Raw
In response to Re: Online checksums verification in the backend  (Magnus Hagander <magnus@hagander.net>)
List pgsql-hackers
On Tue, Nov 03, 2020 at 09:31:20AM +0100, Magnus Hagander wrote:
> On Mon, Nov 2, 2020 at 8:35 PM Andres Freund <andres@anarazel.de> wrote:
>> On 2020-11-02 12:35:30 -0500, Robert Haas wrote:
>>> It feels really confusing to me that the user-exposed function here is
>>> called pg_relation_check_pages(). How is the user supposed to
>>> understand the difference between what this function does and what the
>>> new verify_heapam() in amcheck does? The answer is that the latter
>>> does far more extensive checks, but this isn't obvious from the SGML
>>> documentation, which says only that the blocks are "verified," as if
>>> an end-user can reasonably be expected to know what that means. It
>>> seems likely to lead users to the belief that if this function passes,
>>> they are in good shape, which is extremely far from being true. Just
>>> look at what PageIsVerified() checks compared to what verify_heapam()
>>> checks.

The cases of verify_heapam() are much wider as they target only one
AM, while this stuff should remain more general.  There seems to be
some overlap in terms of the basic checks done by bufmgr.c, and
the fact that you may not want to be that much intrusive with the
existing buffer pool as well when running the AM checks.  It also
seems to me that the use cases are quite different for both, the
original goal of this thread is to detect physical corruptions for all
AMs, while verify_heapam() looks after logical corruptions in the way
heap is handled.

>> Yea I had similar thoughts, it should just be called
>> pg_checksum_verify_relation() or something.
>
> +1.

I mentioned that upthread, is there really a dependency with checksums
here?  There are two cases where we can still apply some checks on a
page, without any need of checksums:
- The state of the page header.
- Zeroed page if pd_upper is 0.  Those pages are valid, and don't have
a checksum computed.
So it seems to me that when it comes to relation pages, then the
check of a page should answer to the question: is this page loadable
in shared buffers, or not?

>>> In fact, I would argue that this functionality ought to live in
>>> amcheck rather than core, though there could usefully be enabling
>>> functions in core.
>>
>> I'm not really convinced by this though. It's not really AM
>> specific - works for all types of relations with storage; don't really
>> object either...
>
> Yeah, I'm not sure about that one either. Also what would happen
> if/when we get checksums on things that aren't even relations? (though
> maybe that goes for other parts of amcheck at some point as well?)

I also thought about amcheck when looking at this thread, but it did
not seem the right place as this applies to any AM able that could
load stuff into the shared buffers.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Jakub Wartak
Date:
Subject: Re: automatic analyze: readahead - add "IO read time" log message
Next
From: Magnus Hagander
Date:
Subject: Re: -O switch