Re: Online checksums verification in the backend - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Online checksums verification in the backend
Date
Msg-id CAD21AoBNyywFbO-S=vdjdpkVQHzLhNPTQ33=KPRTBE0qS9W=xA@mail.gmail.com
Whole thread Raw
In response to Online checksums verification in the backend  (Julien Rouhaud <rjuju123@gmail.com>)
Responses Re: Online checksums verification in the backend
List pgsql-hackers
On Fri, Dec 6, 2019 at 11:51 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> Hi,
>
> This topic was discussed several times, with the most recent
> discussions found at [1] and [2].  Based on those discussions, my
> understanding is that the current approach in BASE_BACKUP has too many
> drawbacks and we should instead do this check in the backend.  I've
> been working using such approach at VMware, and I'm submitting it here
> to discuss the approach and rationales, and hopefully have such a
> feature integrated.

Thank you for working on this!

>
> First, this was originally developed as an extension.  It means that
> the check is performed using an SRF.  That's maybe not the best
> approach, as a transaction has be kept for the total processing time.
> It can be leveraged by checking each relation independently, but
> that's still not ideal.  Maybe using some utility commands (as part of
> VACUUM or a new CHECK command for instance) would be a better
> approach.
>
> This brings the second consideration: how to report the list corrupted
> blocks to end users.  As I said this is for now returned via the SRF,
> but this is clearly not ideal and should rather be made available more
> globally.  One usage of this information could be block level
> recovery.  I'm Cc-ing Sawada-san, as I know he's working on this and
> mentioned me that he had ideas on passing the list of corrupted blocks
> using the stat collector.

Yes it's necessary the list of corrupted pages for single page
recovery. Apart from single page recovery I think it's helpful for DBA
if they can find the corrupted blocks in the server logs and on a
system view.

I've also tried to report corrupted pages to the stats collector
during I researching single page recovery in PostgreSQL but one
problem is that the statistics in the stats collector is cleared when
crash recovery. I want the information of block corruption to survive
even when the server down. And we might want to add checksums to the
permanent file having information of database corruption. The
correctness of these information would be important because we can fix
a database by restoring some tables from a logical backup or by doing
reindex etc as long as we have a non-broken information of database
corruption.

>
> Finally, the read and locking considerations.  I tried to cover that
> extensively in the comments, but here are some details on how I tried
> to make the check safe while trying to keep the overhead as low as
> possible.  First thing is that this is only doing buffered reads,
> without any attempt to discard OS cache.  Therefore, any discrepancy
> between the OS cache and the disk cannot be detected unless you do
> other actions, such as sync / drop_caches on GNU/Linux.
>
> An access share lock on the currently checked relation is held,
> meaning that it can't get deleted/truncated.  The total number of
> blocks for the given fork is retrieved first, so any new block will be
> ignored.  Such new blocks are considered out of scope as being written
> after the start of the check.
>
> Each time a buffer is being checked, the target buffer mapping
> partition lock is acquired in shared mode, to prevent concurrent
> eviction.  If the buffer is found in shared buffers, it's pinned and
> released immediately, just to get the state.

I wonder if there is possibility that blocks on disk can be corrupted
even if these are loaded to the shared buffer. ISTM the above method
cannot detect such corruption. Reading and checking blocks fast is
attractive but I thought it's also important to check blocks precisely
without overlooking.

Regards,

--
Masahiko Sawada  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Increase footprint of %m and reduce strerror()
Next
From: Amit Kapila
Date:
Subject: Re: [HACKERS] Block level parallel vacuum