Re: Online checksums verification in the backend - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: Online checksums verification in the backend |
Date | |
Msg-id | 20201023072832.GE5180@paquier.xyz Whole thread Raw |
In response to | Re: Online checksums verification in the backend (Michael Paquier <michael@paquier.xyz>) |
Responses |
Re: Online checksums verification in the backend
|
List | pgsql-hackers |
On Mon, Oct 19, 2020 at 04:52:48PM +0900, Michael Paquier wrote: > No issues with reporting the block number and the fork type in the SRF > at least of course as this is helpful to detect the position of the > broken blocks. For the checksum found in the header and the one > calculated (named expected in the patch), I am less sure how to put a > clear definition on top of that but we could always consider that > later and extend the SRF as needed. Once the user knows that both do > not match, he/she knows that there is a problem. So, I have reviewed your patch set, and heavily reworked the logic to be more consistent on many thinks, resulting in a largely simplified patch without sacrificing its usefulness: - The logic of the core routine of bufmgr.c is unchanged. I have simplified it a bit though by merging the subroutines that were part of the patch. SMgrRelation is used as argument instead of a Relation. That's more consistent with the surroundings. The initial read of a page without locks is still on the table as an extra optimization, though I am not completely sure if this should be part of CheckBuffer() or not. I also thought about the previous benchmarks and I think that not using the most-optimized improved performance, because it reduced the successive runes of the SQL functions, reducing the back-pressure on the partition locks (we held on of them at the same time for a longer period, meaning that the other 127 ran free for a longer time). Please note that this part still referred to a "module", which was incorrect. - Removal of the expected and found checksums from the SRF of the function. Based on my recent business with the page checks for base backups, I have arrived at the conclusion that the SRF should return data that we can absolutely trust, and the minimum I think we have to trust here is if a given page is thought as safe or not, considering all the sanity checks done by PageIsVerified() as the main entry point for everything. This has led to a bit of confusion with the addition of NoComputedChecksum for a page that was empty as of the initial of the patch, so it happens that we don't need it anymore. - Removal of the dependency with checksums for this feature. While simplifying the code, I have noticed that this feature can also be beneficial for clusters that do not have have data checksums, as PageIsVerified() is perfectly able to run some page header checks and the zero-page case. That's of course less useful than having the checksums, but there is no need to add a dependency here. The file for the SQL functions is renamed from checksumfuncs.c to pagefuncs.c. - The function is changed to return no tuples if the relkind is not supported, and the same applies for temporary relations. That's more consistent with other system functions like the ones in charge of partition information, and this makes full scans of pg_class much easier to work with. Temporary tables were not handled correctly anyway as these are in local buffers, but the use case of this function in this case is not really obvious to me. - Having the forknum in the SRF is kind of confusing, as the user would need to map a number with the physical on-disk name. Instead, I have made the choice to return the *path* of the corrupted file with a block number. This way, an operator can know immediately where a problem comes from. The patch does not append the segment number, and I am not sure if we should actually do that, but adding it is straight-forward as we have the block number. There is a dependency with table AMs here as well, as this goes down to fd.c, explaining why I have not added it and just. - I really don't know what kind of default ACL should apply for such functions, but I am sure that SCAN_TABLES is not what we are looking for here, and there is nothing preventing us from having a safe default from the start of times, so I moved the function to be superuser-only by default, and GRANT can be used to allow its execution to other roles. We could relax that in the future, of course, this can be discussed separately. - The WARNING report for each block found as corrupted gains an error context, as a result of a switch to PageIsVerified(), giving a user all the information needed in the logs on top of the result in the SRF. That's useful as well if combined with CHECK_FOR_INTERRUPTS(), and I got to wonder if we should have some progress report for this stuff, though that's a separate discussion. - The function is renamed to something less generic, pg_relation_check_pages(), and I have reduced the number of functions from two to one, where the user can specify the fork name with a new option. The default of NULL means that all the forks of a relation are checked. - The TAP tests are rather bulky. I have moved all the non-corruption test cases into a new SQL test file. That's useful for people willing to do some basic sanity checks with a non-default table AM. At least it provides a minimum coverage. I have not completely finished its review, but I have done some work. Doing some debugging of corrupt_and_test_block() was proving to be rather difficult as the same test names are assigned multiple times. I am tempted to move this test suite to src/test/recovery/ instead. - Reworked the docs and some comments. That's quite a lot of changes, and I think that most of the C code, the main tests in src/test/regress/ and the docs are getting in a rather committable state. The TAP tests emulating corruptions still need a closer lookup (say, check_pg_stat_database_nb_error() should have an error prefix at least). The portions in bufmgr.c and the rest should of course be split into two separate commits, that can easily be done. And the code needs an indentation run and a catalog bump. -- Michael
Attachment
pgsql-hackers by date: