Re: Online verification of checksums - Mailing list pgsql-hackers
From | Michael Banck |
---|---|
Subject | Re: Online verification of checksums |
Date | |
Msg-id | 20181121123535.GD23740@nighthawk.caipicrew.dd-dns.de Whole thread Raw |
In response to | Re: Online verification of checksums (Fabien COELHO <coelho@cri.ensmp.fr>) |
Responses |
Re: Online verification of checksums
|
List | pgsql-hackers |
Hi, On Tue, Oct 30, 2018 at 06:22:52PM +0100, Fabien COELHO wrote: > >I am not convinced we need to differentiate further between online and > >offline operation, can you explain in more detail which other > >differences are ok in online mode and why? > > For instance the "file/directory was removed" do not look okay at all when > offline, even if unlikely. Moreover, the checks hides the error message and > is fully silent in this case, while it was not beforehand on the same error > when offline. OK, I kinda see the point here and added that. > The "check if page was modified since checkpoint" does not look useful when > offline. Maybe it lacks a comment to say that this cannot (should not ?) > happen when offline, but even then I would not like it to be true: ISTM that > no page should be allowed to be skipped on the checkpoint condition when > offline, but it is probably ok to skip with the new page test, which make me > still think that they should be counted and reported separately, or at least > the checkpoint skip test should not be run when offline. What is the rationale to not skip on the checkpoint condition when the instance is offline? If it was shutdown cleanly, this should not happen, if the instance crashed, those would be spurious errors that would get repaired on recovery. I have not changed that for now. > When offline, the retry logic does not make much sense, it should complain > directly on the first error? Also, I'm unsure of the read & checksum retry > logic *without any delay*. I think the small overhead of retrying in offline mode even if useless is worth avoiding making the code more complicated in order to cater for both modes. Initially there was a delay, but this was removed after analysis and requests by several other reviewers. > >>This might suggest some option to tell the command that it should work in > >>online or offline mode, so that it may be stricter in some cases. The > >>default may be one of the option, eg the stricter offline mode, or maybe > >>guessed at startup. > > > >If we believe the operation should be different, the patch removes the > >"is cluster online?" check (as it is no longer necessary), so we could > >just replace the current error message with a global variable with the > >result of that check and use it where needed (if any). > > That could let open the issue of someone starting the check offline, and > then starting the database while it is not finished. Maybe it is not worth > sweating about such a narrow use case. I don't think we need to cater for that, yeah. > If operations are to be different, and it seems to me they should be, I'd > suggest (1) auto detect default based one the existing "is cluster online" > code, (2) force options, eg --online vs --offline, which would complain and > exit if the cluster is not in the right state on startup. The current code bails out if it thinks the cluster is online. What is wrong with just setting a flag now in case it is? > I'd suggest to add a failing checksum online test, if possible. At least a > "foo" file? Ok, done so. > It would also be nice if the test could apply on an active database, > eg with a low-rate pgbench running in parallel to the verification, > but I'm not sure how easy it is to add such a thing. That sounds much more complicated so I have not tackled that yet. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Attachment
pgsql-hackers by date: