Hallo Michael,
>>> I'm wondering (possibly again) about the existing early exit if one block
>>> cannot be read on retry: the command should count this as a kind of bad
>>> block, proceed on checking other files, and obviously fail in the end, but
>>> having checked everything else and generated a report. I do not think that
>>> this condition warrants a full stop. ISTM that under rare race conditions
>>> (eg, an unlucky concurrent "drop database" or "drop table") this could
>>> happen when online, although I could not trigger one despite heavy testing,
>>> so I'm possibly mistaken.
>>
>> This seems like a defensible judgement call either way.
>
> Right now we have a few tests that explicitly check that
> pg_verify_checksums fail on broken data ("foo" in the file). Those
> would then just get skipped AFAICT, which I think is the worse behaviour
> , but if everybody thinks that should be the way to go, we can
> drop/adjust those tests and make pg_verify_checksums skip them.
>
> Thoughts?
My point is that it should fail as it does, only not immediately (early
exit), but after having checked everything else. This mean avoiding
calling "exit(1)" here and there (lseek, fopen...), but taking note that
something bad happened, and call exit only in the end.
--
Fabien.