Change checkpoint‑record‑missing PANIC to FATAL - Mailing list pgsql-hackers

From Nitin Jadhav
Subject Change checkpoint‑record‑missing PANIC to FATAL
Date
Msg-id CAMm1aWZ9Tv=Wrx52_2Ppw+6ULf_twRZuQm=ZWLA_a-kXWykHkQ@mail.gmail.com
Whole thread Raw
List pgsql-hackers
Hi,

While working on [1], we discussed whether the redo-record-missing error should be a PANIC or a FATAL. We concluded that FATAL is more appropriate, as it is more appropriate for the current situation and achieves the intended behavior and also it is consistent with the backup_label path, which already reports FATAL in the same scenario.

However, when the checkpoint record is missing, the behavior remains inconsistent: Without a backup_label, we currently raise a PANIC. With a backup_label, the same code path reports a FATAL.Since we have already made the redo‑record‑missing case to FATAL in 15f68ce, it seems reasonable to align the checkpoint‑record‑missing case as well. The existing PANIC dates back to an era before online backups and archive recovery existed, when external manipulation of WAL was not expected and such conditions were treated as internal faults. With all such features, it is much more realistic for WAL segments to go missing due to operational issues, and such cases are often recoverable. So switching this to FATAL appears appropriate.

Please share your thoughts.

I am happy to share a patch including a TAP test to cover this behavior once we agree to proceed.

[1]: https://www.postgresql.org/message-id/flat/CAMm1aWaaJi2w49c0RiaDBfhdCL6ztbr9m%3DdaGqiOuVdizYWYaA%40mail.gmail.com

Best Regards,
Nitin Jadhav
Azure Database for PostgreSQL
Microsoft

pgsql-hackers by date:

Previous
From: Xuneng Zhou
Date:
Subject: Re: Optimize SnapBuildPurgeOlderTxn: use in-place compaction instead of temporary array
Next
From: "cca5507"
Date:
Subject: A small problem when rehashing catalog cache