On Mon, Dec 08, 2025 at 12:48:52PM +0530, Nitin Jadhav wrote:
>> An update of src/test/recovery/meson.build to add the new test is
>> also required.
>
> I will work on improving the test accordingly and include the changes
> in the next version.
Cool, thanks.
> The main reason I chose PANIC is that when the ReadCheckpointRecord()
> fails, we already use PANIC for the error message ‘could not locate a
> valid checkpoint record…’ in the no_backup_label case, whereas the
> similar flow with backup_label uses FATAL. I am not entirely sure why
> this difference exists. I looked into it but couldn’t find much. If we
> decide to change this patch to use FATAL for ‘could not find redo
> location…’, should we also consider changing the existing PANIC to
> FATAL for consistency?
Using PANIC is an inherited historical artifact that has been
introduced around 4d14fe0048cf with the introduction of WAL. There
was nothing like archiving or even base backup back then. Switching
the existing surrounding one to also use a FATAL is something that
seems worth considering to me for the checkpoint record, at least
based on the pattern that there could be a driver error even if there
is no backup_label file (aka for example the case of FS-levelsnapshots
with one partition used for the data folder, no?).
This offers bonus points in the shape of more tests like the one you
have sent upthread. It's not something that I would backpatch as it
is a behavior change, but I'm open to seeing that as an improvement in
usability for future releases: PANIC is for cases that should never
happen for internal states, due to an internal logic error, or an OS
going crazy. Here we have a should-no-happen case triggered by a
user, and a FATAL still provides the same information about what's
wrong. Let's make such changes separate patches, of course, depending
on what we find on the way.
> Yes. As noted in my initial email, this patch is focused solely on
> fixing the crash issue. It does not address error handling in
> StartupXLOG(), CreateCheckPoint(), or involve any broader code
> cleanup.
That sounds fine to me.
--
Michael