Home > mailing lists

Re: Fix crash during recovery when redo segment is missing - Mailing list pgsql-hackers

From	Michael Paquier
Subject	Re: Fix crash during recovery when redo segment is missing
Date	December 8, 2025 11:23:08
Msg-id	aTaK7CTUOsLy5UWw@paquier.xyz Whole thread Raw
In response to	Re: Fix crash during recovery when redo segment is missing (Nitin Jadhav <nitinjadhavpostgres@gmail.com>)
Responses	Re: Fix crash during recovery when redo segment is missing
List	pgsql-hackers

Tree view

On Mon, Dec 08, 2025 at 12:48:52PM +0530, Nitin Jadhav wrote:
>> An update of src/test/recovery/meson.build to add the new test is
>> also required.
>
> I will work on improving the test accordingly and include the changes
> in the next version.

Cool, thanks.

> The main reason I chose PANIC is that when the ReadCheckpointRecord()
> fails, we already use PANIC for the error message ‘could not locate a
> valid checkpoint record…’ in the no_backup_label case, whereas the
> similar flow with backup_label uses FATAL. I am not entirely sure why
> this difference exists. I looked into it but couldn’t find much. If we
> decide to change this patch to use FATAL for ‘could not find redo
> location…’, should we also consider changing the existing PANIC to
> FATAL for consistency?

Using PANIC is an inherited historical artifact that has been
introduced around 4d14fe0048cf with the introduction of WAL.  There
was nothing like archiving or even base backup back then.  Switching
the existing surrounding one to also use a FATAL is something that
seems worth considering to me for the checkpoint record, at least
based on the pattern that there could be a driver error even if there
is no backup_label file (aka for example the case of FS-levelsnapshots
with one partition used for the data folder, no?).

This offers bonus points in the shape of more tests like the one you
have sent upthread.  It's not something that I would backpatch as it
is a behavior change, but I'm open to seeing that as an improvement in
usability for future releases: PANIC is for cases that should never
happen for internal states, due to an internal logic error, or an OS
going crazy.  Here we have a should-no-happen case triggered by a
user, and a FATAL still provides the same information about what's
wrong.  Let's make such changes separate patches, of course, depending
on what we find on the way.

> Yes. As noted in my initial email, this patch is focused solely on
> fixing the crash issue. It does not address error handling in
> StartupXLOG(), CreateCheckPoint(), or involve any broader code
> cleanup.

That sounds fine to me.
--
Michael

Attachment

signature.asc

pgsql-hackers by date:

From: Pavel Stehule
Date: 08 December 2025, 11:15:50
Subject: Re: proposal: schema variables

From: David Geier
Date: 08 December 2025, 11:33:10
Subject: Re: Make copyObject work in C++

Re: Fix crash during recovery when redo segment is missing - Mailing list pgsql-hackers

Attachment

Previous

Next