Thread: Re: Fix logging for invalid recovery timeline

Re: Fix logging for invalid recovery timeline

From
"Andrey M. Borodin"
Date:

> On 20 Dec 2024, at 20:37, David Steele <david@pgbackrest.org> wrote:
>
> "Latest checkpoint is at %X/%X on timeline %u, but in the history of the requested timeline, the server forked off
fromthat timeline at %X/%X." 

I think errdetai here is very hard to follow. I seem to understand what is going on after reading errmsg, but errdetai
makesme uncertain. 

If we call "tliSwitchPoint(CheckPointTLI, expectedTLEs, NULL);"
don't we risk to have again
ereport(ERROR,
(errmsg("requested timeline %u is not in this server's history",
tli)));
?

Best regards, Andrey Borodin.


Re: Fix logging for invalid recovery timeline

From
David Steele
Date:
On 12/20/24 23:28, Andrey M. Borodin wrote:
> 
>> On 20 Dec 2024, at 20:37, David Steele <david@pgbackrest.org> wrote:
>>
>> "Latest checkpoint is at %X/%X on timeline %u, but in the history of the requested timeline, the server forked off
fromthat timeline at %X/%X."
 
> 
> I think errdetai here is very hard to follow. I seem to understand what is going on after reading errmsg, but
errdetaimakes me uncertain.
 

Yeah, this one confuses users a lot. We see it mostly when a user 
accidentally promotes a standby and that standby pushes a history file 
and maybe some WAL on a new timeline, e.g. 2. The original primary 
continues to make backups on the original timeline 1. At some point a 
restore is required and Postgres by default wants to recover to the most 
recent timeline, but timeline 2 forks from timeline 1 before the latest 
backup was started so it is not accessible.

The solution is to set the target timeline to current but first the user 
needs this figure out what is going on an this error message just 
doesn't contain enough information to do that. I have some ideas on how 
to make it better but that would probably be for HEAD only.

> If we call "tliSwitchPoint(CheckPointTLI, expectedTLEs, NULL);"
> don't we risk to have again
> ereport(ERROR,
> (errmsg("requested timeline %u is not in this server's history",
> tli)));
> ?

I'm not sure what you mean. For primary backups CheckPointTLI will 
always equal ControlFile->checkPointCopy.ThisTimeLineID so that 
shouldn't be a problem. For standby backups CheckPointTLI will be <= 
ControlFile->checkPointCopy.ThisTimeLineID since CheckPointTLI 
represents the timeline at the start of the backup. If a route from that 
timeline to the current timeline can't be found then I'd certainly 
expect an error.

I'll add this patch to the January CF.

Regards,
-David