Re: pause recovery if pitr target not reached - Mailing list pgsql-hackers

From Leif Gunnar Erlandsen
Subject Re: pause recovery if pitr target not reached
Date
Msg-id 06a0abbf3c8511f7548956f1a92997be@lako.no
Whole thread Raw
In response to Re: pause recovery if pitr target not reached  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers
"Kyotaro Horiguchi" <horikyota.ntt@gmail.com> skrev 22. november 2019 kl. 05:26:

> Hello, Lief, Peter.
>
> At Thu, 21 Nov 2019 12:50:18 +0000, "Leif Gunnar Erlandsen" <leif@lako.no> wrote in
>
>> Adding another patch which is not only for recovery_target_time but also for xid, name and lsn.
>>
>> After studying this a bit more, I think the current behavior is totally bogus and needs a serious
>> rethink.
>>
>> If you specify a recovery target and it is reached, recovery pauses (depending on
>> recovery_target_action).
>>
>> If you specify a recovery target and it is not reached when the end of the archive is reached
>> (i.e., restore_command fails), then recovery ends and the server is promoted, without any further
>> information. This is clearly wrong in multiple ways.
>>
>> Yes, that is why I have created the patch.
>
> It seems premising to be used in prepeated trial-and-error recovery by
> well experiecned operators. When it is used, I think that the target
> goes back gradually through repetitions so anyway we need to start
> from a clean backup for each repetition, in the expected
> usage. Unintended promotion doesn't harm in the case.
If going back in time and gradually recover less WAL todays behaiviour is adequate.
The patch is for circumstances where for some reason you do not have all the WAL's ready at once.

>
> In this persipective, I don't think the behavior is totally wrong but
> FATAL'ing at EO-WAL before target seems good to do.
>
>> I think what we should do is if we specify a recovery target and we don't reach it, we should
>> ereport(FATAL). Somewhere around
>>
>> If recovery pauses or a FATAL error is reported, is not important, as long as it is possible to get
>> some more WAL and continue recovery. Pause has the benefit of the possibility to inspect tables in
>> the database.
>>
>> in StartupXLOG(), where we already check for other conditions that are undesirable at the end of
>> recovery. Then a user can make fixes either by getting more WAL files to restore and adjusting the
>> recovery target and starting again. I don't think pausing is the right behavior, but perhaps an
>> argument could be made to offer it as a nondefault behavior.
>>
>> Pausing was choosen in the patch as pause was the expected behaivior if target was reached.
>>
>> And the patch does not interfere with any other functionality as far as I know.
>
> With the current behavior, if server promotes without stopping as told
> by target_action variables, it is a sign that something's wrong. But
> if server pauses before reaching target, operators may overlook the
> message if they don't know of the behavior. And if server poses in the
> case, I think there's nothing to do.
Yes, that is correct. FATAL might be the correct behaiviour.
>
> So +1 for FATAL.
>
> regards.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: adding partitioned tables to publications
Next
From: "Leif Gunnar Erlandsen"
Date:
Subject: Re: pause recovery if pitr target not reached