Re: pause recovery if pitr target not reached - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: pause recovery if pitr target not reached
Date
Msg-id 20191122.132616.111879793970372216.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: pause recovery if pitr target not reached  ("Leif Gunnar Erlandsen" <leif@lako.no>)
Responses Re: pause recovery if pitr target not reached  ("Leif Gunnar Erlandsen" <leif@lako.no>)
List pgsql-hackers
Hello, Lief, Peter.

At Thu, 21 Nov 2019 12:50:18 +0000, "Leif Gunnar Erlandsen" <leif@lako.no> wrote in 
> Adding another patch which is not only for recovery_target_time but also for xid, name and lsn.
> 
> > After studying this a bit more, I think the current behavior is totally bogus and needs a serious
> > rethink.
> > 
> > If you specify a recovery target and it is reached, recovery pauses (depending on
> > recovery_target_action).
> > 
> > If you specify a recovery target and it is not reached when the end of the archive is reached
> > (i.e., restore_command fails), then recovery ends and the server is promoted, without any further
> > information. This is clearly wrong in multiple ways.
> 
> Yes, that is why I have created the patch.

It seems premising to be used in prepeated trial-and-error recovery by
well experiecned operators. When it is used, I think that the target
goes back gradually through repetitions so anyway we need to start
from a clean backup for each repetition, in the expected
usage. Unintended promotion doesn't harm in the case.

In this persipective, I don't think the behavior is totally wrong but
FATAL'ing at EO-WAL before target seems good to do.

> > I think what we should do is if we specify a recovery target and we don't reach it, we should
> > ereport(FATAL). Somewhere around
> > 
> If recovery pauses or a FATAL error is reported, is not important, as long as it is possible to get some more WAL and
continuerecovery. Pause has the benefit of the possibility to inspect tables in the database.
 
> 
> > in StartupXLOG(), where we already check for other conditions that are undesirable at the end of
> > recovery. Then a user can make fixes either by getting more WAL files to restore and adjusting the
> > recovery target and starting again. I don't think pausing is the right behavior, but perhaps an
> > argument could be made to offer it as a nondefault behavior.
> 
> Pausing was choosen in the patch as pause was the expected behaivior if target was reached.
> 
> And the patch does not interfere with any other functionality as far as I know.

With the current behavior, if server promotes without stopping as told
by target_action variables, it is a sign that something's wrong. But
if server pauses before reaching target, operators may overlook the
message if they don't know of the behavior. And if server poses in the
case, I think there's nothing to do.

So +1 for FATAL.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: Ordering of header file inclusion
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: an OID >= 8000 in master