Re: pause recovery if pitr target not reached - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | Re: pause recovery if pitr target not reached |
Date | |
Msg-id | 20191122.132616.111879793970372216.horikyota.ntt@gmail.com Whole thread Raw |
In response to | Re: pause recovery if pitr target not reached ("Leif Gunnar Erlandsen" <leif@lako.no>) |
Responses |
Re: pause recovery if pitr target not reached
|
List | pgsql-hackers |
Hello, Lief, Peter. At Thu, 21 Nov 2019 12:50:18 +0000, "Leif Gunnar Erlandsen" <leif@lako.no> wrote in > Adding another patch which is not only for recovery_target_time but also for xid, name and lsn. > > > After studying this a bit more, I think the current behavior is totally bogus and needs a serious > > rethink. > > > > If you specify a recovery target and it is reached, recovery pauses (depending on > > recovery_target_action). > > > > If you specify a recovery target and it is not reached when the end of the archive is reached > > (i.e., restore_command fails), then recovery ends and the server is promoted, without any further > > information. This is clearly wrong in multiple ways. > > Yes, that is why I have created the patch. It seems premising to be used in prepeated trial-and-error recovery by well experiecned operators. When it is used, I think that the target goes back gradually through repetitions so anyway we need to start from a clean backup for each repetition, in the expected usage. Unintended promotion doesn't harm in the case. In this persipective, I don't think the behavior is totally wrong but FATAL'ing at EO-WAL before target seems good to do. > > I think what we should do is if we specify a recovery target and we don't reach it, we should > > ereport(FATAL). Somewhere around > > > If recovery pauses or a FATAL error is reported, is not important, as long as it is possible to get some more WAL and continuerecovery. Pause has the benefit of the possibility to inspect tables in the database. > > > in StartupXLOG(), where we already check for other conditions that are undesirable at the end of > > recovery. Then a user can make fixes either by getting more WAL files to restore and adjusting the > > recovery target and starting again. I don't think pausing is the right behavior, but perhaps an > > argument could be made to offer it as a nondefault behavior. > > Pausing was choosen in the patch as pause was the expected behaivior if target was reached. > > And the patch does not interfere with any other functionality as far as I know. With the current behavior, if server promotes without stopping as told by target_action variables, it is a sign that something's wrong. But if server pauses before reaching target, operators may overlook the message if they don't know of the behavior. And if server poses in the case, I think there's nothing to do. So +1 for FATAL. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: