Thread: Fwd: Re: BUG #15589: Due to missing wal, restore ends prematurely and opens database for read/write
Fwd: Re: BUG #15589: Due to missing wal, restore ends prematurely and opens database for read/write
From
leif@lako.no
Date:
Hi I have reported a bug via PostgreSQL bug report form, but havent got any response so far. This might not be a bug, but a feature not implemented yet. I suggest to make a small addition to StartupXLOG to solve the issue. git diff src/backend/access/transam/xlog.c diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 2ab7d804f0..d0e5bb3f84 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -7277,6 +7277,19 @@ StartupXLOG(void) case RECOVERY_TARGET_ACTION_PROMOTE: break; + } + } else if (recoveryTarget == RECOVERY_TARGET_TIME) + { + /* + * Stop point not reached but next WAL could not be read + * Some explanation and warning should be logged + */ + switch (recoveryTargetAction) + { + case RECOVERY_TARGET_ACTION_PAUSE: + SetRecoveryPause(true); + recoveryPausesHere(); + break; } } The scenario I want to solve is: Need to restore backup to another server. Restores pgbasebackup files Restores som wal-files Extract pgbasebackup files creates recover.conf with pit Starts postgresql recover ends before pit due to missing wal-files database opens read/write I think database should have paused recovery then I could restore additional wal-files and restart postgresql to continue with recover. With large databases and a lot of wal-files it is time consuming to repeat parts of the process. Best regards Leif Gunnar Erlandsen
Re: BUG #15589: Due to missing wal, restore ends prematurely andopens database for read/write
From
Kyotaro HORIGUCHI
Date:
At Wed, 30 Jan 2019 15:53:51 +0000, leif@lako.no wrote in <a3bf3b8910cd5adb8a5fbc8113eac0ab@lako.no> > Hi > I have reported a bug via PostgreSQL bug report form, but havent got any response so far. > This might not be a bug, but a feature not implemented yet. > I suggest to make a small addition to StartupXLOG to solve the issue. I can understand what you want, but it doesn't seem acceptable since it introduces inconsistency among target kinds. > The scenario I want to solve is: > Need to restore backup to another server. > Restores pgbasebackup files > Restores som wal-files > Extract pgbasebackup files > creates recover.conf with pit > Starts postgresql > recover ends before pit due to missing wal-files > database opens read/write > > I think database should have paused recovery then I could restore > additional wal-files and restart postgresql to continue with recover. I don't think no one expected that server follows recovery_target_action without setting a target, so we can change the behavior when any kind of target is specified. So I propose to follow recovery_target_action even if not rached the target when any recovery target isspecified. With the attached PoC (for master), recovery stops as follows: LOG: consistent recovery state reached at 0/2F000000 LOG: database system is ready to accept read only connections rc_work/00000001000000000000002F’: No such file or directory WARNING: not reached specfied recovery target, take specified action anyway DETAIL: This means a wrong target or missing of expected WAL files. LOG: recovery has paused HINT: Execute pg_wal_replay_resume() to continue. If no target is specifed, it promtes immediately ignoring r_t_action. If this is acceptable I'll post complete version (including documentation). I don't think this back-patcheable. > With large databases and a lot of wal-files it is time consuming to repeat parts of the process. I understand your concern. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 2ab7d804f0..081bdd86ec 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -7246,12 +7246,25 @@ StartupXLOG(void) * end of main redo apply loop */ - if (reachedStopPoint) + /* + * If recovery target is specified, specified action is expected + * to be taken regardless whether the target is reached or not . + */ + if (recoveryTarget != RECOVERY_TARGET_UNSET) { + /* + * At this point we don't consider the case where we are + * before consistent point even if not reached stop point. + */ if (!reachedConsistency) ereport(FATAL, (errmsg("requested recovery stop point is before consistent recovery point"))); + if (!reachedStopPoint) + ereport(WARNING, + (errmsg ("not yet reached specfied recovery target, take specified action anyway"), + errdetail("This means a wrong target or missing WAL files."))); + /* * This is the last point where we can restart recovery with a * new recovery target, if we shutdown and begin again. After
Re: BUG #15589: Due to missing wal, restore ends prematurely andopens database for read/write
From
leif@lako.no
Date:
"Kyotaro HORIGUCHI" <horiguchi.kyotaro@lab.ntt.co.jp> skrev 31. januar 2019 kl. 13:28: > If this is acceptable I'll post complete version (including > documentation). I don't think this back-patcheable. > If you are asking me, then I think this is exactly what I wanted, thank you for your effort. >> With large databases and a lot of wal-files it is time consuming to repeat parts of the process. > > I understand your concern. > > regards. > > -- > Kyotaro Horiguchi > NTT Open Source Software Center regards Leif Gunnar Erlandsen
Re: BUG #15589: Due to missing wal, restore ends prematurely andopens database for read/write
From
Michael Paquier
Date:
On Thu, Jan 31, 2019 at 09:26:48PM +0900, Kyotaro HORIGUCHI wrote: > I don't think no one expected that server follows > recovery_target_action without setting a target, so we can change > the behavior when any kind of target is specified. So I propose > to follow recovery_target_action even if not rached the target > when any recovery target isspecified. Quoting the docs: https://www.postgresql.org/docs/current/recovery-target-settings.html recovery_target_action (enum) "Specifies what action the server should take once the recovery target is *reached*." So what we have now is that an action would be taken iff a stop point is defined and reached. What this patch changes is that the action would be taken even if the stop point has *not* been reached once the end of a WAL stream is found. + * to be taken regardless whether the target is reached or not . Nit 1: Dot at the end has an extra space. Nit 2: s/specfied/specified/ Please do not take me wrong, I can see that there could be use cases where it is possible to take an action at the end of a WAL stream if there is less WAL than what was planned, perhaps if the OP has set an incorrect stop position too far in the future, still too much WAL would have been replayed so it would make the base backup unusable for future uses. Also, it looks incorrect to me to change an existing behavior and to use the same semantics for triggering an action if a stop point is defined and reached. -- Michael
Attachment
Re: BUG #15589: Due to missing wal, restore ends prematurely andopens database for read/write
From
leif@lako.no
Date:
"Michael Paquier" <michael@paquier.xyz> skrev 26. februar 2019 kl. 09:13: > On Thu, Jan 31, 2019 at 09:26:48PM +0900, Kyotaro HORIGUCHI wrote: > >> I don't think no one expected that server follows >> recovery_target_action without setting a target, so we can change >> the behavior when any kind of target is specified. So I propose >> to follow recovery_target_action even if not rached the target >> when any recovery target isspecified. > > Quoting the docs: > https://www.postgresql.org/docs/current/recovery-target-settings.html > recovery_target_action (enum) > "Specifies what action the server should take once the recovery target > is *reached*." I know this and recovery_target_action in my case was "pause". Recovery target was specified with a date and time. > So what we have now is that an action would be taken iff a stop point > is defined and reached. What this patch changes is that the action > would be taken even if the stop point has *not* been reached once the > end of a WAL stream is found. Yes, and this is expected behaviour in my use case. This was a PITR scenario, to a new server, and not crash recovery. I restored a backup and placed WAL-files in a separate directory, then I created a recovery.conf with correct recovery_target_time. After PostgreSQL started it stopped after a short while and opened the database in read/write. Checks showed target was not reached. Log showed that no more WAL could be found. If PostgreSQL had followed recovery_target_action, then I could have restored the missing WAL-files and continued replayof WAL. As this was not the case I had to restart the process from the beginning, this took many hours. Another thing to consider is that in instances such as this one, where a lot of WAL was needed for replay, it is not alwaysgiven that we have the sufficient amount of available disk space in order to store them all at the same time. > Please do not take me wrong, I can see that there could be use cases > where it is possible to take an action at the end of a WAL stream if > there is less WAL than what was planned, perhaps if the OP has set > an incorrect stop position too far in the future, still too much WAL > would have been replayed so it would make the base backup unusable for > future uses. Also, it looks incorrect to me to change an existing > behavior and to use the same semantics for triggering an action if a > stop point is defined and reached. I did not set an incorrect stop position. I see this change as something most in a similar situation would expect from theirdatabase system. AFAIK the doc does not specify what happens if recovery_target_time is specified but not reached. But as default recovery_target_actionis set to "pause" I would have assumed "pause" to be the action. regards Leif Gunnar Erlandsen