Re: BUG #15346: Replica fails to start after the crash - Mailing list pgsql-hackers
From | Kyotaro HORIGUCHI |
---|---|
Subject | Re: BUG #15346: Replica fails to start after the crash |
Date | |
Msg-id | 20180831.145206.05203037.horiguchi.kyotaro@lab.ntt.co.jp Whole thread Raw |
In response to | Re: BUG #15346: Replica fails to start after the crash (Michael Paquier <michael@paquier.xyz>) |
Responses |
Re: BUG #15346: Replica fails to start after the crash
Re: BUG #15346: Replica fails to start after the crash |
List | pgsql-hackers |
At Thu, 30 Aug 2018 18:48:55 -0700, Michael Paquier <michael@paquier.xyz> wrote in <20180831014855.GJ15446@paquier.xyz> > On Fri, Aug 31, 2018 at 09:48:46AM +0900, Kyotaro HORIGUCHI wrote: > > Please wait a bit.. I have a concern about this. > > Sure, please feel free. Thanks. I looked though the patch and related code and have a concern. The patch inhibits turning off updateMinRecoveryPoint on other than the startup process running crash-recovery except at the end of XLogNeedsFlush. > /* > * Check minRecoveryPoint for any other process than the startup > * process doing crash recovery, which should not update the control > * file value if crash recovery is still running. > */ > if (XLogRecPtrIsInvalid(minRecoveryPoint)) > updateMinRecoveryPoint = false; Even if any other processes than the startup calls the function during crash recovery, they don't have a means to turn on updateMinRecoveryPoint again. Actually the only process that calls the function druing crash recovery is the startup. bwriter and checkpointer doesn't. Hot-standby backends come after crash-recvery ends. After all, we just won't have an invalid minRecoveryPoint value there, and updateMinRecoverypoint must be true. Other than that I didn't find a problem. Please find the attached patch. I also have not come up with how to check the case automatically.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 493f1db7b9..74692a128d 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -2723,9 +2723,13 @@ UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force) * i.e., we're doing crash recovery. We never modify the control file's * value in that case, so we can short-circuit future checks here too. The * local values of minRecoveryPoint and minRecoveryPointTLI should not be - * updated until crash recovery finishes. + * updated until crash recovery finishes. We only do this for the startup + * process as it should not update its own reference of minRecoveryPoint + * until it has finished crash recovery to make sure that all WAL + * available is replayed in this case. This also saves from extra locks + * taken on the control file from the startup process. */ - if (XLogRecPtrIsInvalid(minRecoveryPoint)) + if (XLogRecPtrIsInvalid(minRecoveryPoint) && InRecovery) { updateMinRecoveryPoint = false; return; @@ -2737,7 +2741,9 @@ UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force) minRecoveryPoint = ControlFile->minRecoveryPoint; minRecoveryPointTLI = ControlFile->minRecoveryPointTLI; - if (force || minRecoveryPoint < lsn) + if (XLogRecPtrIsInvalid(minRecoveryPoint)) + updateMinRecoveryPoint = false; + else if (force || minRecoveryPoint < lsn) { XLogRecPtr newMinRecoveryPoint; TimeLineID newMinRecoveryPointTLI; @@ -3124,12 +3130,15 @@ XLogNeedsFlush(XLogRecPtr record) if (RecoveryInProgress()) { /* - * An invalid minRecoveryPoint means that we need to recover all the - * WAL, i.e., we're doing crash recovery. We never modify the control - * file's value in that case, so we can short-circuit future checks - * here too. + * An invalid minRecoveryPoint on the startup process means that we + * need to recover all the WAL, i.e., we're doing crash recovery. We + * never modify the control file's value in that case, so we can + * short-circuit future checks here too. This triggers a quick exit + * path for the startup process, which cannot update its local copy of + * minRecoveryPoint as long as it has not replayed all WAL available + * when doing crash recovery. */ - if (XLogRecPtrIsInvalid(minRecoveryPoint)) + if (XLogRecPtrIsInvalid(minRecoveryPoint) && InRecovery) updateMinRecoveryPoint = false; /* Quick exit if already known to be updated or cannot be updated */ @@ -3146,6 +3155,12 @@ XLogNeedsFlush(XLogRecPtr record) minRecoveryPointTLI = ControlFile->minRecoveryPointTLI; LWLockRelease(ControlFileLock); + /* + * No other process than the startup doesn't reach here before crash + * recovery ends. So minRecoveryPoint must have a valid value here. + */ + Assert(minRecoveryPoint != InvalidXLogRecPtr); + /* check again */ return record > minRecoveryPoint; }
pgsql-hackers by date: