Re: Is Recovery actually paused? - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | Re: Is Recovery actually paused? |
Date | |
Msg-id | 20210209.105804.245840302061999932.horikyota.ntt@gmail.com Whole thread Raw |
In response to | Re: Is Recovery actually paused? (Dilip Kumar <dilipbalaut@gmail.com>) |
Responses |
Re: Is Recovery actually paused?
Re: Is Recovery actually paused? |
List | pgsql-hackers |
At Mon, 8 Feb 2021 17:05:52 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in > On Mon, Feb 8, 2021 at 2:19 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > > > On Mon, 08 Feb 2021 17:32:46 +0900 (JST) > > Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > > > > At Mon, 8 Feb 2021 14:12:35 +0900, Yugo NAGATA <nagata@sraoss.co.jp> wrote in > > > > > > > I think the right fix should be that the state should never go from > > > > > > > ‘paused’ to ‘pause requested’ so I think pg_wal_replay_pause should take > > > > > > > care of that. > > > > > > > > > > > > It makes sense to take care of this in pg_wal_replay_pause, but I wonder > > > > > > it can not handle the case that a user resume and pause again while a sleep. > > > > > > > > > > Right, we will have to check and set in the loop. But we should not > > > > > allow the state to go from paused to pause requested irrespective of > > > > > this. > > > > > > > > I agree with you. > > > > > > Is there any actual harm if PAUSED returns to REQUESETED, assuming we > > > immediately change the state to PAUSE always we see REQUESTED in the > > > waiting loop, despite that we allow change the state from PAUSE to > > > REQUESTED via NOT_PAUSED between two successive loop condition checks? > > > > If a user call pg_wal_replay_pause while recovery is paused, users can > > observe 'pause requested' during a sleep alghough the time window is short. > > It seems a bit odd that pg_wal_replay_pause changes the state like this > > because This state meeans that recovery may not be 'paused'. > > Yeah, this appears wrong that after 'paused' we go back to 'pause > requested'. the logical state transition should always be as below > > NOT PAUSED -> PAUSE REQUESTED or PAUSED (maybe we should always go to > request and then paused but there is nothing wrong with going to > paused) > PAUSE REQUESTED -> NOT PAUSE or PAUSED (either cancel the request or get paused) > PAUSED -> NOT PAUSED (from PAUSED we should not go to the > PAUSE_REQUESTED without going to NOT PAUSED) I didn't asked about the internal logical correctness, but asked about *actual harm* revealed to users. I don't see any actual harm in the "wrong" transition because: 1. It is not wrong nor strange that the invoker of pg_wal_replay_pause sees the state PAUSE_REQUESTED before it changes to PAUSED. Even if the previous state was PAUSED, it is no business of the requestors. 2. It is no harm in the recovery side since PAUSE_REQUESTED and PAUSED are effectively the same state. 3. After we inhibited the direct transition from PAUSED->PAUSE_REQUESTED, the effectively the same transition PAUSED->NOT_PAUSED->PAUSE_REQUESTED is still allowed. The inhibition of the former transition doesn't protect anything other than seeming correctness of the transition. If we are going to introduce that complexity, I'd like to re-propose to introduce interlocking between the recovery side and the pause-requestor side instead of introducing the intermediate state, which is the cause of the complexity. The problem is due to the looseness of checking for pause requests in the existing checkponts, and the window after the last checkpoint until calling rm_redo(). The attached PoC patch adds: - A solid checkpoint just before calling rm_redo. It doesn't add a info_lck since the check is done in the existing lock section. - Interlocking between the above and SetRecoveryPause without adding a shared variable. (This is what I called "synchronous" before.) There's a concern about pausing after updating XlogCtl->replayEndRecPtr but I don't see an issue yet.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 8e3b5df7dc..26aadb4178 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -6076,6 +6076,20 @@ void SetRecoveryPause(bool recoveryPause) { SpinLockAcquire(&XLogCtl->info_lck); + + /* + * Wait for the application of the record being applied to finish, so that + * no records will be applied after this function returns. We don't need to + * wait when ending a pause. Anyway we are requesting a recovery pause, we + * don't mind a possible slow down of recovery by the info_lck here. + */ + while(recoveryPause && !XLogCtl->recoveryPause && + XLogCtl->replayEndRecPtr != XLogCtl->lastReplayedEndRecPtr) + { + SpinLockRelease(&XLogCtl->info_lck); + pg_usleep(10000L); /* 10 ms */ + SpinLockAcquire(&XLogCtl->info_lck); + } XLogCtl->recoveryPause = recoveryPause; SpinLockRelease(&XLogCtl->info_lck); } @@ -7262,6 +7276,7 @@ StartupXLOG(void) do { bool switchedTLI = false; + bool pause_requested = false; #ifdef WAL_DEBUG if (XLOG_DEBUG || @@ -7292,11 +7307,9 @@ StartupXLOG(void) * Note that we intentionally don't take the info_lck spinlock * here. We might therefore read a slightly stale value of * the recoveryPause flag, but it can't be very stale (no - * worse than the last spinlock we did acquire). Since a - * pause request is a pretty asynchronous thing anyway, - * possibly responding to it one WAL record later than we - * otherwise would is a minor issue, so it doesn't seem worth - * adding another spinlock cycle to prevent that. + * worse than the last spinlock we did acquire). We eventually + * make sure catching the pause request if any just before + * applying this record. */ if (((volatile XLogCtlData *) XLogCtl)->recoveryPause) recoveryPausesHere(false); @@ -7385,12 +7398,19 @@ StartupXLOG(void) /* * Update shared replayEndRecPtr before replaying this record, * so that XLogFlush will update minRecoveryPoint correctly. + * Also we check for the correct value of the recoveryPause + * flag here not to have redo overrun during a pause. See + * SetRecoveryPuase() for details. */ SpinLockAcquire(&XLogCtl->info_lck); XLogCtl->replayEndRecPtr = EndRecPtr; XLogCtl->replayEndTLI = ThisTimeLineID; + pause_requested = XLogCtl->recoveryPause; SpinLockRelease(&XLogCtl->info_lck); + if (pause_requested) + recoveryPausesHere(false); + /* * If we are attempting to enter Hot Standby mode, process * XIDs we see
pgsql-hackers by date: