Re: Is Recovery actually paused? - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Is Recovery actually paused?
Date
Msg-id 20210209.145523.959421492944827422.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: Is Recovery actually paused?  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: Is Recovery actually paused?  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers
At Tue, 9 Feb 2021 09:47:58 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in 
> On Tue, Feb 9, 2021 at 8:54 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> >
> > On Tue, 09 Feb 2021 10:58:04 +0900 (JST)
> > Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
> >
> > > At Mon, 8 Feb 2021 17:05:52 +0530, Dilip Kumar <dilipbalaut@gmail.com> wrote in
> > > > On Mon, Feb 8, 2021 at 2:19 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> > > > >
> > > > > On Mon, 08 Feb 2021 17:32:46 +0900 (JST)
> > > > > Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
> > > > >
> > > > > > At Mon, 8 Feb 2021 14:12:35 +0900, Yugo NAGATA <nagata@sraoss.co.jp> wrote in
> > > > > > > > > > I think the right fix should be that the state should never go from
> > > > > > > > > > ‘paused’ to ‘pause requested’  so I think pg_wal_replay_pause should take
> > > > > > > > > > care of that.
> > > > > > > > >
> > > > > > > > > It makes sense to take care of this in pg_wal_replay_pause, but I wonder
> > > > > > > > > it can not handle the case that a user resume and pause again while a sleep.
> > > > > > > >
> > > > > > > > Right, we will have to check and set in the loop.  But we should not
> > > > > > > > allow the state to go from paused to pause requested irrespective of
> > > > > > > > this.
> > > > > > >
> > > > > > > I agree with you.
> > > > > >
> > > > > > Is there any actual harm if PAUSED returns to REQUESETED, assuming we
> > > > > > immediately change the state to PAUSE always we see REQUESTED in the
> > > > > > waiting loop, despite that we allow change the state from PAUSE to
> > > > > > REQUESTED via NOT_PAUSED between two successive loop condition checks?
> > > > >
> > > > > If a user call pg_wal_replay_pause while recovery is paused, users can
> > > > > observe 'pause requested' during a sleep alghough the time window is short.
> > > > > It seems a bit odd that pg_wal_replay_pause changes the state like this
> > > > > because This state meeans that recovery may not be 'paused'.
> > > >
> > > > Yeah, this appears wrong that after 'paused' we go back to 'pause
> > > > requested'.  the logical state transition should always be as below
> > > >
> > > > NOT PAUSED -> PAUSE REQUESTED or PAUSED (maybe we should always go to
> > > > request and then paused but there is nothing wrong with going to
> > > > paused)
> > > > PAUSE REQUESTED -> NOT PAUSE or PAUSED (either cancel the request or get paused)
> > > > PAUSED -> NOT PAUSED (from PAUSED we should not go to the
> > > > PAUSE_REQUESTED without going to NOT PAUSED)
> > >
> > > I didn't asked about the internal logical correctness, but asked about
> > > *actual harm* revealed to users. I don't see any actual harm in the
> > > "wrong" transition because:
> >
> > Actually, the incorrect state transition is not so harmful except that
> > users can observe unnecessary state changes. However, I don't think any
> > actual harm in prohibit the incorrect state transition. So, I think we
> > can do it.
> >
> > > If we are going to introduce that complexity, I'd like to re-propose
> > > to introduce interlocking between the recovery side and the
> > > pause-requestor side instead of introducing the intermediate state,
> > > which is the cause of the complexity.
> > >
> > > The attached PoC patch adds:
> > >
> > > - A solid checkpoint just before calling rm_redo. It doesn't add a
> > >   info_lck since the check is done in the existing lock section.
> > >
> > > - Interlocking between the above and SetRecoveryPause without adding a
> > >   shared variable.
> > >   (This is what I called "synchronous" before.)
> >
> > I think waiting in pg_wal_replay_pasue is a possible option, but this will
> > also introduce other complexity to codes such as possibility of waiting for
> > long or for ever.  For example, waiting in SetRecoveryPause as in your POC
> > patch appears to make recovery stuck in  RecoveryRequiresIntParameter.

Ah. Yes, startup process does not need to wait. That is a bug of the
patch. No other callers don't cause the self dead lock.

> I agree with this,  I think we previously discussed these approaches
> where we can wait in pg_wal_replay_pasue() or
> pg_is_wal_replay_pasued().  In fact, we had an older version where we
> put the wait in pg_is_wal_replay_pasued().  But it appeared that doing

Note that the expected waiting period is while calling rmgr_redo(). If
it is stuck for a long time, that suggests something's going wrong.

> so will add extra complexity as well as instead of waiting in these
> APIs the wait logic can be implemented in the application code which
> is actually using these APIs and IMHO that will give better control to
> the users.

Year, with the PoC pg_wal_replay_pause() can make a short wait as a
side-effect but the tri-state patch also can add a function to wait
for the state suffices.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 8e3b5df7dc..194a2f9998 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6076,6 +6076,23 @@ void
 SetRecoveryPause(bool recoveryPause)
 {
     SpinLockAcquire(&XLogCtl->info_lck);
+
+    /*
+     * Wait for the application of the record being applied to finish, so that
+     * no records will be applied after this function returns. We don't need to
+     * wait when ending a pause. Anyway we are requesting a recovery pause, we
+     * don't mind a possible slow down of recovery by the info_lck here.
+     * We don't need to wait in the startup process.
+     */
+    while(InRecovery &&
+          recoveryPause && !XLogCtl->recoveryPause &&
+          XLogCtl->replayEndRecPtr != XLogCtl->lastReplayedEndRecPtr)
+    {
+        SpinLockRelease(&XLogCtl->info_lck);
+        CHECK_FOR_INTERRUPTS();
+        pg_usleep(10000L);        /* 10 ms */
+        SpinLockAcquire(&XLogCtl->info_lck);
+    }
     XLogCtl->recoveryPause = recoveryPause;
     SpinLockRelease(&XLogCtl->info_lck);
 }
@@ -7262,6 +7279,7 @@ StartupXLOG(void)
             do
             {
                 bool        switchedTLI = false;
+                bool        pause_requested = false;
 
 #ifdef WAL_DEBUG
                 if (XLOG_DEBUG ||
@@ -7292,11 +7310,9 @@ StartupXLOG(void)
                  * Note that we intentionally don't take the info_lck spinlock
                  * here.  We might therefore read a slightly stale value of
                  * the recoveryPause flag, but it can't be very stale (no
-                 * worse than the last spinlock we did acquire).  Since a
-                 * pause request is a pretty asynchronous thing anyway,
-                 * possibly responding to it one WAL record later than we
-                 * otherwise would is a minor issue, so it doesn't seem worth
-                 * adding another spinlock cycle to prevent that.
+                 * worse than the last spinlock we did acquire). We eventually
+                 * make sure catching the pause request if any just before
+                 * applying this record.
                  */
                 if (((volatile XLogCtlData *) XLogCtl)->recoveryPause)
                     recoveryPausesHere(false);
@@ -7385,12 +7401,19 @@ StartupXLOG(void)
                 /*
                  * Update shared replayEndRecPtr before replaying this record,
                  * so that XLogFlush will update minRecoveryPoint correctly.
+                 * Also we check for the correct value of the recoveryPause
+                 * flag here not to have redo overrun during a pause. See
+                 * SetRecoveryPuase() for details.
                  */
                 SpinLockAcquire(&XLogCtl->info_lck);
                 XLogCtl->replayEndRecPtr = EndRecPtr;
                 XLogCtl->replayEndTLI = ThisTimeLineID;
+                pause_requested = XLogCtl->recoveryPause;
                 SpinLockRelease(&XLogCtl->info_lck);
 
+                if (pause_requested)
+                    recoveryPausesHere(false);
+                    
                 /*
                  * If we are attempting to enter Hot Standby mode, process
                  * XIDs we see

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: pg_replication_origin_drop API potential race condition
Next
From: "Andrey V. Lepikhov"
Date:
Subject: Re: [POC] Fast COPY FROM command for the table with foreign partitions