Re: Assertion failure in WaitForWALToBecomeAvailable state machine - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Assertion failure in WaitForWALToBecomeAvailable state machine
Date
Msg-id 20220214.171428.735280610520122187.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: Assertion failure in WaitForWALToBecomeAvailable state machine  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Assertion failure in WaitForWALToBecomeAvailable state machine
List pgsql-hackers
At Fri, 11 Feb 2022 22:25:49 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in 
> > I don't think
> > > just making InstallXLogFileSegmentActive false is enough. By looking
> > > at the comment [1], it doesn't make sense to move ahead for restoring
> > > from the archive location without the WAL receiver fully stopped.
> > > IMO, the real fix is to just remove WalRcvStreaming() and call
> > > XLogShutdownWalRcv() unconditionally. Anyways, we have the
> > > Assert(!WalRcvStreaming()); down below. I don't think it will create
> > > any problem.
> >
> > If WalRcvStreaming() is returning false that means walreceiver is
> > already stopped so we don't need to shutdown it externally.  I think
> > like we are setting this flag outside start streaming we can reset it
> > also outside XLogShutdownWalRcv.  Or I am fine even if we call
> > XLogShutdownWalRcv() because if walreceiver is stopped it will just
> > reset the flag we want it to reset and it will do nothing else.
> 
> As I said, I'm okay with just calling XLogShutdownWalRcv()
> unconditionally as it just returns in case walreceiver has already
> stopped and updates the InstallXLogFileSegmentActive flag to false.
> 
> Let's also hear what other hackers have to say about this.

Firstly, good catch:)  And the direction seems right.

It seems like an overlook of cc2c7d65fc. We cannot install new wal
segments only while we're in archive recovery.  Conversely, we must
turn off it when entering archive recovery (not exiting streaming
recovery).  So, *I* feel like to do that at the beginning of
XLOG_FROM_ARCHIVE/PG_WAL rather than the end of XLOG_FROM_STREAM.

(And I would like to remove XLogShutDownWalRcv() and turn off the flag
 in StartupXLOG explicitly, but it would be overdone.)

--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -12800,6 +12800,16 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
                  */
                 Assert(!WalRcvStreaming());
 
+                /*
+                 * WAL segment installation conflicts with archive
+                 * recovery. Make sure it is turned off.  XLogShutdownWalRcv()
+                 * does that but it is not done when the process has voluntary
+                 * exited for example for replication timeout.
+                 */
+                LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+                XLogCtl->InstallXLogFileSegmentActive = false;
+                LWLockRelease(ControlFileLock);
+
                 /* Close any old file we might have open. */
                 if (readFile >= 0)


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set
Next
From: Peter Eisentraut
Date:
Subject: Re: Database-level collation version tracking