Hi,
The problem is that whenever we are going for streaming we always set
XLogCtl->InstallXLogFileSegmentActive to true, but while switching
from streaming to archive we do not always reset it so it hits
assertion in some cases. Basically we reset it inside
XLogShutdownWalRcv() but while switching from the streaming mode we
only call it conditionally if WalRcvStreaming(). But it is very much
possible that even before we call WalRcvStreaming() the walreceiver
might have set alrcv->walRcvState to WALRCV_STOPPED. So now
WalRcvStreaming() will return false. So I agree now we do not want to
really shut down the walreceiver but who will reset the flag?
I just ran some tests on primary and attached the walreceiver to gdb
and waited for it to exit with timeout and then the recovery process
hit the assertion.
2022-02-11 14:33:56.976 IST [60978] FATAL: terminating walreceiver
due to timeout
cp: cannot stat
‘/home/dilipkumar/work/PG/install/bin/wal_archive/00000002.history’:
No such file or directory
2022-02-11 14:33:57.002 IST [60973] LOG: restored log file
"000000010000000000000003" from archive
TRAP: FailedAssertion("!XLogCtl->InstallXLogFileSegmentActive", File:
"xlog.c", Line: 3823, PID: 60973)
I have just applied a quick fix and that solved the issue, basically
if the last failed source was streaming and the WalRcvStreaming() is
false then just reset this flag.
@@ -12717,6 +12717,12 @@ WaitForWALToBecomeAvailable(XLogRecPtr
RecPtr, bool randAccess,
*/
if (WalRcvStreaming())
XLogShutdownWalRcv();
+ else
+ {
+
LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
XLogCtl->InstallXLogFileSegmentActive = false;
+ LWLockRelease(ControlFileLock);
+ }
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com