Re: Assertion failure in WaitForWALToBecomeAvailable state machine - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Assertion failure in WaitForWALToBecomeAvailable state machine
Date
Msg-id CALj2ACWCk2it_A=ubKc6r3jt9N3WrfGsD7FWD_YCF2AYyraD+w@mail.gmail.com
Whole thread Raw
In response to Re: Assertion failure in WaitForWALToBecomeAvailable state machine  (Noah Misch <noah@leadboat.com>)
Responses Re: Assertion failure in WaitForWALToBecomeAvailable state machine
Re: Assertion failure in WaitForWALToBecomeAvailable state machine
List pgsql-hackers
On Tue, Sep 13, 2022 at 8:52 AM Noah Misch <noah@leadboat.com> wrote:
>
> > > > [1] - https://www.postgresql.org/message-id/flat/20220909.172949.2223165886970819060.horikyota.ntt%40gmail.com
>
> I plan to use that message's patch, because it guarantees WALRCV_STOPPED at
> the code location being changed.  Today, in the unlikely event of
> !WalRcvStreaming() due to WALRCV_WAITING or WALRCV_STOPPING, that code
> proceeds without waiting for WALRCV_STOPPED.

Hm. That was the original fix [2] proposed and it works. The concern
is that XLogShutdownWalRcv() does a bunch of work via ShutdownWalRcv()
- it calls ConditionVariablePrepareToSleep(),
ConditionVariableCancelSleep() (has lock 2 acquisitions and
requisitions) and 1 function call WalRcvRunning()) even for
WALRCV_STOPPED case, all this is unnecessary IMO when we determine the
walreceiver is state is already WALRCV_STOPPED. I think we can do
something like [3] to allay this concern and go with the fix proposed
at [1] unconditionally calling XLogShutdownWalRcv().

> If WALRCV_WAITING or WALRCV_STOPPING can happen at that patch's code site, I
> perhaps should back-patch the change to released versions.  Does anyone know
> whether one or both can happen?

IMO, we must back-patch to the version where
cc2c7d65fc27e877c9f407587b0b92d46cd6dd16  got introduced irrespective
of any of the above happening.

Thoughts?

[1] - https://www.postgresql.org/message-id/flat/20220909.172949.2223165886970819060.horikyota.ntt%40gmail.com
[2] -
https://www.postgresql.org/message-id/flat/CALj2ACUoBWbaFo_t0gew%2Bx6n0V%2BmpvB_23HLvsVD9abgCShV5A%40mail.gmail.com#e762ee94cbbe32a0b8c72c60793626b3
[3] -diff --git a/src/backend/replication/walreceiverfuncs.c
b/src/backend/replication/walreceiverfuncs.c
index 90798b9d53..3487793c7a 100644
--- a/src/backend/replication/walreceiverfuncs.c
+++ b/src/backend/replication/walreceiverfuncs.c
@@ -181,6 +181,7 @@ ShutdownWalRcv(void)
        WalRcvData *walrcv = WalRcv;
        pid_t           walrcvpid = 0;
        bool            stopped = false;
+       bool            was_stopped = false;

        /*
         * Request walreceiver to stop. Walreceiver will switch to
WALRCV_STOPPED
@@ -191,6 +192,7 @@ ShutdownWalRcv(void)
        switch (walrcv->walRcvState)
        {
                case WALRCV_STOPPED:
+                       was_stopped = true;
                        break;
                case WALRCV_STARTING:
                        walrcv->walRcvState = WALRCV_STOPPED;
@@ -208,6 +210,10 @@ ShutdownWalRcv(void)
        }
        SpinLockRelease(&walrcv->mutex);

+       /* Quick exit if walreceiver was already stopped. */
+       if (was_stopped)
+               return;
+
        /* Unnecessary but consistent. */
        if (stopped)
                ConditionVariableBroadcast(&walrcv->walRcvStoppedCV);

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: Error "initial slot snapshot too large" in create replication slot
Next
From: Peter Eisentraut
Date:
Subject: Re: postgres_fdw hint messages