Re: Move walreceiver state assignment (to WALRCV_STREAMING) in WalReceiverMain() - Mailing list pgsql-hackers

From Euler Taveira
Subject Re: Move walreceiver state assignment (to WALRCV_STREAMING) in WalReceiverMain()
Date
Msg-id 3abb8c54-11de-4220-ad65-c2785df01124@app.fastmail.com
Whole thread Raw
In response to Move walreceiver state assignment (to WALRCV_STREAMING) in WalReceiverMain()  ("Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>)
Responses Re: Move walreceiver state assignment (to WALRCV_STREAMING) in WalReceiverMain()
Re: Move walreceiver state assignment (to WALRCV_STREAMING) in WalReceiverMain()
List pgsql-hackers
On Tue, Dec 12, 2023, at 12:58 PM, Drouvot, Bertrand wrote:
Currently walrcv->walRcvState is set to WALRCV_STREAMING at the
beginning of WalReceiverMain().

But it seems that after this assignment things could be wrong before the
walreicever actually starts streaming (like not being able to connect
to the primary).

It looks to me that WALRCV_STREAMING should be set once walrcv_startstreaming()
returns true: this is the proposal of this patch.

Per the state name (streaming), it seems it should be set later as you
proposed, however, I'm concerned about the previous state (WALRCV_STARTING). If
I'm reading the code correctly, WALRCV_STARTING is assigned at
RequestXLogStreaming():

                        SetInstallXLogFileSegmentActive();
                        RequestXLogStreaming(tli, ptr, PrimaryConnInfo,
                                             PrimarySlotName,
                                             wal_receiver_create_temp_slot);
                        flushedUpto = 0; 
                    }

                    /*
                     * Check if WAL receiver is active or wait to start up.
                     */
                    if (!WalRcvStreaming())
                    {    
                        lastSourceFailed = true;
                        break;
                    }

After a few lines the function WalRcvStreaming() has:

    SpinLockRelease(&walrcv->mutex);

    /*  
     * If it has taken too long for walreceiver to start up, give up. Setting
     * the state to STOPPED ensures that if walreceiver later does start up
     * after all, it will see that it's not supposed to be running and die
     * without doing anything.
     */
    if (state == WALRCV_STARTING)
    {   
        pg_time_t   now = (pg_time_t) time(NULL);

        if ((now - startTime) > WALRCV_STARTUP_TIMEOUT)
        {   
            bool        stopped = false;

            SpinLockAcquire(&walrcv->mutex);
            if (walrcv->walRcvState == WALRCV_STARTING)
            {   
                state = walrcv->walRcvState = WALRCV_STOPPED;
                stopped = true;
            }
            SpinLockRelease(&walrcv->mutex);

            if (stopped)
                ConditionVariableBroadcast(&walrcv->walRcvStoppedCV);
        }
    }   

Couldn't it give up before starting if you apply your patch? My main concern is
due to a slow system, the walrcv_connect() took to long in WalReceiverMain()
and the code above kills the walreceiver while in the process to start it.
Since you cannot control the hardcoded WALRCV_STARTUP_TIMEOUT value, you might
have issues during overload periods.


--
Euler Taveira

pgsql-hackers by date:

Previous
From: "Drouvot, Bertrand"
Date:
Subject: Move walreceiver state assignment (to WALRCV_STREAMING) in WalReceiverMain()
Next
From: Alvaro Herrera
Date:
Subject: Re: Report planning memory in EXPLAIN ANALYZE