Re: Minimal logical decoding on standbys - Mailing list pgsql-hackers

From Drouvot, Bertrand
Subject Re: Minimal logical decoding on standbys
Date
Msg-id 096259be-d8cf-e3c3-cfdc-81365d73b9b4@gmail.com
Whole thread Raw
In response to Re: Minimal logical decoding on standbys  (Andres Freund <andres@anarazel.de>)
Responses Re: Minimal logical decoding on standbys
List pgsql-hackers
Hi,

On 1/6/23 4:40 AM, Andres Freund wrote:
> Hi,
> On 2023-01-05 16:15:39 -0500, Robert Haas wrote:
>> On Tue, Jan 3, 2023 at 2:42 AM Drouvot, Bertrand
>> <bertranddrouvot.pg@gmail.com> wrote:
> 0006:
> 
>> diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
>> index bc3c3eb3e7..98c96eb864 100644
>> --- a/src/backend/access/transam/xlogrecovery.c
>> +++ b/src/backend/access/transam/xlogrecovery.c
>> @@ -358,6 +358,9 @@ typedef struct XLogRecoveryCtlData
>>       RecoveryPauseState recoveryPauseState;
>>       ConditionVariable recoveryNotPausedCV;
>>
>> +    /* Replay state (see getReplayedCV() for more explanation) */
>> +    ConditionVariable replayedCV;
>> +
>>       slock_t        info_lck;        /* locks shared variables shown above */
>>   } XLogRecoveryCtlData;
>>
> 
> getReplayedCV() doesn't seem to fit into any of the naming scheems in use for
> xlogrecovery.h.

Changed to check_for_replay() in V46 attached.

>> -         * Sleep until something happens or we time out.  Also wait for the
>> -         * socket becoming writable, if there's still pending output.
>> +         * When not in recovery, sleep until something happens or we time out.
>> +         * Also wait for the socket becoming writable, if there's still pending output.
> 
> Hm. Is there a problem with not handling the becoming-writable case in the
> in-recovery case?
> 
> 

Yes, when not in recovery we'd wait for the timeout to occur in  ConditionVariableTimedSleep()
(as the CV is broadcasted only in ApplyWalRecord()).

>> +        else
>> +        /*
>> +         * We are in the logical decoding on standby case.
>> +         * We are waiting for the startup process to replay wal record(s) using
>> +         * a timeout in case we are requested to stop.
>> +         */
>> +        {
> 
> I don't think pgindent will like that formatting....

Oops, fixed.

> 
> 
>> +            ConditionVariablePrepareToSleep(replayedCV);
>> +            ConditionVariableTimedSleep(replayedCV, 1000,
>> +                                        WAIT_EVENT_WAL_SENDER_WAIT_REPLAY);
>> +        }
> 
> I think this is racy, see ConditionVariablePrepareToSleep()'s comment:
> 
>   * Caution: "before entering the loop" means you *must* test the exit
>   * condition between calling ConditionVariablePrepareToSleep and calling
>   * ConditionVariableSleep.  If that is inconvenient, omit calling
>   * ConditionVariablePrepareToSleep.
> 
> Basically, the ConditionVariablePrepareToSleep() should be before the loop
> body.
> 

I missed it, thanks! Moved it before the loop body.

> 
> I don't think the fixed timeout here makes sense. For one, we need to wake up
> based on WalSndComputeSleeptime(), otherwise we're ignoring wal_sender_timeout
> (which can be quite small).  

Good point. Making use of WalSndComputeSleeptime() instead in V46.

> It's also just way too frequent - we're trying to
> avoid constantly waking up unnecessarily.
> 
> 
> Perhaps we could deal with the pq_is_send_pending() issue by having a version
> of ConditionVariableTimedSleep() that accepts a WaitEventSet?
> 

What issue do you see?
The one that I see with V46 (keeping the in/not recovery branches) is that one may need to wait
for wal_sender_timeout to see changes that occurred right after the promotion.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Logical replication timeout problem
Next
From: Masahiko Sawada
Date:
Subject: Re: Assertion failure in SnapBuildInitialSnapshot()