Re: Switching XLog source from archive to streaming when primary available - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: Switching XLog source from archive to streaming when primary available
Date
Msg-id 20220908175356.GA2225645@nathanxps13
Whole thread Raw
In response to Re: Switching XLog source from archive to streaming when primary available  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Switching XLog source from archive to streaming when primary available
List pgsql-hackers
On Thu, Sep 08, 2022 at 05:16:53PM +0530, Bharath Rupireddy wrote:
> On Wed, Sep 7, 2022 at 3:27 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>> It's not clear to me how this is expected to interact with the pg_wal phase
>> of standby recovery.  As the docs note [0], standby servers loop through
>> archive recovery, recovery from pg_wal, and streaming replication.  Does
>> this cause the pg_wal phase to be skipped (i.e., the standby goes straight
>> from archive recovery to streaming replication)?  I wonder if it'd be
>> better for this mechanism to simply move the standby to the pg_wal phase so
>> that the usual ordering isn't changed.
> 
> It doesn't change any behaviour as such for XLOG_FROM_PG_WAL. In
> standby mode when recovery_command is specified, the initial value of
> currentSource would be XLOG_FROM_ARCHIVE (see [1]). If the archive is
> exhausted of WAL or the standby fails to fetch from the archive, then
> it switches to XLOG_FROM_STREAM. If the standby fails to receive WAL
> from primary, it switches back to XLOG_FROM_ARCHIVE. This continues
> unless the standby gets promoted. With the patch, we enable the
> standby to try fetching from the primary, instead of waiting for WAL
> in the archive to get exhausted or for an error to occur in the
> standby while receiving from the archive.

Okay.  I see that you are checking for XLOG_FROM_ARCHIVE.

>> Shouldn't the last_switch_time be set when the state machine first enters
>> XLOG_FROM_ARCHIVE?  IIUC this logic is currently counting time spent
>> elsewhere (e.g., XLOG_FROM_STREAM) when determining whether to force a
>> source switch.  This would mean that a standby that has spent a lot of time
>> in streaming replication before failing would flip to XLOG_FROM_ARCHIVE,
>> immediately flip back to XLOG_FROM_STREAM, and then likely flip back to
>> XLOG_FROM_ARCHIVE when it failed again.  Given the standby will wait for
>> wal_retrieve_retry_interval before going back to XLOG_FROM_ARCHIVE, it
>> seems like we could end up rapidly looping between sources.  Perhaps I am
>> misunderstanding how this is meant to work.
> 
> last_switch_time indicates the time when the standby last attempted to
> switch to primary. For instance, a standby:
> 1) for the first time, sets last_switch_time = current_time when in archive mode
> 2) if current_time < last_switch_time + interval, continues to be in
> archive mode
> 3) if current_time >= last_switch_time + interval, attempts to switch
> to primary and sets last_switch_time = current_time
> 3.1) if successfully switches to primary, continues in there and for
> any reason fails to fetch from primary, then enters archive mode and
> loops from step (2)
> 3.2) if fails to switch to primary, then enters archive mode and loops
> from step (2)

Let's say I have this new parameter set to 5 minutes, and I have a standby
that's been at step 3.1 for 5 days before failing and going back to step 2.
Won't the standby immediately jump back to step 3.1?  I think we should
place the limit on how long the server stays in XLOG_FROM_ARCHIVE, not how
long it's been since we last tried XLOG_FROM_STREAM.

>> I wonder if the lower bound should be higher to avoid switching
>> unnecessarily rapidly between WAL sources.  I see that
>> WaitForWALToBecomeAvailable() ensures that standbys do not switch from
>> XLOG_FROM_STREAM to XLOG_FROM_ARCHIVE more often than once per
>> wal_retrieve_retry_interval.  Perhaps wal_retrieve_retry_interval should be
>> the lower bound for this GUC, too.  Or maybe WaitForWALToBecomeAvailable()
>> should make sure that the standby makes at least once attempt to restore
>> the file from archive before switching to streaming replication.
> 
> No, we need a way to disable the feature, so I'm not changing the
> lower bound. And let's not make this GUC dependent on any other GUC, I
> would like to keep it simple for better usability. However, I've
> increased the default value to 5min and added a note in the docs about
> the lower values.
> 
> I'm attaching the v3 patch with the review comments addressed, please
> review it further.

My general point is that we should probably offer some basic preventative
measure against flipping back and forth between streaming and archive
recovery while making zero progress.  As I noted, maybe that's as simple as
having WaitForWALToBecomeAvailable() attempt to restore a file from archive
at least once before the new parameter forces us to switch to streaming
replication.  There might be other ways to handle this.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Jacob Champion
Date:
Subject: Re: SYSTEM_USER reserved word implementation
Next
From: Tom Lane
Date:
Subject: Re: Reducing the chunk header sizes on all memory context types