Re: Switching XLog source from archive to streaming when primary available - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Switching XLog source from archive to streaming when primary available
Date
Msg-id CALj2ACVpCNuqtbj-2DmwvkZH588S7-h7Q4HvKBO-S-E0YhpPKQ@mail.gmail.com
Whole thread Raw
In response to Re: Switching XLog source from archive to streaming when primary available  (Nathan Bossart <nathandbossart@gmail.com>)
Responses Re: Switching XLog source from archive to streaming when primary available
Re: Switching XLog source from archive to streaming when primary available
List pgsql-hackers
On Wed, Sep 7, 2022 at 3:27 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> +      <indexterm>
> +       <primary><varname>wal_source_switch_interval</varname> configuration parameter</primary>
> +      </indexterm>
>
> I don't want to bikeshed on the name too much, but I do think we need
> something more descriptive.  I'm thinking of something like
> streaming_replication_attempt_interval or
> streaming_replication_retry_interval.

I could come up with wal_source_switch_interval after a log of
bikeshedding myself :). However, streaming_replication_retry_interval
looks much better, I've used it in the latest patch. Thanks.

> +        Specifies how long the standby server should wait before switching WAL
> +        source from WAL archive to primary (streaming replication). This can
> +        happen either during the standby initial recovery or after a previous
> +        failed attempt to stream WAL from the primary.
>
> I'm not sure what the second sentence means.  In general, I think the
> explanation in your commit message is much clearer:

I polished the comments, docs and commit message a bit, please check now.

> 5 seconds seems low.  I would expect the default to be 1-5 minutes.  I
> think it's important to strike a balance between interrupting archive
> recovery to attempt streaming replication and letting archive recovery make
> progress.

+1 for a default value of 5 minutes to avoid frequent interruptions
for archive mode when primary is really down for a long time. I've
also added a cautionary note in the docs about the lower values.

> +        * Try reading WAL from primary after every wal_source_switch_interval
> +        * milliseconds, when state machine is in XLOG_FROM_ARCHIVE state. If
> +        * successful, the state machine moves to XLOG_FROM_STREAM state, otherwise
> +        * it falls back to XLOG_FROM_ARCHIVE state.
>
> It's not clear to me how this is expected to interact with the pg_wal phase
> of standby recovery.  As the docs note [0], standby servers loop through
> archive recovery, recovery from pg_wal, and streaming replication.  Does
> this cause the pg_wal phase to be skipped (i.e., the standby goes straight
> from archive recovery to streaming replication)?  I wonder if it'd be
> better for this mechanism to simply move the standby to the pg_wal phase so
> that the usual ordering isn't changed.
>
> [0] https://www.postgresql.org/docs/current/warm-standby.html#STANDBY-SERVER-OPERATION

It doesn't change any behaviour as such for XLOG_FROM_PG_WAL. In
standby mode when recovery_command is specified, the initial value of
currentSource would be XLOG_FROM_ARCHIVE (see [1]). If the archive is
exhausted of WAL or the standby fails to fetch from the archive, then
it switches to XLOG_FROM_STREAM. If the standby fails to receive WAL
from primary, it switches back to XLOG_FROM_ARCHIVE. This continues
unless the standby gets promoted. With the patch, we enable the
standby to try fetching from the primary, instead of waiting for WAL
in the archive to get exhausted or for an error to occur in the
standby while receiving from the archive.

> +                                       if (!first_time &&
> +                                               TimestampDifferenceExceeds(last_switch_time, curr_time,
> +
wal_source_switch_interval))
>
> Shouldn't this also check that wal_source_switch_interval is not set to 0?

Corrected.

> +                                               elog(DEBUG2,
> +                                                        "trying to switch WAL source to %s after fetching WAL from
%sfor %d milliseconds",
 
> +                                                        xlogSourceNames[XLOG_FROM_STREAM],
> +                                                        xlogSourceNames[currentSource],
> +                                                        wal_source_switch_interval);
> +
> +                                               last_switch_time = curr_time;
>
> Shouldn't the last_switch_time be set when the state machine first enters
> XLOG_FROM_ARCHIVE?  IIUC this logic is currently counting time spent
> elsewhere (e.g., XLOG_FROM_STREAM) when determining whether to force a
> source switch.  This would mean that a standby that has spent a lot of time
> in streaming replication before failing would flip to XLOG_FROM_ARCHIVE,
> immediately flip back to XLOG_FROM_STREAM, and then likely flip back to
> XLOG_FROM_ARCHIVE when it failed again.  Given the standby will wait for
> wal_retrieve_retry_interval before going back to XLOG_FROM_ARCHIVE, it
> seems like we could end up rapidly looping between sources.  Perhaps I am
> misunderstanding how this is meant to work.

last_switch_time indicates the time when the standby last attempted to
switch to primary. For instance, a standby:
1) for the first time, sets last_switch_time = current_time when in archive mode
2) if current_time < last_switch_time + interval, continues to be in
archive mode
3) if current_time >= last_switch_time + interval, attempts to switch
to primary and sets last_switch_time = current_time
3.1) if successfully switches to primary, continues in there and for
any reason fails to fetch from primary, then enters archive mode and
loops from step (2)
3.2) if fails to switch to primary, then enters archive mode and loops
from step (2)

Hope this clarifies the behaviour.

> +       {
> +               {"wal_source_switch_interval", PGC_SIGHUP, REPLICATION_STANDBY,
> +                       gettext_noop("Sets the time to wait before switching WAL source from archive to primary"),
> +                       gettext_noop("0 turns this feature off."),
> +                       GUC_UNIT_MS
> +               },
> +               &wal_source_switch_interval,
> +               5000, 0, INT_MAX,
> +               NULL, NULL, NULL
> +       },
>
> I wonder if the lower bound should be higher to avoid switching
> unnecessarily rapidly between WAL sources.  I see that
> WaitForWALToBecomeAvailable() ensures that standbys do not switch from
> XLOG_FROM_STREAM to XLOG_FROM_ARCHIVE more often than once per
> wal_retrieve_retry_interval.  Perhaps wal_retrieve_retry_interval should be
> the lower bound for this GUC, too.  Or maybe WaitForWALToBecomeAvailable()
> should make sure that the standby makes at least once attempt to restore
> the file from archive before switching to streaming replication.

No, we need a way to disable the feature, so I'm not changing the
lower bound. And let's not make this GUC dependent on any other GUC, I
would like to keep it simple for better usability. However, I've
increased the default value to 5min and added a note in the docs about
the lower values.

I'm attaching the v3 patch with the review comments addressed, please
review it further.

[1]
    if (!InArchiveRecovery)
        currentSource = XLOG_FROM_PG_WAL;
    else if (currentSource == XLOG_FROM_ANY ||
             (!StandbyMode && currentSource == XLOG_FROM_STREAM))
    {
        lastSourceFailed = false;
        currentSource = XLOG_FROM_ARCHIVE;
    }

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: proposal: possibility to read dumped table's name from file
Next
From: Peter Eisentraut
Date:
Subject: pg_walinspect float4/float8 confusion