Re: Switching XLog source from archive to streaming when primary available - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: Switching XLog source from archive to streaming when primary available
Date
Msg-id 20220906215704.GA2084086@nathanxps13
Whole thread Raw
In response to Re: Switching XLog source from archive to streaming when primary available  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Switching XLog source from archive to streaming when primary available
List pgsql-hackers
+      <indexterm>
+       <primary><varname>wal_source_switch_interval</varname> configuration parameter</primary>
+      </indexterm>

I don't want to bikeshed on the name too much, but I do think we need
something more descriptive.  I'm thinking of something like
streaming_replication_attempt_interval or
streaming_replication_retry_interval.

+        Specifies how long the standby server should wait before switching WAL
+        source from WAL archive to primary (streaming replication). This can
+        happen either during the standby initial recovery or after a previous
+        failed attempt to stream WAL from the primary.

I'm not sure what the second sentence means.  In general, I think the
explanation in your commit message is much clearer:

    The standby makes an attempt to read WAL from primary after
    wal_retrieve_retry_interval milliseconds reading from archive.

+        If this value is specified without units, it is taken as milliseconds.
+        The default value is 5 seconds. A setting of <literal>0</literal>
+        disables the feature.

5 seconds seems low.  I would expect the default to be 1-5 minutes.  I
think it's important to strike a balance between interrupting archive
recovery to attempt streaming replication and letting archive recovery make
progress.

+     * Try reading WAL from primary after every wal_source_switch_interval
+     * milliseconds, when state machine is in XLOG_FROM_ARCHIVE state. If
+     * successful, the state machine moves to XLOG_FROM_STREAM state, otherwise
+     * it falls back to XLOG_FROM_ARCHIVE state.

It's not clear to me how this is expected to interact with the pg_wal phase
of standby recovery.  As the docs note [0], standby servers loop through
archive recovery, recovery from pg_wal, and streaming replication.  Does
this cause the pg_wal phase to be skipped (i.e., the standby goes straight
from archive recovery to streaming replication)?  I wonder if it'd be
better for this mechanism to simply move the standby to the pg_wal phase so
that the usual ordering isn't changed.

+                    if (!first_time &&
+                        TimestampDifferenceExceeds(last_switch_time, curr_time,
+                                                   wal_source_switch_interval))

Shouldn't this also check that wal_source_switch_interval is not set to 0?

+                        elog(DEBUG2,
+                             "trying to switch WAL source to %s after fetching WAL from %s for %d milliseconds",
+                             xlogSourceNames[XLOG_FROM_STREAM],
+                             xlogSourceNames[currentSource],
+                             wal_source_switch_interval);
+
+                        last_switch_time = curr_time;

Shouldn't the last_switch_time be set when the state machine first enters
XLOG_FROM_ARCHIVE?  IIUC this logic is currently counting time spent
elsewhere (e.g., XLOG_FROM_STREAM) when determining whether to force a
source switch.  This would mean that a standby that has spent a lot of time
in streaming replication before failing would flip to XLOG_FROM_ARCHIVE,
immediately flip back to XLOG_FROM_STREAM, and then likely flip back to
XLOG_FROM_ARCHIVE when it failed again.  Given the standby will wait for
wal_retrieve_retry_interval before going back to XLOG_FROM_ARCHIVE, it
seems like we could end up rapidly looping between sources.  Perhaps I am
misunderstanding how this is meant to work.

+    {
+        {"wal_source_switch_interval", PGC_SIGHUP, REPLICATION_STANDBY,
+            gettext_noop("Sets the time to wait before switching WAL source from archive to primary"),
+            gettext_noop("0 turns this feature off."),
+            GUC_UNIT_MS
+        },
+        &wal_source_switch_interval,
+        5000, 0, INT_MAX,
+        NULL, NULL, NULL
+    },

I wonder if the lower bound should be higher to avoid switching
unnecessarily rapidly between WAL sources.  I see that
WaitForWALToBecomeAvailable() ensures that standbys do not switch from
XLOG_FROM_STREAM to XLOG_FROM_ARCHIVE more often than once per
wal_retrieve_retry_interval.  Perhaps wal_retrieve_retry_interval should be
the lower bound for this GUC, too.  Or maybe WaitForWALToBecomeAvailable()
should make sure that the standby makes at least once attempt to restore
the file from archive before switching to streaming replication.

[0] https://www.postgresql.org/docs/current/warm-standby.html#STANDBY-SERVER-OPERATION

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: HOT chain validation in verify_heapam()
Next
From: Tom Lane
Date:
Subject: Re: pg_publication_tables show dropped columns