Re: Switching XLog source from archive to streaming when primary available - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Switching XLog source from archive to streaming when primary available
Date
Msg-id CALj2ACUaELLztrsdD_YOA3YEqJt7RagBL4pffzK5rc0CWKmLxQ@mail.gmail.com
Whole thread Raw
In response to Re: Switching XLog source from archive to streaming when primary available  (Nathan Bossart <nathandbossart@gmail.com>)
Responses Re: Switching XLog source from archive to streaming when primary available
List pgsql-hackers
On Sun, Oct 9, 2022 at 3:22 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> As I mentioned upthread [0], I'm still a little concerned that this patch
> will cause the state machine to go straight from archive recovery to
> streaming replication, skipping recovery from pg_wal.
>
> [0] https://postgr.es/m/20220906215704.GA2084086%40nathanxps13

Yes, it goes straight to streaming replication skipping recovery from
pg_wal with the patch.

> I wonder if this
> could be resolved by moving the standby to the pg_wal phase instead.
> Concretely, this line
>
> +                               if (switchSource)
> +                                       break;
>
> would instead change currentSource from XLOG_FROM_ARCHIVE to
> XLOG_FROM_PG_WAL before the call to XLogFileReadAnyTLI().  I suspect the
> behavior would be basically the same, but it would maintain the existing
> ordering.

We can give it a chance to restore from pg_wal before switching to
streaming to not change any behaviour of the state machine. But, not
definitely by setting currentSource to XLOG_FROM_WAL, we basically
never explicitly set currentSource to XLOG_FROM_WAL, other than when
not in archive recovery i.e. InArchiveRecovery is false. Also, see the
comment [1].

Instead, the simplest would be to just pass XLOG_FROM_WAL to
XLogFileReadAnyTLI() when we're about to switch the source to stream
mode. This doesn't change the existing behaviour.

> However, I do see the following note elsewhere in xlogrecovery.c:
>
>  * The segment can be fetched via restore_command, or via walreceiver having
>  * streamed the record, or it can already be present in pg_wal. Checking
>  * pg_wal is mainly for crash recovery, but it will be polled in standby mode
>  * too, in case someone copies a new segment directly to pg_wal. That is not
>  * documented or recommended, though.
>
> Given this information, the present behavior might not be too important,
> but I don't see a point in changing it without good reason.

Yeah, with the attached patch we don't skip pg_wal before switching to
streaming mode.

I've also added a note in the 'Standby Server Operation' section about
the new feature.

Please review the v8 patch further.

Unrelated to this patch, the fact that the standby polls pg_wal is not
documented or recommended, is not true, it is actually documented [2].
Whether or not we change the docs to be something like [3], is a
separate discussion.

[1]
            /*
             * We just successfully read a file in pg_wal. We prefer files in
             * the archive over ones in pg_wal, so try the next file again
             * from the archive first.
             */

[2] https://www.postgresql.org/docs/current/warm-standby.html#STANDBY-SERVER-OPERATION
The standby server will also attempt to restore any WAL found in the
standby cluster's pg_wal directory. That typically happens after a
server restart, when the standby replays again WAL that was streamed
from the primary before the restart, but you can also manually copy
files to pg_wal at any time to have them replayed.

[3]
The standby server will also attempt to restore any WAL found in the
standby cluster's pg_wal directory. That typically happens after a
server restart, when the standby replays again WAL that was streamed
from the primary before the restart, but you can also manually copy
files to pg_wal at any time to have them replayed. However, copying of
WAL files manually is not recommended.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Japin Li
Date:
Subject: Re: Remove unnecessary commas for goto labels
Next
From: "Anton A. Melnikov"
Date:
Subject: Re: [BUG] Logical replica crash if there was an error in a function.