Re: Switching XLog source from archive to streaming when primary available - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Switching XLog source from archive to streaming when primary available
Date
Msg-id CALj2ACVB5eVTpj6YLkTqE1+mBGhyVLu9SdJg9CAj595rW5XSFA@mail.gmail.com
Whole thread Raw
In response to Re: Switching XLog source from archive to streaming when primary available  (Nathan Bossart <nathandbossart@gmail.com>)
List pgsql-hackers
On Thu, Jan 19, 2023 at 6:20 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Tue, Jan 17, 2023 at 07:44:52PM +0530, Bharath Rupireddy wrote:
> > On Thu, Jan 12, 2023 at 6:21 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
> >> With your patch, we might replay one of these "old" files in pg_wal instead
> >> of the complete version of the file from the archives,
> >
> > That's true even today, without the patch, no? We're not changing the
> > existing behaviour of the state machine. Can you explain how it
> > happens with the patch?
>
> My point is that on HEAD, we will always prefer a complete archive file.
> With your patch, we might instead choose to replay an old file in pg_wal
> because we are artificially advancing the state machine.  IOW even if
> there's a complete archive available, we might not use it.  This is a
> behavior change, but I think it is okay.

Oh, yeah, I too agree that it's okay because manually copying WAL
files directly to pg_wal (which eventually get replayed before
switching to streaming) isn't recommended anyway for production level
servers. I think, we covered it in the documentation that it exhausts
all the WAL present in pg_wal before switching. Isn't that enough?

+        Specifies amount of time after which standby attempts to switch WAL
+        source from WAL archive to streaming replication (get WAL from
+        primary). However, exhaust all the WAL present in pg_wal before
+        switching. If the standby fails to switch to stream mode, it falls
+        back to archive mode.

> >> Would you mind testing this scenario?
ndby should receive f6 via archive and replay it (check the
> > replay lsn an> >
>
> I meant testing the scenario where there's an old file in pg_wal, a
> complete file in the archives, and your new GUC forces replay of the
> former.  This might be difficult to do in a TAP test.  Ultimately, I just
> want to validate the assumptions discussed above.

I think testing the scenario [1] is achievable. I could write a TAP
test for it - https://github.com/BRupireddy/postgres/tree/prefer_archived_wal_v1.
It's a bit flaky and needs a little more work (1 - writing a custom
script for restore_command  that sleeps only after fetching an
existing WAL file from archive, not sleeping for a history file or a
non-existent WAL file. 2- finding a command-line way to sleep on
Windows.) to stabilize it, but it seems doable. I can spend some more
time, if one thinks that the test is worth adding to the core, perhaps
discussing it separately from this thread.

[1] RestoreArchivedFile():
    /*
     * When doing archive recovery, we always prefer an archived log file even
     * if a file of the same name exists in XLOGDIR.  The reason is that the
     * file in XLOGDIR could be an old, un-filled or partly-filled version
     * that was copied and restored as part of backing up $PGDATA.
     *

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: "Drouvot, Bertrand"
Date:
Subject: Re: Split index and table statistics into different types of stats
Next
From: Dean Rasheed
Date:
Subject: Re: Supporting MERGE on updatable views