Re: Switching XLog source from archive to streaming when primary available - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: Switching XLog source from archive to streaming when primary available
Date
Msg-id 20230119005014.GA3838170@nathanxps13
Whole thread Raw
In response to Re: Switching XLog source from archive to streaming when primary available  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Switching XLog source from archive to streaming when primary available  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers
On Tue, Jan 17, 2023 at 07:44:52PM +0530, Bharath Rupireddy wrote:
> On Thu, Jan 12, 2023 at 6:21 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>> With your patch, we might replay one of these "old" files in pg_wal instead
>> of the complete version of the file from the archives,
> 
> That's true even today, without the patch, no? We're not changing the
> existing behaviour of the state machine. Can you explain how it
> happens with the patch?

My point is that on HEAD, we will always prefer a complete archive file.
With your patch, we might instead choose to replay an old file in pg_wal
because we are artificially advancing the state machine.  IOW even if
there's a complete archive available, we might not use it.  This is a
behavior change, but I think it is okay.

>> Would you mind testing this scenario?
> 
> How about something like below for testing the above scenario? If it
> looks okay, I can add it as a new TAP test file.
> 
> 1. Generate WAL files f1 and f2 and archive them.
> 2. Check the replay lsn and WAL file name on the standby, when it
> replays upto f2, stop the standby.
> 3. Set recovery to fail on the standby, and stop the standby.
> 4. Generate f3, f4 (partially filled) on the primary.
> 5. Manually copy f3, f4 to the standby's pg_wal.
> 6. Start the standby, since recovery is set to fail, and there're new
> WAL files (f3, f4) under its pg_wal, it must replay those WAL files
> (check the replay lsn and WAL file name, it must be f4) before
> switching to streaming.
> 7. Generate f5 on the primary.
> 8. The standby should receive f5 and replay it (check the replay lsn
> and WAL file name, it must be f5).
> 9. Set streaming to fail on the standby and set recovery to succeed.
> 10. Generate f6 on the primary.
> 11. The standby should receive f6 via archive and replay it (check the
> replay lsn and WAL file name, it must be f6).

I meant testing the scenario where there's an old file in pg_wal, a
complete file in the archives, and your new GUC forces replay of the
former.  This might be difficult to do in a TAP test.  Ultimately, I just
want to validate the assumptions discussed above.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
Next
From: Peter Geoghegan
Date:
Subject: Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation