On Tue, Jan 17, 2023 at 07:44:52PM +0530, Bharath Rupireddy wrote:
> On Thu, Jan 12, 2023 at 6:21 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>> With your patch, we might replay one of these "old" files in pg_wal instead
>> of the complete version of the file from the archives,
>
> That's true even today, without the patch, no? We're not changing the
> existing behaviour of the state machine. Can you explain how it
> happens with the patch?
My point is that on HEAD, we will always prefer a complete archive file.
With your patch, we might instead choose to replay an old file in pg_wal
because we are artificially advancing the state machine. IOW even if
there's a complete archive available, we might not use it. This is a
behavior change, but I think it is okay.
>> Would you mind testing this scenario?
>
> How about something like below for testing the above scenario? If it
> looks okay, I can add it as a new TAP test file.
>
> 1. Generate WAL files f1 and f2 and archive them.
> 2. Check the replay lsn and WAL file name on the standby, when it
> replays upto f2, stop the standby.
> 3. Set recovery to fail on the standby, and stop the standby.
> 4. Generate f3, f4 (partially filled) on the primary.
> 5. Manually copy f3, f4 to the standby's pg_wal.
> 6. Start the standby, since recovery is set to fail, and there're new
> WAL files (f3, f4) under its pg_wal, it must replay those WAL files
> (check the replay lsn and WAL file name, it must be f4) before
> switching to streaming.
> 7. Generate f5 on the primary.
> 8. The standby should receive f5 and replay it (check the replay lsn
> and WAL file name, it must be f5).
> 9. Set streaming to fail on the standby and set recovery to succeed.
> 10. Generate f6 on the primary.
> 11. The standby should receive f6 via archive and replay it (check the
> replay lsn and WAL file name, it must be f6).
I meant testing the scenario where there's an old file in pg_wal, a
complete file in the archives, and your new GUC forces replay of the
former. This might be difficult to do in a TAP test. Ultimately, I just
want to validate the assumptions discussed above.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com