On Tue, Oct 18, 2022 at 07:31:33AM +0530, Bharath Rupireddy wrote:
> In summary, the standby state machine in WaitForWALToBecomeAvailable()
> exhausts all the WAL in pg_wal before switching to streaming after
> failing to fetch from archive. The v8 patch proposed upthread deviates
> from this behaviour. Hence, attaching v9 patch that keeps the
> behaviour as-is, that means, the standby exhausts all the WAL in
> pg_wal before switching to streaming after fetching WAL from archive
> for at least streaming_replication_retry_interval milliseconds.
I think this is okay. The following comment explains why archives are
preferred over existing files in pg_wal:
* When doing archive recovery, we always prefer an archived log file even
* if a file of the same name exists in XLOGDIR. The reason is that the
* file in XLOGDIR could be an old, un-filled or partly-filled version
* that was copied and restored as part of backing up $PGDATA.
With your patch, we might replay one of these "old" files in pg_wal instead
of the complete version of the file from the archives, but I think that is
still correct. We'll just replay whatever exists in pg_wal (which may be
un-filled or partly-filled) before attempting streaming. If that fails,
we'll go back to trying the archives again.
Would you mind testing this scenario?
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com