On Tue, Oct 11, 2022 at 8:40 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Mon, Oct 10, 2022 at 11:33:57AM +0530, Bharath Rupireddy wrote:
> > On Mon, Oct 10, 2022 at 3:17 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
> >> I wonder if it would be better to simply remove this extra polling of
> >> pg_wal as a prerequisite to your patch. The existing commentary leads me
> >> to think there might not be a strong reason for this behavior, so it could
> >> be a nice way to simplify your patch.
> >
> > I don't think it's a good idea to remove that completely. As said
> > above, it might help someone, we never know.
>
> It would be great to hear whether anyone is using this functionality. If
> no one is aware of existing usage and there is no interest in keeping it
> around, I don't think it would be unreasonable to remove it in v16.
It seems like exhausting all the WAL in pg_wal before switching to
streaming after failing to fetch from archive is unremovable. I found
this after experimenting with it, here are my findings:
1. The standby has to recover initial WAL files in the pg_wal
directory even for the normal post-restart/first-time-start case, I
mean, in non-crash recovery case.
2. The standby received WAL files from primary (walreceiver just
writes and flushes the received WAL to WAL files under pg_wal)
pretty-fast and/or standby recovery is slow, say both the standby
connection to primary and archive connection are broken for whatever
reasons, then it has WAL files to recover in pg_wal directory.
I think the fundamental behaviour for the standy is that it has to
fully recover to the end of WAL under pg_wal no matter who copies WAL
files there. I fully understand the consequences of manually copying
WAL files into pg_wal, for that matter, manually copying/tinkering any
other files into/under the data directory is something we don't
recommend and encourage.
In summary, the standby state machine in WaitForWALToBecomeAvailable()
exhausts all the WAL in pg_wal before switching to streaming after
failing to fetch from archive. The v8 patch proposed upthread deviates
from this behaviour. Hence, attaching v9 patch that keeps the
behaviour as-is, that means, the standby exhausts all the WAL in
pg_wal before switching to streaming after fetching WAL from archive
for at least streaming_replication_retry_interval milliseconds.
Please review the v9 patch further.
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com