On Fri, Jul 8, 2022 at 9:16 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Sat, Jun 25, 2022 at 1:31 AM Cary Huang <cary.huang@highgo.ca> wrote:
> >
> > The following review has been posted through the commitfest application:
> > make installcheck-world: tested, passed
> > Implements feature: tested, passed
> > Spec compliant: not tested
> > Documentation: not tested
> >
> > Hello
> >
> > I tested this patch in a setup where the standby is in the middle of replicating and REDOing primary's WAL files
duringa very large data insertion. During this time, I keep killing the walreceiver process to cause a stream failure
andforce standby to read from archive. The system will restore from archive for "wal_retrieve_retry_interval" seconds
beforeit attempts to steam again. Without this patch, once the streaming is interrupted, it keeps reading from archive
untilstandby reaches the same consistent state of primary and then it will switch back to streaming again. So it seems
thatthe patch does the job as described and does bring some benefit during a very large REDO job where it will try to
re-streamafter restoring some WALs from archive to speed up this "catch up" process. But if the recovery job is not a
largeone, PG is already switching back to streaming once it hits consistent state.
>
> Thanks a lot Cary for testing the patch.
>
> > Here's a v1 patch that I've come up with. I'm right now using the
> > existing GUC wal_retrieve_retry_interval to switch to stream mode from
> > archive mode as opposed to switching only after the failure to get WAL
> > from archive mode. If okay with the approach, I can add tests, change
> > the docs and add a new GUC to control this behaviour. I'm open to
> > thoughts and ideas here.
>
> It will be great if I can hear some thoughts on the above points (as
> posted upthread).
Here's the v2 patch with a separate GUC, new GUC was necessary as the
existing GUC wal_retrieve_retry_interval is loaded with multiple
usages. When the feature is enabled, it will let standby to switch to
stream mode i.e. fetch WAL from primary before even fetching from
archive fails. The switching to stream mode from archive happens in 2
scenarios: 1) when standby is in initial recovery 2) when there was a
failure in receiving from primary (walreceiver got killed or crashed
or timed out, or connectivity to primary was broken - for whatever
reasons).
I also added test cases to the v2 patch.
Please review the patch.
--
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/