Re: Switching XLog source from archive to streaming when primary available - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Switching XLog source from archive to streaming when primary available
Date
Msg-id CALj2ACW1qQ3-mTQXASc2UJHfS4iyqiidG=rG5U9042FcmuCtXg@mail.gmail.com
Whole thread Raw
In response to Re: Switching XLog source from archive to streaming when primary available  (Nathan Bossart <nathandbossart@gmail.com>)
Responses Re: Switching XLog source from archive to streaming when primary available
List pgsql-hackers
On Fri, Sep 9, 2022 at 10:29 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Fri, Sep 09, 2022 at 12:14:25PM +0530, Bharath Rupireddy wrote:
> > On Fri, Sep 9, 2022 at 10:57 AM Kyotaro Horiguchi
> > <horikyota.ntt@gmail.com> wrote:
> >> At Thu, 8 Sep 2022 10:53:56 -0700, Nathan Bossart <nathandbossart@gmail.com> wrote in
> >> > My general point is that we should probably offer some basic preventative
> >> > measure against flipping back and forth between streaming and archive
> >> > recovery while making zero progress.  As I noted, maybe that's as simple as
> >> > having WaitForWALToBecomeAvailable() attempt to restore a file from archive
> >> > at least once before the new parameter forces us to switch to streaming
> >> > replication.  There might be other ways to handle this.
> >>
> >> +1.
> >
> > Hm. In that case, I think we can get rid of timeout based switching
> > mechanism and have this behaviour - the standby can attempt to switch
> > to streaming mode from archive, say, after fetching 1, 2 or a
> > configurable number of WAL files. In fact, this is the original idea
> > proposed by Satya in this thread.
>
> IMO the timeout approach would be more intuitive for users.  When it comes
> to archive recovery, "WAL segment" isn't a standard unit of measure.  WAL
> segment size can differ between clusters, and WAL files can have different
> amounts of data or take different amounts of time to replay.

How about the amount of WAL bytes fetched from the archive after which
a standby attempts to connect to primary or enter streaming mode? Of
late, we've changed some GUCs to represent bytes instead of WAL
files/segments, see [1].

> So I think it
> would be difficult for the end user to decide on a value.  However, even
> the timeout approach has this sort of problem.  If your parameter is set to
> 1 minute, but the current archive takes 5 minutes to recover, you won't
> really be testing streaming replication once a minute.  That would likely
> need to be documented.

If we have configurable WAL bytes instead of timeout for standby WAL
source switch from archive to primary, we don't have the above problem
right?

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=c3fe108c025e4a080315562d4c15ecbe3f00405e

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: Possible crash on standby
Next
From: Jaime Casanova
Date:
Subject: Re: START_REPLICATION SLOT causing a crash in an assert build