On Fri, Sep 09, 2022 at 11:07:00PM +0530, Bharath Rupireddy wrote:
> On Fri, Sep 9, 2022 at 10:29 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>> IMO the timeout approach would be more intuitive for users.  When it comes
>> to archive recovery, "WAL segment" isn't a standard unit of measure.  WAL
>> segment size can differ between clusters, and WAL files can have different
>> amounts of data or take different amounts of time to replay.
> 
> How about the amount of WAL bytes fetched from the archive after which
> a standby attempts to connect to primary or enter streaming mode? Of
> late, we've changed some GUCs to represent bytes instead of WAL
> files/segments, see [1].
Well, for wal_keep_size, using bytes makes sense.  Given you know how much
disk space you have, you can set this parameter accordingly to avoid
retaining too much of it for standby servers.  For your proposed parameter,
it's not so simple.  The same setting could have wildly different timing
behavior depending on the server.  I still think that a timeout is the most
intuitive.
>> So I think it
>> would be difficult for the end user to decide on a value.  However, even
>> the timeout approach has this sort of problem.  If your parameter is set to
>> 1 minute, but the current archive takes 5 minutes to recover, you won't
>> really be testing streaming replication once a minute.  That would likely
>> need to be documented.
> 
> If we have configurable WAL bytes instead of timeout for standby WAL
> source switch from archive to primary, we don't have the above problem
> right?
If you are going to stop replaying in the middle of a WAL archive, then
maybe.  But I don't think I'd recommend that.
-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com