Re: How to simulate sync/async standbys being closer/farther (network distance) to primary in core postgres? - Mailing list pgsql-hackers

From SATYANARAYANA NARLAPURAM
Subject Re: How to simulate sync/async standbys being closer/farther (network distance) to primary in core postgres?
Date
Msg-id CAHg+QDewHQU6LHNw0hRmV2Ca0=BxD538V1wLnwuBwrwwQWNtQQ@mail.gmail.com
Whole thread Raw
In response to Re: How to simulate sync/async standbys being closer/farther (network distance) to primary in core postgres?  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: How to simulate sync/async standbys being closer/farther (network distance) to primary in core postgres?  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers


On Fri, Apr 8, 2022 at 6:44 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
On Wed, Apr 6, 2022 at 4:30 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Tue, Apr 5, 2022 at 9:23 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > Hi,
> >
> > I'm thinking if there's a way in core postgres to achieve $subject. In
> > reality, the sync/async standbys can either be closer/farther (which
> > means sync/async standbys can receive WAL at different times) to
> > primary, especially in cloud HA environments with primary in one
> > Availability Zone(AZ)/Region and standbys in different AZs/Regions.
> > $subject may not be possible on dev systems (say, for testing some HA
> > features) unless we can inject a delay in WAL senders before sending
> > WAL.

Simulation will be helpful even for end customers to simulate faults in the production environments during availability zone/disaster recovery drills.

 
> >
> > How about having two developer-only GUCs {async,
> > sync}_wal_sender_delay? When set, the async and sync WAL senders will
> > delay sending WAL by {async, sync}_wal_sender_delay
> > milliseconds/seconds? Although, I can't think of any immediate use, it
> > will be useful someday IMO, say for features like [1], if it gets in.
> > With this set of GUCs, one can even add core regression tests for HA
> > features.

I would suggest doing this at the slot level, instead of two GUCs that control the behavior of all the slots (physical/logical). Something like "pg_suspend_replication_slot and pg_Resume_replication_slot"?
Alternatively a GUC on the standby side instead of primary so that the wal receiver stops responding to the wal sender? This helps achieve the same as above but the granularity is now at individual replica level.
 
Thanks,
Satya

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Lowering the ever-growing heap->pd_lower
Next
From: Nathan Bossart
Date:
Subject: Re: avoid multiple hard links to same WAL file after a crash