Re: How to simulate sync/async standbys being closer/farther (network distance) to primary in core postgres? - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: How to simulate sync/async standbys being closer/farther (network distance) to primary in core postgres?
Date
Msg-id CALj2ACWd2fds-LagF=VfSgr9fQwTaByV40urNZjhpqvaa1F6dQ@mail.gmail.com
Whole thread Raw
In response to Re: How to simulate sync/async standbys being closer/farther (network distance) to primary in core postgres?  (SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>)
Responses Re: How to simulate sync/async standbys being closer/farther (network distance) to primary in core postgres?  (Julien Rouhaud <rjuju123@gmail.com>)
List pgsql-hackers
On Fri, Apr 8, 2022 at 10:22 PM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:
>
>> > <bharath.rupireddyforpostgres@gmail.com> wrote:
>> > >
>> > > Hi,
>> > >
>> > > I'm thinking if there's a way in core postgres to achieve $subject. In
>> > > reality, the sync/async standbys can either be closer/farther (which
>> > > means sync/async standbys can receive WAL at different times) to
>> > > primary, especially in cloud HA environments with primary in one
>> > > Availability Zone(AZ)/Region and standbys in different AZs/Regions.
>> > > $subject may not be possible on dev systems (say, for testing some HA
>> > > features) unless we can inject a delay in WAL senders before sending
>> > > WAL.
>
> Simulation will be helpful even for end customers to simulate faults in the production environments during
availabilityzone/disaster recovery drills.
 

Right.

>> > > How about having two developer-only GUCs {async,
>> > > sync}_wal_sender_delay? When set, the async and sync WAL senders will
>> > > delay sending WAL by {async, sync}_wal_sender_delay
>> > > milliseconds/seconds? Although, I can't think of any immediate use, it
>> > > will be useful someday IMO, say for features like [1], if it gets in.
>> > > With this set of GUCs, one can even add core regression tests for HA
>> > > features.
>
> I would suggest doing this at the slot level, instead of two GUCs that control the behavior of all the slots
(physical/logical).Something like "pg_suspend_replication_slot and pg_Resume_replication_slot"?
 

Having the control at the replication slot level seems reasonable
instead of at the WAL sender level. As there can be many slots on the
primary, we must have a way to specify which slots need to be delayed
and by how much time before sending WAL. If GUCs, they must be of list
types and I'm not sure that would come out well.

Instead, two (superuser-only/users with replication role) functions
such as pg_replication_slot_set_delay(slot_name,
delay_in_milliseconds)/pg_replication_slot_unset_delay(slot_name).
pg_replication_slot_set_delay will set ReplicationSlot->delay and the
WAL sender checks MyReplicationSlot->delay > 0 and waits before
sending WAL. pg_replication_slot_unset_delay will set
ReplicationSlot->delay to 0, or instead of
pg_replication_slot_unset_delay, the
pg_replication_slot_set_delay(slot_name, 0) can be used, this way only
single function.

If the users want a standby to receive WAL with a delay, they can use
pg_replication_slot_set_delay after creating the replication slot.

Thoughts?

> Alternatively a GUC on the standby side instead of primary so that the wal receiver stops responding to the wal
sender?

I think we have wal_receiver_status_interval GUC on WAL receiver that
achieves the above i.e. not responding to the primary at all, one can
set wal_receiver_status_interval to, say, 1day.

[1]
    {
        {"wal_receiver_status_interval", PGC_SIGHUP, REPLICATION_STANDBY,
            gettext_noop("Sets the maximum interval between WAL
receiver status reports to the sending server."),
            NULL,
            GUC_UNIT_S
        },
        &wal_receiver_status_interval,
        10, 0, INT_MAX / 1000,
        NULL, NULL, NULL
    },

Regards,
Bharath Rupireddy.



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: failures in t/031_recovery_conflict.pl on CI
Next
From: Christoph Berg
Date:
Subject: Re: How about a psql backslash command to show GUCs?