Fwd: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes - Mailing list pgsql-hackers

From SATYANARAYANA NARLAPURAM
Subject Fwd: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes
Date
Msg-id CAHg+QDetitTL0bzuMjymzbScs9X8W0LLHrVc+4HsWhP382KTDg@mail.gmail.com
Whole thread Raw
In response to Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes
List pgsql-hackers
Please find the attached draft patch.

On Thu, Dec 23, 2021 at 2:47 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
On Thu, Dec 23, 2021 at 5:53 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:
>
> Hi Hackers,
>
> I am considering implementing RPO (recovery point objective) enforcement feature for Postgres where the WAL writes on the primary are stalled when the WAL distance between the primary and standby exceeds the configured (replica_lag_in_bytes) threshold. This feature is useful particularly in the disaster recovery setups where primary and standby are in different regions and synchronous replication can't be set up for latency and performance reasons yet requires some level of RPO enforcement.

+1 for the idea in general. However, blocking writes on primary seems
an extremely radical idea. The replicas can fall behind transiently at
times and blocking writes on the primary may stop applications failing
for these transient times. This is not a problem if the applications
have retry logic for the writes. How about blocking writes on primary
if the replicas fall behind the primary for a certain period of time?

My proposal is to block the caller from writing until the lag situation is improved. Don't want to throw any errors and fail the tranaction. I think we are aligned?
 

> The idea here is to calculate the lag between the primary and the standby (Async?) server during XLogInsert and block the caller until the lag is less than the threshold value. We can calculate the max lag by iterating over ReplicationSlotCtl->replication_slots.

The "falling behind" can also be quantified by the number of
write-transactions on the primary. I think it's good to have the users
choose what the "falling behind" means for them. We can have something
like the "recovery_target" param with different options name, xid,
time, lsn.

The transactions can be of arbitrary size and length and these options may not provide the desired results. Time is a worthy option to add.
 

> If this is not something we don't want to do in the core, at least adding a hook for XlogInsert is of great value.

IMHO, this feature may not be needed by everyone, the hook-way seems
reasonable so that the postgres vendors can provide different
implementations (for instance they can write an extension that
implements this hook which can block writes on primary, write some log
messages, inform some service layer of the replicas falling behind the
primary etc.). If we were to have the hook in XLogInsert which gets
called so frequently or XLogInsert is a hot-path, the hook really
should do as little work as possible, otherwise the write operations
latency may increase.

A Hook is a good start. If there is enough interest then an extension can be added to the contrib module.


> A few other scenarios I can think of with the hook are:
>
> Enforcing RPO as described above
> Enforcing rate limit and slow throttling when sync standby is falling behind (could be flush lag or replay lag)
> Transactional log rate governance - useful for cloud providers to provide SKU sizes based on allowed WAL writes.
>
> Thoughts?

The hook can help to achieve the above objectives but where to place
it and what parameters it should take as input (or what info it should
emit out of the server via the hook) are important too.

XLogInsert in my opinion is the best place to call it and the hook can be something like this "void xlog_insert_hook(NULL)" as all the throttling logic required is the current flush position which can be obtained from GetFlushRecPtr and the ReplicationSlotCtl. Attached a draft patch.
 

Having said all, the RPO feature can also be implemented outside of
the postgres, a simple implementation could be - get the primary
current wal lsn using pg_current_wal_lsn and all the replicas
restart_lsn using pg_replication_slot, if they differ by certain
amount, then issue ALTER SYSTEM SET READ ONLY command [1] on the
primary, this requires the connections to the server and proper access
rights. This feature can also be implemented as an extension (without
the hook) which doesn't require any connections to the server yet can
access the required info primary current_wal_lsn, restart_lsn of the
replication slots etc, but the RPO enforcement may not be immediate as
the server doesn't have any hooks in XLogInsert or some other area.
 
READ ONLY is a decent choice but can fail the writes or not take into effect until the end of the transaction? 
 
[1] - https://www.postgresql.org/message-id/CAAJ_b967uKBiW6gbHr5aPzweURYjEGv333FHVHxvJmMhanwHXA%40mail.gmail.com

Regards,
Bharath Rupireddy.
Attachment

pgsql-hackers by date:

Previous
From: Joshua Brindle
Date:
Subject: Re: CREATEROLE and role ownership hierarchies
Next
From: Michael Paquier
Date:
Subject: Re: Delay the variable initialization in get_rel_sync_entry