Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers) - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)
Date
Msg-id 20220225193819.GB662561@nathanxps13
Whole thread Raw
In response to Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)  ("Hsu, John" <hsuchen@amazon.com>)
Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers
On Fri, Feb 25, 2022 at 08:31:37PM +0530, Bharath Rupireddy wrote:
> Thanks Satya and others for the inputs. Here's the v1 patch that
> basically allows async wal senders to wait until the sync standbys
> report their flush lsn back to the primary. Please let me know your
> thoughts.

I haven't had a chance to look too closely yet, but IIUC this adds a new
function that waits for synchronous replication.  This new function
essentially spins until the synchronous LSN has advanced.

I don't think it's a good idea to block sending any WAL like this.  AFAICT
it is possible that there will be a lot of synchronously replicated WAL
that we can send, and it might just be the last several bytes that cannot
yet be replicated to the asynchronous standbys.  І believe this patch will
cause the server to avoid sending _any_ WAL until the synchronous LSN
advances.

Perhaps we should instead just choose the SendRqstPtr based on the current
synchronous LSN.  Presumably there are other things we'd need to consider,
but in general, I think we ought to send as much WAL as possible for a
given call to XLogSendPhysical().

> I've done pgbench testing to see if the patch causes any problems. I
> ran tests two times, there isn't much difference in the txns per
> seconds (tps), although there's a delay in the async standby receiving
> the WAL, after all, that's the feature we are pursuing.

I'm curious what a longer pgbench run looks like when the synchronous
replicas are in the same region.  That is probably a more realistic
use-case.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Design of pg_stat_subscription_workers vs pgstats
Next
From: Jacob Champion
Date:
Subject: Re: Proposal: Support custom authentication methods using hooks