Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers) - Mailing list pgsql-hackers
From | Nathan Bossart |
---|---|
Subject | Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers) |
Date | |
Msg-id | 20220226160711.GA671429@nathanxps13 Whole thread Raw |
In response to | Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers) (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) |
Responses |
Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)
|
List | pgsql-hackers |
On Sat, Feb 26, 2022 at 02:17:50PM +0530, Bharath Rupireddy wrote: > A global min LSN of SendRqstPtr of all the sync standbys can be > calculated and the async standbys can send WAL up to global min LSN. > This is unlike what the v1 patch does i.e. async standbys will wait > until the sync standbys report flush LSN back to the primary. Problem > with the global min LSN approach is that there can still be a small > window where async standbys can get ahead of sync standbys. Imagine > async standbys being closer to the primary than sync standbys and if > the failover has to happen while the WAL at SendRqstPtr isn't received > by the sync standbys, but the async standbys can receive them as they > are closer. We hit the same problem that we are trying to solve with > this patch. This is the reason, we are waiting till the sync flush LSN > as it guarantees more transactional protection. Do you mean that the application of WAL gets ahead on your async standbys or that the writing/flushing of WAL gets ahead? If synchronous_commit is set to 'remote_write' or 'on', I think either approach can lead to situations where the async standbys are ahead of the sync standbys with WAL application. For example, a conflict between WAL replay and a query on your sync standby could delay WAL replay, but the primary will not wait for this conflict to resolve before considering a transaction synchronously replicated and sending it to the async standbys. If writing/flushing WAL gets ahead on async standbys, I think something is wrong with the patch. If you aren't sending WAL to async standbys until it is synchronously replicated to the sync standbys, it should by definition be impossible for this to happen. If you wanted to make sure that WAL was not applied to async standbys before it was applied to sync standbys, I think you'd need to set synchronous_commit to 'remote_apply'. This would ensure that the WAL is replayed on sync standbys before the primary considers the transaction synchronously replicated and sends it to the async standbys. > Do you think allowing async standbys optionally wait for either remote > write or flush or apply or global min LSN of SendRqstPtr so that users > can choose what they want? I'm not sure I follow the difference between "global min LSN of SendRqstPtr" and remote write/flush/apply. IIUC you are saying that we could use the LSN of what is being sent to sync standbys instead of the LSN of what the primary considers synchronously replicated. I don't think we should do that because it provides no guarantee that the WAL has even been sent to the sync standbys before it is sent to the async standbys. For this feature, I think we always need to consider what the primary considers synchronously replicated. My suggested approach doesn't change that. I'm saying that instead of spinning in a loop waiting for the WAL to be synchronously replicated, we just immediately send WAL up to the LSN that is presently known to be synchronously replicated. You do bring up an interesting point, though. Is there a use-case for specifying synchronous_commit='on' but not sending WAL to async replicas until it is synchronously applied? Or alternatively, would anyone want to set synchronous_commit='remote_apply' but send WAL to async standbys as soon as it is written to the sync standbys? My initial reaction is that we should depend on the synchronous replication setup. As long as the primary considers an LSN synchronously replicated, it would be okay to send it to the async standbys. I personally don't think it is worth taking on the extra complexity for that level of configuration just yet. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: