Re: [PATCH] Add cascade synchronous replication - Mailing list pgsql-hackers

From Grigoriy Novikov
Subject Re: [PATCH] Add cascade synchronous replication
Date
Msg-id d1ff931e-6605-4af7-b02f-300106aee8c4@gmail.com
Whole thread Raw
In response to [PATCH] Add cascade synchronous replication  (Григорий Новиков <grigoriy.novikov220@gmail.com>)
List pgsql-hackers
Hello hackers.
My apologies for not fixing the code indentation according to the
requirements right away. I'm attaching the corrected version of the patch.
On 11/12/25 20:05, Григорий Новиков wrote:
> Hello hackers,
>
> Introduction
> Using a large number of synchronous standbys creates excessive load on 
> the primary node. To solve this problem, cascading synchronous 
> replication can be used.
>
> Overview of Changes
> This patch adds synchronous cascading replication mechanics to 
> PostgreSQL. With it, standby servers will consider configuration 
> parameters related to synchronous replication. They will select 
> walsenders LSN positions from walsdender data structures and compute 
> the synchronous LSN position for write, flush, and apply  among them 
> using the synchronous replication algorithm, then calculate the 
> minimum value between these values and the corresponding positions of 
> the standby server. To avoid synchronization problems and unnecessary 
> overhead, these calculations are performed by the walreceiver process. 
> The offset positions will be transmitted in the standby reply message 
> instead of the server's own positions. This will occur if the 
> SyncRepRequested condition is met and if at least one synchronous 
> standby server is specified in synchronous_standby_names.
> In case the walsender processes fail to calculate synchronous LSN 
> values (for example, because there are not enough synchronous 
> standbys), the server will send DefaultSendingLSN. This value is 
> between InvalidXLogRecPtr and FirstNormalUnloggedLSN. Sending 
> InvalidXLogRecPtr is not allowed because in the pg_stat_replication 
> function, a standby sending such value will be displayed as 
> asynchronous, although it is not. The value 2 was chosen for 
> DefaultSendingLSN since 1 is used by one of the access methods.
> When receiving a DefaultSendingLSN position value from a synchronous 
> standby, the server will use it as a regular LSN. This allows 
> transaction execution to continue if the configuration permits it. If 
> not, transaction execution stops until the cluster failure is resolved.
>
> Overview of Individual Patch Parts
> The first part adds the SyncRepGetSendingSyncRecPtr function, which is 
> written similarly to SyncRepGetSyncRecPtr and is responsible for 
> calculating the LSN positions to be sent. These functions contained a 
> large common code section, which was moved to the 
> SyncRepGetSyncRecPtrBySyncRepMethod function. Also, for optimization 
> purposes, the walsender process serving a synchronous standby can call 
> the WalRcvForceReply function.
> The second part of the patch is responsible for redistributing code in 
> the syncrep.c file into sections. This is necessary to preserve the 
> semantics of the sections used in this file, since now some functions 
> can be used by the walreceiver process, while others can be used by 
> both walreceiver and walsender.
> The third part adds a special notation in pg_stat_replication for 
> standbys sending DefaultSendingLSN. If such a standby is synchronous, 
> it is marked with a "?" symbol. In the author's opinion, this notation 
> can simplify problem searching in the cluster, but does not claim to 
> be a serious solution for failure detection.
> The fourth part of the patch contains fixes in recovery tests numbered 
> 9 and 12. These tests created circular dependencies between servers. 
> This was not a problem as long as standby ignored synchronous 
> replication parameters, but with this patch the tests broke. Also, 
> tests for the new mechanics were added to test 7, which is responsible 
> for synchronous replication.
>
> Possible Topologies
> As part of the patch, connection of asynchronous and synchronous 
> standbys to a synchronous standby is allowed. However, offset 
> positions sent by asynchronous standbys will not be considered, since 
> the synchronous replication algorithm is used. For the same reason, 
> connecting a synchronous standby to an asynchronous one is 
> theoretically possible but meaningless.
>
> Additional Information
> The patch contains no platform-dependent elements, compiles with the 
> -Wall flag, and successfully passes tests. Performance optimization is 
> a separate task, and in the author's opinion, deserves a separate 
> patch. Nevertheless, local testing using Docker containers showed 
> insignificant performance degradation when using synchronous cascading 
> chains.
> This patch is intended primarily for discussion. It was developed for 
> the master branch, commit hash: b227b0bb4e032e19b3679bedac820eba3ac0d1cf.
> Best wishes, Grigoriy Novikov!
Attachment

pgsql-hackers by date:

Previous
From: "Zhijie Hou (Fujitsu)"
Date:
Subject: RE: [bug fix] prepared transaction might be lost when max_prepared_transactions is zero on the subscriber
Next
From: Chao Li
Date:
Subject: Re: Remove MsgType type