Re: [PATCH] Add cascade synchronous replication - Mailing list pgsql-hackers
| From | Grigoriy Novikov |
|---|---|
| Subject | Re: [PATCH] Add cascade synchronous replication |
| Date | |
| Msg-id | d1ff931e-6605-4af7-b02f-300106aee8c4@gmail.com Whole thread Raw |
| In response to | [PATCH] Add cascade synchronous replication (Григорий Новиков <grigoriy.novikov220@gmail.com>) |
| List | pgsql-hackers |
Hello hackers. My apologies for not fixing the code indentation according to the requirements right away. I'm attaching the corrected version of the patch. On 11/12/25 20:05, Григорий Новиков wrote: > Hello hackers, > > Introduction > Using a large number of synchronous standbys creates excessive load on > the primary node. To solve this problem, cascading synchronous > replication can be used. > > Overview of Changes > This patch adds synchronous cascading replication mechanics to > PostgreSQL. With it, standby servers will consider configuration > parameters related to synchronous replication. They will select > walsenders LSN positions from walsdender data structures and compute > the synchronous LSN position for write, flush, and apply among them > using the synchronous replication algorithm, then calculate the > minimum value between these values and the corresponding positions of > the standby server. To avoid synchronization problems and unnecessary > overhead, these calculations are performed by the walreceiver process. > The offset positions will be transmitted in the standby reply message > instead of the server's own positions. This will occur if the > SyncRepRequested condition is met and if at least one synchronous > standby server is specified in synchronous_standby_names. > In case the walsender processes fail to calculate synchronous LSN > values (for example, because there are not enough synchronous > standbys), the server will send DefaultSendingLSN. This value is > between InvalidXLogRecPtr and FirstNormalUnloggedLSN. Sending > InvalidXLogRecPtr is not allowed because in the pg_stat_replication > function, a standby sending such value will be displayed as > asynchronous, although it is not. The value 2 was chosen for > DefaultSendingLSN since 1 is used by one of the access methods. > When receiving a DefaultSendingLSN position value from a synchronous > standby, the server will use it as a regular LSN. This allows > transaction execution to continue if the configuration permits it. If > not, transaction execution stops until the cluster failure is resolved. > > Overview of Individual Patch Parts > The first part adds the SyncRepGetSendingSyncRecPtr function, which is > written similarly to SyncRepGetSyncRecPtr and is responsible for > calculating the LSN positions to be sent. These functions contained a > large common code section, which was moved to the > SyncRepGetSyncRecPtrBySyncRepMethod function. Also, for optimization > purposes, the walsender process serving a synchronous standby can call > the WalRcvForceReply function. > The second part of the patch is responsible for redistributing code in > the syncrep.c file into sections. This is necessary to preserve the > semantics of the sections used in this file, since now some functions > can be used by the walreceiver process, while others can be used by > both walreceiver and walsender. > The third part adds a special notation in pg_stat_replication for > standbys sending DefaultSendingLSN. If such a standby is synchronous, > it is marked with a "?" symbol. In the author's opinion, this notation > can simplify problem searching in the cluster, but does not claim to > be a serious solution for failure detection. > The fourth part of the patch contains fixes in recovery tests numbered > 9 and 12. These tests created circular dependencies between servers. > This was not a problem as long as standby ignored synchronous > replication parameters, but with this patch the tests broke. Also, > tests for the new mechanics were added to test 7, which is responsible > for synchronous replication. > > Possible Topologies > As part of the patch, connection of asynchronous and synchronous > standbys to a synchronous standby is allowed. However, offset > positions sent by asynchronous standbys will not be considered, since > the synchronous replication algorithm is used. For the same reason, > connecting a synchronous standby to an asynchronous one is > theoretically possible but meaningless. > > Additional Information > The patch contains no platform-dependent elements, compiles with the > -Wall flag, and successfully passes tests. Performance optimization is > a separate task, and in the author's opinion, deserves a separate > patch. Nevertheless, local testing using Docker containers showed > insignificant performance degradation when using synchronous cascading > chains. > This patch is intended primarily for discussion. It was developed for > the master branch, commit hash: b227b0bb4e032e19b3679bedac820eba3ac0d1cf. > Best wishes, Grigoriy Novikov!
Attachment
pgsql-hackers by date: