Thread: [MASSMAIL] Resetting synchronous_standby_names can wait for CHECKPOINT to finish
[MASSMAIL] Resetting synchronous_standby_names can wait for CHECKPOINT to finish
From
"Yusuke Egashira (Fujitsu)"
Date:
Hello, hackers. When the checkpointer process is busy, even if we reset synchronous_standby_names, the resumption of the backend processeswaiting in SyncRep are made to wait until the checkpoint is completed. This prevents the prompt resumption of application processing when a problem occurs on the standby server in a synchronousreplication system. I confirmed this in PostgreSQL 12.18. This issue has actually become a major problem for our customer. When a problem occurred in the replication network, even after resetting synchronous_standby_names, the backend processesdid not respond, resulting in timeout errors in many client applications. The customer has also set the checkpoint_completion_target parameter to 0.9, and it seems to have been working fine undernormal conditions. However, there was a time when VACUUM was concentrated on a huge table. At that time, more than five times the max_wal_sizeof WAL output occurred during checkpoint processing. Unfortunately, communication with the synchronous standby was lost during that checkpoint processing, and despite resettingthe synchronous_standby_names, multiple client applications could not return a response while waiting for SyncRep. I wrote a script(reset-synchronous_standby_names-during-checkpoint.sh) to illustrate the issue. The script stops the synchronous standby during a transaction, and then resets synchronous_standby_names during checkpoint. When I run this on my 1-core RHEL7 machine, I see that COMMIT does wait until the CHECKPOINT finishes, even though synchronous_standby_nameshas been reset. I am attaching a patch (REL_12_STABLE) for the simplest seeming solution. This moves the handling of SIGHUP reception by the checkpointer outside of the sleep process. However, I am concerned that this change could affect the performance of checkpoint execution when there is a delay in thecheckpoint schedule. Can PostgreSQL tolerate this overhead? Regards, Yusuke Egashira.
Attachment
RE: Resetting synchronous_standby_names can wait for CHECKPOINT to finish
From
"Yusuke Egashira (Fujitsu)"
Date:
Hello, > When the checkpointer process is busy, even if we reset synchronous_standby_names, the resumption of the backend processeswaiting in SyncRep are made to wait until the checkpoint is completed. > This prevents the prompt resumption of application processing when a problem occurs on the standby server in a synchronousreplication system. > I confirmed this in PostgreSQL 12.18. I have tested this issue on Postgres built from the master branch (17devel) and observed the same behavior where the backendSyncRep release is blocked until CHECKPOINT completion. In situations where a synchronous standby instance encounters an error and needs to be detached, I believe that the currentbehavior of waiting for SyncRep is inappropriate as it delays the backend. I don't think changing the position of SIGHUP processing in the Checkpointer process carries much risk. Is there any oversightin my perception? Regards, Yusuke Egashira.