On Mon, Dec 8, 2025 at 7:34 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> Hi,
>
> Previously, the slotsync worker used SIGINT to receive a graceful shutdown
> signal from the startup process on promotion. However, SIGINT is also used by
> the LOCK_TIMEOUT handler to trigger a query-cancel interrupt. Given that the
> slotsync worker can access and lock catalog tables while parsing libpq tuples,
> this overlapping use of SIGINT led to the slotsync worker ignoring LOCK_TIMEOUT
> signals and consequently waiting indefinitely on locks.
>
> I can reproduce the issue by:
>
> 1) create a failover replication slot for slotsync on primary.
> 2) start slotsync worker on standby and uses gdb to make the slotsync
> worker block before accessing pg_type catalog via walrcv_exec -> libpqrcv_exec ->
> libpqrcv_processTuples -> TupleDescInitEntry -> SearchSysCache1.
> 3) take ACCESS EXCLUSIVE lock on pg_type on primary.
> 4) log standby snapshot to replicate the lock to standby.
> 5) release the slotsync worker, it will start waiting for the lock on pg_type to
> be released. And on HEAD, it would not be canceled by the lock_timeout
> setting.
>
> Here is a patch to resolve this by replacing the current signal handler with the
> appropriate StatementCancelHandler for SIGINT within the slotsync worker.
> Furthermore, it updates the startup process to send a SIGUSR1 signal to notify
> slotsync of the need to stop during promotion. The slotsync worker now stops
> upon detecting that the shared memory flag (stopSignaled) is set to true.
>
> I did not add a tap-test in the patch for now. Although feasible, it requires
> a strong lock on a catalog and an injection point to control the
> process.
>
Thanks for the patch. I agree with the issue mentioned and can
reproduce it on HEAD; verified that the patch fixes it.
The patch looks good to me.
thanks
Shveta