Re: Fix LOCK_TIMEOUT handling in slotsync worker - Mailing list pgsql-hackers

From shveta malik
Subject Re: Fix LOCK_TIMEOUT handling in slotsync worker
Date
Msg-id CAJpy0uDzEwSm=1=xnK3o_=W5SfgP2TEVz50-xGk434AKWpE1Og@mail.gmail.com
Whole thread Raw
In response to Fix LOCK_TIMEOUT handling in slotsync worker  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
List pgsql-hackers
On Mon, Dec 8, 2025 at 7:34 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> Hi,
>
> Previously, the slotsync worker used SIGINT to receive a graceful shutdown
> signal from the startup process on promotion. However, SIGINT is also used by
> the LOCK_TIMEOUT handler to trigger a query-cancel interrupt. Given that the
> slotsync worker can access and lock catalog tables while parsing libpq tuples,
> this overlapping use of SIGINT led to the slotsync worker ignoring LOCK_TIMEOUT
> signals and consequently waiting indefinitely on locks.
>
> I can reproduce the issue by:
>
> 1) create a failover replication slot for slotsync on primary.
> 2) start slotsync worker on standby and uses gdb to make the slotsync
> worker block before accessing pg_type catalog via walrcv_exec -> libpqrcv_exec ->
> libpqrcv_processTuples -> TupleDescInitEntry -> SearchSysCache1.
> 3) take ACCESS EXCLUSIVE lock on pg_type on primary.
> 4) log standby snapshot to replicate the lock to standby.
> 5) release the slotsync worker, it will start waiting for the lock on pg_type to
>    be released. And on HEAD, it would not be canceled by the lock_timeout
>    setting.
>
> Here is a patch to resolve this by replacing the current signal handler with the
> appropriate StatementCancelHandler for SIGINT within the slotsync worker.
> Furthermore, it updates the startup process to send a SIGUSR1 signal to notify
> slotsync of the need to stop during promotion. The slotsync worker now stops
> upon detecting that the shared memory flag (stopSignaled) is set to true.
>
> I did not add a tap-test in the patch for now. Although feasible, it requires
> a strong lock on a catalog and an injection point to control the
> process.
>

Thanks for the patch. I agree with the issue mentioned and can
reproduce it on HEAD; verified that the patch fixes it.
The patch looks good to me.

thanks
Shveta



pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: Proposal: Conflict log history table for Logical Replication
Next
From: Nitin Jadhav
Date:
Subject: Re: Fix crash during recovery when redo segment is missing