Re: Add progressive backoff to XactLockTableWait functions - Mailing list pgsql-hackers
From | Fujii Masao |
---|---|
Subject | Re: Add progressive backoff to XactLockTableWait functions |
Date | |
Msg-id | d71d67c9-0fdc-4825-a902-d69ea8deb714@oss.nttdata.com Whole thread Raw |
In response to | Add progressive backoff to XactLockTableWait functions (Xuneng Zhou <xunengzhou@gmail.com>) |
Responses |
Re: Add progressive backoff to XactLockTableWait functions
Re: Add progressive backoff to XactLockTableWait functions |
List | pgsql-hackers |
On 2025/06/24 1:32, Xuneng Zhou wrote: > Hi, > > > Here's patch version 4. > > > 1. The problem > > I conducted further investigation on this issue. The 1ms sleep in XactLockTableWait that falls back to polling was notproblematic in certain scenarios prior to v16. It became an potential issue after the "Allow logical decoding on standbys"feature was introduced [1]. After that commit, we can now create logical replication slots on hot standby serversand perform many other logical decoding operations. [2] > > > > 2. The analysis > > Is this issue limited to calling sites related to logical decoding? I believe so. As we've discussed, the core issue isthat primary transactions are not running locally on the standby, which breaks the assumption of XactLockTableWait—thatthere is an xid lock to wait on. Other call stacks of this function, except those from logical decoding,are unlikely to encounter this problem because they are unlikely to be invoked from standby servers. > > > Analysis of XactLockTableWait calling sites: > > > Problematic Case (Hot Standby): > > - Logical decoding operations in snapbuild.c during snapshot building > > - This is thecase that can run on hot standby and experience the issue > > > Non-problematic Cases (Primary Only): > > - Heap operations (heapam.c): DML operations not possible on read-only standby > > - B-tree operations (nbtinsert.c): Index modifications impossible on standby > > - Logical apply workers (execReplication.c): Cannot start during recovery (BgWorkerStart_RecoveryFinished) > > - All other cases: Require write operations unavailable on standby > > > > 3. The proposed solution > > If the above analysis is sound, one potential fix would be to add separate branching for standby in XactLockTableWait.However, this seems inconsistent with the function's definition—there's simply no lock entry in the locktable for waiting. We could implement a new function for this logic, To be honest, I'm fine with v3, since it only increases the sleep time after 5000 loop iterations, which has negligible performance impact. But if these functions aren't intended to be used during recovery and the loop shouldn't reach that many iterations, I'm also okay with the v4 approach of introducing a separate function specifically for recovery. But this amakes me wonder if we should add something like Assert(!RecoveryInProgress()) to those two functions, to prevent them from being used during recovery in the future. > but it's hard to find a proper place for it to fit well: lmgr.c is for locks, while standby.c or procarray.c are not thatideal either. Therefore, I placed the special handling for standby in SnapBuildWaitSnapshot. Since the purpose and logic of the new function are similar to XactLockTableWait(), I think it would be better to place it nearby in lmgr.c, even though it doesn't handle a lock directly. That would help keep the related logic together and improve readability. Regards, -- Fujii Masao NTT DATA Japan Corporation
pgsql-hackers by date: