Thread: Add progressive backoff to XactLockTableWait functions
Hi hackers, This patch implements progressive backoff in XactLockTableWait() and ConditionalXactLockTableWait(). As Kevin reported in this thread [1], XactLockTableWait() can enter a tight polling loop during logical replication slot creation on standby servers, sleeping for fixed 1ms intervals that can continue for a long time. This creates significant CPU overhead. The patch implements a time-based threshold approach based on Fujii’s idea [1]: keep sleeping for 1ms until the total sleep time reaches 10 seconds, then start exponential backoff (doubling the sleep duration each cycle) up to a maximum of 10 seconds per sleep. This balances responsiveness for normal operations (which typically complete within seconds) against CPU efficiency for the long waits in some logical replication scenarios. [1] https://www.postgresql.org/message-id/flat/CAM45KeELdjhS-rGuvN%3DZLJ_asvZACucZ9LZWVzH7bGcD12DDwg%40mail.gmail.com Best regards, Xuneng
Attachment
On 2025/06/08 23:33, Xuneng Zhou wrote: > Hi hackers, > > This patch implements progressive backoff in XactLockTableWait() and > ConditionalXactLockTableWait(). > > As Kevin reported in this thread [1], XactLockTableWait() can enter a > tight polling loop during logical replication slot creation on standby > servers, sleeping for fixed 1ms intervals that can continue for a long > time. This creates significant CPU overhead. > > The patch implements a time-based threshold approach based on Fujii’s > idea [1]: keep sleeping for 1ms until the total sleep time reaches 10 > seconds, then start exponential backoff (doubling the sleep duration > each cycle) up to a maximum of 10 seconds per sleep. This balances > responsiveness for normal operations (which typically complete within > seconds) against CPU efficiency for the long waits in some logical > replication scenarios. Thanks for the patch! When I first suggested this idea, I used 10s as an example for the maximum sleep time. But thinking more about it now, 10s might be too long. Even if the target transaction has already finished, XactLockTableWait() could still wait up to 10 seconds, which seems excessive. What about using 1s instead? That value is already used as a max sleep time in other places, like WaitExceedsMaxStandbyDelay(). If we agree on 1s as the max, then using exponential backoff from 1ms to 1s after the threshold might not be necessary. It might be simpler and sufficient to just sleep for 1s once we hit the threshold. Based on that, I think a change like the following could work well. Thought? ---------------------------------------- XactLockTableWaitInfo info; ErrorContextCallback callback; bool first = true; + int left_till_hibernate = 5000; <snip> if (!first) { CHECK_FOR_INTERRUPTS(); - pg_usleep(1000L); + + if (left_till_hibernate > 0) + { + pg_usleep(1000L); + left_till_hibernate--; + } + else + pg_usleep(1000000L); ---------------------------------------- Regards, -- Fujii Masao NTT DATA Japan Corporation
Thanks for the feedback!
When I first suggested this idea, I used 10s as an example for
the maximum sleep time. But thinking more about it now, 10s might
be too long. Even if the target transaction has already finished,
XactLockTableWait() could still wait up to 10 seconds,
which seems excessive.
What about using 1s instead? That value is already used as a max
sleep time in other places, like WaitExceedsMaxStandbyDelay().
If we agree on 1s as the max, then using exponential backoff from
1ms to 1s after the threshold might not be necessary. It might
be simpler and sufficient to just sleep for 1s once we hit
the threshold.
Based on that, I think a change like the following could work well.
Thought?
Although it’s clear that replacing tight 1 ms polling loops will reduce CPU usage, I'm curious about the hard numbers. To that end, I ran a 60 s logical-replication slot–creation workload on a standby using three different XactLockTableWait() variants—on an 8-core, 16 GB AMD system—and collected both profiling traces and hardware-counter metrics.
1. Hardware‐counter results
- CPU cycles drop by 58% moving from 1 ms to exp. backoff, and another 25% to the 1 s threshold variant.
- Cache‐misses and context‐switches see similarly large reductions.
- IPC remains around 0.45, dipping slightly under longer sleeps.
Attachment
Hi, On 2025-06-08 22:33:39 +0800, Xuneng Zhou wrote: > This patch implements progressive backoff in XactLockTableWait() and > ConditionalXactLockTableWait(). > > As Kevin reported in this thread [1], XactLockTableWait() can enter a > tight polling loop during logical replication slot creation on standby > servers, sleeping for fixed 1ms intervals that can continue for a long > time. This creates significant CPU overhead. > > The patch implements a time-based threshold approach based on Fujii’s > idea [1]: keep sleeping for 1ms until the total sleep time reaches 10 > seconds, then start exponential backoff (doubling the sleep duration > each cycle) up to a maximum of 10 seconds per sleep. This balances > responsiveness for normal operations (which typically complete within > seconds) against CPU efficiency for the long waits in some logical > replication scenarios. ISTM that this is going to wrong way - the real problem is that we seem to have extended periods where XactLockTableWait() doesn't actually work, not that the sleep time is too short. The sleep in XactLockTableWait() was intended to address a very short race, not something that's essentially unbound. Greetings, Andres Freund
On 2025/06/24 1:32, Xuneng Zhou wrote: > Hi, > > > Here's patch version 4. > > > 1. The problem > > I conducted further investigation on this issue. The 1ms sleep in XactLockTableWait that falls back to polling was notproblematic in certain scenarios prior to v16. It became an potential issue after the "Allow logical decoding on standbys"feature was introduced [1]. After that commit, we can now create logical replication slots on hot standby serversand perform many other logical decoding operations. [2] > > > > 2. The analysis > > Is this issue limited to calling sites related to logical decoding? I believe so. As we've discussed, the core issue isthat primary transactions are not running locally on the standby, which breaks the assumption of XactLockTableWait—thatthere is an xid lock to wait on. Other call stacks of this function, except those from logical decoding,are unlikely to encounter this problem because they are unlikely to be invoked from standby servers. > > > Analysis of XactLockTableWait calling sites: > > > Problematic Case (Hot Standby): > > - Logical decoding operations in snapbuild.c during snapshot building > > - This is thecase that can run on hot standby and experience the issue > > > Non-problematic Cases (Primary Only): > > - Heap operations (heapam.c): DML operations not possible on read-only standby > > - B-tree operations (nbtinsert.c): Index modifications impossible on standby > > - Logical apply workers (execReplication.c): Cannot start during recovery (BgWorkerStart_RecoveryFinished) > > - All other cases: Require write operations unavailable on standby > > > > 3. The proposed solution > > If the above analysis is sound, one potential fix would be to add separate branching for standby in XactLockTableWait.However, this seems inconsistent with the function's definition—there's simply no lock entry in the locktable for waiting. We could implement a new function for this logic, To be honest, I'm fine with v3, since it only increases the sleep time after 5000 loop iterations, which has negligible performance impact. But if these functions aren't intended to be used during recovery and the loop shouldn't reach that many iterations, I'm also okay with the v4 approach of introducing a separate function specifically for recovery. But this amakes me wonder if we should add something like Assert(!RecoveryInProgress()) to those two functions, to prevent them from being used during recovery in the future. > but it's hard to find a proper place for it to fit well: lmgr.c is for locks, while standby.c or procarray.c are not thatideal either. Therefore, I placed the special handling for standby in SnapBuildWaitSnapshot. Since the purpose and logic of the new function are similar to XactLockTableWait(), I think it would be better to place it nearby in lmgr.c, even though it doesn't handle a lock directly. That would help keep the related logic together and improve readability. Regards, -- Fujii Masao NTT DATA Japan Corporation
Hi, On 2025-07-02 22:55:16 +0900, Fujii Masao wrote: > On 2025/06/24 1:32, Xuneng Zhou wrote: > > 3. The proposed solution > > > > If the above analysis is sound, one potential fix would be to add > > separate branching for standby in XactLockTableWait. However, this seems > > inconsistent with the function's definition—there's simply no lock entry > > in the lock table for waiting. We could implement a new function for > > this logic, > > To be honest, I'm fine with v3, since it only increases the sleep time > after 5000 loop iterations, which has negligible performance impact. I think this is completely the wrong direction. We should make XactLockTableWait() on standbys, not make the polling smarter. I think neither v3 nor v4 are viable patches. Greetings, Andres Freund
On 2025/07/02 23:04, Andres Freund wrote: > Hi, > > On 2025-07-02 22:55:16 +0900, Fujii Masao wrote: >> On 2025/06/24 1:32, Xuneng Zhou wrote: >>> 3. The proposed solution >>> >>> If the above analysis is sound, one potential fix would be to add >>> separate branching for standby in XactLockTableWait. However, this seems >>> inconsistent with the function's definition—there's simply no lock entry >>> in the lock table for waiting. We could implement a new function for >>> this logic, >> >> To be honest, I'm fine with v3, since it only increases the sleep time >> after 5000 loop iterations, which has negligible performance impact. > > I think this is completely the wrong direction. We should make > XactLockTableWait() on standbys, not make the polling smarter. On standby, XactLockTableWait() can enter a busy loop with 1ms sleeps. But are you suggesting that this doesn't need to be addressed? Or do you have another idea for how to handle it? Regards, -- Fujii Masao NTT DATA Japan Corporation
Hi, On July 2, 2025 10:15:09 AM EDT, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > >On 2025/07/02 23:04, Andres Freund wrote: >> Hi, >> >> On 2025-07-02 22:55:16 +0900, Fujii Masao wrote: >>> On 2025/06/24 1:32, Xuneng Zhou wrote: >>>> 3. The proposed solution >>>> >>>> If the above analysis is sound, one potential fix would be to add >>>> separate branching for standby in XactLockTableWait. However, this seems >>>> inconsistent with the function's definition—there's simply no lock entry >>>> in the lock table for waiting. We could implement a new function for >>>> this logic, >>> >>> To be honest, I'm fine with v3, since it only increases the sleep time >>> after 5000 loop iterations, which has negligible performance impact. >> >> I think this is completely the wrong direction. We should make >> XactLockTableWait() on standbys, not make the polling smarter. > >On standby, XactLockTableWait() can enter a busy loop with 1ms sleeps. Right. >But are you suggesting that this doesn't need to be addressed? No. >Or do you have another idea for how to handle it? We have all the information to make it work properly on standby. I've not find through the code to figure out not, but that'swhat needs to happen, instead on putting on another layer of hacks. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
On 2025/07/02 23:19, Andres Freund wrote: > Hi, > > On July 2, 2025 10:15:09 AM EDT, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> >> >> On 2025/07/02 23:04, Andres Freund wrote: >>> Hi, >>> >>> On 2025-07-02 22:55:16 +0900, Fujii Masao wrote: >>>> On 2025/06/24 1:32, Xuneng Zhou wrote: >>>>> 3. The proposed solution >>>>> >>>>> If the above analysis is sound, one potential fix would be to add >>>>> separate branching for standby in XactLockTableWait. However, this seems >>>>> inconsistent with the function's definition—there's simply no lock entry >>>>> in the lock table for waiting. We could implement a new function for >>>>> this logic, >>>> >>>> To be honest, I'm fine with v3, since it only increases the sleep time >>>> after 5000 loop iterations, which has negligible performance impact. >>> >>> I think this is completely the wrong direction. We should make >>> XactLockTableWait() on standbys, not make the polling smarter. >> >> On standby, XactLockTableWait() can enter a busy loop with 1ms sleeps. > > Right. > >> But are you suggesting that this doesn't need to be addressed? > > No. > >> Or do you have another idea for how to handle it? > > We have all the information to make it work properly on standby. I've not find through the code to figure out not, butthat's what needs to happen, instead on putting on another layer of hacks. Sorry, maybe I failed to get your point... Could you explain your idea or reasoning in a bit more detail? Regards, -- Fujii Masao NTT DATA Japan Corporation
To be honest, I'm fine with v3, since it only increases the sleep time
after 5000 loop iterations, which has negligible performance impact.
But if these functions aren't intended to be used during recovery and
the loop shouldn't reach that many iterations, I'm also okay with
the v4 approach of introducing a separate function specifically for recovery.
But this amakes me wonder if we should add something like
Assert(!RecoveryInProgress()) to those two functions, to prevent them
from being used during recovery in the future.
> but it's hard to find a proper place for it to fit well: lmgr.c is for locks, while standby.c or procarray.c are not that ideal either. Therefore, I placed the special handling for standby in SnapBuildWaitSnapshot.
Since the purpose and logic of the new function are similar to
XactLockTableWait(), I think it would be better to place it nearby
in lmgr.c, even though it doesn't handle a lock directly. That would
help keep the related logic together and improve readability.
Hi,
>>> On 2025-07-02 22:55:16 +0900, Fujii Masao wrote:
>>>> On 2025/06/24 1:32, Xuneng Zhou wrote:
>>>>> 3. The proposed solution
>>>>>
>>>>> If the above analysis is sound, one potential fix would be to add
>>>>> separate branching for standby in XactLockTableWait. However, this seems
>>>>> inconsistent with the function's definition—there's simply no lock entry
>>>>> in the lock table for waiting. We could implement a new function for
>>>>> this logic,
>>>>
>>>> To be honest, I'm fine with v3, since it only increases the sleep time
>>>> after 5000 loop iterations, which has negligible performance impact.
>>>
>>> I think this is completely the wrong direction. We should make
>>> XactLockTableWait() on standbys, not make the polling smarter.
>>
>> On standby, XactLockTableWait() can enter a busy loop with 1ms sleeps.
>
> Right.
>
>> But are you suggesting that this doesn't need to be addressed?
>
> No.
>
>> Or do you have another idea for how to handle it?
>
> We have all the information to make it work properly on standby. I've not find through the code to figure out not, but that's what needs to happen, instead on putting on another layer of hacks.
Sorry, maybe I failed to get your point...
Could you explain your idea or reasoning in a bit more detail?My take is that XactLockTableWait() isn’t really designed to work on standby. Instead of adding another layer on top like v4 did, maybe we can tweak the function itself to support standby. One possible approach could be to add a check like RecoveryInProgress() to handle the logic when running on a standby.
After thinking about this further, Andres's suggestion might be replacing polling(whether smart or not) with event-driven like waiting in XactLockTableWait. To achieve this, implementing a new notification mechanism for standby servers seems to be required. From what I can observe, the codebase appears to lack IPC infrastructure for waiting on remote transaction completion and receiving notifications when those transactions finish. I'm not familiar with this area, so additional inputs would be very helpful here.