Thread: Add progressive backoff to XactLockTableWait functions

Add progressive backoff to XactLockTableWait functions

From
Xuneng Zhou
Date:
Hi hackers,

This patch implements progressive backoff in XactLockTableWait() and
ConditionalXactLockTableWait().

As Kevin reported in this thread [1], XactLockTableWait() can enter a
tight polling loop during logical replication slot creation on standby
servers, sleeping for fixed 1ms intervals that can continue for a long
time. This creates significant CPU overhead.

The patch implements a time-based threshold approach based on Fujii’s
idea [1]: keep sleeping for 1ms until the total sleep time reaches 10
seconds, then start exponential backoff (doubling the sleep duration
each cycle) up to a maximum of 10 seconds per sleep. This balances
responsiveness for normal operations (which typically complete within
seconds) against CPU efficiency for the long waits in some logical
replication scenarios.

[1] https://www.postgresql.org/message-id/flat/CAM45KeELdjhS-rGuvN%3DZLJ_asvZACucZ9LZWVzH7bGcD12DDwg%40mail.gmail.com

Best regards,
Xuneng

Attachment

Re: Add progressive backoff to XactLockTableWait functions

From
Fujii Masao
Date:

On 2025/06/08 23:33, Xuneng Zhou wrote:
> Hi hackers,
>
> This patch implements progressive backoff in XactLockTableWait() and
> ConditionalXactLockTableWait().
>
> As Kevin reported in this thread [1], XactLockTableWait() can enter a
> tight polling loop during logical replication slot creation on standby
> servers, sleeping for fixed 1ms intervals that can continue for a long
> time. This creates significant CPU overhead.
>
> The patch implements a time-based threshold approach based on Fujii’s
> idea [1]: keep sleeping for 1ms until the total sleep time reaches 10
> seconds, then start exponential backoff (doubling the sleep duration
> each cycle) up to a maximum of 10 seconds per sleep. This balances
> responsiveness for normal operations (which typically complete within
> seconds) against CPU efficiency for the long waits in some logical
> replication scenarios.

Thanks for the patch!

When I first suggested this idea, I used 10s as an example for
the maximum sleep time. But thinking more about it now, 10s might
be too long. Even if the target transaction has already finished,
XactLockTableWait() could still wait up to 10 seconds,
which seems excessive.

What about using 1s instead? That value is already used as a max
sleep time in other places, like WaitExceedsMaxStandbyDelay().

If we agree on 1s as the max, then using exponential backoff from
1ms to 1s after the threshold might not be necessary. It might
be simpler and sufficient to just sleep for 1s once we hit
the threshold.

Based on that, I think a change like the following could work well.
Thought?

----------------------------------------
         XactLockTableWaitInfo info;
         ErrorContextCallback callback;
         bool            first = true;
+       int             left_till_hibernate = 5000;

<snip>

                 if (!first)
                 {
                         CHECK_FOR_INTERRUPTS();
-                       pg_usleep(1000L);
+
+                       if (left_till_hibernate > 0)
+                       {
+                               pg_usleep(1000L);
+                               left_till_hibernate--;
+                       }
+                       else
+                               pg_usleep(1000000L);
----------------------------------------

Regards,

--
Fujii Masao
NTT DATA Japan Corporation




Re: Add progressive backoff to XactLockTableWait functions

From
Xuneng Zhou
Date:
Hi,

Thanks for the feedback! 

On Thu, Jun 12, 2025 at 10:02 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:


When I first suggested this idea, I used 10s as an example for
the maximum sleep time. But thinking more about it now, 10s might
be too long. Even if the target transaction has already finished,
XactLockTableWait() could still wait up to 10 seconds,
which seems excessive.

+1, this could be a problem 
 
What about using 1s instead? That value is already used as a max
sleep time in other places, like WaitExceedsMaxStandbyDelay().

1s should be generally good
 
If we agree on 1s as the max, then using exponential backoff from
1ms to 1s after the threshold might not be necessary. It might
be simpler and sufficient to just sleep for 1s once we hit
the threshold.

That makes sense to me.

Based on that, I think a change like the following could work well.
Thought?

I'll update the patch accordingly.

Best regards,
Xuneng

Re: Add progressive backoff to XactLockTableWait functions

From
Xuneng Zhou
Date:
Hi,

Although it’s clear that replacing tight 1 ms polling loops will reduce CPU usage, I'm curious about the hard numbers. To that end, I ran a 60 s logical-replication slot–creation workload on a standby using three different XactLockTableWait() variants—on an 8-core, 16 GB AMD system—and collected both profiling traces and hardware-counter metrics. 


1. Hardware‐counter results


image.png

  • CPU cycles drop by 58% moving from 1 ms to exp. backoff, and another 25% to the 1 s threshold variant.
  • Cache‐misses and context‐switches see similarly large reductions.
  • IPC remains around 0.45, dipping slightly under longer sleeps.

2. Flame‐graph 
See attached files

Best regards, 
Xuneng

Attachment

Re: Add progressive backoff to XactLockTableWait functions

From
Andres Freund
Date:
Hi,

On 2025-06-08 22:33:39 +0800, Xuneng Zhou wrote:
> This patch implements progressive backoff in XactLockTableWait() and
> ConditionalXactLockTableWait().
> 
> As Kevin reported in this thread [1], XactLockTableWait() can enter a
> tight polling loop during logical replication slot creation on standby
> servers, sleeping for fixed 1ms intervals that can continue for a long
> time. This creates significant CPU overhead.
> 
> The patch implements a time-based threshold approach based on Fujii’s
> idea [1]: keep sleeping for 1ms until the total sleep time reaches 10
> seconds, then start exponential backoff (doubling the sleep duration
> each cycle) up to a maximum of 10 seconds per sleep. This balances
> responsiveness for normal operations (which typically complete within
> seconds) against CPU efficiency for the long waits in some logical
> replication scenarios.

ISTM that this is going to wrong way - the real problem is that we seem to
have extended periods where XactLockTableWait() doesn't actually work, not
that the sleep time is too short.  The sleep in XactLockTableWait() was
intended to address a very short race, not something that's essentially
unbound.

Greetings,

Andres Freund



Re: Add progressive backoff to XactLockTableWait functions

From
Fujii Masao
Date:

On 2025/06/24 1:32, Xuneng Zhou wrote:
> Hi,
> 
> 
> Here's patch version 4.
> 
> 
> 1. The problem
> 
> I conducted further investigation on this issue. The 1ms sleep in XactLockTableWait that falls back to polling was
notproblematic in certain scenarios prior to v16. It became an potential issue after the "Allow logical decoding on
standbys"feature was introduced [1]. After that commit, we can now create logical replication slots on hot standby
serversand perform many other logical decoding operations. [2]
 
> 
> 
> 
> 2. The analysis
> 
> Is this issue limited to calling sites related to logical decoding? I believe so. As we've discussed, the core issue
isthat primary transactions are not running locally on the standby, which breaks the assumption of
XactLockTableWait—thatthere is an xid lock to wait on. Other call stacks of this function, except those from logical
decoding,are unlikely to encounter this problem because they are unlikely to be invoked from standby servers.
 
> 
> 
> Analysis of XactLockTableWait calling sites:
> 
> 
> Problematic Case (Hot Standby):
> 
> - Logical decoding operations in snapbuild.c during snapshot building
> 
> - This is thecase that can run on hot standby and experience the issue
> 
> 
> Non-problematic Cases (Primary Only):
> 
> - Heap operations (heapam.c): DML operations not possible on read-only standby
> 
> - B-tree operations (nbtinsert.c): Index modifications impossible on standby
> 
> - Logical apply workers (execReplication.c): Cannot start during recovery (BgWorkerStart_RecoveryFinished)
> 
> - All other cases: Require write operations unavailable on standby
> 
> 
> 
> 3. The proposed solution
> 
> If the above analysis is sound, one potential fix would be to add separate branching for standby in
XactLockTableWait.However, this seems inconsistent with the function's definition—there's simply no lock entry in the
locktable for waiting. We could implement a new function for this logic, 
 

To be honest, I'm fine with v3, since it only increases the sleep time
after 5000 loop iterations, which has negligible performance impact.
But if these functions aren't intended to be used during recovery and
the loop shouldn't reach that many iterations, I'm also okay with
the v4 approach of introducing a separate function specifically for recovery.

But this amakes me wonder if we should add something like
Assert(!RecoveryInProgress()) to those two functions, to prevent them
from being used during recovery in the future.

> but it's hard to find a proper place for it to fit well: lmgr.c is for locks, while standby.c or procarray.c are not
thatideal either. Therefore, I placed the special handling for standby in SnapBuildWaitSnapshot.
 

Since the purpose and logic of the new function are similar to
XactLockTableWait(), I think it would be better to place it nearby
in lmgr.c, even though it doesn't handle a lock directly. That would
help keep the related logic together and improve readability.

Regards,

-- 
Fujii Masao
NTT DATA Japan Corporation




Re: Add progressive backoff to XactLockTableWait functions

From
Andres Freund
Date:
Hi,

On 2025-07-02 22:55:16 +0900, Fujii Masao wrote:
> On 2025/06/24 1:32, Xuneng Zhou wrote:
> > 3. The proposed solution
> > 
> > If the above analysis is sound, one potential fix would be to add
> > separate branching for standby in XactLockTableWait. However, this seems
> > inconsistent with the function's definition—there's simply no lock entry
> > in the lock table for waiting. We could implement a new function for
> > this logic,
> 
> To be honest, I'm fine with v3, since it only increases the sleep time
> after 5000 loop iterations, which has negligible performance impact.

I think this is completely the wrong direction. We should make
XactLockTableWait() on standbys, not make the polling smarter.

I think neither v3 nor v4 are viable patches.

Greetings,

Andres Freund



Re: Add progressive backoff to XactLockTableWait functions

From
Fujii Masao
Date:

On 2025/07/02 23:04, Andres Freund wrote:
> Hi,
>
> On 2025-07-02 22:55:16 +0900, Fujii Masao wrote:
>> On 2025/06/24 1:32, Xuneng Zhou wrote:
>>> 3. The proposed solution
>>>
>>> If the above analysis is sound, one potential fix would be to add
>>> separate branching for standby in XactLockTableWait. However, this seems
>>> inconsistent with the function's definition—there's simply no lock entry
>>> in the lock table for waiting. We could implement a new function for
>>> this logic,
>>
>> To be honest, I'm fine with v3, since it only increases the sleep time
>> after 5000 loop iterations, which has negligible performance impact.
>
> I think this is completely the wrong direction. We should make
> XactLockTableWait() on standbys, not make the polling smarter.

On standby, XactLockTableWait() can enter a busy loop with 1ms sleeps.
But are you suggesting that this doesn't need to be addressed?
Or do you have another idea for how to handle it?

Regards,

--
Fujii Masao
NTT DATA Japan Corporation




Re: Add progressive backoff to XactLockTableWait functions

From
Andres Freund
Date:
Hi,

On July 2, 2025 10:15:09 AM EDT, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
>
>On 2025/07/02 23:04, Andres Freund wrote:
>> Hi,
>>
>> On 2025-07-02 22:55:16 +0900, Fujii Masao wrote:
>>> On 2025/06/24 1:32, Xuneng Zhou wrote:
>>>> 3. The proposed solution
>>>>
>>>> If the above analysis is sound, one potential fix would be to add
>>>> separate branching for standby in XactLockTableWait. However, this seems
>>>> inconsistent with the function's definition—there's simply no lock entry
>>>> in the lock table for waiting. We could implement a new function for
>>>> this logic,
>>>
>>> To be honest, I'm fine with v3, since it only increases the sleep time
>>> after 5000 loop iterations, which has negligible performance impact.
>>
>> I think this is completely the wrong direction. We should make
>> XactLockTableWait() on standbys, not make the polling smarter.
>
>On standby, XactLockTableWait() can enter a busy loop with 1ms sleeps.

Right.

>But are you suggesting that this doesn't need to be addressed?

No.

>Or do you have another idea for how to handle it?

We have all the information to make it work properly on standby. I've not find through the code to figure out not, but
that'swhat needs to happen, instead on putting on another layer of hacks. 

Andres

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: Add progressive backoff to XactLockTableWait functions

From
Fujii Masao
Date:

On 2025/07/02 23:19, Andres Freund wrote:
> Hi,
>
> On July 2, 2025 10:15:09 AM EDT, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>
>>
>> On 2025/07/02 23:04, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2025-07-02 22:55:16 +0900, Fujii Masao wrote:
>>>> On 2025/06/24 1:32, Xuneng Zhou wrote:
>>>>> 3. The proposed solution
>>>>>
>>>>> If the above analysis is sound, one potential fix would be to add
>>>>> separate branching for standby in XactLockTableWait. However, this seems
>>>>> inconsistent with the function's definition—there's simply no lock entry
>>>>> in the lock table for waiting. We could implement a new function for
>>>>> this logic,
>>>>
>>>> To be honest, I'm fine with v3, since it only increases the sleep time
>>>> after 5000 loop iterations, which has negligible performance impact.
>>>
>>> I think this is completely the wrong direction. We should make
>>> XactLockTableWait() on standbys, not make the polling smarter.
>>
>> On standby, XactLockTableWait() can enter a busy loop with 1ms sleeps.
>
> Right.
>
>> But are you suggesting that this doesn't need to be addressed?
>
> No.
>
>> Or do you have another idea for how to handle it?
>
> We have all the information to make it work properly on standby. I've not find through the code to figure out not,
butthat's what needs to happen, instead on putting on another layer of hacks. 

Sorry, maybe I failed to get your point...
Could you explain your idea or reasoning in a bit more detail?

Regards,

--
Fujii Masao
NTT DATA Japan Corporation




Re: Add progressive backoff to XactLockTableWait functions

From
Xuneng Zhou
Date:
Thanks for the feedbacks!

To be honest, I'm fine with v3, since it only increases the sleep time
after 5000 loop iterations, which has negligible performance impact.
But if these functions aren't intended to be used during recovery and
the loop shouldn't reach that many iterations, I'm also okay with
the v4 approach of introducing a separate function specifically for recovery.

But this amakes me wonder if we should add something like
Assert(!RecoveryInProgress()) to those two functions, to prevent them
from being used during recovery in the future.

> but it's hard to find a proper place for it to fit well: lmgr.c is for locks, while standby.c or procarray.c are not that ideal either. Therefore, I placed the special handling for standby in SnapBuildWaitSnapshot.

Since the purpose and logic of the new function are similar to
XactLockTableWait(), I think it would be better to place it nearby
in lmgr.c, even though it doesn't handle a lock directly. That would
help keep the related logic together and improve readability.


Now I see two possible approaches here. One is to extend XactLockTableWait(), and the other is to introduce a new wait function specifically for standby. For the first option, adding standby-specific logic might not align well with the function’s name or its original design. If we go with a new function, we might need to consider what other scenarios it could be reused in. If this is the only place it would apply, whether it’s worth introducing a separate function just for this case.

Best regards,
Xuneng

Re: Add progressive backoff to XactLockTableWait functions

From
Xuneng Zhou
Date:
Hi, 

On Thu, Jul 3, 2025 at 9:30 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
Hi, 

>>> On 2025-07-02 22:55:16 +0900, Fujii Masao wrote:
>>>> On 2025/06/24 1:32, Xuneng Zhou wrote:
>>>>> 3. The proposed solution
>>>>>
>>>>> If the above analysis is sound, one potential fix would be to add
>>>>> separate branching for standby in XactLockTableWait. However, this seems
>>>>> inconsistent with the function's definition—there's simply no lock entry
>>>>> in the lock table for waiting. We could implement a new function for
>>>>> this logic,
>>>>
>>>> To be honest, I'm fine with v3, since it only increases the sleep time
>>>> after 5000 loop iterations, which has negligible performance impact.
>>>
>>> I think this is completely the wrong direction. We should make
>>> XactLockTableWait() on standbys, not make the polling smarter.
>>
>> On standby, XactLockTableWait() can enter a busy loop with 1ms sleeps.
>
> Right.
>
>> But are you suggesting that this doesn't need to be addressed?
>
> No.
>
>> Or do you have another idea for how to handle it?
>
> We have all the information to make it work properly on standby. I've not find through the code to figure out not, but that's what needs to happen, instead on putting on another layer of hacks.

Sorry, maybe I failed to get your point...
Could you explain your idea or reasoning in a bit more detail?

My take is that XactLockTableWait() isn’t really designed to work on standby. Instead of adding another layer on top like v4 did, maybe we can tweak the function itself to support standby. One possible approach could be to add a check like RecoveryInProgress() to handle the logic when running on a standby. 


After thinking about this further, Andres's suggestion might be replacing polling(whether smart or not) with event-driven like waiting in XactLockTableWait. To achieve this, implementing a new notification mechanism for standby servers seems to be required. From what I can observe, the codebase appears to lack IPC infrastructure for waiting on remote transaction completion and receiving notifications when those transactions finish. I'm not familiar with this area, so additional inputs would be very helpful here.