Re: Add progressive backoff to XactLockTableWait functions - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Add progressive backoff to XactLockTableWait functions |
Date | |
Msg-id | s6tzrpjevsjesgnsccoagowyhyq4zoz2dpiiq4gmtw4gh2e2xn@z7yjmqm4ldfk Whole thread Raw |
In response to | Re: Add progressive backoff to XactLockTableWait functions (Fujii Masao <masao.fujii@oss.nttdata.com>) |
List | pgsql-hackers |
On 2025-07-05 01:14:45 +0900, Fujii Masao wrote: > > > On 2025/07/04 17:57, Xuneng Zhou wrote: > > Hi, > > > > On Thu, Jul 3, 2025 at 9:30 AM Xuneng Zhou <xunengzhou@gmail.com <mailto:xunengzhou@gmail.com>> wrote: > > > > Hi, > > > > > > >>> On 2025-07-02 22:55:16 +0900, Fujii Masao wrote: > > >>>> On 2025/06/24 1:32, Xuneng Zhou wrote: > > >>>>> 3. The proposed solution > > >>>>> > > >>>>> If the above analysis is sound, one potential fix would be to add > > >>>>> separate branching for standby in XactLockTableWait. However, this seems > > >>>>> inconsistent with the function's definition—there's simply no lock entry > > >>>>> in the lock table for waiting. We could implement a new function for > > >>>>> this logic, > > >>>> > > >>>> To be honest, I'm fine with v3, since it only increases the sleep time > > >>>> after 5000 loop iterations, which has negligible performance impact. > > >>> > > >>> I think this is completely the wrong direction. We should make > > >>> XactLockTableWait() on standbys, not make the polling smarter. > > >> > > >> On standby, XactLockTableWait() can enter a busy loop with 1ms sleeps. > > > > > > Right. > > > > > >> But are you suggesting that this doesn't need to be addressed? > > > > > > No. > > > > > >> Or do you have another idea for how to handle it? > > > > > > We have all the information to make it work properly on standby. I've not find through the code to figureout not, but that's what needs to happen, instead on putting on another layer of hacks. > > > > Sorry, maybe I failed to get your point... > > Could you explain your idea or reasoning in a bit more detail? > > > > > > My take is that XactLockTableWait() isn’t really designed to work on standby. Instead of adding another layer ontop like v4 did, maybe we can tweak the function itself to support standby. One possible approach could be to add a checklike RecoveryInProgress() to handle the logic when running on a standby. > > > > > > After thinking about this further, Andres's suggestion might be replacing polling(whether smart or not) with event-drivenlike waiting in XactLockTableWait. To achieve this, implementing a new notification mechanism for standby serversseems to be required. From what I can observe, the codebase appears to lack IPC infrastructure for waiting on remotetransaction completion and receiving notifications when those transactions finish. I'm not familiar with this area,so additional inputs would be very helpful here. > > Your guess might be right, or maybe not. It's hard for me to say for sure. > It seems better to wait for Andres to explain his idea in more detail, > rather than trying to guess... My position is basically: 1) We should *never* add new long-duration polling loops to postgres. We've regretted it every time. It just ends up masking bugs and biting us in scenarios we didn't predict (increased wakeups increasing power usage, increased latency because our more eager wakeup mechanisms were racy). 2) We should try rather hard to not even have any new very short lived polling code. The existing code in XactLockTableWait() isn't great, even on the primary, but the window during the polling addresses is really short, so it's *kinda* acceptable. 3) There are many ways to address the XactLockTableWait() issue here. One way would be to simply make XactLockTableWait() work on standbys, by maintaining the lock table. Another would be to teach it to add some helper to procarray.c that allows XactLockTableWait() to work based on the KnownAssignedXids machinery. I don't have a clear preference for how to make this work in a non-polling way. But it's clear to me that making it poll smarter is the completely wrong direction. Greetings, Andres Freund
pgsql-hackers by date: