Hi,
On Fri, Aug 8, 2025 at 7:06 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi, Tom!
>
> Thanks for looking at this.
>
> On Fri, Aug 8, 2025 at 2:20 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Xuneng Zhou <xunengzhou@gmail.com> writes:
> > > V9 replaces the original partitioned xid-wait htab with a single,
> > > unified one, reflecting the modest entry count and rare contention for
> > > waiting. To prevent possible races when multiple backends wait on the
> > > same XID for the first time in XidWaitOnStandby, a dedicated lock has
> > > been added to protect the hash table.
> >
> > This seems like adding quite a lot of extremely subtle code in
> > order to solve a very small problem. I thought the v1 patch
> > was about the right amount of complexity.
>
> Yeah, this patch is indeed complex, and the complexity might not be
> well-justified—given the current use cases, it feels like we’re paying
> a lot for very little. TBH, getting the balance right between
> efficiency gains and cost, in terms of both code complexity and
> runtime overhead, is beyond my current ability here, since I’m
> touching many parts of the code for the first time. Every time I
> thought I’d figured it out, new subtleties surfaced—though I’ve
> learned a lot from the exploration and hacking. We may agree on the
> necessity of fixing this issue, but not yet on how to fix it. I’m open
> to discussion and suggestions.
>
Some changes in v10:
1) XidWaitHashLock is used for all operations on XidWaitHash though
might be unnecessary for some cases.
2) Field pg_atomic_uint32 waiter_count was removed from the
XidWaitEntry. The start process now takes charge of cleaning up the
XidWaitHash entry after waking up processes.
3) pg_atomic_uint32 xidWaiterNum is added to avoid unnecessary lock
acquire & release and htab look-up while there's no xid waiting.
Hope this could eliminate some subtleties.
Exponential backoff in earlier patches is simple and effective for
alleviating cpu overhead in extended waiting; however it could also
bring unwanted latency for more sensitive use cases like logical
walsender on cascading standbys. Unfortunately, I am unable to come up
with a solution that is correct, effective and simple in all cases.
Best,
Xuneng