Re: Optimize LISTEN/NOTIFY - Mailing list pgsql-hackers

From Chao Li
Subject Re: Optimize LISTEN/NOTIFY
Date
Msg-id 290910DE-9A03-4AE6-B348-073D5DA96ACC@gmail.com
Whole thread Raw
In response to Re: Optimize LISTEN/NOTIFY  ("Joel Jacobson" <joel@compiler.org>)
List pgsql-hackers

> On Oct 29, 2025, at 18:33, Joel Jacobson <joel@compiler.org> wrote:
>
> On Wed, Oct 29, 2025, at 08:05, Chao Li wrote:
>>
>> I think the current implementation still has a race problem.
>>
>> Let’s say notifier N1 notifies listener’s L1 to read message.
>> L1 starts to read: it acquires the look, gets reading range, then
>> releases the lock, start performs reading without holding the lock.
>> Notifier N2 comes, N2 doesn’t have anything L1 is interested in. N2 now
>> holds the look, when it checks "if (QUEUE_POS_EQUAL(pos,
>> queueHeadBeforeWrite))”, here comes the race. Because the lock is in
>> N2’s hand, L1 cannot get the lock to update its pos, so "if
>> (QUEUE_POS_EQUAL(pos, queueHeadBeforeWrite))” will not be satisfied, so
>> direct advancement won’t happen.
>
> I'm not sure I agree that qualifies as a race "problem" per se, since I
> think that just sounds like a case where we would do an unnecessary
> wakeup, right?
>

Why unnecessary? Say there are 100 listeners L1 - L100. When N2 is checking state of L1, L100 has finished reading,
ideallyL100 should update its pos, then when N2 reaches L100, it should do direct advancement, right? 

But now the problem is, we use a single notification lock to handle all notifiers and listeners. Assume if every
backendprocess has a notification lock, then the race is no longer there. When N2 is checking state of L1, it just
holdsL1’s lock, so L100 can go ahead update its pos, then when N2 reaches L100, N2 can do direct advancement. 

I ever thought to propose to use a lock for every backend process, but I didn’t, because a lock is underlying an
expensivesemaphore, if there are hundreds of backends, adding the same number of semaphores doesn’t seem a good thing,
whichwould be a too many overheads to the system. 

> Without more sophisticated data structures (e.g. skip ranges) and
> increased code complexity, there will always be cases where we will by
> do unnecessary wakeups, which IMO need not be a design goal to
> completely avoid, until we have benchmark data that indicates otherwise.
>

The other problem I see is that, we don’t have a way to evaluate if the “direct advancement” is really effective, such
as1) if a case that can perform “direct advancement” is really applied the advancement; 2) in a test model, how many
“directadvancement” are applied. 

> I think we should iterate by first trying to reason about correctness of
> the code, trying to prove/disprove if a notifications could ever end up
> not being delivered. The bug I fixed in v22 is an example of such a
> case, that would cause a listening backend to never be awaken, since
> notifiers would not signal it due to the pending wake that was not
> cleared.
>
> I wonder if there could be more such serious bugs in the current code. I
> will focus my efforts now trying to answer that question. Would be
> really nice if we could find a way to reason formally about this. I've
> been looking into the P programming language, which seems suitable for
> modeling and verifying these kind of asynchronous concurrency protocols,
> I will give it a try.
>

I don’t think we need to rush. From my observation, none of the “big” patches can get merged quickly anyway. Rather
thanhurrying to make it “ready,” I think it’s better to take the time to make it “perfect”. I have also spent a lot of
timeon this patch, and I don’t mind to spend more. If you need a hand, I will be happy to offer. 

TBH, with all the problems I described earlier still in my brain, I just cannot convince myself to let this patch go
yet.Sorry about that. 

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Next
From: shveta malik
Date:
Subject: Re: Report bytes and transactions actually sent downtream