Re: Optimize LISTEN/NOTIFY - Mailing list pgsql-hackers

From Chao Li
Subject Re: Optimize LISTEN/NOTIFY
Date
Msg-id 0BCA1C2D-B92C-459E-B1A6-6D06BA4C62CF@gmail.com
Whole thread Raw
In response to Re: Optimize LISTEN/NOTIFY  (Arseniy Mukhin <arseniy.mukhin.dev@gmail.com>)
Responses Re: Optimize LISTEN/NOTIFY
List pgsql-hackers

> On Oct 23, 2025, at 18:02, Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote:
>
> Hi,
>
> On Thu, Oct 23, 2025 at 11:17 AM Chao Li <li.evan.chao@gmail.com> wrote:
>>
>>
>>
>>> On Oct 21, 2025, at 00:43, Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote:
>>>
>>>
>>> I managed to reproduce the race with v20-alt3. I tried to write a TAP
>>> test reproducing the issue, so it was easier to validate changes.
>>> Please find the attached TAP test. I added it to some random package
>>> for simplicity.
>>>
>>
>> With alt3, as we have acquired the notification lock after reading every message to update the POS, I think we can
doa little bit more optimization: 
>>
>> The notifier: in SignalBackend()
>>    * Now we check if a listener’s pos equals to beforeWritePos, then we do “directly advancement”
>>    * We can change to if a listener’s pos is between beforeWritePos and afterWritePos, then we can do the
advancement.
>>
>> The listener: in asyncQueueReadAllNotifications():
>>    * With alt3, we only lock and update pos
>>    * We can do more. If current pos in shared memory is after that local pos, then meaning some notifier has done an
advancement,so it can stop reading. 
>>
>
> I think this would be a reasonable optimization if it weren't for the
> race condition mentioned above. The problem is that if the local pos
> lags behind the shared memory pos, it could point to a truncated queue
> segment, so we shouldn't allow that.
>

I figured out a way to resolve the race condition for alt3:

* add an awakening flag for every listener, this flag is only set by listeners
* add an advisory pos for every listener, similar to alt1
* if a listener is awaken, notify only updates the listener’s advisory pos; otherwise directly advance its position.
* If a running listener see current pos is behind advisory pos, then stop reading

See more details in attach patch file, I added code comments for my changes. Now the TAP test won’t hit the race
condition. 
```
# +++ tap check in src/test/authentication +++
t/008_listen-pos-race.pl .. skipped: Injection points not supported by this build
Files=1, Tests=0,  0 wallclock secs ( 0.00 usr  0.00 sys +  0.03 cusr  0.01 csys =  0.04 CPU)
Result: NOTESTS
```

And with my solution, listeners longer will still use local pos, so that no longer need to acquire notification lock in
everyloop. 

The patch stack is: v20 patch -> alt3 patch -> tap patch -> my patch. Please see if my solution works.

I also made a tiny change in the TAP script to allow it to terminate gracefully.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/






Attachment

pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: another autovacuum scheduling thread
Next
From: "Joel Jacobson"
Date:
Subject: Re: Optimize LISTEN/NOTIFY