Re: Optimize LISTEN/NOTIFY - Mailing list pgsql-hackers

From Arseniy Mukhin
Subject Re: Optimize LISTEN/NOTIFY
Date
Msg-id CAE7r3MK-3AOdh1mpZ8hw9h6F_i0D5RMoAy7CttnfCJRpB8GJDA@mail.gmail.com
Whole thread Raw
In response to Re: Optimize LISTEN/NOTIFY  (Chao Li <li.evan.chao@gmail.com>)
List pgsql-hackers
Hi,

On Thu, Oct 23, 2025 at 11:17 AM Chao Li <li.evan.chao@gmail.com> wrote:
>
>
>
> > On Oct 21, 2025, at 00:43, Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote:
> >
> >
> > I managed to reproduce the race with v20-alt3. I tried to write a TAP
> > test reproducing the issue, so it was easier to validate changes.
> > Please find the attached TAP test. I added it to some random package
> > for simplicity.
> >
>
> With alt3, as we have acquired the notification lock after reading every message to update the POS, I think we can do
alittle bit more optimization: 
>
> The notifier: in SignalBackend()
>     * Now we check if a listener’s pos equals to beforeWritePos, then we do “directly advancement”
>     * We can change to if a listener’s pos is between beforeWritePos and afterWritePos, then we can do the
advancement.
>
> The listener: in asyncQueueReadAllNotifications():
>     * With alt3, we only lock and update pos
>     * We can do more. If current pos in shared memory is after that local pos, then meaning some notifier has done an
advancement,so it can stop reading. 
>

I think this would be a reasonable optimization if it weren't for the
race condition mentioned above. The problem is that if the local pos
lags behind the shared memory pos, it could point to a truncated queue
segment, so we shouldn't allow that.

> I tried to run your TAP test on my MacBook, but failed:
>
> ```
> t/008_listen-pos-race.pl .. Dubious, test returned 32 (wstat 8192, 0x2000)
> No subtests run
>
> Test Summary Report
> -------------------
> t/008_listen-pos-race.pl (Wstat: 8192 (exited 32) Tests: 0 Failed: 0)
>   Non-zero exit status: 32
>   Parse errors: No plan found in TAP output
> Files=1, Tests=0,  3 wallclock secs ( 0.01 usr  0.01 sys +  0.10 cusr  0.29 csys =  0.41 CPU)
> Result: FAIL
> ```
>
> I didn’t spend time debugging the problem. If you can figure the problem, maybe I can run the test from my side.
>

Thank you for trying the test. I think the test works for you as
expected, it should fail with error and I have the same error status.
Sorry, I failed to realize it could be confusing, probably it was
better to fail on some assert instead, but I thought error is enough
for temp reproducer. Please see 008_listen-pos-race_test.log for
details.


Best regards,
Arseniy Mukhin



pgsql-hackers by date:

Previous
From: Greg Sabino Mullane
Date:
Subject: Re: POC: Carefully exposing information without authentication
Next
From: Bertrand Drouvot
Date:
Subject: Re: Question about InvalidatePossiblyObsoleteSlot()