Re: Optimize LISTEN/NOTIFY - Mailing list pgsql-hackers

From Joel Jacobson
Subject Re: Optimize LISTEN/NOTIFY
Date
Msg-id af75d742-1b74-43aa-8777-e1de7a36fdba@app.fastmail.com
Whole thread Raw
In response to Re: Optimize LISTEN/NOTIFY  (Rishu Bagga <rishu.postgres@gmail.com>)
List pgsql-hackers
On Wed, Jul 16, 2025, at 02:20, Rishu Bagga wrote:
> Hi Joel,
>
> Thanks for sharing the patch.
> I have a few questions based on a cursory first look.
>
>> If a single listener is found, we signal only that backend.
>> Otherwise, we fall back to the existing broadcast behavior.
>
> The idea of not wanting to wake up all backends makes sense to me,
> but I don’t understand why we want this optimization only for the case
> where there is a single backend listening on a channel.
>
> Is there a pattern of usage in LISTEN/NOTIFY where users typically
> have either just one or several backends listening on a channel?
>
> If we are doing this optimization, why not maintain a list of backends
> for each channel, and only wake up those channels?

Thanks for the thoughtful question. You've hit on the central design trade-off
in this optimization: how to provide targeted signaling for some workloads
without degrading performance for others.

While we don't have telemetry on real-world usage patterns of LISTEN/NOTIFY,
it seems likely that most applications fall into one of three categories,
which I've been thinking of in networking terms:

1. Broadcast-style ("hub mode")

Many backends listening on the *same* channel (e.g., for cache invalidation).
The current implementation is already well-optimized for this, behaving like
an Ethernet hub that broadcasts to all ports. Waking all listeners is efficient
because they all need the message.

2. Targeted notifications ("switch mode")

Each backend listens on its own private channel (e.g., for session events or
worker queues). This is where the current implementation scales poorly, as every
NOTIFY wakes up all listeners regardless of relevance. My patch is designed
to make this behave like an efficient Ethernet switch.

3. Selective multicast-style ("group mode")

A subset of backends shares a channel, but not all. This is the tricky middle
ground. Your question, "why not maintain a list of backends for each channel,
and only wake up those channels?" is exactly the right one to ask.
A full listener list seems like the obvious path to optimizing for *all* cases.
However, the devil is in the details of concurrency and performance. Managing
such a list would require heavier locking, which would create a new bottleneck
and degrade the scalability of LISTEN/UNLISTEN operations—especially for
the "hub mode" case where many backends rapidly subscribe to the same popular
channel.

This patch makes a deliberate architectural choice:
Prioritize a massive, low-risk win for "switch mode" while rigorously protecting
the performance of "hub mode".

It introduces a targeted fast path for single-listener channels and cleanly
falls back to the existing, well-performing broadcast model for everything else.

This brings us back to "group mode", which remains an open optimization problem.
A possible approach could be to track listeners up to a small threshold *K*
(e.g., store up to 4 ProcNumber's in the hash entry). If the count exceeds *K*,
we would flip a "broadcast" flag and revert to hub-mode behavior.

However, this path has a critical drawback:

1. Performance Penalty for Hub Mode

With the current patch, after the second listener joins a channel,
the has_multiple_listeners flag is set. Every subsequent listener can acquire
a shared lock, see the flag is true, and immediately continue. This is
a highly concurrent, read-only operation that does not require mutating shared
state.

In contrast, the K-listener approach would force every new listener (from the
third up to the K-th) to acquire an exclusive lock to mutate the shared
listener array**. This would serialize LISTEN operations on popular channels,
creating the very contention point this patch successfully avoids and directly
harming the hub-mode use case that currently works well.

2. Uncertainty

Compounding this, without clear data on typical "group" sizes, choosing a value
for *K* is a shot in the dark. A small *K* might not help much, while
a large *K* would increase the shared memory footprint and worsen the
serialization penalty.

For these reasons, attempting to build a switch that also optimizes for
multicast risks undermining the architectural clarity and performance of
both the switch and hub models.

This patch, therefore, draws a clean line. It provides a precise,
low-cost path for switch-mode workloads and preserves the existing,
well-performing path for hub-mode workloads. While this leaves "group mode"
unoptimized for now, it ensures we make two common use cases better without
making any use case worse. The new infrastructure is flexible, leaving
the door open should a better approach for "group mode" emerge in
the future—one that doesn't compromise the other two.

Benchmarks updated showing master vs 0001-optimize_listen_notify-v3.patch:
https://github.com/joelonsql/pg-bench-listen-notify/raw/master/plot.png
https://github.com/joelonsql/pg-bench-listen-notify/raw/master/performance_overview_connections_equal_jobs.png
https://github.com/joelonsql/pg-bench-listen-notify/raw/master/performance_overview_fixed_connections.png

I've not included the benchmark CSV data in this mail, since it's quite heavy,
160kB, and I couldn't see any significant performance changes since v2.

/Joel



pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: Logical Replication of sequences
Next
From: Yugo Nagata
Date:
Subject: Re: Extend ALTER DEFAULT PRIVILEGES for large objects