Re: Optimize LISTEN/NOTIFY - Mailing list pgsql-hackers

From Joel Jacobson
Subject Re: Optimize LISTEN/NOTIFY
Date
Msg-id 96f00bf1-cc9d-4520-9d02-9e14e7767c88@app.fastmail.com
Whole thread Raw
In response to Re: Optimize LISTEN/NOTIFY  (Rishu Bagga <rishu.postgres@gmail.com>)
List pgsql-hackers
On Wed, Jul 16, 2025, at 02:20, Rishu Bagga wrote:
> If we are doing this optimization, why not maintain a list of backends
> for each channel, and only wake up those channels?

Thanks for a contributing a great idea, it actually turned out to work
really well in practice!

The attached new v4 of the patch implements your multicast idea:

---

Improve NOTIFY scalability with multicast signaling

Previously, NOTIFY would signal all listening backends in a database for
any channel with more than one listener. This broadcast approach scales
poorly for workloads that rely on targeted notifications to small groups
of backends, as every NOTIFY could wake up many unrelated processes.

This commit introduces a multicast signaling optimization to improve
scalability for such use-cases. A new GUC, `notify_multicast_threshold`,
is added to control the maximum number of listeners to track per
channel. When a NOTIFY is issued, if the number of listeners is at or
below this threshold, only those specific backends are signaled. If the
limit is exceeded, the system falls back to the original broadcast
behavior.

The default for this threshold is set to 16. Benchmarks show this
provides a good balance, with significant performance gains for small to
medium-sized listener groups and diminishing returns for higher values.
Setting the threshold to 0 disables multicast signaling, forcing a
fallback to the broadcast path for all notifications.

To implement this, a new partitioned hash table is introduced in shared
memory to track listeners. Locking is managed with an optimistic
read-then-upgrade pattern. This allows concurrent LISTEN/UNLISTEN
operations on *different* channels to proceed in parallel, as they will
only acquire locks on their respective partitions.

For correctness and to prevent deadlocks, a strict lock ordering
hierarchy (NotifyQueueLock before any partition lock) is observed. The
signaling path in NOTIFY must acquire the global NotifyQueueLock first
before consulting the partitioned hash table, which serializes
concurrent NOTIFYs. The primary concurrency win is for LISTEN/UNLISTEN
operations, which are now much more scalable.

The "wake only tail" optimization, which signals backends that are far
behind in the queue, is also included to ensure the global queue tail
can always advance.

Thanks to Rishu Bagga for the multicast idea.

---

BENCHMARK

To find the optimal default notify_multicast_threshold value,
I created a new benchmark tool that spawns one "ping" worker that sends
notifications to a channel, and multiple "pong" workers that listen on channels
and all immediately reply back to the "ping" worker, and when all replies
have been received, the cycle repeats.

By measuring how many complete round-trips can be performed per second,
it evaluates the impact of different multicast threshold settings.

The results below show the effect of setting the notify_multicast_threshold
just below, or exactly at the N backends per channel, to compare broadcast
vs multicast, for different sizes of multicast groups (where 1 would be the
old targeted mode, optimized for specifically earlier).

K = notify_multicast_threshold

With 2 backends per channel (32 channels total):
  patch-v4 (K=1): 8,477 TPS
  patch-v4 (K=2): 27,748 TPS (3.3x improvement)

With 4 backends per channel (16 channels total):  
  patch-v4 (K=1): 7,367 TPS
  patch-v4 (K=4): 18,777 TPS (2.6x improvement)

With 8 backends per channel (8 channels total):
  patch-v4 (K=1): 5,892 TPS  
  patch-v4 (K=8): 8,620 TPS (1.5x improvement)

With 16 backends per channel (4 channels total):
  patch-v4 (K=1):  4,202 TPS
  patch-v4 (K=16): 4,750 TPS (1.1x improvement)

I also reran the old ping-pong as well as the pgbench benchmarks,
and I couldn't detect any negative impact, testing with
notify_multicast_threshold {1, 8, 16}.

Ping-pong benchmark:

Extra Connections: 0
--------------------------------------------------------------------------------
Version                   Max TPS         vs Master       All Values (sorted)
-------------------------------------------------------------------------------------
master                    9119            baseline        {9088, 9095, 9119}
patch-v4 (t=1)            9116            -0.0%           {9082, 9090, 9116}
patch-v4 (t=8)            9106            -0.2%           {9086, 9102, 9106}
patch-v4 (t=16)           9134            +0.2%           {9082, 9116, 9134}

Extra Connections: 10
--------------------------------------------------------------------------------
Version                   Max TPS         vs Master       All Values (sorted)
-------------------------------------------------------------------------------------
master                    6237            baseline        {6224, 6227, 6237}
patch-v4 (t=1)            9358            +50.0%          {9302, 9345, 9358}
patch-v4 (t=8)            9348            +49.9%          {9266, 9312, 9348}
patch-v4 (t=16)           9408            +50.8%          {9339, 9407, 9408}

Extra Connections: 100
--------------------------------------------------------------------------------
Version                   Max TPS         vs Master       All Values (sorted)
-------------------------------------------------------------------------------------
master                    2028            baseline        {2026, 2027, 2028}
patch-v4 (t=1)            9278            +357.3%         {9222, 9235, 9278}
patch-v4 (t=8)            9227            +354.8%         {9184, 9207, 9227}
patch-v4 (t=16)           9250            +355.9%         {9180, 9243, 9250}

Extra Connections: 1000
--------------------------------------------------------------------------------
Version                   Max TPS         vs Master       All Values (sorted)
-------------------------------------------------------------------------------------
master                    239             baseline        {239, 239, 239}
patch-v4 (t=1)            8841            +3594.1%        {8819, 8840, 8841}
patch-v4 (t=8)            8835            +3591.7%        {8802, 8826, 8835}
patch-v4 (t=16)           8855            +3599.8%        {8787, 8843, 8855}


Among my pgbench benchmarks, results seems unaffected in these benchmarks:
listen_unique.sql
listen_common.sql
listen_unlisten_unique.sql
listen_unlisten_common.sql

The listen_notify_unique.sql benchmark shows similar improvements
for all notify_multicast_threshold values tested,
which is expected, since this benchmark uses unique channels,
so a higher notify_multicast_threshold shouldn't affect the results,
which it didn't:

# TEST `listen_notify_unique.sql`

```sql
LISTEN channel_:client_id;
NOTIFY channel_:client_id;
```

## 1 Connection, 1 Job

- **master**: 63696 TPS (baseline)
- **optimize_listen_notify_v4 (t=1.0)**: 63377 TPS (-0.5%)
- **optimize_listen_notify_v4 (t=8.0)**: 62890 TPS (-1.3%)
- **optimize_listen_notify_v4 (t=16.0)**: 63114 TPS (-0.9%)

## 2 Connections, 2 Jobs

- **master**: 90967 TPS (baseline)
- **optimize_listen_notify_v4 (t=1.0)**: 109423 TPS (+20.3%)
- **optimize_listen_notify_v4 (t=8.0)**: 109107 TPS (+19.9%)
- **optimize_listen_notify_v4 (t=16.0)**: 109608 TPS (+20.5%)

## 4 Connections, 4 Jobs

- **master**: 114333 TPS (baseline)
- **optimize_listen_notify_v4 (t=1.0)**: 140986 TPS (+23.3%)
- **optimize_listen_notify_v4 (t=8.0)**: 141263 TPS (+23.6%)
- **optimize_listen_notify_v4 (t=16.0)**: 141327 TPS (+23.6%)

## 8 Connections, 8 Jobs

- **master**: 64429 TPS (baseline)
- **optimize_listen_notify_v4 (t=1.0)**: 93787 TPS (+45.6%)
- **optimize_listen_notify_v4 (t=8.0)**: 93828 TPS (+45.6%)
- **optimize_listen_notify_v4 (t=16.0)**: 93875 TPS (+45.7%)

## 16 Connections, 16 Jobs

- **master**: 41704 TPS (baseline)
- **optimize_listen_notify_v4 (t=1.0)**: 84791 TPS (+103.3%)
- **optimize_listen_notify_v4 (t=8.0)**: 88330 TPS (+111.8%)
- **optimize_listen_notify_v4 (t=16.0)**: 84827 TPS (+103.4%)

## 32 Connections, 32 Jobs

- **master**: 25988 TPS (baseline)
- **optimize_listen_notify_v4 (t=1.0)**: 83197 TPS (+220.1%)
- **optimize_listen_notify_v4 (t=8.0)**: 83453 TPS (+221.1%)
- **optimize_listen_notify_v4 (t=16.0)**: 83576 TPS (+221.6%)

## 1000 Connections, 1 Job

- **master**: 105 TPS (baseline)
- **optimize_listen_notify_v4 (t=1.0)**: 3097 TPS (+2852.1%)
- **optimize_listen_notify_v4 (t=8.0)**: 3079 TPS (+2835.1%)
- **optimize_listen_notify_v4 (t=16.0)**: 3080 TPS (+2835.9%)

## 1000 Connections, 2 Jobs

- **master**: 108 TPS (baseline)
- **optimize_listen_notify_v4 (t=1.0)**: 2981 TPS (+2671.7%)
- **optimize_listen_notify_v4 (t=8.0)**: 3091 TPS (+2774.4%)
- **optimize_listen_notify_v4 (t=16.0)**: 3097 TPS (+2779.6%)

## 1000 Connections, 4 Jobs

- **master**: 105 TPS (baseline)
- **optimize_listen_notify_v4 (t=1.0)**: 2947 TPS (+2705.5%)
- **optimize_listen_notify_v4 (t=8.0)**: 2994 TPS (+2751.0%)
- **optimize_listen_notify_v4 (t=16.0)**: 2992 TPS (+2748.7%)

## 1000 Connections, 8 Jobs

- **master**: 107 TPS (baseline)
- **optimize_listen_notify_v4 (t=1.0)**: 3064 TPS (+2777.0%)
- **optimize_listen_notify_v4 (t=8.0)**: 2981 TPS (+2698.5%)
- **optimize_listen_notify_v4 (t=16.0)**: 2979 TPS (+2696.8%)

## 1000 Connections, 16 Jobs

- **master**: 101 TPS (baseline)
- **optimize_listen_notify_v4 (t=1.0)**: 3068 TPS (+2923.2%)
- **optimize_listen_notify_v4 (t=8.0)**: 2950 TPS (+2806.4%)
- **optimize_listen_notify_v4 (t=16.0)**: 2940 TPS (+2796.8%)

## 1000 Connections, 32 Jobs

- **master**: 102 TPS (baseline)
- **optimize_listen_notify_v4 (t=1.0)**: 2980 TPS (+2815.0%)
- **optimize_listen_notify_v4 (t=8.0)**: 3034 TPS (+2867.9%)
- **optimize_listen_notify_v4 (t=16.0)**: 2962 TPS (+2798.0%)

Here are some plots that includes the above results:

https://github.com/joelonsql/pg-bench-listen-notify/raw/master/plot-v4.png
https://github.com/joelonsql/pg-bench-listen-notify/raw/master/performance_overview_connections_equal_jobs-v4.png
https://github.com/joelonsql/pg-bench-listen-notify/raw/master/performance_overview_fixed_connections-v4.png

/Joel
Attachment

pgsql-hackers by date:

Previous
From: Dean Rasheed
Date:
Subject: Re: Improving and extending int128.h to more of numeric.c
Next
From: Andrei Lepikhov
Date:
Subject: Re: track generic and custom plans in pg_stat_statements