Re: Optimize LISTEN/NOTIFY - Mailing list pgsql-hackers
From | Joel Jacobson |
---|---|
Subject | Re: Optimize LISTEN/NOTIFY |
Date | |
Msg-id | 96f00bf1-cc9d-4520-9d02-9e14e7767c88@app.fastmail.com Whole thread Raw |
In response to | Re: Optimize LISTEN/NOTIFY (Rishu Bagga <rishu.postgres@gmail.com>) |
List | pgsql-hackers |
On Wed, Jul 16, 2025, at 02:20, Rishu Bagga wrote: > If we are doing this optimization, why not maintain a list of backends > for each channel, and only wake up those channels? Thanks for a contributing a great idea, it actually turned out to work really well in practice! The attached new v4 of the patch implements your multicast idea: --- Improve NOTIFY scalability with multicast signaling Previously, NOTIFY would signal all listening backends in a database for any channel with more than one listener. This broadcast approach scales poorly for workloads that rely on targeted notifications to small groups of backends, as every NOTIFY could wake up many unrelated processes. This commit introduces a multicast signaling optimization to improve scalability for such use-cases. A new GUC, `notify_multicast_threshold`, is added to control the maximum number of listeners to track per channel. When a NOTIFY is issued, if the number of listeners is at or below this threshold, only those specific backends are signaled. If the limit is exceeded, the system falls back to the original broadcast behavior. The default for this threshold is set to 16. Benchmarks show this provides a good balance, with significant performance gains for small to medium-sized listener groups and diminishing returns for higher values. Setting the threshold to 0 disables multicast signaling, forcing a fallback to the broadcast path for all notifications. To implement this, a new partitioned hash table is introduced in shared memory to track listeners. Locking is managed with an optimistic read-then-upgrade pattern. This allows concurrent LISTEN/UNLISTEN operations on *different* channels to proceed in parallel, as they will only acquire locks on their respective partitions. For correctness and to prevent deadlocks, a strict lock ordering hierarchy (NotifyQueueLock before any partition lock) is observed. The signaling path in NOTIFY must acquire the global NotifyQueueLock first before consulting the partitioned hash table, which serializes concurrent NOTIFYs. The primary concurrency win is for LISTEN/UNLISTEN operations, which are now much more scalable. The "wake only tail" optimization, which signals backends that are far behind in the queue, is also included to ensure the global queue tail can always advance. Thanks to Rishu Bagga for the multicast idea. --- BENCHMARK To find the optimal default notify_multicast_threshold value, I created a new benchmark tool that spawns one "ping" worker that sends notifications to a channel, and multiple "pong" workers that listen on channels and all immediately reply back to the "ping" worker, and when all replies have been received, the cycle repeats. By measuring how many complete round-trips can be performed per second, it evaluates the impact of different multicast threshold settings. The results below show the effect of setting the notify_multicast_threshold just below, or exactly at the N backends per channel, to compare broadcast vs multicast, for different sizes of multicast groups (where 1 would be the old targeted mode, optimized for specifically earlier). K = notify_multicast_threshold With 2 backends per channel (32 channels total): patch-v4 (K=1): 8,477 TPS patch-v4 (K=2): 27,748 TPS (3.3x improvement) With 4 backends per channel (16 channels total): patch-v4 (K=1): 7,367 TPS patch-v4 (K=4): 18,777 TPS (2.6x improvement) With 8 backends per channel (8 channels total): patch-v4 (K=1): 5,892 TPS patch-v4 (K=8): 8,620 TPS (1.5x improvement) With 16 backends per channel (4 channels total): patch-v4 (K=1): 4,202 TPS patch-v4 (K=16): 4,750 TPS (1.1x improvement) I also reran the old ping-pong as well as the pgbench benchmarks, and I couldn't detect any negative impact, testing with notify_multicast_threshold {1, 8, 16}. Ping-pong benchmark: Extra Connections: 0 -------------------------------------------------------------------------------- Version Max TPS vs Master All Values (sorted) ------------------------------------------------------------------------------------- master 9119 baseline {9088, 9095, 9119} patch-v4 (t=1) 9116 -0.0% {9082, 9090, 9116} patch-v4 (t=8) 9106 -0.2% {9086, 9102, 9106} patch-v4 (t=16) 9134 +0.2% {9082, 9116, 9134} Extra Connections: 10 -------------------------------------------------------------------------------- Version Max TPS vs Master All Values (sorted) ------------------------------------------------------------------------------------- master 6237 baseline {6224, 6227, 6237} patch-v4 (t=1) 9358 +50.0% {9302, 9345, 9358} patch-v4 (t=8) 9348 +49.9% {9266, 9312, 9348} patch-v4 (t=16) 9408 +50.8% {9339, 9407, 9408} Extra Connections: 100 -------------------------------------------------------------------------------- Version Max TPS vs Master All Values (sorted) ------------------------------------------------------------------------------------- master 2028 baseline {2026, 2027, 2028} patch-v4 (t=1) 9278 +357.3% {9222, 9235, 9278} patch-v4 (t=8) 9227 +354.8% {9184, 9207, 9227} patch-v4 (t=16) 9250 +355.9% {9180, 9243, 9250} Extra Connections: 1000 -------------------------------------------------------------------------------- Version Max TPS vs Master All Values (sorted) ------------------------------------------------------------------------------------- master 239 baseline {239, 239, 239} patch-v4 (t=1) 8841 +3594.1% {8819, 8840, 8841} patch-v4 (t=8) 8835 +3591.7% {8802, 8826, 8835} patch-v4 (t=16) 8855 +3599.8% {8787, 8843, 8855} Among my pgbench benchmarks, results seems unaffected in these benchmarks: listen_unique.sql listen_common.sql listen_unlisten_unique.sql listen_unlisten_common.sql The listen_notify_unique.sql benchmark shows similar improvements for all notify_multicast_threshold values tested, which is expected, since this benchmark uses unique channels, so a higher notify_multicast_threshold shouldn't affect the results, which it didn't: # TEST `listen_notify_unique.sql` ```sql LISTEN channel_:client_id; NOTIFY channel_:client_id; ``` ## 1 Connection, 1 Job - **master**: 63696 TPS (baseline) - **optimize_listen_notify_v4 (t=1.0)**: 63377 TPS (-0.5%) - **optimize_listen_notify_v4 (t=8.0)**: 62890 TPS (-1.3%) - **optimize_listen_notify_v4 (t=16.0)**: 63114 TPS (-0.9%) ## 2 Connections, 2 Jobs - **master**: 90967 TPS (baseline) - **optimize_listen_notify_v4 (t=1.0)**: 109423 TPS (+20.3%) - **optimize_listen_notify_v4 (t=8.0)**: 109107 TPS (+19.9%) - **optimize_listen_notify_v4 (t=16.0)**: 109608 TPS (+20.5%) ## 4 Connections, 4 Jobs - **master**: 114333 TPS (baseline) - **optimize_listen_notify_v4 (t=1.0)**: 140986 TPS (+23.3%) - **optimize_listen_notify_v4 (t=8.0)**: 141263 TPS (+23.6%) - **optimize_listen_notify_v4 (t=16.0)**: 141327 TPS (+23.6%) ## 8 Connections, 8 Jobs - **master**: 64429 TPS (baseline) - **optimize_listen_notify_v4 (t=1.0)**: 93787 TPS (+45.6%) - **optimize_listen_notify_v4 (t=8.0)**: 93828 TPS (+45.6%) - **optimize_listen_notify_v4 (t=16.0)**: 93875 TPS (+45.7%) ## 16 Connections, 16 Jobs - **master**: 41704 TPS (baseline) - **optimize_listen_notify_v4 (t=1.0)**: 84791 TPS (+103.3%) - **optimize_listen_notify_v4 (t=8.0)**: 88330 TPS (+111.8%) - **optimize_listen_notify_v4 (t=16.0)**: 84827 TPS (+103.4%) ## 32 Connections, 32 Jobs - **master**: 25988 TPS (baseline) - **optimize_listen_notify_v4 (t=1.0)**: 83197 TPS (+220.1%) - **optimize_listen_notify_v4 (t=8.0)**: 83453 TPS (+221.1%) - **optimize_listen_notify_v4 (t=16.0)**: 83576 TPS (+221.6%) ## 1000 Connections, 1 Job - **master**: 105 TPS (baseline) - **optimize_listen_notify_v4 (t=1.0)**: 3097 TPS (+2852.1%) - **optimize_listen_notify_v4 (t=8.0)**: 3079 TPS (+2835.1%) - **optimize_listen_notify_v4 (t=16.0)**: 3080 TPS (+2835.9%) ## 1000 Connections, 2 Jobs - **master**: 108 TPS (baseline) - **optimize_listen_notify_v4 (t=1.0)**: 2981 TPS (+2671.7%) - **optimize_listen_notify_v4 (t=8.0)**: 3091 TPS (+2774.4%) - **optimize_listen_notify_v4 (t=16.0)**: 3097 TPS (+2779.6%) ## 1000 Connections, 4 Jobs - **master**: 105 TPS (baseline) - **optimize_listen_notify_v4 (t=1.0)**: 2947 TPS (+2705.5%) - **optimize_listen_notify_v4 (t=8.0)**: 2994 TPS (+2751.0%) - **optimize_listen_notify_v4 (t=16.0)**: 2992 TPS (+2748.7%) ## 1000 Connections, 8 Jobs - **master**: 107 TPS (baseline) - **optimize_listen_notify_v4 (t=1.0)**: 3064 TPS (+2777.0%) - **optimize_listen_notify_v4 (t=8.0)**: 2981 TPS (+2698.5%) - **optimize_listen_notify_v4 (t=16.0)**: 2979 TPS (+2696.8%) ## 1000 Connections, 16 Jobs - **master**: 101 TPS (baseline) - **optimize_listen_notify_v4 (t=1.0)**: 3068 TPS (+2923.2%) - **optimize_listen_notify_v4 (t=8.0)**: 2950 TPS (+2806.4%) - **optimize_listen_notify_v4 (t=16.0)**: 2940 TPS (+2796.8%) ## 1000 Connections, 32 Jobs - **master**: 102 TPS (baseline) - **optimize_listen_notify_v4 (t=1.0)**: 2980 TPS (+2815.0%) - **optimize_listen_notify_v4 (t=8.0)**: 3034 TPS (+2867.9%) - **optimize_listen_notify_v4 (t=16.0)**: 2962 TPS (+2798.0%) Here are some plots that includes the above results: https://github.com/joelonsql/pg-bench-listen-notify/raw/master/plot-v4.png https://github.com/joelonsql/pg-bench-listen-notify/raw/master/performance_overview_connections_equal_jobs-v4.png https://github.com/joelonsql/pg-bench-listen-notify/raw/master/performance_overview_fixed_connections-v4.png /Joel
Attachment
pgsql-hackers by date: