Re: Optimize LISTEN/NOTIFY - Mailing list pgsql-hackers
From | Joel Jacobson |
---|---|
Subject | Re: Optimize LISTEN/NOTIFY |
Date | |
Msg-id | 30c2aa7d-dd6c-4b68-a2e4-f217a1a34acf@app.fastmail.com Whole thread Raw |
In response to | Re: Optimize LISTEN/NOTIFY ("Joel Jacobson" <joel@compiler.org>) |
Responses |
Re: Optimize LISTEN/NOTIFY
|
List | pgsql-hackers |
On Thu, Jul 17, 2025, at 09:43, Joel Jacobson wrote: > On Wed, Jul 16, 2025, at 02:20, Rishu Bagga wrote: >> If we are doing this optimization, why not maintain a list of backends >> for each channel, and only wake up those channels? > > Thanks for a contributing a great idea, it actually turned out to work > really well in practice! > > The attached new v4 of the patch implements your multicast idea: Hi hackers, While my previous attempts of $subject has only focused on optimizing the multi-channel scenario, I thought it would be really nice if LISTEN/NOTIFY could be optimize in the general case, benefiting all users, including those who just listen on a single channel. To my surprise, this was not only possible, but actually quite simple. The main idea in this patch, is to introduce an atomic state machine, with three states, IDLE, SIGNALLED, and PROCESSED, so that we don't interrupt backends that are already in the process of catching up. Thanks to Thomas Munro for making me aware of his, Heikki Linnakanga's and others work in the "Interrupts vs signals" [1] thread. Maybe my patch is redundant due to their patch set, I'm not really sure? Their patch seems to refactors the underlying wakeup mechanism. It replaces the old, complex chain of events (SIGUSR1 signal -> handler -> flag -> latch) with a single, direct function call: SendInterrupt(). For async.c, this seems to be a low-level plumbing change that simplifies how a notification wakeup is delivered. My patch optimizes the high-level notification protocol. It introduces a state machine (IDLE, SIGNALLED, PROCESSING) to only signal backends when needed. In their patch, in asyn.c's SignalBackends(), they do SendInterrupt(INTERRUPT_ASYNC_NOTIFY, procno) instead of SendProcSignal(pid, PROCSIG_NOTIFY_INTERRUPT, procnos[i]). They don't seem to check if the backend is already signalled or not, but maybe SendInterrupt() has signal coalescing built-in so it would be a noop with almost no cost? I'm happy to rebase my LISTEN/NOTIFY work on top of [1], but I could also see benefits of doing the opposite. I'm also happy to help with benchmarking of your work in [1]. Note that this patch doesn't contain the hash table to keep track of listeners per backend, as proposed in earlier patches. I will propose such a patch again later, but first we need to figure out if I should rebase onto [1] or master (HEAD). --- PATCH --- Optimize NOTIFY signaling to avoid redundant backend signals Previously, a NOTIFY would send SIGUSR1 to all listening backends, which could lead to a "thundering herd" of redundant signals under high traffic. To address this inefficiency, this patch replaces the simple volatile notifyInterruptPending flag with a per-backend atomic state machine, stored in asyncQueueControl->backend[i].state. This state variable can be in one of three states: IDLE (awaiting signal), SIGNALLED (signal received, work pending), or PROCESSING (actively reading the queue). From the notifier's perspective, SignalBackends now uses an atomic compare-and-swap (CAS) to transition a listener from IDLE to SIGNALLED. Only on a successful transition is a signal sent. If the listener is already SIGNALLED or another notifier wins the race, no redundant signal is sent. If the listener is in the PROCESSING state, the notifier will also transition it to SIGNALLED to ensure the listener re-scans the queue after its current work is done. On the listener side, ProcessIncomingNotify first transitions its state from SIGNALLED to PROCESSING. After reading notifications, it attempts to transition from PROCESSING back to IDLE. If this CAS fails, it means a new notification arrived during processing and a notifier has already set the state back to SIGNALLED. The listener then simply re-latches itself to process the new notifications, avoiding a tight loop. The primary benefit is a significant reduction in syscall overhead and unnecessary kernel wakeups in high-traffic scenarios. This dramatically improves performance for workloads with many concurrent notifiers. Benchmarks show a substantial increase in NOTIFY-only transaction throughput, with gains exceeding 200% at higher concurrency levels. src/backend/commands/async.c | 209 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------------------------- src/backend/tcop/postgres.c | 4 ++-- src/include/commands/async.h | 4 +++- 3 files changed, 185 insertions(+), 32 deletions(-) --- BENCHMARK --- The attached benchmark script does LISTEN on one connection, and then uses pgbench to send NOTIFY on a varying number of connections and jobs, to cause a high procsignal load. I've run the benchmark on my MacBook Pro M3 Max, 10 seconds per run, 3 runs. (I reused the same benchmark script as in the other thread, "Optimize ProcSignal to avoid redundant SIGUSR1 signals") Connections=Jobs | TPS (master) | TPS (patch) | Relative Diff (%) | StdDev (master) | StdDev (patch) ------------------+--------------+-------------+-------------------+-----------------+---------------- 1 | 118833 | 151510 | 27.50% | 484 | 923 2 | 156005 | 239051 | 53.23% | 3145 | 1596 4 | 177351 | 250910 | 41.48% | 4305 | 4891 8 | 116597 | 171944 | 47.47% | 1549 | 2752 16 | 40835 | 165482 | 305.25% | 2695 | 2825 32 | 37940 | 145150 | 282.58% | 2533 | 1566 64 | 35495 | 131836 | 271.42% | 1837 | 573 128 | 40193 | 121333 | 201.88% | 2254 | 874 (8 rows) /Joel https://www.postgresql.org/message-id/flat/CA%2BhUKG%2B3MkS21yK4jL4cgZywdnnGKiBg0jatoV6kzaniBmcqbQ%40mail.gmail.com
Attachment
pgsql-hackers by date: