Re: Optimize LISTEN/NOTIFY - Mailing list pgsql-hackers

From Joel Jacobson
Subject Re: Optimize LISTEN/NOTIFY
Date
Msg-id 30c2aa7d-dd6c-4b68-a2e4-f217a1a34acf@app.fastmail.com
Whole thread Raw
In response to Re: Optimize LISTEN/NOTIFY  ("Joel Jacobson" <joel@compiler.org>)
Responses Re: Optimize LISTEN/NOTIFY
List pgsql-hackers
On Thu, Jul 17, 2025, at 09:43, Joel Jacobson wrote:
> On Wed, Jul 16, 2025, at 02:20, Rishu Bagga wrote:
>> If we are doing this optimization, why not maintain a list of backends
>> for each channel, and only wake up those channels?
>
> Thanks for a contributing a great idea, it actually turned out to work
> really well in practice!
>
> The attached new v4 of the patch implements your multicast idea:

Hi hackers,

While my previous attempts of $subject has only focused on optimizing
the multi-channel scenario, I thought it would be really nice if
LISTEN/NOTIFY could be optimize in the general case, benefiting all
users, including those who just listen on a single channel.

To my surprise, this was not only possible, but actually quite simple.

The main idea in this patch, is to introduce an atomic state machine,
with three states, IDLE, SIGNALLED, and PROCESSED, so that we don't
interrupt backends that are already in the process of catching up.

Thanks to Thomas Munro for making me aware of his, Heikki Linnakanga's
and others work in the  "Interrupts vs signals" [1] thread.

Maybe my patch is redundant due to their patch set, I'm not really sure?

Their patch seems to refactors the underlying wakeup mechanism. It
replaces the old, complex chain of events (SIGUSR1 signal -> handler ->
flag -> latch) with a single, direct function call: SendInterrupt(). For
async.c, this seems to be a low-level plumbing change that simplifies
how a notification wakeup is delivered.

My patch optimizes the high-level notification protocol. It introduces a
state machine (IDLE, SIGNALLED, PROCESSING) to only signal backends when
needed.

In their patch, in asyn.c's SignalBackends(), they do
SendInterrupt(INTERRUPT_ASYNC_NOTIFY, procno) instead of
SendProcSignal(pid, PROCSIG_NOTIFY_INTERRUPT, procnos[i]). They don't
seem to check if the backend is already signalled or not, but maybe
SendInterrupt() has signal coalescing built-in so it would be a noop
with almost no cost?

I'm happy to rebase my LISTEN/NOTIFY work on top of [1], but I could
also see benefits of doing the opposite.

I'm also happy to help with benchmarking of your work in [1].

Note that this patch doesn't contain the hash table to keep track of
listeners per backend, as proposed in earlier patches. I will propose
such a patch again later, but first we need to figure out if I should
rebase onto [1] or master (HEAD).

--- PATCH ---

    Optimize NOTIFY signaling to avoid redundant backend signals

    Previously, a NOTIFY would send SIGUSR1 to all listening backends, which
    could lead to a "thundering herd" of redundant signals under high
    traffic. To address this inefficiency, this patch replaces the simple
    volatile notifyInterruptPending flag with a per-backend atomic state
    machine, stored in asyncQueueControl->backend[i].state. This state
    variable can be in one of three states: IDLE (awaiting signal),
    SIGNALLED (signal received, work pending), or PROCESSING (actively
    reading the queue).

    From the notifier's perspective, SignalBackends now uses an atomic
    compare-and-swap (CAS) to transition a listener from IDLE to SIGNALLED.
    Only on a successful transition is a signal sent. If the listener is
    already SIGNALLED or another notifier wins the race, no redundant signal
    is sent. If the listener is in the PROCESSING state, the notifier will
    also transition it to SIGNALLED to ensure the listener re-scans the
    queue after its current work is done.

    On the listener side, ProcessIncomingNotify first transitions its state
    from SIGNALLED to PROCESSING. After reading notifications, it attempts
    to transition from PROCESSING back to IDLE. If this CAS fails, it means
    a new notification arrived during processing and a notifier has already
    set the state back to SIGNALLED. The listener then simply re-latches
    itself to process the new notifications, avoiding a tight loop.

    The primary benefit is a significant reduction in syscall overhead and
    unnecessary kernel wakeups in high-traffic scenarios. This dramatically
    improves performance for workloads with many concurrent notifiers.
    Benchmarks show a substantial increase in NOTIFY-only transaction
    throughput, with gains exceeding 200% at higher
    concurrency levels.

 src/backend/commands/async.c | 209
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------------------------
 src/backend/tcop/postgres.c  |   4 ++--
 src/include/commands/async.h |   4 +++-
 3 files changed, 185 insertions(+), 32 deletions(-)

--- BENCHMARK ---

The attached benchmark script does LISTEN on one connection,
and then uses pgbench to send NOTIFY on a varying number of
connections and jobs, to cause a high procsignal load.

I've run the benchmark on my MacBook Pro M3 Max,
10 seconds per run, 3 runs.

(I reused the same benchmark script as in the other thread, "Optimize ProcSignal to avoid redundant SIGUSR1 signals")

 Connections=Jobs | TPS (master) | TPS (patch) | Relative Diff (%) | StdDev (master) | StdDev (patch)
------------------+--------------+-------------+-------------------+-----------------+----------------
                1 |       118833 |      151510 | 27.50%            |             484 |            923
                2 |       156005 |      239051 | 53.23%            |            3145 |           1596
                4 |       177351 |      250910 | 41.48%            |            4305 |           4891
                8 |       116597 |      171944 | 47.47%            |            1549 |           2752
               16 |        40835 |      165482 | 305.25%           |            2695 |           2825
               32 |        37940 |      145150 | 282.58%           |            2533 |           1566
               64 |        35495 |      131836 | 271.42%           |            1837 |            573
              128 |        40193 |      121333 | 201.88%           |            2254 |            874
(8 rows)

/Joel

https://www.postgresql.org/message-id/flat/CA%2BhUKG%2B3MkS21yK4jL4cgZywdnnGKiBg0jatoV6kzaniBmcqbQ%40mail.gmail.com
Attachment

pgsql-hackers by date:

Previous
From: Mircea Cadariu
Date:
Subject: Re: Metadata and record block access stats for indexes
Next
From: Japin Li
Date:
Subject: Re: Re-archive the WAL on standby with archive_mode=always?