Re: Optimize LISTEN/NOTIFY - Mailing list pgsql-hackers

From Joel Jacobson
Subject Re: Optimize LISTEN/NOTIFY
Date
Msg-id 34ccde90-2041-4388-90d5-f8519f3104dd@app.fastmail.com
Whole thread Raw
In response to Re: Optimize LISTEN/NOTIFY  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Wed, Jan 7, 2026, at 21:29, Tom Lane wrote:
> "Joel Jacobson" <joel@compiler.org> writes:
>> I've reworked the staging mechanism for LISTEN/UNLISTEN. The new design
>> tracks LISTEN state at three levels:
>> * pendingListenChannels: per-transaction pending changes
>> * listenChannelsHash: per-backend committed state cache
>> * channelHash: cluster-wide shared state
>
> I spent some time looking through this version.

Many thanks for a great review with awesome ideas.

>> The main benefit is that pendingListenChannels is now a hash table
>> instead of a simple List.  In the old design, LISTEN foo; UNLISTEN foo;
>> LISTEN foo would create three list entries that all had to be processed
>> at commit.  The new design collapses this to one hash entry storing the
>> final state, which we just apply at commit.
>
> That seems a little weird: surely such usage is not something we need
> to optimize.  Maybe it can be justified because it makes the code
> simpler, but that's not immediately obvious to me.

The reason for not sticking to the two-boolean approach (staged/current)
like in v32 was to minimize shared dshash operations in favor of local
hash table operations.

Consider `LISTEN foo; UNLISTEN foo; LISTEN foo:`

With two booleans and pendingListenActions as a List (as in v32):

Each LISTEN/UNLISTEN touches the shared dshash during PreCommit to set
the staged value, then commit iterates the entire list (including
duplicates) to finalize. That's 6 dshash operations in total.

With a local hash table (current design, v34):

UNLISTEN during PreCommit doesn't touch the shared dshash at all, it
just records the intent locally.  LISTEN pre-allocates once per channel.
At commit we iterate the de-duplicated hash table, so we do one dshash
operation per unique channel rather than per action.  That's 2 dshash
operations in total.

Since dshash operations involve shared memory access and potential
contention, doing more work locally with a cheap hash table seemed like
the right trade-off to me.

I also tried making the current v34 approach work with
pendingListenActions as a List.  The problem is that at commit, we need
to know the final intent per channel.  With a List containing
duplicates, we'd iterate in reverse for "last action wins" semantics,
but then we'd need to track which channels we've already processed,
essentially reimplementing a hash table manually.  The hash table gives
us this for free: inserting the same channel just overwrites the
previous action, and at commit each channel appears exactly once with
its final intent.

> I found the terminology pretty confusing, in particular there's too
> little cognitive distance between "channel hash table" and "local
> listen channels hash table" and "pending listen channels hash table".
> I think you could improve matters by renaming, perhaps along the
> lines of

I really struggled trying to come up with meaningful names, so big
thanks for great naming improvements.

Renames per suggestions:

    channelHash -> globalChannelTable
    listenChannelsHash -> localChannelTable
    pendingListenChannels -> pendingListenActions
    NotificationList.channelSet -> NotificationList.uniqueChannelNames
    Exec_ListenPreCommit -> BecomeRegisteredListener
    Exec_ListenPreCommitStage -> PrepareTableEntriesForListen
    Exec_UnlistenPreCommitStage -> PrepareTableEntriesForUnlisten
    Exec_UnlistenAllPreCommitStage -> PrepareTableEntriesForUnlistenAll

I also renamed the following to make their purpose clear:

    ChannelNameKey -> GlobalChannelKey
    ChannelHashKey -> GlobalChannelKey
    ChannelListeners -> GlobalChannelEntry
    ChannelListeners.ChannelHashKey -> GlobalChannelEntry.GlobalChannelKey
    AsyncQueueControl.channelHashDSA -> AsyncQueueControl.globalChannelTableDSA
    AsyncQueueControl.channelHashDSH -> AsyncQueueControl.globalChannelTableDSH
    channelDSA -> globalChannelDSA

I also renamed PendingListenEntry.listening to
PendingListenEntry.action, and changed it from a bool to a new enum
PendingListenAction.  I didn't like how this field was a bool named
"listening", since ListenerEntry's also has a bool field named
"listening".  It felt better to make it an enum with the explicit values
{PENDING_LISTEN, PENDING_UNLISTEN}.

> I noted while looking at code coverage results that AtAbort_Notify
> seems almost entirely unreached, and soon realized that that's because
> none of it runs unless we are cleaning up a transaction that fails
> after Exec_ListenPreCommit has created pendingListenChannels.
> That's obviously extremely difficult to reach in a test, so it's a bit
> sad that we have that big chunk of unreachable code, especially when
> the coverage elsewhere in this file is pretty stellar.  I wonder if
> there is a way to merge that chunk with the code in AtCommit_Notify
> that also has to scan pendingListenChannels, so as to reduce the
> amount of untested code.  Also, the comments for AtAbort_Notify
> don't really make it clear that there's nothing to do unless we're
> reverting an almost-committed transaction.

Great idea, thanks!  I also found it troubling not being able to test
AtAbort_Notify without e.g. manually adding code to simulate OOM, which
is how I manually tested the old code.

Two new helper-functions have been introduced:
    ApplyPendingListenActions
        -> RemoveListenerFromChannel

RemoveListenerFromChannel is now used both by UNLISTEN being committed...

AtCommit_Notify
    -> ApplyPendingListenActions
        -> RemoveListenerFromChannel

...as well as the harder-to-hit case of a staged LISTEN being aborted:

    AtAbort_Notify
        -> ApplyPendingListenActions
            -> RemoveListenerFromChannel

> Speaking of AtCommit_Notify:
>
>             entry = dshash_find(channelHash, &key, true);
>             if (entry == NULL)
>                 continue;
>
> How can it be okay to just silently "continue" here?  Certainly
> this case should be unreachable, but that doesn't make it okay
> to silently ignore the failure if one were to occur.  Given
> that this is post-commit processing, probably we can do no
> better than

The silent continue was handling UNLISTEN of a channel we weren't
listening on (which is allowed as a no-op).  However, I've now moved
that check earlier: PrepareTableEntriesForUnlisten now returns early if
the channel isn't in localChannelTable.  This means
ApplyPendingListenActions can safely PANIC on entry == NULL, since that
case is now truly unreachable.

> Also, a bit further down:
>
...
>
> This seems like a weird order of operations: why didn't you finish
> dealing with the channelHash entry before dealing with
> listenChannelsHash?

Right. Fixed.

New version attached.

/Joel

Attachment

pgsql-hackers by date:

Previous
From: John Naylor
Date:
Subject: Re: Correct comment wording in extension.c
Next
From: Amul Sul
Date:
Subject: Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions