Re: Optimize LISTEN/NOTIFY - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Optimize LISTEN/NOTIFY
Date
Msg-id 2067239.1763670416@sss.pgh.pa.us
Whole thread Raw
In response to Re: Optimize LISTEN/NOTIFY  ("Joel Jacobson" <joel@compiler.org>)
List pgsql-hackers
I took a brief look through the v28 patch, and I'm fairly distressed
at how much new logic has been stuffed into what's effectively a
critical section.  It's totally not okay for AtCommit_Notify or
anything it calls to throw an error; if it does, something
like this will happen:

diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index e1cf659..ece820d 100644
*** a/src/backend/commands/async.c
--- b/src/backend/commands/async.c
*************** Exec_ListenCommit(const char *channel)
*** 1148,1153 ****
--- 1148,1154 ----
     * doesn't seem worth trying to guard against that, but maybe improve this
     * later.
     */
+   elog(ERROR, "phony OOM in Exec_ListenCommit");
    oldcontext = MemoryContextSwitchTo(TopMemoryContext);
    listenChannels = lappend(listenChannels, pstrdup(channel));
    MemoryContextSwitchTo(oldcontext);

regression=# begin;
BEGIN
regression=*# listen foo;
LISTEN
regression=*# notify foo;
NOTIFY
regression=*# commit;
ERROR:  phony OOM in Exec_ListenCommit
WARNING:  AbortTransaction while in COMMIT state
PANIC:  cannot abort transaction 21558, it was already committed
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

(The NOTIFY in my example could be replaced by anything that
causes the transaction to obtain an XID.)

Now, we had skated past this issue in a few places already,
such as the above-quoted fragment in Exec_ListenCommit, arguing
that the probability of failure there was small enough to tolerate.
But I see no such arguments being made in this patch, and I doubt
I'd believe it anyway for things like DSA segment creation.

So I think there needs to be a serious effort made to move as
much as we possibly can of the potentially-risky stuff into
PreCommit_Notify.  In particular I think we ought to create
the shared channel hash entry then, and even insert our PID
into it.  We could expand the listenersArray entries to include
both a PID and a boolean "is it REALLY listening?", and then
during Exec_ListenCommit we'd only be required to find an
entry we already added and set its boolean, so there's no OOM
hazard.  Possibly do something similar with the local
listenChannelsHash, so as to remove that possibility of OOM
failure as well.

(An alternative design could be to go ahead and insert our
PID during PreCommit_Notify, and just tolerate the small
risk of getting signaled when we didn't need to be.  But
then we'd need some mechanism for cleaning out the bogus
entry during AtAbort_Notify.)

I'm not sure what I think about the new logic in SignalBackends
from this standpoint.  Making it very-low-probability-of-error
definitely needs some consideration though.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Sami Imseih
Date:
Subject: Re: another autovacuum scheduling thread
Next
From: Bruce Momjian
Date:
Subject: Re: 10% drop in code line count in PG 17