Re: [PATCH] OAuth: fix performance bug with stuck multiplexer events - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: [PATCH] OAuth: fix performance bug with stuck multiplexer events
Date
Msg-id CA+hUKGKV5aM1K3gc_kAMtUq-BkD5AugLUCDVa-RMnbfFypALwQ@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] OAuth: fix performance bug with stuck multiplexer events  (Jacob Champion <jacob.champion@enterprisedb.com>)
Responses Re: [PATCH] OAuth: fix performance bug with stuck multiplexer events
List pgsql-hackers
On Thu, Aug 7, 2025 at 11:55 AM Jacob Champion
<jacob.champion@enterprisedb.com> wrote:
> On Wed, Aug 6, 2025 at 9:13 AM Jacob Champion
> <jacob.champion@enterprisedb.com> wrote:
> > Maybe "drain" would no longer be the
> > verb to use there.
>
> I keep describing this as "combing" the queue when I talk about it in
> person, so v3-0001 renames this new operation to comb_multiplexer().
> And the CI (plus the more strenuous TLS tests) confirms that the
> callback count is still stable with this weaker guarantee, so I've
> gotten rid of the event-counting code.

I was about to hit send on an email suggesting "reset_multiplexer()",
and an attempt at distilling the explanation to a short paragraph, but
your names and explanations are also good so please feel 100% free to
ignore these suggestions.

"Unlike epoll descriptors, kqueue descriptors only transition from
readable to unreadable when kevent() is called and finds nothing,
after removing level-triggered conditions that have gone away.  We
therefore need a dummy kevent() call after operations might have been
performed on the monitored sockets or timer_fd.  Any event returned is
ignored here, but it also remains queued (being level-triggered) and
leaves the descriptor readable.  This is a no-op for epoll
descriptors."

Reviewing the timer stuff, it is again tempting to try to use an
EVFILT_TIMER directly, but I like your approach: it's nice to be able
to mirror the Linux coding, with this minor kink ironed out.

FWIW I re-read the kqueue paper's discussion of the goals of making
kqueue descriptors themselves monitorable/pollable, and it seems it
was mainly intended for hierarchies of kqueues, like your timer_fd,
with the specific aim of expressing priorities.  It doesn't talk about
giving them to code that doesn't know it has a kqueue fd (the client)
and never calls kevent() and infers the events instead (libcurl).
That said, the fact that the filter function for kqueue fds just
checks queue size > 0 without running the filter functions for the
queued events does seem like a bit of an abstraction leak from this
vantage point.  At least it's easy enough to work around in the
kqueue-managing middleman code once you understand it.

> Now that I'm no longer counting events, I can collapse the changes to
> register_socket(). I can't revert those changes entirely, because then
> we regress the case where Curl switches a socket from IN to OUT (this
> is enforced by the new unit tests). But I'm not sure that the existing
> comment adequately explained that fix anyway, and I didn't remember to
> call it out in my initial email, so I've split it out into v3-0002.
> It's much smaller.

Much nicer!  Yeah, that all makes sense.

> The tests (now in 0005) have been adjusted for the new "combing"
> behavior, and I've added a case to ensure that multiple stale events
> are swept up by a single call to comb_multiplexer().

This all looks pretty good to me.  I like the C TAP test.  PostgreSQL
needs more of this.

s/signalled/signaled/ (= US spelling) in a couple of places.



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Datum as struct
Next
From: Shinya Kato
Date:
Subject: Speed up COPY FROM text/CSV parsing using SIMD