Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
Date
Msg-id dg7ipxk4onfxczykzzslrbyznmc4zjhqc5dnrshwcljlxeacuw@jxpayo4oe4ep
Whole thread Raw
In response to Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible  (Jacob Champion <jacob.champion@enterprisedb.com>)
Responses Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
List pgsql-hackers
Hi,

On 2025-03-07 09:03:18 -0800, Jacob Champion wrote:
> On Fri, Mar 7, 2025 at 8:38 AM Andres Freund <andres@anarazel.de> wrote:
> > FWIW, I continue to think that this is a misuse of wait events. We shouldn't
> > use them as a poor man's general purpose tracing framework.
>
> Well, okay. That's frustrating.

I should have clarified - there are a few that I think are ok, basically the
places where we wrap syscalls, e.g. around the sendto, select and recvfrom in
PerformRadiusTransaction().

OTOH that code is effectively completely broken. Doing a blocking select() is
just a no-go, the code isn't interruptible, breaking authentication
timeout. And using select() means that we theoretically could crash due to an
fd that's above FD_SETSIZE.


Most of the other places I'm not on board with, that's wrapping large amounts
of code in a wait event, which pretty much means we're not waiting.

I think some of the wrapped calls into library code might actually call back
into our code (to receive/send data), and our code then will use wait events
around lower level operations done as part of that.

Which pretty much explains my main issue with this - either the code can't
wait in those function calls, in which case it's wrong to use wait events, or
the code is flat out broken.


It's also IMO quite wrong to do something that can throw an error inside a
wait event, because that means that the wait event will still be reported
during error recovery. Probably not the only place doing so, but it's still
wrong.


> If I return to the original design, but replace all of the high-level
> wait events with calls to pgstat_report_activity(), does that work?

It'd be less wrong.

But I really doubt that it's a good idea to encode all kinds of function calls
happening during authentication into something SQL visible. Why stop with
these functions and not just do that for *all* functions in postgres? I mean
it'd not work and slow everything down, but how do you define that line?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: pg_atomic_compare_exchange_*() and memory barriers
Next
From: Nathan Bossart
Date:
Subject: Re: Orphaned users in PG16 and above can only be managed by Superusers