Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible - Mailing list pgsql-hackers

From Jacob Champion
Subject Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
Date
Msg-id CAOYmi+nkEPh4724NKci7hFiZBbu4WEO99bL7=fu+EvTAb3N8uw@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Fri, Mar 7, 2025 at 9:25 AM Andres Freund <andres@anarazel.de> wrote:
> I should have clarified - there are a few that I think are ok, basically the
> places where we wrap syscalls, e.g. around the sendto, select and recvfrom in
> PerformRadiusTransaction().

Okay.

> OTOH that code is effectively completely broken. Doing a blocking select() is
> just a no-go, the code isn't interruptible, breaking authentication
> timeout. And using select() means that we theoretically could crash due to an
> fd that's above FD_SETSIZE.

I think we're in agreement here; I'm just trying to improve things
incrementally. If someone actually hits the broken case, I think it'd
be helpful for them to see it.

> I think some of the wrapped calls into library code might actually call back
> into our code (to receive/send data), and our code then will use wait events
> around lower level operations done as part of that.

That would be a problem, agreed, but I didn't think I'd wrapped any
callback APIs. (Admittedly I have little experience with the SSPI
stuff.) But looking at the wrapped calls in the patch... which are you
suspicious of?

> It's also IMO quite wrong to do something that can throw an error inside a
> wait event, because that means that the wait event will still be reported
> during error recovery.

Hm, okay. I can change that for the LookupAccountSid case.

> Probably not the only place doing so, but it's still
> wrong.

It's definitely not the only place. :D

> Why stop with
> these functions and not just do that for *all* functions in postgres? I mean
> it'd not work and slow everything down,

(That seems like a good reason not to do it for all functions in
Postgres, no? I hope the slope is not all that slippery in practice.)

> but how do you define that line?

Cost/benefit. In this case, authentication hanging in an unknown place
in PAM and LDAP has caused tangible support problems. I suspect I'd
have gotten complaints if I only focused on those two places, though,
so I expanded it to the other blocking calls I could see.

Thanks,
--Jacob



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Add column name to error description
Next
From: Jeff Davis
Date:
Subject: Re: Statistics Import and Export