Thread: [PATCH] pg_stat_activity: make slow/hanging authentication more visible

Hi all,

Recently I dealt with a server where PAM had hung a connection
indefinitely, suppressing our authentication timeout and preventing a
clean shutdown. Worse, the xmin that was pinned by the opening
transaction cascaded to replicas and started messing things up
downstream.

The DBAs didn't know what was going on, because pg_stat_activity
doesn't report the authenticating connection or its open transaction.
It just looked like a Postgres bug. And while talking about it with
Euler, he mentioned he'd seen similar "invisible" hangs with
misbehaving LDAP deployments. I think we can do better to show DBAs
what's happening.

0001, attached, changes InitPostgres() to report a nearly-complete
pgstat entry before entering client authentication, then fills it in
the rest of the way once we know who the user is. Here's a sample
entry for a client that's hung during a SCRAM exchange:

    =# select * from pg_stat_activity where state = 'authenticating';
    -[ RECORD 1 ]----+------------------------------
    datid            |
    datname          |
    pid              | 745662
    leader_pid       |
    usesysid         |
    usename          |
    application_name |
    client_addr      | 127.0.0.1
    client_hostname  |
    client_port      | 38304
    backend_start    | 2024-05-06 11:25:23.905923-07
    xact_start       |
    query_start      |
    state_change     |
    wait_event_type  | Client
    wait_event       | ClientRead
    state            | authenticating
    backend_xid      |
    backend_xmin     | 784
    query_id         |
    query            |
    backend_type     | client backend

0002 goes even further, and adds wait events for various forms of
external authentication, but it's not fully baked. The intent is for a
DBA to be able to see when a bunch of connections are piling up
waiting for PAM/Kerberos/whatever. (I'm also motivated by my OAuth
patchset, where there's a server-side plugin that we have no control
over, and we'd want to be able to correctly point fingers at it if
things go wrong.)

= Open Issues, Idle Thoughts =

Maybe it's wishful thinking, but it'd be cool if a misbehaving
authentication exchange did not impact replicas in any way. Is there a
way to make that opening transaction lighterweight?

0001 may be a little too much code. There are only two parts of
pgstat_bestart() that need to be modified: omit the user ID, and fill
in the state as 'authenticating' rather than unknown. I could just add
the `pre_auth` boolean to the signature of pgstat_bestart() directly,
if we don't mind adjusting all the call sites. We could also avoid
changing the signature entirely, and just assume that we're
authenticating if SessionUserId isn't set. That felt like a little too
much global magic to me, though.

Would anyone like me to be more aggressive, and create a pgstat entry
as soon as we have the opening transaction? Or... as soon as a
connection is made?

0002 is abusing the "IPC" wait event class. If the general idea seems
okay, maybe we could add an "External" class that encompasses the
general idea of "it's not our fault, it's someone else's"?

I had trouble deciding how granular to make the areas that are covered
by the new wait events. Ideally they would kick in only when we call
out to an external system, but for some authentication types, that's a
lot of calls to wrap. On the other extreme, we don't want to go too
high in the call stack and accidentally nest wait events (such as
those generated during pq_getmessage()). What I have now is not very
principled.

I haven't decided how to test these patches. Seems like a potential
use case for injection points, but I think I'd need to preload an
injection library rather than using the existing extension. Does that
seem like an okay way to go?

Thanks,
--Jacob

Attachment
On Mon, May 06, 2024 at 02:23:38PM -0700, Jacob Champion wrote:
>     =# select * from pg_stat_activity where state = 'authenticating';
>     -[ RECORD 1 ]----+------------------------------
>     datid            |
>     datname          |
>     pid              | 745662
>     leader_pid       |
>     usesysid         |
>     usename          |
>     application_name |
>     client_addr      | 127.0.0.1
>     client_hostname  |
>     client_port      | 38304
>     backend_start    | 2024-05-06 11:25:23.905923-07
>     xact_start       |
>     query_start      |
>     state_change     |
>     wait_event_type  | Client
>     wait_event       | ClientRead
>     state            | authenticating
>     backend_xid      |
>     backend_xmin     | 784
>     query_id         |
>     query            |
>     backend_type     | client backend

That looks like a reasonable user experience.  Is any field newly-nullable?

> = Open Issues, Idle Thoughts =
> 
> Maybe it's wishful thinking, but it'd be cool if a misbehaving
> authentication exchange did not impact replicas in any way. Is there a
> way to make that opening transaction lighterweight?

You could release the xmin before calling PAM or LDAP.  If you've copied all
relevant catalog content to local memory, that's fine to do.  That said, it
may be more fruitful to arrange for authentication timeout to cut through PAM
etc.  Hanging connection slots hurt even if they lack an xmin.  I assume it
takes an immediate shutdown to fix them?

> Would anyone like me to be more aggressive, and create a pgstat entry
> as soon as we have the opening transaction? Or... as soon as a
> connection is made?

All else being equal, I'd like backends to have one before taking any lmgr
lock or snapshot.

> I haven't decided how to test these patches. Seems like a potential
> use case for injection points, but I think I'd need to preload an
> injection library rather than using the existing extension. Does that
> seem like an okay way to go?

Yes.

Thanks,
nm



On Sun, Jun 30, 2024 at 10:48 AM Noah Misch <noah@leadboat.com> wrote:
> That looks like a reasonable user experience.  Is any field newly-nullable?

Technically I think the answer is no, since backends such as walwriter
already have null database and user fields. It's new for a client
backend to have nulls there, though.

> That said, it
> may be more fruitful to arrange for authentication timeout to cut through PAM
> etc.

That seems mostly out of our hands -- the misbehaving modules are free
to ignore our signals (and do). Is there another way to force the
issue?

> Hanging connection slots hurt even if they lack an xmin.

Oh, would releasing the xmin not really move the needle, then?

> I assume it
> takes an immediate shutdown to fix them?

That's my understanding, yeah.

> > Would anyone like me to be more aggressive, and create a pgstat entry
> > as soon as we have the opening transaction? Or... as soon as a
> > connection is made?
>
> All else being equal, I'd like backends to have one before taking any lmgr
> lock or snapshot.

I can look at this for the next patchset version.

> > I haven't decided how to test these patches. Seems like a potential
> > use case for injection points, but I think I'd need to preload an
> > injection library rather than using the existing extension. Does that
> > seem like an okay way to go?
>
> Yes.

I misunderstood how injection points worked. No preload module needed,
so v2 adds a waitpoint and a test along with a couple of needed tweaks
to BackgroundPsql. I think 0001 should probably be applied
independently.

Thanks,
--Jacob

Attachment
On Mon, Jul 08, 2024 at 02:09:21PM -0700, Jacob Champion wrote:
> On Sun, Jun 30, 2024 at 10:48 AM Noah Misch <noah@leadboat.com> wrote:
> > That said, it
> > may be more fruitful to arrange for authentication timeout to cut through PAM
> > etc.
> 
> That seems mostly out of our hands -- the misbehaving modules are free
> to ignore our signals (and do). Is there another way to force the
> issue?

Two ways at least (neither of them cheap):
- Invoke PAM in a subprocess, and SIGKILL that process if needed.
- Modify the module to be interruptible.

> > Hanging connection slots hurt even if they lack an xmin.
> 
> Oh, would releasing the xmin not really move the needle, then?

It still moves the needle.




On 2024-08-29 Th 4:44 PM, Jacob Champion wrote:
As for the other patches, I'll ping Andrew about 0001,


Patch 0001 looks sane to me.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible

From
Jacob Champion
Date:
On Sun, Sep 1, 2024 at 5:10 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Fri, Aug 30, 2024 at 04:10:32PM -0400, Andrew Dunstan wrote:
> > Patch 0001 looks sane to me.
> So does 0002 to me.

Thanks both!

> I'm not much a fan of the addition of
> pgstat_bestart_pre_auth(), which is just a shortcut to set a different
> state in the backend entry to tell that it is authenticating.  Is
> authenticating the term for this state of the process startups,
> actually?  Could it be more transparent to use a "startup" or
> "starting"" state instead

Yeah, I think I should rename that. Especially if we adopt new wait
states to make it obvious where we're stuck.

"startup", "starting", "initializing", "connecting"...?

> that gets also used by pgstat_bestart() in
> the case of the patch where !pre_auth?

To clarify, do you want me to just add the new boolean directly to
pgstat_bestart()'s parameter list?

> The addition of the new wait event states in 0004 is a good idea,
> indeed,

Thanks! Any thoughts on the two open questions for it?:
1) Should we add a new wait event class rather than reusing IPC?
2) Is the level at which I've inserted calls to
pgstat_report_wait_start()/_end() sane and maintainable?

> and these can be seen in pg_stat_activity once we get out of
> PGSTAT_END_WRITE_ACTIVITY() (err.. Right?).

It doesn't look like pgstat_report_wait_start() uses that machinery.

--Jacob



On Tue, Sep 10, 2024 at 02:29:57PM +0900, Michael Paquier wrote:
> You are adding twelve event points with only 5
> new wait names.  Couldn't it be better to have a one-one mapping
> instead, adding twelve entries in wait_event_names.txt?

No, I think the patch's level of detail is better.  One shouldn't expect the
two ldap_simple_bind_s() calls to have different-enough performance
characteristics to justify exposing that level of detail to the DBA.
ldap_search_s() and InitializeLDAPConnection() differ more, but the DBA mostly
just needs to know the scale of their LDAP responsiveness problem.

(Someday, it might be good to expose the file:line and/or backtrace associated
with a wait, like we do for ereport().  As a way to satisfy rare needs for
more detail, I'd prefer that over giving every pgstat_report_wait_start() a
different name.)



On Tue, Sep 10, 2024 at 1:27 PM Noah Misch <noah@leadboat.com> wrote:
> On Tue, Sep 10, 2024 at 02:29:57PM +0900, Michael Paquier wrote:
> > You are adding twelve event points with only 5
> > new wait names.  Couldn't it be better to have a one-one mapping
> > instead, adding twelve entries in wait_event_names.txt?
>
> No, I think the patch's level of detail is better.  One shouldn't expect the
> two ldap_simple_bind_s() calls to have different-enough performance
> characteristics to justify exposing that level of detail to the DBA.
> ldap_search_s() and InitializeLDAPConnection() differ more, but the DBA mostly
> just needs to know the scale of their LDAP responsiveness problem.
>
> (Someday, it might be good to expose the file:line and/or backtrace associated
> with a wait, like we do for ereport().  As a way to satisfy rare needs for
> more detail, I'd prefer that over giving every pgstat_report_wait_start() a
> different name.)

I think unique names are a good idea. If a user doesn't care about the
difference between sdgjsA and sdjgsB, they can easily ignore the
trailing suffix, and IME, people typically do that without really
stopping to think about it. If on the other hand the two are lumped
together as sdjgs and a user needs to distinguish them, they can't. So
I see unique names as having much more upside than downside.

--
Robert Haas
EDB: http://www.enterprisedb.com



On Tue, Sep 10, 2024 at 02:51:23PM -0400, Robert Haas wrote:
> On Tue, Sep 10, 2024 at 1:27 PM Noah Misch <noah@leadboat.com> wrote:
> > On Tue, Sep 10, 2024 at 02:29:57PM +0900, Michael Paquier wrote:
> > > You are adding twelve event points with only 5
> > > new wait names.  Couldn't it be better to have a one-one mapping
> > > instead, adding twelve entries in wait_event_names.txt?
> >
> > No, I think the patch's level of detail is better.  One shouldn't expect the
> > two ldap_simple_bind_s() calls to have different-enough performance
> > characteristics to justify exposing that level of detail to the DBA.
> > ldap_search_s() and InitializeLDAPConnection() differ more, but the DBA mostly
> > just needs to know the scale of their LDAP responsiveness problem.
> >
> > (Someday, it might be good to expose the file:line and/or backtrace associated
> > with a wait, like we do for ereport().  As a way to satisfy rare needs for
> > more detail, I'd prefer that over giving every pgstat_report_wait_start() a
> > different name.)
> 
> I think unique names are a good idea. If a user doesn't care about the
> difference between sdgjsA and sdjgsB, they can easily ignore the
> trailing suffix, and IME, people typically do that without really
> stopping to think about it. If on the other hand the two are lumped
> together as sdjgs and a user needs to distinguish them, they can't. So
> I see unique names as having much more upside than downside.

I agree a person can ignore the distinction, but that requires the person to
be consuming the raw event list.  It's reasonable to tell your monitoring tool
to give you the top N wait events.  Individual AuthnLdap* events may all miss
the cut even though their aggregate would have made the cut.  Before you know
to teach that monitoring tool to group AuthnLdap* together, it won't show you
any of those names.

I felt commit c789f0f also chose sub-optimally in this respect, particularly
with the DblinkGetConnect/DblinkConnect pair.  I didn't feel strongly enough
to complain at the time, but a rule of "each wait event appears in one
pgstat_report_wait_start()" would be a rule I don't want.  One needs
familiarity with the dblink implementation internals to grasp the
DblinkGetConnect/DblinkConnect distinction, and a plausible refactor of dblink
would make those names cease to fit.  I see this level of fine-grained naming
as making the event name a sort of stable proxy for FILE:LINE.  I'd value
exposing such a proxy, all else being equal, but I don't think wait event
names like AuthLdapBindLdapbinddn/AuthLdapBindUser are the right way.  Wait
event names should be more independent of today's code-level details.



On Tue, Sep 10, 2024 at 4:58 PM Noah Misch <noah@leadboat.com> wrote:
> ... a rule of "each wait event appears in one
> pgstat_report_wait_start()" would be a rule I don't want.

As the original committer of the wait event stuff, I intended for the
rule that you do not want to be the actual rule. However, I see that I
didn't spell that out anywhere in the commit message, or the commit
itself.

> I see this level of fine-grained naming
> as making the event name a sort of stable proxy for FILE:LINE.  I'd value
> exposing such a proxy, all else being equal, but I don't think wait event
> names like AuthLdapBindLdapbinddn/AuthLdapBindUser are the right way.  Wait
> event names should be more independent of today's code-level details.

I don't agree with that. One of the most difficult parts of supporting
PostgreSQL, in my experience, is that it's often very difficult to
find out what has gone wrong when a system starts behaving badly. It
is often necessary to ask customers to install a debugger and do stuff
with it, or give them an instrumented build, in order to determine the
root cause of a problem that in some cases is not even particularly
complicated. While needing to refer to specific source code details
may not be a common experience for the typical end user, it is
extremely common for me. This problem commonly arises with error
messages, because we have lots of error messages that are exactly the
same, although thankfully it has become less common due to "could not
find tuple for THINGY %u" no longer being a message that no longer
typically reaches users. But even when someone has a complaint about
an error message and there are multiple instances of that error
message, I know that:

(1) I can ask them to set the error verbosity to verbose. I don't have
that option for wait events.

(2) The primary function of the error message is to be understandable
to the user, which means that it needs to be written in plain English.
The primary function of a wait event is to make it possible to
understand the behavior of the system and troubleshoot problems, and
it becomes much less effective as soon as it starts saying that thing
A and thing B are so similar that nobody will ever care about the
distinction. It is very hard to be certain of that. When somebody
reports that they've got a whole bunch of wait events on some wait
event that nobody has ever complained about before, I want to go look
at the code in that specific place and try to figure out what's
happening. If I have to start imagining possible scenarios based on 2
or more call sites, or if I have to start by getting them to install a
modified build with those properly split apart and trying to reproduce
the problem, it's a lot harder.

In my experience, the number of distinct wait events that a particular
installation experiences is rarely very large. It is probably measured
in dozens. A user who wishes to disregard the distinction between
similarly-named wait events won't find it prohibitively difficult to
look over the list of all the wait events they ever see and decide
which ones they'd like to merge for reporting purposes. But a user who
really needs things separated out and finds that they aren't is simply
out of luck.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible

From
Jacob Champion
Date:
On Mon, Sep 9, 2024 at 10:30 PM Michael Paquier <michael@paquier.xyz> wrote:
> No.  My question was about splitting pgstat_bestart() and
> pgstat_bestart_pre_auth() in a cleaner way, because authenticated
> connections finish by calling both, meaning that we do twice the same
> setup for backend entries depending on the authentication path taken.
> That seems like a waste.

I can try to separate them out. I'm a little wary of messing with the
CRITICAL_SECTION guarantees, though. I thought the idea was that you
filled in the entire struct to prevent tearing. (If I've misunderstood
that, please let me know :D)

> Perhaps just use a new
> "Authentication" class, as in "The server is waiting for an
> authentication operation to complete"?

Sounds good.

> Couldn't it be better to have a one-one mapping
> instead, adding twelve entries in wait_event_names.txt?

(I have no strong opinions on this myself, but while the debate is
ongoing, I'll work on a version of the patch with more detailed wait
events. It's easy to collapse them again if that gets the most votes.)

> I am not really on board with the test based on injection points
> proposed, though.  It checks that the "authenticating" flag is set in
> pg_stat_activity, but it does nothing else.  That seems limited.  Or
> are you planning for more?

I can test for specific contents of the entry, if you'd like. My
primary goal was to test that an entry shows up if that part of the
code hangs. I think a regression would otherwise go completely
unnoticed.

Thanks!
--Jacob



On Wed, Sep 11, 2024 at 09:00:33AM -0400, Robert Haas wrote:
> On Tue, Sep 10, 2024 at 4:58 PM Noah Misch <noah@leadboat.com> wrote:
> > ... a rule of "each wait event appears in one
> > pgstat_report_wait_start()" would be a rule I don't want.
> 
> As the original committer of the wait event stuff, I intended for the
> rule that you do not want to be the actual rule. However, I see that I
> didn't spell that out anywhere in the commit message, or the commit
> itself.
> 
> > I see this level of fine-grained naming
> > as making the event name a sort of stable proxy for FILE:LINE.  I'd value
> > exposing such a proxy, all else being equal, but I don't think wait event
> > names like AuthLdapBindLdapbinddn/AuthLdapBindUser are the right way.  Wait
> > event names should be more independent of today's code-level details.
> 
> I don't agree with that. One of the most difficult parts of supporting
> PostgreSQL, in my experience, is that it's often very difficult to
> find out what has gone wrong when a system starts behaving badly. It
> is often necessary to ask customers to install a debugger and do stuff
> with it, or give them an instrumented build, in order to determine the
> root cause of a problem that in some cases is not even particularly
> complicated. While needing to refer to specific source code details
> may not be a common experience for the typical end user, it is
> extremely common for me. This problem commonly arises with error
> messages

That is a problem.  Half the time, error verbosity doesn't disambiguate enough
for me, and I need backtrace_functions.  I now find it hard to believe how
long we coped without backtrace_functions.

I withdraw the objection to "each wait event appears in one
pgstat_report_wait_start()".



Hi,

On 2024-11-07 09:20:24 -0800, Jacob Champion wrote:
> From e755fdccf16cb4fcd3036e1209291750ffecd261 Mon Sep 17 00:00:00 2001
> From: Jacob Champion <jacob.champion@enterprisedb.com>
> Date: Fri, 3 May 2024 15:54:58 -0700
> Subject: [PATCH v5 1/2] pgstat: report in earlier with STATE_STARTING
> 
> Add pgstat_bestart_pre_auth(), which reports a 'starting' state while
> waiting for backend initialization and client authentication to
> complete. Since we hold a transaction open for a good amount of that,
> and some authentication methods call out to external systems, having a
> pg_stat_activity entry helps DBAs debug when things go badly wrong.

I don't understand why the pgstat_bestart()/pgstat_bestart_pre_auth() split
makes sense. The latter is going to redo most of the work that the former
did. What's the point of that?

Why not have a new function that initializes just the missing additional
information? Or for that matter, why not move most of what pgstat_bestart()
does into pgstat_beinit()?

As-is I'm -1 on this patch, it makes something complicated and fragile even
more so.


> From 858e95f996589461e2840047fa35675b6f96e46d Mon Sep 17 00:00:00 2001
> From: Jacob Champion <jacob.champion@enterprisedb.com>
> Date: Fri, 3 May 2024 15:58:23 -0700
> Subject: [PATCH v5 2/2] Report external auth calls as wait events
> 
> Introduce a new "Auth" wait class for various external authentication
> systems, to make it obvious what's going wrong if one of those systems
> hangs. Each new wait event is unique in order to more easily pinpoint
> problematic locations in the code.

This doesn't really seem like it's actually using wait events to describe
waits. The new wait events cover stuff like memory allocations etc, see
e.g. pg_SSPI_make_upn().

I have some sympathy for that, it'd be nice if we had some generic way to
describe what code is doing - but it doesn't really seem good to use wait
events for that. Right now a backend reporting a wait allows to conclude that
a backend isn't running postgres code and busy or blocked outside of postgres
- but that's not true anymore if you have wait event cover generic things like
memory allocations (or even various library functions).

This isn't just pedantry - all the relevant code really needs to be rewritten
to allow the blocking to happen in an interruptible way, otherwise
authentication timeout etc can't realiably work. Once that's done you can
actually define useful wait events too.

Greetings,

Andres Freund



Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible

From
Jacob Champion
Date:
On Thu, Nov 7, 2024 at 10:12 AM Andres Freund <andres@anarazel.de> wrote:
> I don't understand why the pgstat_bestart()/pgstat_bestart_pre_auth() split
> makes sense. The latter is going to redo most of the work that the former
> did. What's the point of that?
>
> Why not have a new function that initializes just the missing additional
> information? Or for that matter, why not move most of what pgstat_bestart()
> does into pgstat_beinit()?

I talk about that up above [1]. I agree that this is all complicated
and fragile, but at the moment, I think splitting things apart is not
going to reduce the complexity in any way. I'm all ears for a
different approach, though (and it sounds like Michael is taking a
stab at it too).

> This doesn't really seem like it's actually using wait events to describe
> waits. The new wait events cover stuff like memory allocations etc, see
> e.g. pg_SSPI_make_upn().

I've also asked about the "scope" of the waits in the OP [2]. I can
move them downwards in the stack, if you'd prefer.

All of these are intended to cover parts of the code that can actually
hang, but for things like SSPI I'm just working off of inspection and
Win32 documentation. So if it's not actually true that some of these
call points can hang, let me know and I can remove them. (For the
particular example you called out, I'm just trying to cover both calls
to TranslateName() in a maintainable place. The documentation says
"TranslateName fails if it cannot bind to Active Directory on a domain
controller." which seemed pretty wait-worthy to me.)

> This isn't just pedantry - all the relevant code really needs to be rewritten
> to allow the blocking to happen in an interruptible way, otherwise
> authentication timeout etc can't realiably work. Once that's done you can
> actually define useful wait events too.

I agree that would be amazing! I'm not about to tackle reliable
interrupts for all of the current blocking auth code for v18, though.
I'm just trying to make it observable when we do something that
blocks.

--Jacob

[1] https://www.postgresql.org/message-id/CAOYmi%2BkLzSWrDHZbJg8bWZ94oP_K98mkoEvetgupOBVoy5H_ag%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAOYmi%2B%3D60deN20WDyCoHCiecgivJxr%3D98s7s7-C8SkXwrCfHXg%40mail.gmail.com



Hi,

On 2024-11-07 10:44:25 -0800, Jacob Champion wrote:
> On Thu, Nov 7, 2024 at 10:12 AM Andres Freund <andres@anarazel.de> wrote:
> > I don't understand why the pgstat_bestart()/pgstat_bestart_pre_auth() split
> > makes sense. The latter is going to redo most of the work that the former
> > did. What's the point of that?
> >
> > Why not have a new function that initializes just the missing additional
> > information? Or for that matter, why not move most of what pgstat_bestart()
> > does into pgstat_beinit()?
> 
> I talk about that up above [1]. I agree that this is all complicated
> and fragile, but at the moment, I think splitting things apart is not
> going to reduce the complexity in any way. I'm all ears for a
> different approach, though (and it sounds like Michael is taking a
> stab at it too).

I think the patch should not be merged as-is. It's just too ugly and fragile.


> > This doesn't really seem like it's actually using wait events to describe
> > waits. The new wait events cover stuff like memory allocations etc, see
> > e.g. pg_SSPI_make_upn().
> 
> I've also asked about the "scope" of the waits in the OP [2]. I can
> move them downwards in the stack, if you'd prefer.

Well, right now they're just not actually wait events, so yes, they'd need to
be moved down.

I think it might make more sense to use pgstat_report_activity() or such here,
rather than using wait events to describe something that's not a wait.


> > This isn't just pedantry - all the relevant code really needs to be rewritten
> > to allow the blocking to happen in an interruptible way, otherwise
> > authentication timeout etc can't realiably work. Once that's done you can
> > actually define useful wait events too.
> 
> I agree that would be amazing! I'm not about to tackle reliable
> interrupts for all of the current blocking auth code for v18, though.
> I'm just trying to make it observable when we do something that
> blocks.

Well, with that justification we could end up adding wait events for large
swaths of code that might not actually ever wait.

Greetings,

Andres Freund



Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible

From
Jacob Champion
Date:
On Thu, Nov 7, 2024 at 11:41 AM Andres Freund <andres@anarazel.de> wrote:
> I think the patch should not be merged as-is. It's just too ugly and fragile.

Understood; I'm trying to find a way forward, and I'm pointing out
that the alternatives I've tried seem to me to be _more_ fragile.

Are there any items in this list that you disagree with/are less
concerned about?

- the pre-auth step must always initialize the entire pgstat struct
- two-step initialization requires two PGSTAT_BEGIN_WRITE_ACTIVITY()
calls for _every_ backend
- two-step initialization requires us to couple against the order that
authentication information is being filled in (pre-auth, post-auth, or
both)

> I think it might make more sense to use pgstat_report_activity() or such here,
> rather than using wait events to describe something that's not a wait.

I'm not sure why you're saying these aren't waits. If pam_authenticate
is capable of hanging for hours and bringing down a production system,
is that not a "wait"?

> > I agree that would be amazing! I'm not about to tackle reliable
> > interrupts for all of the current blocking auth code for v18, though.
> > I'm just trying to make it observable when we do something that
> > blocks.
>
> Well, with that justification we could end up adding wait events for large
> swaths of code that might not actually ever wait.

If it were hypothetically useful to do so, would that be a problem?
I'm trying not to propose things that aren't actually useful.

--Jacob



Hi,

On 2024-11-07 12:11:46 -0800, Jacob Champion wrote:
> On Thu, Nov 7, 2024 at 11:41 AM Andres Freund <andres@anarazel.de> wrote:
> > I think the patch should not be merged as-is. It's just too ugly and fragile.
> 
> Understood; I'm trying to find a way forward, and I'm pointing out
> that the alternatives I've tried seem to me to be _more_ fragile.
> 
> Are there any items in this list that you disagree with/are less
> concerned about?
> 
> - the pre-auth step must always initialize the entire pgstat struct

Correct. And that has to happen exactly once, not twice.


> - two-step initialization requires two PGSTAT_BEGIN_WRITE_ACTIVITY()
> calls for _every_ backend

That's fine - PGSTAT_BEGIN_WRITE_ACTIVITY() is *extremely* cheap on the write
side. That's the whole point of of the sequence-lock like mechanism.


> - two-step initialization requires us to couple against the order that
> authentication information is being filled in (pre-auth, post-auth, or
> both)

Not sure what you mean with this?


> > I think it might make more sense to use pgstat_report_activity() or such here,
> > rather than using wait events to describe something that's not a wait.
> 
> I'm not sure why you're saying these aren't waits. If pam_authenticate
> is capable of hanging for hours and bringing down a production system,
> is that not a "wait"?

It may or may not be. If you increase the iteration count for whatever secret
"hashing" method to be very high, it's not a wait, it's just CPU
use. Similarly, if you have a cpu expensive WHERE clause, that's not a
wait. But if you wait for network IO due to pam using ldap underneath or you
need to read toast values from disk, those are waits.


> > > I agree that would be amazing! I'm not about to tackle reliable
> > > interrupts for all of the current blocking auth code for v18, though.
> > > I'm just trying to make it observable when we do something that
> > > blocks.
> >
> > Well, with that justification we could end up adding wait events for large
> > swaths of code that might not actually ever wait.
> 
> If it were hypothetically useful to do so, would that be a problem?
> I'm trying not to propose things that aren't actually useful.

My point is that you're redefining wait events to be "in some region of code"
and that once you start doing that, there's a lot of other places you could
suddenly use wait events.

But wait events aren't actually suitable for that - they're a *single-depth*
mechanism, which means that if you start waiting, the prior wait event is
lost, and
a) the nested region isn't attributed to the parent while active
b) once the nested wait event is over, the parent isn't reset

Greetings,

Andres Freund



Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible

From
Jacob Champion
Date:
On Thu, Nov 7, 2024 at 1:37 PM Andres Freund <andres@anarazel.de> wrote:
> > - the pre-auth step must always initialize the entire pgstat struct
>
> Correct. And that has to happen exactly once, not twice.

What goes wrong if it happens twice?

> > - two-step initialization requires two PGSTAT_BEGIN_WRITE_ACTIVITY()
> > calls for _every_ backend
>
> That's fine - PGSTAT_BEGIN_WRITE_ACTIVITY() is *extremely* cheap on the write
> side. That's the whole point of of the sequence-lock like mechanism.

Okay, cool. I'll retract that concern.

> > - two-step initialization requires us to couple against the order that
> > authentication information is being filled in (pre-auth, post-auth, or
> > both)
>
> Not sure what you mean with this?

In other words: if we split it, people who make changes to the order
that authentication information is determined during startup must know
to keep an eye on this code as well. Whereas with the current
patchset, the layers are decoupled and the order doesn't matter.
Quoting from above:

  Finally, if we're okay with all of that, future maintainers need to be
  careful about which fields get copied in the first (preauth) step, the
  second step, or both. GSS, for example, can be set up during transport
  negotiation (first step) or authentication (second step), so we have
  to duplicate the logic there. SSL is currently first-step-only, I
  think -- but are we sure we want to hardcode the assumption that cert
  auth can't change any of those parameters after the transport has been
  established? (I've been brainstorming ways we might use TLS 1.3's
  post-handshake CertificateRequest, for example.)

> If you increase the iteration count for whatever secret
> "hashing" method to be very high, it's not a wait, it's just CPU
> use.

I don't yet understand why this is a useful distinction to make. I
understand that they are different, but what are the bad consequences
if pg_stat_activity records a CPU busy wait in the same way it records
an I/O wait -- as long as they're not nested?

> My point is that you're redefining wait events to be "in some region of code"
> and that once you start doing that, there's a lot of other places you could
> suddenly use wait events.
>
> But wait events aren't actually suitable for that - they're a *single-depth*
> mechanism, which means that if you start waiting, the prior wait event is
> lost, and
> a) the nested region isn't attributed to the parent while active
> b) once the nested wait event is over, the parent isn't reset

I understand that they shouldn't be nested. But as long as they're
not, isn't that fine? And if the concern is that they'll accidentally
get nested, whether in this patch or in the future, can't we just
programmatically assert that we haven't?

Thanks,
--Jacob



Hi,

On 2024-11-07 14:29:18 -0800, Jacob Champion wrote:
> On Thu, Nov 7, 2024 at 1:37 PM Andres Freund <andres@anarazel.de> wrote:
> > > - the pre-auth step must always initialize the entire pgstat struct
> >
> > Correct. And that has to happen exactly once, not twice.
> 
> What goes wrong if it happens twice?

Primarily it's architecturally wrong. For no reason that I can see.

It does actually make things harder - what if somebody added a
pgstat_report_activity() somewhere between the call? It would suddenly get
lost after the second "initialization".  Actually, the proposed patch already
has weird, externally visible, consequences - the application name is set,
then suddenly becomes unset, then is set again.


> > > - two-step initialization requires two PGSTAT_BEGIN_WRITE_ACTIVITY()
> > > calls for _every_ backend
> >
> > That's fine - PGSTAT_BEGIN_WRITE_ACTIVITY() is *extremely* cheap on the write
> > side. That's the whole point of of the sequence-lock like mechanism.
> 
> Okay, cool. I'll retract that concern.

Additionally PGSTAT_BEGIN_WRITE_ACTIVITY() would already happen twice if you
initialize twice...


> > > - two-step initialization requires us to couple against the order that
> > > authentication information is being filled in (pre-auth, post-auth, or
> > > both)
> >
> > Not sure what you mean with this?
> 
> In other words: if we split it, people who make changes to the order
> that authentication information is determined during startup must know
> to keep an eye on this code as well. Whereas with the current
> patchset, the layers are decoupled and the order doesn't matter.
> Quoting from above:
> 
>   Finally, if we're okay with all of that, future maintainers need to be
>   careful about which fields get copied in the first (preauth) step, the
>   second step, or both. GSS, for example, can be set up during transport
>   negotiation (first step) or authentication (second step), so we have
>   to duplicate the logic there. SSL is currently first-step-only, I
>   think -- but are we sure we want to hardcode the assumption that cert
>   auth can't change any of those parameters after the transport has been
>   established? (I've been brainstorming ways we might use TLS 1.3's
>   post-handshake CertificateRequest, for example.)

That doesn't seem like a reason to just initialize twice to me. If you have
one initialization step that properly initializes everything to a minimal
default state, you then can have granular functions that set up the user,
database, SSL, GSS information separately.



> > If you increase the iteration count for whatever secret
> > "hashing" method to be very high, it's not a wait, it's just CPU
> > use.
> 
> I don't yet understand why this is a useful distinction to make. I
> understand that they are different, but what are the bad consequences
> if pg_stat_activity records a CPU busy wait in the same way it records
> an I/O wait -- as long as they're not nested?

Well, first, because you suddenly can't use wait events anymore to find waits.

But more importantly, because of not having nesting, adding one "coarse" "wait
event" means that anyone adding a wait event at a finer grade suddenly needs
to be aware of all the paths that could lead to the execution of the new
code and change all of them to not use the wait event anymore. It imposes a
tax on measuring actual "out of postgres" wait events.


> > My point is that you're redefining wait events to be "in some region of code"
> > and that once you start doing that, there's a lot of other places you could
> > suddenly use wait events.
> >
> > But wait events aren't actually suitable for that - they're a *single-depth*
> > mechanism, which means that if you start waiting, the prior wait event is
> > lost, and
> > a) the nested region isn't attributed to the parent while active
> > b) once the nested wait event is over, the parent isn't reset
> 
> I understand that they shouldn't be nested. But as long as they're
> not, isn't that fine? And if the concern is that they'll accidentally
> get nested, whether in this patch or in the future, can't we just
> programmatically assert that we haven't?

One very useful wait event would be for memory allocations that hit the
kernel. Those can take a fairly long time, because they might need to write
dirty buffers to disk before there is enough free memory. Now imagine that we
redefine the system memory allocator (or just postgres') so that all memory
allocations from the kernel use a wait event.  Now suddenly all that code that
uses "coarse" wait events suddenly has a *rare* path - because most of the time
we can carve memory out of a larger OS level memory allocation - where wait
events would be nested.

Greetings,

Andres Freund



Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible

From
Jacob Champion
Date:
On Thu, Nov 7, 2024 at 2:56 PM Andres Freund <andres@anarazel.de> wrote:
> It does actually make things harder - what if somebody added a
> pgstat_report_activity() somewhere between the call? It would suddenly get
> lost after the second "initialization".  Actually, the proposed patch already
> has weird, externally visible, consequences - the application name is set,
> then suddenly becomes unset, then is set again.

Oh... I think that alone is enough to change my mind; I neglected the
effects of that little pgstat_report_appname() stinger...

> Additionally PGSTAT_BEGIN_WRITE_ACTIVITY() would already happen twice if you
> initialize twice...

Sure. I was just trying not to introduce that to _all_ backend code
paths, but it sounds like that's not a concern. (Plus, it turns out to
be four calls, due again to the application_name reporting...)

> That doesn't seem like a reason to just initialize twice to me. If you have
> one initialization step that properly initializes everything to a minimal
> default state, you then can have granular functions that set up the user,
> database, SSL, GSS information separately.

I will start work on that then (unless Michael has already beaten me to it?).

> But more importantly, because of not having nesting, adding one "coarse" "wait
> event" means that anyone adding a wait event at a finer grade suddenly needs
> to be aware of all the paths that could lead to the execution of the new
> code and change all of them to not use the wait event anymore. It imposes a
> tax on measuring actual "out of postgres" wait events.
>
> One very useful wait event would be for memory allocations that hit the
> kernel. Those can take a fairly long time, because they might need to write
> dirty buffers to disk before there is enough free memory. Now imagine that we
> redefine the system memory allocator (or just postgres') so that all memory
> allocations from the kernel use a wait event.  Now suddenly all that code that
> uses "coarse" wait events suddenly has a *rare* path - because most of the time
> we can carve memory out of a larger OS level memory allocation - where wait
> events would be nested.

Okay, that makes a lot of sense. I will plumb these down as far as I can.

Thanks very much!

--Jacob