Thread: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
Hi all, Recently I dealt with a server where PAM had hung a connection indefinitely, suppressing our authentication timeout and preventing a clean shutdown. Worse, the xmin that was pinned by the opening transaction cascaded to replicas and started messing things up downstream. The DBAs didn't know what was going on, because pg_stat_activity doesn't report the authenticating connection or its open transaction. It just looked like a Postgres bug. And while talking about it with Euler, he mentioned he'd seen similar "invisible" hangs with misbehaving LDAP deployments. I think we can do better to show DBAs what's happening. 0001, attached, changes InitPostgres() to report a nearly-complete pgstat entry before entering client authentication, then fills it in the rest of the way once we know who the user is. Here's a sample entry for a client that's hung during a SCRAM exchange: =# select * from pg_stat_activity where state = 'authenticating'; -[ RECORD 1 ]----+------------------------------ datid | datname | pid | 745662 leader_pid | usesysid | usename | application_name | client_addr | 127.0.0.1 client_hostname | client_port | 38304 backend_start | 2024-05-06 11:25:23.905923-07 xact_start | query_start | state_change | wait_event_type | Client wait_event | ClientRead state | authenticating backend_xid | backend_xmin | 784 query_id | query | backend_type | client backend 0002 goes even further, and adds wait events for various forms of external authentication, but it's not fully baked. The intent is for a DBA to be able to see when a bunch of connections are piling up waiting for PAM/Kerberos/whatever. (I'm also motivated by my OAuth patchset, where there's a server-side plugin that we have no control over, and we'd want to be able to correctly point fingers at it if things go wrong.) = Open Issues, Idle Thoughts = Maybe it's wishful thinking, but it'd be cool if a misbehaving authentication exchange did not impact replicas in any way. Is there a way to make that opening transaction lighterweight? 0001 may be a little too much code. There are only two parts of pgstat_bestart() that need to be modified: omit the user ID, and fill in the state as 'authenticating' rather than unknown. I could just add the `pre_auth` boolean to the signature of pgstat_bestart() directly, if we don't mind adjusting all the call sites. We could also avoid changing the signature entirely, and just assume that we're authenticating if SessionUserId isn't set. That felt like a little too much global magic to me, though. Would anyone like me to be more aggressive, and create a pgstat entry as soon as we have the opening transaction? Or... as soon as a connection is made? 0002 is abusing the "IPC" wait event class. If the general idea seems okay, maybe we could add an "External" class that encompasses the general idea of "it's not our fault, it's someone else's"? I had trouble deciding how granular to make the areas that are covered by the new wait events. Ideally they would kick in only when we call out to an external system, but for some authentication types, that's a lot of calls to wrap. On the other extreme, we don't want to go too high in the call stack and accidentally nest wait events (such as those generated during pq_getmessage()). What I have now is not very principled. I haven't decided how to test these patches. Seems like a potential use case for injection points, but I think I'd need to preload an injection library rather than using the existing extension. Does that seem like an okay way to go? Thanks, --Jacob
Attachment
On Mon, May 06, 2024 at 02:23:38PM -0700, Jacob Champion wrote: > =# select * from pg_stat_activity where state = 'authenticating'; > -[ RECORD 1 ]----+------------------------------ > datid | > datname | > pid | 745662 > leader_pid | > usesysid | > usename | > application_name | > client_addr | 127.0.0.1 > client_hostname | > client_port | 38304 > backend_start | 2024-05-06 11:25:23.905923-07 > xact_start | > query_start | > state_change | > wait_event_type | Client > wait_event | ClientRead > state | authenticating > backend_xid | > backend_xmin | 784 > query_id | > query | > backend_type | client backend That looks like a reasonable user experience. Is any field newly-nullable? > = Open Issues, Idle Thoughts = > > Maybe it's wishful thinking, but it'd be cool if a misbehaving > authentication exchange did not impact replicas in any way. Is there a > way to make that opening transaction lighterweight? You could release the xmin before calling PAM or LDAP. If you've copied all relevant catalog content to local memory, that's fine to do. That said, it may be more fruitful to arrange for authentication timeout to cut through PAM etc. Hanging connection slots hurt even if they lack an xmin. I assume it takes an immediate shutdown to fix them? > Would anyone like me to be more aggressive, and create a pgstat entry > as soon as we have the opening transaction? Or... as soon as a > connection is made? All else being equal, I'd like backends to have one before taking any lmgr lock or snapshot. > I haven't decided how to test these patches. Seems like a potential > use case for injection points, but I think I'd need to preload an > injection library rather than using the existing extension. Does that > seem like an okay way to go? Yes. Thanks, nm
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Jacob Champion
Date:
On Sun, Jun 30, 2024 at 10:48 AM Noah Misch <noah@leadboat.com> wrote: > That looks like a reasonable user experience. Is any field newly-nullable? Technically I think the answer is no, since backends such as walwriter already have null database and user fields. It's new for a client backend to have nulls there, though. > That said, it > may be more fruitful to arrange for authentication timeout to cut through PAM > etc. That seems mostly out of our hands -- the misbehaving modules are free to ignore our signals (and do). Is there another way to force the issue? > Hanging connection slots hurt even if they lack an xmin. Oh, would releasing the xmin not really move the needle, then? > I assume it > takes an immediate shutdown to fix them? That's my understanding, yeah. > > Would anyone like me to be more aggressive, and create a pgstat entry > > as soon as we have the opening transaction? Or... as soon as a > > connection is made? > > All else being equal, I'd like backends to have one before taking any lmgr > lock or snapshot. I can look at this for the next patchset version. > > I haven't decided how to test these patches. Seems like a potential > > use case for injection points, but I think I'd need to preload an > > injection library rather than using the existing extension. Does that > > seem like an okay way to go? > > Yes. I misunderstood how injection points worked. No preload module needed, so v2 adds a waitpoint and a test along with a couple of needed tweaks to BackgroundPsql. I think 0001 should probably be applied independently. Thanks, --Jacob
Attachment
On Mon, Jul 08, 2024 at 02:09:21PM -0700, Jacob Champion wrote: > On Sun, Jun 30, 2024 at 10:48 AM Noah Misch <noah@leadboat.com> wrote: > > That said, it > > may be more fruitful to arrange for authentication timeout to cut through PAM > > etc. > > That seems mostly out of our hands -- the misbehaving modules are free > to ignore our signals (and do). Is there another way to force the > issue? Two ways at least (neither of them cheap): - Invoke PAM in a subprocess, and SIGKILL that process if needed. - Modify the module to be interruptible. > > Hanging connection slots hurt even if they lack an xmin. > > Oh, would releasing the xmin not really move the needle, then? It still moves the needle.
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Andrew Dunstan
Date:
On 2024-08-29 Th 4:44 PM, Jacob Champion wrote:
As for the other patches, I'll ping Andrew about 0001,
Patch 0001 looks sane to me.
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Jacob Champion
Date:
On Sun, Sep 1, 2024 at 5:10 PM Michael Paquier <michael@paquier.xyz> wrote: > On Fri, Aug 30, 2024 at 04:10:32PM -0400, Andrew Dunstan wrote: > > Patch 0001 looks sane to me. > So does 0002 to me. Thanks both! > I'm not much a fan of the addition of > pgstat_bestart_pre_auth(), which is just a shortcut to set a different > state in the backend entry to tell that it is authenticating. Is > authenticating the term for this state of the process startups, > actually? Could it be more transparent to use a "startup" or > "starting"" state instead Yeah, I think I should rename that. Especially if we adopt new wait states to make it obvious where we're stuck. "startup", "starting", "initializing", "connecting"...? > that gets also used by pgstat_bestart() in > the case of the patch where !pre_auth? To clarify, do you want me to just add the new boolean directly to pgstat_bestart()'s parameter list? > The addition of the new wait event states in 0004 is a good idea, > indeed, Thanks! Any thoughts on the two open questions for it?: 1) Should we add a new wait event class rather than reusing IPC? 2) Is the level at which I've inserted calls to pgstat_report_wait_start()/_end() sane and maintainable? > and these can be seen in pg_stat_activity once we get out of > PGSTAT_END_WRITE_ACTIVITY() (err.. Right?). It doesn't look like pgstat_report_wait_start() uses that machinery. --Jacob
On Tue, Sep 10, 2024 at 02:29:57PM +0900, Michael Paquier wrote: > You are adding twelve event points with only 5 > new wait names. Couldn't it be better to have a one-one mapping > instead, adding twelve entries in wait_event_names.txt? No, I think the patch's level of detail is better. One shouldn't expect the two ldap_simple_bind_s() calls to have different-enough performance characteristics to justify exposing that level of detail to the DBA. ldap_search_s() and InitializeLDAPConnection() differ more, but the DBA mostly just needs to know the scale of their LDAP responsiveness problem. (Someday, it might be good to expose the file:line and/or backtrace associated with a wait, like we do for ereport(). As a way to satisfy rare needs for more detail, I'd prefer that over giving every pgstat_report_wait_start() a different name.)
On Tue, Sep 10, 2024 at 1:27 PM Noah Misch <noah@leadboat.com> wrote: > On Tue, Sep 10, 2024 at 02:29:57PM +0900, Michael Paquier wrote: > > You are adding twelve event points with only 5 > > new wait names. Couldn't it be better to have a one-one mapping > > instead, adding twelve entries in wait_event_names.txt? > > No, I think the patch's level of detail is better. One shouldn't expect the > two ldap_simple_bind_s() calls to have different-enough performance > characteristics to justify exposing that level of detail to the DBA. > ldap_search_s() and InitializeLDAPConnection() differ more, but the DBA mostly > just needs to know the scale of their LDAP responsiveness problem. > > (Someday, it might be good to expose the file:line and/or backtrace associated > with a wait, like we do for ereport(). As a way to satisfy rare needs for > more detail, I'd prefer that over giving every pgstat_report_wait_start() a > different name.) I think unique names are a good idea. If a user doesn't care about the difference between sdgjsA and sdjgsB, they can easily ignore the trailing suffix, and IME, people typically do that without really stopping to think about it. If on the other hand the two are lumped together as sdjgs and a user needs to distinguish them, they can't. So I see unique names as having much more upside than downside. -- Robert Haas EDB: http://www.enterprisedb.com
On Tue, Sep 10, 2024 at 02:51:23PM -0400, Robert Haas wrote: > On Tue, Sep 10, 2024 at 1:27 PM Noah Misch <noah@leadboat.com> wrote: > > On Tue, Sep 10, 2024 at 02:29:57PM +0900, Michael Paquier wrote: > > > You are adding twelve event points with only 5 > > > new wait names. Couldn't it be better to have a one-one mapping > > > instead, adding twelve entries in wait_event_names.txt? > > > > No, I think the patch's level of detail is better. One shouldn't expect the > > two ldap_simple_bind_s() calls to have different-enough performance > > characteristics to justify exposing that level of detail to the DBA. > > ldap_search_s() and InitializeLDAPConnection() differ more, but the DBA mostly > > just needs to know the scale of their LDAP responsiveness problem. > > > > (Someday, it might be good to expose the file:line and/or backtrace associated > > with a wait, like we do for ereport(). As a way to satisfy rare needs for > > more detail, I'd prefer that over giving every pgstat_report_wait_start() a > > different name.) > > I think unique names are a good idea. If a user doesn't care about the > difference between sdgjsA and sdjgsB, they can easily ignore the > trailing suffix, and IME, people typically do that without really > stopping to think about it. If on the other hand the two are lumped > together as sdjgs and a user needs to distinguish them, they can't. So > I see unique names as having much more upside than downside. I agree a person can ignore the distinction, but that requires the person to be consuming the raw event list. It's reasonable to tell your monitoring tool to give you the top N wait events. Individual AuthnLdap* events may all miss the cut even though their aggregate would have made the cut. Before you know to teach that monitoring tool to group AuthnLdap* together, it won't show you any of those names. I felt commit c789f0f also chose sub-optimally in this respect, particularly with the DblinkGetConnect/DblinkConnect pair. I didn't feel strongly enough to complain at the time, but a rule of "each wait event appears in one pgstat_report_wait_start()" would be a rule I don't want. One needs familiarity with the dblink implementation internals to grasp the DblinkGetConnect/DblinkConnect distinction, and a plausible refactor of dblink would make those names cease to fit. I see this level of fine-grained naming as making the event name a sort of stable proxy for FILE:LINE. I'd value exposing such a proxy, all else being equal, but I don't think wait event names like AuthLdapBindLdapbinddn/AuthLdapBindUser are the right way. Wait event names should be more independent of today's code-level details.
On Tue, Sep 10, 2024 at 4:58 PM Noah Misch <noah@leadboat.com> wrote: > ... a rule of "each wait event appears in one > pgstat_report_wait_start()" would be a rule I don't want. As the original committer of the wait event stuff, I intended for the rule that you do not want to be the actual rule. However, I see that I didn't spell that out anywhere in the commit message, or the commit itself. > I see this level of fine-grained naming > as making the event name a sort of stable proxy for FILE:LINE. I'd value > exposing such a proxy, all else being equal, but I don't think wait event > names like AuthLdapBindLdapbinddn/AuthLdapBindUser are the right way. Wait > event names should be more independent of today's code-level details. I don't agree with that. One of the most difficult parts of supporting PostgreSQL, in my experience, is that it's often very difficult to find out what has gone wrong when a system starts behaving badly. It is often necessary to ask customers to install a debugger and do stuff with it, or give them an instrumented build, in order to determine the root cause of a problem that in some cases is not even particularly complicated. While needing to refer to specific source code details may not be a common experience for the typical end user, it is extremely common for me. This problem commonly arises with error messages, because we have lots of error messages that are exactly the same, although thankfully it has become less common due to "could not find tuple for THINGY %u" no longer being a message that no longer typically reaches users. But even when someone has a complaint about an error message and there are multiple instances of that error message, I know that: (1) I can ask them to set the error verbosity to verbose. I don't have that option for wait events. (2) The primary function of the error message is to be understandable to the user, which means that it needs to be written in plain English. The primary function of a wait event is to make it possible to understand the behavior of the system and troubleshoot problems, and it becomes much less effective as soon as it starts saying that thing A and thing B are so similar that nobody will ever care about the distinction. It is very hard to be certain of that. When somebody reports that they've got a whole bunch of wait events on some wait event that nobody has ever complained about before, I want to go look at the code in that specific place and try to figure out what's happening. If I have to start imagining possible scenarios based on 2 or more call sites, or if I have to start by getting them to install a modified build with those properly split apart and trying to reproduce the problem, it's a lot harder. In my experience, the number of distinct wait events that a particular installation experiences is rarely very large. It is probably measured in dozens. A user who wishes to disregard the distinction between similarly-named wait events won't find it prohibitively difficult to look over the list of all the wait events they ever see and decide which ones they'd like to merge for reporting purposes. But a user who really needs things separated out and finds that they aren't is simply out of luck. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Jacob Champion
Date:
On Mon, Sep 9, 2024 at 10:30 PM Michael Paquier <michael@paquier.xyz> wrote: > No. My question was about splitting pgstat_bestart() and > pgstat_bestart_pre_auth() in a cleaner way, because authenticated > connections finish by calling both, meaning that we do twice the same > setup for backend entries depending on the authentication path taken. > That seems like a waste. I can try to separate them out. I'm a little wary of messing with the CRITICAL_SECTION guarantees, though. I thought the idea was that you filled in the entire struct to prevent tearing. (If I've misunderstood that, please let me know :D) > Perhaps just use a new > "Authentication" class, as in "The server is waiting for an > authentication operation to complete"? Sounds good. > Couldn't it be better to have a one-one mapping > instead, adding twelve entries in wait_event_names.txt? (I have no strong opinions on this myself, but while the debate is ongoing, I'll work on a version of the patch with more detailed wait events. It's easy to collapse them again if that gets the most votes.) > I am not really on board with the test based on injection points > proposed, though. It checks that the "authenticating" flag is set in > pg_stat_activity, but it does nothing else. That seems limited. Or > are you planning for more? I can test for specific contents of the entry, if you'd like. My primary goal was to test that an entry shows up if that part of the code hangs. I think a regression would otherwise go completely unnoticed. Thanks! --Jacob
On Wed, Sep 11, 2024 at 09:00:33AM -0400, Robert Haas wrote: > On Tue, Sep 10, 2024 at 4:58 PM Noah Misch <noah@leadboat.com> wrote: > > ... a rule of "each wait event appears in one > > pgstat_report_wait_start()" would be a rule I don't want. > > As the original committer of the wait event stuff, I intended for the > rule that you do not want to be the actual rule. However, I see that I > didn't spell that out anywhere in the commit message, or the commit > itself. > > > I see this level of fine-grained naming > > as making the event name a sort of stable proxy for FILE:LINE. I'd value > > exposing such a proxy, all else being equal, but I don't think wait event > > names like AuthLdapBindLdapbinddn/AuthLdapBindUser are the right way. Wait > > event names should be more independent of today's code-level details. > > I don't agree with that. One of the most difficult parts of supporting > PostgreSQL, in my experience, is that it's often very difficult to > find out what has gone wrong when a system starts behaving badly. It > is often necessary to ask customers to install a debugger and do stuff > with it, or give them an instrumented build, in order to determine the > root cause of a problem that in some cases is not even particularly > complicated. While needing to refer to specific source code details > may not be a common experience for the typical end user, it is > extremely common for me. This problem commonly arises with error > messages That is a problem. Half the time, error verbosity doesn't disambiguate enough for me, and I need backtrace_functions. I now find it hard to believe how long we coped without backtrace_functions. I withdraw the objection to "each wait event appears in one pgstat_report_wait_start()".
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Andres Freund
Date:
Hi, On 2024-11-07 09:20:24 -0800, Jacob Champion wrote: > From e755fdccf16cb4fcd3036e1209291750ffecd261 Mon Sep 17 00:00:00 2001 > From: Jacob Champion <jacob.champion@enterprisedb.com> > Date: Fri, 3 May 2024 15:54:58 -0700 > Subject: [PATCH v5 1/2] pgstat: report in earlier with STATE_STARTING > > Add pgstat_bestart_pre_auth(), which reports a 'starting' state while > waiting for backend initialization and client authentication to > complete. Since we hold a transaction open for a good amount of that, > and some authentication methods call out to external systems, having a > pg_stat_activity entry helps DBAs debug when things go badly wrong. I don't understand why the pgstat_bestart()/pgstat_bestart_pre_auth() split makes sense. The latter is going to redo most of the work that the former did. What's the point of that? Why not have a new function that initializes just the missing additional information? Or for that matter, why not move most of what pgstat_bestart() does into pgstat_beinit()? As-is I'm -1 on this patch, it makes something complicated and fragile even more so. > From 858e95f996589461e2840047fa35675b6f96e46d Mon Sep 17 00:00:00 2001 > From: Jacob Champion <jacob.champion@enterprisedb.com> > Date: Fri, 3 May 2024 15:58:23 -0700 > Subject: [PATCH v5 2/2] Report external auth calls as wait events > > Introduce a new "Auth" wait class for various external authentication > systems, to make it obvious what's going wrong if one of those systems > hangs. Each new wait event is unique in order to more easily pinpoint > problematic locations in the code. This doesn't really seem like it's actually using wait events to describe waits. The new wait events cover stuff like memory allocations etc, see e.g. pg_SSPI_make_upn(). I have some sympathy for that, it'd be nice if we had some generic way to describe what code is doing - but it doesn't really seem good to use wait events for that. Right now a backend reporting a wait allows to conclude that a backend isn't running postgres code and busy or blocked outside of postgres - but that's not true anymore if you have wait event cover generic things like memory allocations (or even various library functions). This isn't just pedantry - all the relevant code really needs to be rewritten to allow the blocking to happen in an interruptible way, otherwise authentication timeout etc can't realiably work. Once that's done you can actually define useful wait events too. Greetings, Andres Freund
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Jacob Champion
Date:
On Thu, Nov 7, 2024 at 10:12 AM Andres Freund <andres@anarazel.de> wrote: > I don't understand why the pgstat_bestart()/pgstat_bestart_pre_auth() split > makes sense. The latter is going to redo most of the work that the former > did. What's the point of that? > > Why not have a new function that initializes just the missing additional > information? Or for that matter, why not move most of what pgstat_bestart() > does into pgstat_beinit()? I talk about that up above [1]. I agree that this is all complicated and fragile, but at the moment, I think splitting things apart is not going to reduce the complexity in any way. I'm all ears for a different approach, though (and it sounds like Michael is taking a stab at it too). > This doesn't really seem like it's actually using wait events to describe > waits. The new wait events cover stuff like memory allocations etc, see > e.g. pg_SSPI_make_upn(). I've also asked about the "scope" of the waits in the OP [2]. I can move them downwards in the stack, if you'd prefer. All of these are intended to cover parts of the code that can actually hang, but for things like SSPI I'm just working off of inspection and Win32 documentation. So if it's not actually true that some of these call points can hang, let me know and I can remove them. (For the particular example you called out, I'm just trying to cover both calls to TranslateName() in a maintainable place. The documentation says "TranslateName fails if it cannot bind to Active Directory on a domain controller." which seemed pretty wait-worthy to me.) > This isn't just pedantry - all the relevant code really needs to be rewritten > to allow the blocking to happen in an interruptible way, otherwise > authentication timeout etc can't realiably work. Once that's done you can > actually define useful wait events too. I agree that would be amazing! I'm not about to tackle reliable interrupts for all of the current blocking auth code for v18, though. I'm just trying to make it observable when we do something that blocks. --Jacob [1] https://www.postgresql.org/message-id/CAOYmi%2BkLzSWrDHZbJg8bWZ94oP_K98mkoEvetgupOBVoy5H_ag%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAOYmi%2B%3D60deN20WDyCoHCiecgivJxr%3D98s7s7-C8SkXwrCfHXg%40mail.gmail.com
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Andres Freund
Date:
Hi, On 2024-11-07 10:44:25 -0800, Jacob Champion wrote: > On Thu, Nov 7, 2024 at 10:12 AM Andres Freund <andres@anarazel.de> wrote: > > I don't understand why the pgstat_bestart()/pgstat_bestart_pre_auth() split > > makes sense. The latter is going to redo most of the work that the former > > did. What's the point of that? > > > > Why not have a new function that initializes just the missing additional > > information? Or for that matter, why not move most of what pgstat_bestart() > > does into pgstat_beinit()? > > I talk about that up above [1]. I agree that this is all complicated > and fragile, but at the moment, I think splitting things apart is not > going to reduce the complexity in any way. I'm all ears for a > different approach, though (and it sounds like Michael is taking a > stab at it too). I think the patch should not be merged as-is. It's just too ugly and fragile. > > This doesn't really seem like it's actually using wait events to describe > > waits. The new wait events cover stuff like memory allocations etc, see > > e.g. pg_SSPI_make_upn(). > > I've also asked about the "scope" of the waits in the OP [2]. I can > move them downwards in the stack, if you'd prefer. Well, right now they're just not actually wait events, so yes, they'd need to be moved down. I think it might make more sense to use pgstat_report_activity() or such here, rather than using wait events to describe something that's not a wait. > > This isn't just pedantry - all the relevant code really needs to be rewritten > > to allow the blocking to happen in an interruptible way, otherwise > > authentication timeout etc can't realiably work. Once that's done you can > > actually define useful wait events too. > > I agree that would be amazing! I'm not about to tackle reliable > interrupts for all of the current blocking auth code for v18, though. > I'm just trying to make it observable when we do something that > blocks. Well, with that justification we could end up adding wait events for large swaths of code that might not actually ever wait. Greetings, Andres Freund
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Jacob Champion
Date:
On Thu, Nov 7, 2024 at 11:41 AM Andres Freund <andres@anarazel.de> wrote: > I think the patch should not be merged as-is. It's just too ugly and fragile. Understood; I'm trying to find a way forward, and I'm pointing out that the alternatives I've tried seem to me to be _more_ fragile. Are there any items in this list that you disagree with/are less concerned about? - the pre-auth step must always initialize the entire pgstat struct - two-step initialization requires two PGSTAT_BEGIN_WRITE_ACTIVITY() calls for _every_ backend - two-step initialization requires us to couple against the order that authentication information is being filled in (pre-auth, post-auth, or both) > I think it might make more sense to use pgstat_report_activity() or such here, > rather than using wait events to describe something that's not a wait. I'm not sure why you're saying these aren't waits. If pam_authenticate is capable of hanging for hours and bringing down a production system, is that not a "wait"? > > I agree that would be amazing! I'm not about to tackle reliable > > interrupts for all of the current blocking auth code for v18, though. > > I'm just trying to make it observable when we do something that > > blocks. > > Well, with that justification we could end up adding wait events for large > swaths of code that might not actually ever wait. If it were hypothetically useful to do so, would that be a problem? I'm trying not to propose things that aren't actually useful. --Jacob
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Andres Freund
Date:
Hi, On 2024-11-07 12:11:46 -0800, Jacob Champion wrote: > On Thu, Nov 7, 2024 at 11:41 AM Andres Freund <andres@anarazel.de> wrote: > > I think the patch should not be merged as-is. It's just too ugly and fragile. > > Understood; I'm trying to find a way forward, and I'm pointing out > that the alternatives I've tried seem to me to be _more_ fragile. > > Are there any items in this list that you disagree with/are less > concerned about? > > - the pre-auth step must always initialize the entire pgstat struct Correct. And that has to happen exactly once, not twice. > - two-step initialization requires two PGSTAT_BEGIN_WRITE_ACTIVITY() > calls for _every_ backend That's fine - PGSTAT_BEGIN_WRITE_ACTIVITY() is *extremely* cheap on the write side. That's the whole point of of the sequence-lock like mechanism. > - two-step initialization requires us to couple against the order that > authentication information is being filled in (pre-auth, post-auth, or > both) Not sure what you mean with this? > > I think it might make more sense to use pgstat_report_activity() or such here, > > rather than using wait events to describe something that's not a wait. > > I'm not sure why you're saying these aren't waits. If pam_authenticate > is capable of hanging for hours and bringing down a production system, > is that not a "wait"? It may or may not be. If you increase the iteration count for whatever secret "hashing" method to be very high, it's not a wait, it's just CPU use. Similarly, if you have a cpu expensive WHERE clause, that's not a wait. But if you wait for network IO due to pam using ldap underneath or you need to read toast values from disk, those are waits. > > > I agree that would be amazing! I'm not about to tackle reliable > > > interrupts for all of the current blocking auth code for v18, though. > > > I'm just trying to make it observable when we do something that > > > blocks. > > > > Well, with that justification we could end up adding wait events for large > > swaths of code that might not actually ever wait. > > If it were hypothetically useful to do so, would that be a problem? > I'm trying not to propose things that aren't actually useful. My point is that you're redefining wait events to be "in some region of code" and that once you start doing that, there's a lot of other places you could suddenly use wait events. But wait events aren't actually suitable for that - they're a *single-depth* mechanism, which means that if you start waiting, the prior wait event is lost, and a) the nested region isn't attributed to the parent while active b) once the nested wait event is over, the parent isn't reset Greetings, Andres Freund
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Jacob Champion
Date:
On Thu, Nov 7, 2024 at 1:37 PM Andres Freund <andres@anarazel.de> wrote: > > - the pre-auth step must always initialize the entire pgstat struct > > Correct. And that has to happen exactly once, not twice. What goes wrong if it happens twice? > > - two-step initialization requires two PGSTAT_BEGIN_WRITE_ACTIVITY() > > calls for _every_ backend > > That's fine - PGSTAT_BEGIN_WRITE_ACTIVITY() is *extremely* cheap on the write > side. That's the whole point of of the sequence-lock like mechanism. Okay, cool. I'll retract that concern. > > - two-step initialization requires us to couple against the order that > > authentication information is being filled in (pre-auth, post-auth, or > > both) > > Not sure what you mean with this? In other words: if we split it, people who make changes to the order that authentication information is determined during startup must know to keep an eye on this code as well. Whereas with the current patchset, the layers are decoupled and the order doesn't matter. Quoting from above: Finally, if we're okay with all of that, future maintainers need to be careful about which fields get copied in the first (preauth) step, the second step, or both. GSS, for example, can be set up during transport negotiation (first step) or authentication (second step), so we have to duplicate the logic there. SSL is currently first-step-only, I think -- but are we sure we want to hardcode the assumption that cert auth can't change any of those parameters after the transport has been established? (I've been brainstorming ways we might use TLS 1.3's post-handshake CertificateRequest, for example.) > If you increase the iteration count for whatever secret > "hashing" method to be very high, it's not a wait, it's just CPU > use. I don't yet understand why this is a useful distinction to make. I understand that they are different, but what are the bad consequences if pg_stat_activity records a CPU busy wait in the same way it records an I/O wait -- as long as they're not nested? > My point is that you're redefining wait events to be "in some region of code" > and that once you start doing that, there's a lot of other places you could > suddenly use wait events. > > But wait events aren't actually suitable for that - they're a *single-depth* > mechanism, which means that if you start waiting, the prior wait event is > lost, and > a) the nested region isn't attributed to the parent while active > b) once the nested wait event is over, the parent isn't reset I understand that they shouldn't be nested. But as long as they're not, isn't that fine? And if the concern is that they'll accidentally get nested, whether in this patch or in the future, can't we just programmatically assert that we haven't? Thanks, --Jacob
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Andres Freund
Date:
Hi, On 2024-11-07 14:29:18 -0800, Jacob Champion wrote: > On Thu, Nov 7, 2024 at 1:37 PM Andres Freund <andres@anarazel.de> wrote: > > > - the pre-auth step must always initialize the entire pgstat struct > > > > Correct. And that has to happen exactly once, not twice. > > What goes wrong if it happens twice? Primarily it's architecturally wrong. For no reason that I can see. It does actually make things harder - what if somebody added a pgstat_report_activity() somewhere between the call? It would suddenly get lost after the second "initialization". Actually, the proposed patch already has weird, externally visible, consequences - the application name is set, then suddenly becomes unset, then is set again. > > > - two-step initialization requires two PGSTAT_BEGIN_WRITE_ACTIVITY() > > > calls for _every_ backend > > > > That's fine - PGSTAT_BEGIN_WRITE_ACTIVITY() is *extremely* cheap on the write > > side. That's the whole point of of the sequence-lock like mechanism. > > Okay, cool. I'll retract that concern. Additionally PGSTAT_BEGIN_WRITE_ACTIVITY() would already happen twice if you initialize twice... > > > - two-step initialization requires us to couple against the order that > > > authentication information is being filled in (pre-auth, post-auth, or > > > both) > > > > Not sure what you mean with this? > > In other words: if we split it, people who make changes to the order > that authentication information is determined during startup must know > to keep an eye on this code as well. Whereas with the current > patchset, the layers are decoupled and the order doesn't matter. > Quoting from above: > > Finally, if we're okay with all of that, future maintainers need to be > careful about which fields get copied in the first (preauth) step, the > second step, or both. GSS, for example, can be set up during transport > negotiation (first step) or authentication (second step), so we have > to duplicate the logic there. SSL is currently first-step-only, I > think -- but are we sure we want to hardcode the assumption that cert > auth can't change any of those parameters after the transport has been > established? (I've been brainstorming ways we might use TLS 1.3's > post-handshake CertificateRequest, for example.) That doesn't seem like a reason to just initialize twice to me. If you have one initialization step that properly initializes everything to a minimal default state, you then can have granular functions that set up the user, database, SSL, GSS information separately. > > If you increase the iteration count for whatever secret > > "hashing" method to be very high, it's not a wait, it's just CPU > > use. > > I don't yet understand why this is a useful distinction to make. I > understand that they are different, but what are the bad consequences > if pg_stat_activity records a CPU busy wait in the same way it records > an I/O wait -- as long as they're not nested? Well, first, because you suddenly can't use wait events anymore to find waits. But more importantly, because of not having nesting, adding one "coarse" "wait event" means that anyone adding a wait event at a finer grade suddenly needs to be aware of all the paths that could lead to the execution of the new code and change all of them to not use the wait event anymore. It imposes a tax on measuring actual "out of postgres" wait events. > > My point is that you're redefining wait events to be "in some region of code" > > and that once you start doing that, there's a lot of other places you could > > suddenly use wait events. > > > > But wait events aren't actually suitable for that - they're a *single-depth* > > mechanism, which means that if you start waiting, the prior wait event is > > lost, and > > a) the nested region isn't attributed to the parent while active > > b) once the nested wait event is over, the parent isn't reset > > I understand that they shouldn't be nested. But as long as they're > not, isn't that fine? And if the concern is that they'll accidentally > get nested, whether in this patch or in the future, can't we just > programmatically assert that we haven't? One very useful wait event would be for memory allocations that hit the kernel. Those can take a fairly long time, because they might need to write dirty buffers to disk before there is enough free memory. Now imagine that we redefine the system memory allocator (or just postgres') so that all memory allocations from the kernel use a wait event. Now suddenly all that code that uses "coarse" wait events suddenly has a *rare* path - because most of the time we can carve memory out of a larger OS level memory allocation - where wait events would be nested. Greetings, Andres Freund
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Jacob Champion
Date:
On Thu, Nov 7, 2024 at 2:56 PM Andres Freund <andres@anarazel.de> wrote: > It does actually make things harder - what if somebody added a > pgstat_report_activity() somewhere between the call? It would suddenly get > lost after the second "initialization". Actually, the proposed patch already > has weird, externally visible, consequences - the application name is set, > then suddenly becomes unset, then is set again. Oh... I think that alone is enough to change my mind; I neglected the effects of that little pgstat_report_appname() stinger... > Additionally PGSTAT_BEGIN_WRITE_ACTIVITY() would already happen twice if you > initialize twice... Sure. I was just trying not to introduce that to _all_ backend code paths, but it sounds like that's not a concern. (Plus, it turns out to be four calls, due again to the application_name reporting...) > That doesn't seem like a reason to just initialize twice to me. If you have > one initialization step that properly initializes everything to a minimal > default state, you then can have granular functions that set up the user, > database, SSL, GSS information separately. I will start work on that then (unless Michael has already beaten me to it?). > But more importantly, because of not having nesting, adding one "coarse" "wait > event" means that anyone adding a wait event at a finer grade suddenly needs > to be aware of all the paths that could lead to the execution of the new > code and change all of them to not use the wait event anymore. It imposes a > tax on measuring actual "out of postgres" wait events. > > One very useful wait event would be for memory allocations that hit the > kernel. Those can take a fairly long time, because they might need to write > dirty buffers to disk before there is enough free memory. Now imagine that we > redefine the system memory allocator (or just postgres') so that all memory > allocations from the kernel use a wait event. Now suddenly all that code that > uses "coarse" wait events suddenly has a *rare* path - because most of the time > we can carve memory out of a larger OS level memory allocation - where wait > events would be nested. Okay, that makes a lot of sense. I will plumb these down as far as I can. Thanks very much! --Jacob
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Jacob Champion
Date:
On Thu, Feb 6, 2025 at 12:35 AM Michael Paquier <michael@paquier.xyz> wrote: > /* Update app name to current GUC setting */ > + /* TODO: ask the list: maybe do this before setting STATE_UNDEFINED? */ > if (application_name) > pgstat_report_appname(application_name); > > Historically, we've always reported the application name after > STATE_UNDEFINED is set, never the reverse. There could be potential > security implications if we were to begin reporting the application > name before the connection has fully authenticated because it would be > possible for attempted connections to show malicious data in the > catalogs, and now we're sure that the catalog data is OK to query for > other users. Let's be careful about that. I think that we should > still set that as late as possible in pgstat_bestart_final(), never at > an earlier step as this data can come from a connection > string, potentially malicious if not authenticated yet. I don't want to change the order of authentication and application name, just move the application name report up above the state change in pgstat_bestart_final(). I don't feel strongly about it, though. > + if (!bootstrap) > + { > + pgstat_bestart_initial(); > + if (MyProcPort) > + pgstat_bestart_security(); /* fill in any SSL/GSS info too */ > > This part of patch 0001 is giving me a very long pause. What's the > merit of filling in twice the backend entry data if we're going to > update it at the end anyway for normal backend connections? More info > for debugging, is it? Correct; if you already know authentication information from the transport then it seemed nice to fill it in before doing the stuff that hangs. > It seems that removing this call does not > influence the tests you are adding, as well (or should the test > "pg_stat_ssl view is updated" be responsible for checking that?). The test is supposed to enforce that, but I see that it's not for some reason. That's concerning. I'll investigate, thanks for pointing it out. > With this patch, the information that we get to be able to debug a > backend entry in pg_stat_activity is st_clientaddr and > remote_hostname. If we have a backend stuck in a wait event for the > "Auth" class, would these be sufficient to provide a good user > experience? Still better than nothing as we don't know the database > and user ID used for the connection until authentication is done, I > guess, to be able to grab patterns behind authentication getting > stuck. It's better than nothing, but is there a particular reason not to trust the crypto the first time around? The SSL/GSS details are what they are; if you don't trust them at either point one or point two then I think we need to have a more urgent conversation about that? > In patch 0002, WAIT_EVENT_SSPI_TRANSLATE_NAME is used twice for two > code paths. Perhaps these should be two separate wait events? That was my question from v6: > I violated the "one event name per call site" rule with > TranslateName(). The call pattern there is "call once to figure out > the buffer length, then call again to fill it in", and IMO that didn't > deserve differentiation. But if anyone objects, I'm happy to change it > (and I'd appreciate some name suggestions in that case). So -- any name suggestions? :D (Personally, I viewed this sort of like an unrolled loop of two. I don't know if it helps to know which call is hanging as long as you know that TranslateName is hanging.) > The events attached to ldap_unbind() make that also kind of hard to > debug. What's the code path actually impacted if we see them in > pg_stat_activity? We have nine of them with the same wait event > attached to them. Do we _want_ nine separate flavors of WAIT_EVENT_LDAP_UNBIND? I figured it was enough to know that you were stuck unbinding. > The patch needs some 2024 -> 2025 updates in its copyright notices. Will do. > At the end of [1], I've been reminded of this quote from Andres about > 0002: > "This doesn't really seem like it's actually using wait events to > describe waits. The new wait events cover stuff like memory > allocations etc, see e.g. pg_SSPI_make_upn()." > > Should we do improvements in this area with interruptions before > attempting to tackle the problem of making starting backend entries > visible in pg_stat_activity, before these complete authentication? We > use wait events for syscalls on I/O already, perhaps that does not > stand as an argument in favor of this patch on its own, but I don't > see why more events associated to external library calls as done in > 0002 is a problem? I had hoped that my v6 addressed Andres' concerns by pushing the events down as far as possible (that way they can't be nested, which I took to be the primary problem). Does something else need to be done? Thanks, --Jacob
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Jacob Champion
Date:
On Tue, Feb 11, 2025 at 11:23 PM Michael Paquier <michael@paquier.xyz> wrote: > + be_tls_get_peer_subject_name(MyProcPort, lsslstatus.ssl_client_dn, NAMEDATALEN); > + be_tls_get_peer_serial(MyProcPort, lsslstatus.ssl_client_serial, NAMEDATALEN); > + be_tls_get_peer_issuer_name(MyProcPort, lsslstatus.ssl_issuer_dn, NAMEDATALEN); > > But are these three actually OK to have showing up in the catalogs > this early? This data comes from a peer certificate, that we may know > nothing about, become set at a very early stage with > secure_initialize(). I guess I'm going to zero in on your definition of "may know nothing about". If that is true, something is very wrong IMO. My understanding of the backend code was that port->peer is only set after OpenSSL has verified that the certificate has been issued by an explicitly trusted CA. (Our verify_cb() doesn't override the internal checks to allow failed certificates through.) That may not be enough to authorize entry into the server, but it also shouldn't be untrusted data. If a CA is issuing Subject data that is somehow dangerous to the operation of the server, I think that's a security problem in and of itself: there are clientcert HBA modes that don't validate the Subject, but they're still going to push that data into the catalogs, aren't they? So if we're concerned that Subject data is dangerous at this point in the code, I agree that my patch makes it even more dangerous and I'll modify it -- but then I'm going to split off another thread to try to fix that underlying issue. A user should not have to be authorized to access the server in order for signed authentication data from the transport layer to be considered trustworthy. Being able to monitor those separately is important for auditability. > As a whole we still have a gap between what could be OK, what could > not be OK, and the fact that pgstat_bestart_security() is called twice > makes that confusing. My end goal is that all of this _should_ always be OK, so calling it once or twice or thirty times should be safe. (But again, if that's not actually true today, I'll remove it for now.) > > v8-0003 shows this approach. For the record, I think it's materially > > worse than v7-0003. IMO it increases the cognitive load for very > > little benefit and makes it more work for a newcomer to refactor the > > cleanup code for those routines. I think it's enough that you can see > > a separate LOG message for each failure case, if you want to know why > > we're unbinding. > > That's more verbose, as well. As Robert said, that makes the output > easier to debug with a 1:1 mapping between the event and a code path. I agree with Robert's goal in general. I just think that following that rule to *this* extent is counterproductive. But I won't die on that hill; in the end, I just want to be able to see when LDAP calls hang. Thanks! --Jacob
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Jacob Champion
Date:
On Thu, Feb 13, 2025 at 4:03 PM Michael Paquier <michael@paquier.xyz> wrote: > > If a CA is issuing Subject data that is somehow dangerous to the > > operation of the server, I think that's a security problem in and of > > itself: there are clientcert HBA modes that don't validate the > > Subject, but they're still going to push that data into the catalogs, > > aren't they? > > Is that the case before we finish authentication now? No, but I still don't understand why that's relevant. My point is that transport authentication data should be neither less trustworthy prior to ClientAuthentication, nor more trustworthy after it, since it's signed by the same authentication provider that you're trusting to make the authentication decisions in the first place. (But it doesn't seem like we're going to agree on this for now; in the meantime I'll prepare a version of the patch that only calls pgstat_bestart_security() once.) At some point in the future, I would really like to clarify what potential problems there are if we put verified Subject data into the catalogs before ClientAuthentication completes. I think that any such problems would continue to be problems after ClientAuthentication completes, too. Thanks, --Jacob
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Jacob Champion
Date:
On Fri, Feb 14, 2025 at 5:34 PM Jacob Champion <jacob.champion@enterprisedb.com> wrote: > (But it doesn't > seem like we're going to agree on this for now; in the meantime I'll > prepare a version of the patch that only calls > pgstat_bestart_security() once.) v9 removes the first call, and moves the second (now only) call up and out of the if/else chain, just past client authentication. The SSL pre-auth tests have been removed. Thanks! --Jacob
Attachment
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Michael Paquier
Date:
On Fri, Feb 28, 2025 at 12:40:13PM -0800, Jacob Champion wrote: > v9 removes the first call, and moves the second (now only) call up and > out of the if/else chain, just past client authentication. The SSL > pre-auth tests have been removed. I have put my eyes on 0001, and this version looks sensible here, just tweaked a bit the comments after a closer lookup and adjusted a few things, nothing huge.. /* Update app name to current GUC setting */ + /* TODO: ask the list: maybe do this before setting STATE_UNDEFINED? */ if (application_name) pgstat_report_appname(application_name); This has always been set last and it's still the case in the patch, so let's just remove that. -- Michael
Attachment
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Michael Paquier
Date:
On Mon, Mar 03, 2025 at 02:23:51PM +0900, Michael Paquier wrote: > This has always been set last and it's still the case in the patch, so > let's just remove that. This first one has been now applied as c76db55c9085. Attached is the rest to add the wait events (still need to have a closer look at this part). -- Michael
Attachment
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Jacob Champion
Date:
On Tue, Mar 4, 2025 at 12:51 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Mon, Mar 03, 2025 at 02:23:51PM +0900, Michael Paquier wrote: > > This has always been set last and it's still the case in the patch, so > > let's just remove that. > > This first one has been now applied as c76db55c9085. Thanks! > Attached is the > rest to add the wait events (still need to have a closer look at this > part). Sounds good. --Jacob
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Andres Freund
Date:
Hi, On 2025-03-04 17:51:17 +0900, Michael Paquier wrote: > On Mon, Mar 03, 2025 at 02:23:51PM +0900, Michael Paquier wrote: > > This has always been set last and it's still the case in the patch, so > > let's just remove that. > > This first one has been now applied as c76db55c9085. Attached is the > rest to add the wait events (still need to have a closer look at this > part). This seems to trigger a bunch of CI failures, e.g.: https://cirrus-ci.com/task/5350341408980992 https://cirrus-ci.com/task/5537391798124544 https://cirrus-ci.com/task/4657439905153024 https://api.cirrus-ci.com/v1/artifact/task/5350341408980992/testrun/build/testrun/authentication/007_pre_auth/log/regress_log_007_pre_auth [17:47:59.698](0.000s) ok 1 - authenticating connections are recorded in pg_stat_activity [17:47:59.698](0.000s) # issuing query 5 via background psql: SELECT injection_points_wakeup('init-pre-auth'); [17:47:59.752](0.054s) # pump_until: process terminated unexpectedly when searching for "(?^:(^|\n)background_psql: QUERY_SEPARATOR5:\r?\n)" with stream: "" process ended prematurely at C:/cirrus/src/test/perl/PostgreSQL/Test/Utils.pm line 439. # Postmaster PID for node "primary" is 6084 https://api.cirrus-ci.com/v1/artifact/task/5350341408980992/testrun/build/testrun/authentication/007_pre_auth/log/007_pre_auth_primary.log 2025-03-04 17:47:59.705 GMT [7624][client backend] [007_pre_auth.pl][2/9:0] LOG: statement: SELECT injection_points_wakeup('init-pre-auth'); 2025-03-04 17:47:59.705 GMT [7624][client backend] [007_pre_auth.pl][2/9:0] ERROR: could not find injection point init-pre-authto wake up 2025-03-04 17:47:59.705 GMT [7624][client backend] [007_pre_auth.pl][2/9:0] STATEMENT: SELECT injection_points_wakeup('init-pre-auth'); 2025-03-04 17:47:59.706 GMT [7624][client backend] [007_pre_auth.pl][:0] LOG: disconnection: session time: 0:00:00.333 user=SYSTEMdatabase=postgres host=[local] Greetings, Andres Freund
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Jacob Champion
Date:
On Tue, Mar 4, 2025 at 4:10 PM Andres Freund <andres@anarazel.de> wrote: > This seems to trigger a bunch of CI failures, e.g.: > > https://cirrus-ci.com/task/5350341408980992 > https://cirrus-ci.com/task/5537391798124544 > https://cirrus-ci.com/task/4657439905153024 Hm. All Windows. > https://api.cirrus-ci.com/v1/artifact/task/5350341408980992/testrun/build/testrun/authentication/007_pre_auth/log/007_pre_auth_primary.log > 2025-03-04 17:47:59.705 GMT [7624][client backend] [007_pre_auth.pl][2/9:0] LOG: statement: SELECT injection_points_wakeup('init-pre-auth'); > 2025-03-04 17:47:59.705 GMT [7624][client backend] [007_pre_auth.pl][2/9:0] ERROR: could not find injection point init-pre-authto wake up But attaching to that injection point succeeded above, for us to have gotten to this point... Does that error message indicate that the point itself doesn't exist, or that nothing is currently waiting? --Jacob
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Jacob Champion
Date:
On Tue, Mar 4, 2025 at 4:26 PM Jacob Champion <jacob.champion@enterprisedb.com> wrote: > But attaching to that injection point succeeded above, for us to have > gotten to this point... Does that error message indicate that the > point itself doesn't exist, or that nothing is currently waiting? Looks like the latter. With the following diff I can reproduce locally: --- a/src/backend/utils/init/postinit.c +++ b/src/backend/utils/init/postinit.c @@ -730,6 +730,7 @@ InitPostgres(const char *in_dbname, Oid dboid, if (!bootstrap) { pgstat_bestart_initial(); + pg_usleep(1000000); INJECTION_POINT("init-pre-auth"); } So I've misunderstood the API. I should have added a background version of $node->wait_for_event(), or similar. I'll work on a fix, but it probably won't be fast since I need to learn more about the injection points architecture. The test may need to be disabled, or the patch backed out, depending on how painful the flake is for everybody. Thanks, and sorry, --Jacob
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Michael Paquier
Date:
On Tue, Mar 04, 2025 at 04:53:14PM -0800, Jacob Champion wrote: > I'll work on a fix, but it probably won't be fast since I need to > learn more about the injection points architecture. The test may need > to be disabled, or the patch backed out, depending on how painful the > flake is for everybody. Oops, missed these failures.. So we have a race condition where we are trying to wake up a point that's not waiting yet, because there is a small window between the moment when the backend entry is marked as "starting" and the injection point wait. What this is telling us is that we should change the query scanning pg_stat_activity for a PID of a backend in 'starting' state so as we also check the wait_event init-pre-auth, as this is reported when using injection point waits. The attached should be enough to take care of this race condition. -- Michael
Attachment
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Jacob Champion
Date:
On Tue, Mar 4, 2025 at 8:45 PM Michael Paquier <michael@paquier.xyz> wrote: > What this is telling us is that we should change the query scanning > pg_stat_activity for a PID of a backend in 'starting' state so as we > also check the wait_event init-pre-auth, as this is reported when > using injection point waits. The attached should be enough to take > care of this race condition. That's a lot easier than the rabbit hole I was running down; thank you. > - "SELECT pid FROM pg_stat_activity WHERE state = 'starting';"); > + qq{SELECT pid FROM pg_stat_activity > + WHERE state = 'starting' and wait_event = 'init-pre-auth';}); I had intended for this part of the patch to also wait for client backends only (see v8-0001, 001_ssltests.pl), but I must have misapplied it. (The ssltests change was lost completely when that file was dropped from the set.) So while we're at it, should we add a `backend_type = 'client backend'` filter to stop that from flaking in the future? That would further align this query with the wait_for_event() implementation. Thanks, --Jacob
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Jacob Champion
Date:
On Wed, Mar 5, 2025 at 5:47 AM Jacob Champion <jacob.champion@enterprisedb.com> wrote: > > So while we're at it, should we add a > `backend_type = 'client backend'` filter to stop that from flaking in > the future? That would further align this query with the > wait_for_event() implementation. More concretely: here's a squashable patchset with that suggestion, for the CI to chew on. --Jacob
Attachment
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Andres Freund
Date:
Hi, On 2025-03-05 08:16:45 -0800, Jacob Champion wrote: > From efc9fc3b3993601e9611131f229fbcaf4daa46f1 Mon Sep 17 00:00:00 2001 > From: Michael Paquier <michael@paquier.xyz> > Date: Wed, 5 Mar 2025 13:30:43 +0900 > Subject: [PATCH 1/2] Fix race condition in pre-auth test > > --- > src/test/authentication/t/007_pre_auth.pl | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/src/test/authentication/t/007_pre_auth.pl b/src/test/authentication/t/007_pre_auth.pl > index a638226dbaf..90aaea4b5a6 100644 > --- a/src/test/authentication/t/007_pre_auth.pl > +++ b/src/test/authentication/t/007_pre_auth.pl > @@ -43,12 +43,14 @@ $psql->query_safe("SELECT injection_points_attach('init-pre-auth', 'wait')"); > # authentication. Use the $psql connection handle for server interaction. > my $conn = $node->background_psql('postgres', wait => 0); > > -# Wait for the connection to show up. > +# Wait for the connection to show up in pg_stat_activity, with the wait_event > +# of the injection point. > my $pid; > while (1) > { > $pid = $psql->query( > - "SELECT pid FROM pg_stat_activity WHERE state = 'starting';"); > + qq{SELECT pid FROM pg_stat_activity > + WHERE state = 'starting' and wait_event = 'init-pre-auth';}); > last if $pid ne ""; Unrelated to the change in this patch, but tests really shouldn't use while(1) loops without a termination condition. If something is wrong, the test will hang indefinitely, instead of timing out. On the buildfarm that can take out an animal if it hasn't configured a timeout (with autoconf at least, meson terminates tests after a timeout). I guess you can't use poll_query_until() here, but in that case you should copy some of the timeout logic. Or, perhaps better, add a poll_query_until() to BackgroundPsql.pm. Greetings, Andres Freund
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Jacob Champion
Date:
On Wed, Mar 5, 2025 at 9:28 AM Andres Freund <andres@anarazel.de> wrote: > Unrelated to the change in this patch, but tests really shouldn't use while(1) > loops without a termination condition. If something is wrong, the test will > hang indefinitely, instead of timing out. On the buildfarm that can take out > an animal if it hasn't configured a timeout (with autoconf at least, meson > terminates tests after a timeout). With the current patchset, if I pull the PG_TEST_TIMEOUT_DEFAULT down low, and modify the backend so that either one of the two conditions never completes, the tests still stop due to BackgroundPsql's session timeout. This is true for Meson and Autoconf. Is there a different situation where I can't rely on that? Thanks! --Jacob
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Andres Freund
Date:
Hi, On 2025-03-05 16:19:04 -0800, Jacob Champion wrote: > On Wed, Mar 5, 2025 at 9:28 AM Andres Freund <andres@anarazel.de> wrote: > > Unrelated to the change in this patch, but tests really shouldn't use while(1) > > loops without a termination condition. If something is wrong, the test will > > hang indefinitely, instead of timing out. On the buildfarm that can take out > > an animal if it hasn't configured a timeout (with autoconf at least, meson > > terminates tests after a timeout). > > With the current patchset, if I pull the PG_TEST_TIMEOUT_DEFAULT down > low, and modify the backend so that either one of the two conditions > never completes, the tests still stop due to BackgroundPsql's session > timeout. This is true for Meson and Autoconf. Is there a different > situation where I can't rely on that? Oh, I see. I missed that it's relying on the timeout and that the timeout isn't reset in the loop. Sorry for the noise. FWIW, I still don't like open-coded poll loops, as I'd really like our tests to use some smarter retry/backoff logic than a single hardcoded usleep(100_000). The first few iterations that's too long, commonly the state isn't reached in the first iteration, but will be within a millisecond or two. Waiting 100ms is obviously way too long. Once we've slept for 10+ seconds without reaching the target, sleeping for 100us is way too short a sleep and just wastes CPU cycles. A decent portion of the CPU time when running under valgrind is wasted just due to way too tight retry loops. That's harder to do if we have many places polling. But anyway, I digress, that's really not related to your change. Greetings, Andres Freund
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Michael Paquier
Date:
On Wed, Mar 05, 2025 at 08:55:55PM -0500, Andres Freund wrote: > Once we've slept for 10+ seconds without reaching the target, sleeping for > 100us is way too short a sleep and just wastes CPU cycles. A decent portion > of the CPU time when running under valgrind is wasted just due to way too > tight retry loops. > > That's harder to do if we have many places polling. > > But anyway, I digress, that's really not related to your change. Please let me agree with your previous argument, then. While looking at the test when reviewing the patch a couple of days ago, I was also wondering why we could not have a poll_query_until() in BackgroundPsql and gave up on the idea. Honestly, I don't see a reason not to introduce that, like in the attached. BackgroundPsql->query() does all the job already, and it is possible to rely on $PostgreSQL::Test::Utils::timeout_default in the loops, so that's simple, and it makes the test a bit easier to parse. -- Michael
Attachment
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Jacob Champion
Date:
On Wed, Mar 5, 2025 at 8:08 PM Michael Paquier <michael@paquier.xyz> wrote: > Honestly, I don't see a reason not to introduce that, like in the > attached. This new code races against the session timeout. I see this on timer expiration: [14:19:55.224](0.000s) # issuing query 34 via background psql: SELECT state FROM pg_stat_activity WHERE pid = ; [14:19:55.228](0.004s) # pump_until: process terminated unexpectedly when searching for "(?^:(^|\n)background_psql: QUERY_SEPARATOR 34:\r?\n)" with stream: "" process ended prematurely at /home/jacob/src/postgres/src/test/perl/PostgreSQL/Test/Utils.pm line 439. Which makes it seem like some sort of crash, IMO. I don't find that as easily debuggable as the previous log message, which was [14:21:33.104](0.001s) # issuing query 32 via background psql: SELECT pid FROM pg_stat_activity # WHERE state = 'starting' and wait_event = 'init-pre-auth'; IPC::Run: timeout on timer #1 at /home/jacob/perl5/lib/perl5/IPC/Run.pm line 3007. > + WHERE state = 'starting' and wait_event = 'init-pre-auth';}); Did you have thoughts on expanding the check to backend_type [1]? > + # Give up. The output of the last attempt is logged by query(), > + # so no need to do anything here. > + return 0; One of my primary complaints about the poll_query_until() implementation is that "giving up" in this case means continuing to run pieces of the test that have no business running after a timeout, and increasing the log noise after a failure. I'm not sure how loudly to complain in this particular case, since I know we use it elsewhere... Thanks! --Jacob [1] https://postgr.es/m/CAOYmi%2BnxNCQcTQE-tQ7Lwpe4cYc1u-yxwEe5kt2AVN%2BDXXVVbQ%40mail.gmail.com
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Michael Paquier
Date:
On Thu, Mar 06, 2025 at 02:25:07PM -0800, Jacob Champion wrote: > On Wed, Mar 5, 2025 at 8:08 PM Michael Paquier <michael@paquier.xyz> wrote: >> + WHERE state = 'starting' and wait_event = 'init-pre-auth';}); > > Did you have thoughts on expanding the check to backend_type [1]? > >> + # Give up. The output of the last attempt is logged by query(), >> + # so no need to do anything here. >> + return 0; > > One of my primary complaints about the poll_query_until() > implementation is that "giving up" in this case means continuing to > run pieces of the test that have no business running after a timeout, > and increasing the log noise after a failure. I'm not sure how loudly > to complain in this particular case, since I know we use it > elsewhere... Indeed. The existing poll_query_until() is a bit more reliable in terms of error handling, even with a very low PG_TEST_TIMEOUT_DEFAULT. A second thing that was bugging me on a second lookup this morning is how we should handle error cases. A background psql process depends on what the caller defines for ON_ERROR_STOP. In the case of this test, we're OK to fail immediately because we expect the queries to always work. I'm not sure if this is fine by default, especially if callers of this routine expect to have the same properties as poll_query_until() in Cluster.pm. They would not, because a BackgroundPsql is an entire different object, except if given options when staring psql to act like that. I have applied the simplest patch for now, to silence the failures in the CI, and included your suggestion to add a check on the backend_type for the extra safety it offers. I'd like the addition of the poll_query_until() in the long-term, but I'm really not sure if the semantics would be right this way under a background psql. In the auth 007 test, they would be OK, but it could be surprising if we have other callers that begin relying on it. -- Michael
Attachment
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Jacob Champion
Date:
On Thu, Mar 6, 2025 at 3:15 PM Michael Paquier <michael@paquier.xyz> wrote: > I have applied the simplest patch for now, to silence the failures in > the CI, and included your suggestion to add a check on the > backend_type for the extra safety it offers. Thanks! Initial CI run looks green, so that's a good start. I've reattached the wait event patches, to get the cfbot back to where it was. > I'd like the addition of the poll_query_until() in the long-term, but > I'm really not sure if the semantics would be right this way under a > background psql. In the auth 007 test, they would be OK, but it could > be surprising if we have other callers that begin relying on it. Yeah, that API is definitely subtle. --Jacob
Attachment
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Andres Freund
Date:
Hi, On 2025-03-06 15:39:44 -0800, Jacob Champion wrote: > I've reattached the wait event patches, to get the cfbot back to where it was. FWIW, I continue to think that this is a misuse of wait events. We shouldn't use them as a poor man's general purpose tracing framework. Greetings, Andres Freund
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Jacob Champion
Date:
On Fri, Mar 7, 2025 at 8:38 AM Andres Freund <andres@anarazel.de> wrote: > FWIW, I continue to think that this is a misuse of wait events. We shouldn't > use them as a poor man's general purpose tracing framework. Well, okay. That's frustrating. If I return to the original design, but replace all of the high-level wait events with calls to pgstat_report_activity(), does that work? --Jacob
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Andres Freund
Date:
Hi, On 2025-03-07 09:03:18 -0800, Jacob Champion wrote: > On Fri, Mar 7, 2025 at 8:38 AM Andres Freund <andres@anarazel.de> wrote: > > FWIW, I continue to think that this is a misuse of wait events. We shouldn't > > use them as a poor man's general purpose tracing framework. > > Well, okay. That's frustrating. I should have clarified - there are a few that I think are ok, basically the places where we wrap syscalls, e.g. around the sendto, select and recvfrom in PerformRadiusTransaction(). OTOH that code is effectively completely broken. Doing a blocking select() is just a no-go, the code isn't interruptible, breaking authentication timeout. And using select() means that we theoretically could crash due to an fd that's above FD_SETSIZE. Most of the other places I'm not on board with, that's wrapping large amounts of code in a wait event, which pretty much means we're not waiting. I think some of the wrapped calls into library code might actually call back into our code (to receive/send data), and our code then will use wait events around lower level operations done as part of that. Which pretty much explains my main issue with this - either the code can't wait in those function calls, in which case it's wrong to use wait events, or the code is flat out broken. It's also IMO quite wrong to do something that can throw an error inside a wait event, because that means that the wait event will still be reported during error recovery. Probably not the only place doing so, but it's still wrong. > If I return to the original design, but replace all of the high-level > wait events with calls to pgstat_report_activity(), does that work? It'd be less wrong. But I really doubt that it's a good idea to encode all kinds of function calls happening during authentication into something SQL visible. Why stop with these functions and not just do that for *all* functions in postgres? I mean it'd not work and slow everything down, but how do you define that line? Greetings, Andres Freund
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Jacob Champion
Date:
On Fri, Mar 7, 2025 at 9:25 AM Andres Freund <andres@anarazel.de> wrote: > I should have clarified - there are a few that I think are ok, basically the > places where we wrap syscalls, e.g. around the sendto, select and recvfrom in > PerformRadiusTransaction(). Okay. > OTOH that code is effectively completely broken. Doing a blocking select() is > just a no-go, the code isn't interruptible, breaking authentication > timeout. And using select() means that we theoretically could crash due to an > fd that's above FD_SETSIZE. I think we're in agreement here; I'm just trying to improve things incrementally. If someone actually hits the broken case, I think it'd be helpful for them to see it. > I think some of the wrapped calls into library code might actually call back > into our code (to receive/send data), and our code then will use wait events > around lower level operations done as part of that. That would be a problem, agreed, but I didn't think I'd wrapped any callback APIs. (Admittedly I have little experience with the SSPI stuff.) But looking at the wrapped calls in the patch... which are you suspicious of? > It's also IMO quite wrong to do something that can throw an error inside a > wait event, because that means that the wait event will still be reported > during error recovery. Hm, okay. I can change that for the LookupAccountSid case. > Probably not the only place doing so, but it's still > wrong. It's definitely not the only place. :D > Why stop with > these functions and not just do that for *all* functions in postgres? I mean > it'd not work and slow everything down, (That seems like a good reason not to do it for all functions in Postgres, no? I hope the slope is not all that slippery in practice.) > but how do you define that line? Cost/benefit. In this case, authentication hanging in an unknown place in PAM and LDAP has caused tangible support problems. I suspect I'd have gotten complaints if I only focused on those two places, though, so I expanded it to the other blocking calls I could see. Thanks, --Jacob
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Jacob Champion
Date:
On Fri, Mar 7, 2025 at 10:28 AM Jacob Champion <jacob.champion@enterprisedb.com> wrote: > > I think some of the wrapped calls into library code might actually call back > > into our code (to receive/send data), and our code then will use wait events > > around lower level operations done as part of that. > > That would be a problem, agreed, but I didn't think I'd wrapped any > callback APIs. (Admittedly I have little experience with the SSPI > stuff.) But looking at the wrapped calls in the patch... which are you > suspicious of? I missed PAM_CONV, sorry. I'm worried about the sendAuthRequest() being done there; it doesn't seem safe to potentially ereport(ERROR) and longjmp through a PAM call stack? But I'll switch those over to something safe or else drop that part of the patch. Thanks, --Jacob
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Jacob Champion
Date:
On Wed, Mar 12, 2025 at 3:16 PM Jacob Champion <jacob.champion@enterprisedb.com> wrote: > I missed PAM_CONV, sorry. I'm worried about the sendAuthRequest() > being done there; it doesn't seem safe to potentially ereport(ERROR) > and longjmp through a PAM call stack? But I'll switch those over to > something safe or else drop that part of the patch. PAM aside... Michael, what's your level of enthusiasm for the rest of this patch? I was confidently, embarrassingly wrong about how CheckPAMAuth worked, and it makes me think I need to put this down and take a completely new crack at it in 19. Thanks, --Jacob
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Andres Freund
Date:
Hi, On 2025-03-13 09:23:10 -0700, Jacob Champion wrote: > On Wed, Mar 12, 2025 at 3:16 PM Jacob Champion > <jacob.champion@enterprisedb.com> wrote: > > I missed PAM_CONV, sorry. I'm worried about the sendAuthRequest() > > being done there; it doesn't seem safe to potentially ereport(ERROR) > > and longjmp through a PAM call stack? That indeed doesn't seem safe. I am wondering if PAM is so fundamentally incompatible with handling interrupts / a non-blocking interface that we have little choice but to eventually remove it... > PAM aside... Michael, what's your level of enthusiasm for the rest of this > patch? I was confidently, embarrassingly wrong about how CheckPAMAuth > worked, and it makes me think I need to put this down and take a completely > new crack at it in 19. FWIW, I continue to think that it's better to invest in making more auth methods non-blocking, rather than adding wait events for code that could maybe sometimes wait on different things internally. Greetings, Andres Freund
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Jacob Champion
Date:
On Thu, Mar 13, 2025 at 9:56 AM Andres Freund <andres@anarazel.de> wrote: > I am wondering if PAM is so fundamentally incompatible with handling > interrupts / a non-blocking interface that we have little choice but to > eventually remove it... Given the choice between a usually-working PAM module with known architectural flaws, and not having PAM at all, I think many users would rather continue using what's working for them. > FWIW, I continue to think that it's better to invest in making more auth > methods non-blocking, rather than adding wait events for code that could maybe > sometimes wait on different things internally. I think we disagree on the either/or nature of that. If I can get proof that a certain thing is causing bugs in the wild, then I have ammunition to fix that thing. Right now there is no visibility, and my interest in rewriting old authentication methods without bug reports to motivate that work is pretty low. I'm not willing to sign up for that at the moment. (But I do really appreciate the review. I'm just feeling crispy about the overall result...) Thanks, --Jacob
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Andres Freund
Date:
On 2025-03-13 10:29:49 -0700, Jacob Champion wrote: > On Thu, Mar 13, 2025 at 9:56 AM Andres Freund <andres@anarazel.de> wrote: > > I am wondering if PAM is so fundamentally incompatible with handling > > interrupts / a non-blocking interface that we have little choice but to > > eventually remove it... > > Given the choice between a usually-working PAM module with known > architectural flaws, and not having PAM at all, I think many users > would rather continue using what's working for them. authentication_timeout currently doesn't reliably work while in some auth methods, nor does pg_terminate_backend() etc. That's IMO is rather bad from a DOSability perspective. The fact that some auth methods are broken like that has had a sizable negative impact on postgres for a long time. Not just when those methods are used, but also architecturally. It's e.g. one of the main reasons we need the ugly escalating logic in postmaster shutdowns to send SIGQUITs and then SIGKILL after a while, because we don't have a reliable way of terminating backends normally. This used to be way worse because historically postgres considered it sane (why, I have no idea) to ereport() in timeout functions, which then occasionally lead to backends stuck in malloc locks etc. > > FWIW, I continue to think that it's better to invest in making more auth > > methods non-blocking, rather than adding wait events for code that could maybe > > sometimes wait on different things internally. > > I think we disagree on the either/or nature of that. If I can get > proof that a certain thing is causing bugs in the wild, then I have > ammunition to fix that thing. FWIW, I've have repeatedly seen production issues due to authentication timeout not working for some auth methods. It's not hard to see why - e.g. a non-resonsive radius server just leaves the backend hanging in select(). Even though it would get interrupted by signals, we'll just retry without even checking interrupts / timeouts :(. > Right now there is no visibility, and my interest in rewriting old > authentication methods without bug reports to motivate that work is pretty > low. I'm not willing to sign up for that at the moment. Fair enough. > (But I do really appreciate the review. I'm just feeling crispy about > the overall result...) Also fair enough :) Greetings, Andres Freund
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Jacob Champion
Date:
On Thu, Mar 13, 2025 at 10:56 AM Andres Freund <andres@anarazel.de> wrote: > > Given the choice between a usually-working PAM module with known > > architectural flaws, and not having PAM at all, I think many users > > would rather continue using what's working for them. > > authentication_timeout currently doesn't reliably work while in some auth > methods, nor does pg_terminate_backend() etc. That's IMO is rather bad from a > DOSability perspective. > > The fact that some auth methods are broken like that has had a sizable > negative impact on postgres for a long time. Not just when those methods are > used, but also architecturally. Right -- I just don't think end users are going to factor that into their choice of authentication method. If IT tells you "use this PAM module", then... that's it. If we remove PAM, maybe they change authentication methods... or maybe they just don't ever upgrade Postgres again. My money's on the latter. -- I looked into switching over to pgstat_report_activity(), but that wasn't designed to be called in the middle of backend initialization. It would take more work to make those calls safe/sane when `st_state == STATE_STARTING`. I plan to mark this patchset as Withdrawn for now. Thanks all! --Jacob
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
From
Michael Paquier
Date:
On Mon, Mar 17, 2025 at 10:22:47AM -0700, Jacob Champion wrote: > I looked into switching over to pgstat_report_activity(), but that > wasn't designed to be called in the middle of backend initialization. > It would take more work to make those calls safe/sane when `st_state > == STATE_STARTING`. I plan to mark this patchset as Withdrawn for now. Okay, fine by me. I had the impression that it would have been possible to salvage some of the wait event states, but at least the starting state showing up in pg_stat_activity will be able to provide some information, so it's better than none. Unfortunately, I don't have any room until the feature freeze for that. Outside the stat report activity calls, I've been wondering if we should add some dynamic tracking of which hba/ident entry a backend PID is working with. For example, if we knew the file and the entry line number, we would know on which auth method this backend is bumping into. That maybe of course limited if someone modifies and reloads the HBA file while a backend is stuck. Now, these files are mostly static, and we have system views that provide the contents of the ident and HBA files as SQL, so with a JOIN.. -- Michael