Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible - Mailing list pgsql-hackers

From Noah Misch
Subject Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
Date
Msg-id 20240910205850.5d.nmisch@google.com
Whole thread Raw
In response to Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
List pgsql-hackers
On Tue, Sep 10, 2024 at 02:51:23PM -0400, Robert Haas wrote:
> On Tue, Sep 10, 2024 at 1:27 PM Noah Misch <noah@leadboat.com> wrote:
> > On Tue, Sep 10, 2024 at 02:29:57PM +0900, Michael Paquier wrote:
> > > You are adding twelve event points with only 5
> > > new wait names.  Couldn't it be better to have a one-one mapping
> > > instead, adding twelve entries in wait_event_names.txt?
> >
> > No, I think the patch's level of detail is better.  One shouldn't expect the
> > two ldap_simple_bind_s() calls to have different-enough performance
> > characteristics to justify exposing that level of detail to the DBA.
> > ldap_search_s() and InitializeLDAPConnection() differ more, but the DBA mostly
> > just needs to know the scale of their LDAP responsiveness problem.
> >
> > (Someday, it might be good to expose the file:line and/or backtrace associated
> > with a wait, like we do for ereport().  As a way to satisfy rare needs for
> > more detail, I'd prefer that over giving every pgstat_report_wait_start() a
> > different name.)
> 
> I think unique names are a good idea. If a user doesn't care about the
> difference between sdgjsA and sdjgsB, they can easily ignore the
> trailing suffix, and IME, people typically do that without really
> stopping to think about it. If on the other hand the two are lumped
> together as sdjgs and a user needs to distinguish them, they can't. So
> I see unique names as having much more upside than downside.

I agree a person can ignore the distinction, but that requires the person to
be consuming the raw event list.  It's reasonable to tell your monitoring tool
to give you the top N wait events.  Individual AuthnLdap* events may all miss
the cut even though their aggregate would have made the cut.  Before you know
to teach that monitoring tool to group AuthnLdap* together, it won't show you
any of those names.

I felt commit c789f0f also chose sub-optimally in this respect, particularly
with the DblinkGetConnect/DblinkConnect pair.  I didn't feel strongly enough
to complain at the time, but a rule of "each wait event appears in one
pgstat_report_wait_start()" would be a rule I don't want.  One needs
familiarity with the dblink implementation internals to grasp the
DblinkGetConnect/DblinkConnect distinction, and a plausible refactor of dblink
would make those names cease to fit.  I see this level of fine-grained naming
as making the event name a sort of stable proxy for FILE:LINE.  I'd value
exposing such a proxy, all else being equal, but I don't think wait event
names like AuthLdapBindLdapbinddn/AuthLdapBindUser are the right way.  Wait
event names should be more independent of today's code-level details.



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Jargon and acronyms on this mailing list
Next
From: David Rowley
Date:
Subject: Re: Proposal to Enable/Disable Index using ALTER INDEX