Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible - Mailing list pgsql-hackers

From Jacob Champion
Subject Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
Date
Msg-id CAOYmi+=N2U8T0zCwLigLfSj4kZ9pZwqo9eg=PxZjuUdeZhrxfQ@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible  (Andres Freund <andres@anarazel.de>)
Responses Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
List pgsql-hackers
On Thu, Nov 7, 2024 at 1:37 PM Andres Freund <andres@anarazel.de> wrote:
> > - the pre-auth step must always initialize the entire pgstat struct
>
> Correct. And that has to happen exactly once, not twice.

What goes wrong if it happens twice?

> > - two-step initialization requires two PGSTAT_BEGIN_WRITE_ACTIVITY()
> > calls for _every_ backend
>
> That's fine - PGSTAT_BEGIN_WRITE_ACTIVITY() is *extremely* cheap on the write
> side. That's the whole point of of the sequence-lock like mechanism.

Okay, cool. I'll retract that concern.

> > - two-step initialization requires us to couple against the order that
> > authentication information is being filled in (pre-auth, post-auth, or
> > both)
>
> Not sure what you mean with this?

In other words: if we split it, people who make changes to the order
that authentication information is determined during startup must know
to keep an eye on this code as well. Whereas with the current
patchset, the layers are decoupled and the order doesn't matter.
Quoting from above:

  Finally, if we're okay with all of that, future maintainers need to be
  careful about which fields get copied in the first (preauth) step, the
  second step, or both. GSS, for example, can be set up during transport
  negotiation (first step) or authentication (second step), so we have
  to duplicate the logic there. SSL is currently first-step-only, I
  think -- but are we sure we want to hardcode the assumption that cert
  auth can't change any of those parameters after the transport has been
  established? (I've been brainstorming ways we might use TLS 1.3's
  post-handshake CertificateRequest, for example.)

> If you increase the iteration count for whatever secret
> "hashing" method to be very high, it's not a wait, it's just CPU
> use.

I don't yet understand why this is a useful distinction to make. I
understand that they are different, but what are the bad consequences
if pg_stat_activity records a CPU busy wait in the same way it records
an I/O wait -- as long as they're not nested?

> My point is that you're redefining wait events to be "in some region of code"
> and that once you start doing that, there's a lot of other places you could
> suddenly use wait events.
>
> But wait events aren't actually suitable for that - they're a *single-depth*
> mechanism, which means that if you start waiting, the prior wait event is
> lost, and
> a) the nested region isn't attributed to the parent while active
> b) once the nested wait event is over, the parent isn't reset

I understand that they shouldn't be nested. But as long as they're
not, isn't that fine? And if the concern is that they'll accidentally
get nested, whether in this patch or in the future, can't we just
programmatically assert that we haven't?

Thanks,
--Jacob



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Alias of VALUES RTE in explain plan
Next
From: Andres Freund
Date:
Subject: Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible