Thread: System username in pg_stat_activity

System username in pg_stat_activity

From
Magnus Hagander
Date:
The attached patch adds a column "authuser" to pg_stat_activity which
contains the username of the externally authenticated user, being the
same value as the SYSTEM_USER keyword returns in a backend.

This overlaps with for example the values in pg_stat_gss, but it will
include values for authentication methods that don't have their own
view such as peer/ident. gss/ssl info will of course still be shown,
it is just in more than one place.

I was originally thinking this column should be "sysuser" to map to
the keyword, but since we already have "usesysid" as a column name in
pg_stat_activity I figured that could be confusing since it actually
means something completely different. But happy to change that back if
people think that's better.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Attachment

Re: System username in pg_stat_activity

From
Aleksander Alekseev
Date:
Hi,

Thanks for the patch.

> The attached patch adds a column "authuser" to pg_stat_activity which
> contains the username of the externally authenticated user, being the
> same value as the SYSTEM_USER keyword returns in a backend.

I believe what was meant is "authname", not "authuser".

> This overlaps with for example the values in pg_stat_gss, but it will
> include values for authentication methods that don't have their own
> view such as peer/ident. gss/ssl info will of course still be shown,
> it is just in more than one place.
>
> I was originally thinking this column should be "sysuser" to map to
> the keyword, but since we already have "usesysid" as a column name in
> pg_stat_activity I figured that could be confusing since it actually
> means something completely different. But happy to change that back if
> people think that's better.

This part of the documentation is wrong:

```
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>authname</structfield> <type>name</type>
+      </para>
```

Actually the type is `text`:

```
=# \d  pg_stat_activity ;
                      View "pg_catalog.pg_stat_activity"
      Column      |           Type           | Collation | Nullable | Default
------------------+--------------------------+-----------+----------+---------
 datid            | oid                      |           |          |
 datname          | name                     |           |          |
 pid              | integer                  |           |          |
 leader_pid       | integer                  |           |          |
 usesysid         | oid                      |           |          |
 usename          | name                     |           |          |
 authname         | text                     |           |          |
```

It hurts my sense of beauty that usename and authname are of different
types. But if I'm the only one, maybe we can close our eyes on this.
Also I suspect that placing usename and authname in a close proximity
can be somewhat confusing. Perhaps adding authname as the last column
of the view will solve both nitpicks?

```
+    /* Information about the authenticated user */
+    char        st_authuser[NAMEDATALEN];
```

Well, here it's called "authuser" and it looks like the intention was
to use `name` datatype... I suggest using "authname" everywhere for
consistency.

Since the patch affects pg_proc.dat I believe the commit message
should remind bumping the catalog version.

-- 
Best regards,
Aleksander Alekseev



Re: System username in pg_stat_activity

From
Magnus Hagander
Date:
On Wed, Jan 10, 2024 at 1:44 PM Aleksander Alekseev
<aleksander@timescale.com> wrote:
>
> Hi,
>
> Thanks for the patch.
>
> > The attached patch adds a column "authuser" to pg_stat_activity which
> > contains the username of the externally authenticated user, being the
> > same value as the SYSTEM_USER keyword returns in a backend.
>
> I believe what was meant is "authname", not "authuser".
>
> > This overlaps with for example the values in pg_stat_gss, but it will
> > include values for authentication methods that don't have their own
> > view such as peer/ident. gss/ssl info will of course still be shown,
> > it is just in more than one place.
> >
> > I was originally thinking this column should be "sysuser" to map to
> > the keyword, but since we already have "usesysid" as a column name in
> > pg_stat_activity I figured that could be confusing since it actually
> > means something completely different. But happy to change that back if
> > people think that's better.
>
> This part of the documentation is wrong:
>
> ```
> +     <row>
> +      <entry role="catalog_table_entry"><para role="column_definition">
> +       <structfield>authname</structfield> <type>name</type>
> +      </para>
> ```
>
> Actually the type is `text`:
>
> ```
> =# \d  pg_stat_activity ;
>                       View "pg_catalog.pg_stat_activity"
>       Column      |           Type           | Collation | Nullable | Default
> ------------------+--------------------------+-----------+----------+---------
>  datid            | oid                      |           |          |
>  datname          | name                     |           |          |
>  pid              | integer                  |           |          |
>  leader_pid       | integer                  |           |          |
>  usesysid         | oid                      |           |          |
>  usename          | name                     |           |          |
>  authname         | text                     |           |          |
> ```
>
> It hurts my sense of beauty that usename and authname are of different
> types. But if I'm the only one, maybe we can close our eyes on this.
> Also I suspect that placing usename and authname in a close proximity
> can be somewhat confusing. Perhaps adding authname as the last column
> of the view will solve both nitpicks?

But it should probably actually be name, given that's the underlying
datatype. I kept changing it around and ended up half way in
between...


> ```
> +    /* Information about the authenticated user */
> +    char        st_authuser[NAMEDATALEN];
> ```
>
> Well, here it's called "authuser" and it looks like the intention was
> to use `name` datatype... I suggest using "authname" everywhere for
> consistency.

Yeah, I flipped back and forth a few times and clearly got stuck in
the middle. They should absolutely be the same everywhere - whatever
name is used it should be consistent.


> Since the patch affects pg_proc.dat I believe the commit message
> should remind bumping the catalog version.

Yes.

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Attachment

Re: System username in pg_stat_activity

From
Dagfinn Ilmari Mannsåker
Date:
Magnus Hagander <magnus@hagander.net> writes:

> On Wed, Jan 10, 2024 at 1:44 PM Aleksander Alekseev
> <aleksander@timescale.com> wrote:
>>
>> It hurts my sense of beauty that usename and authname are of different
>> types. But if I'm the only one, maybe we can close our eyes on this.
>> Also I suspect that placing usename and authname in a close proximity
>> can be somewhat confusing. Perhaps adding authname as the last column
>> of the view will solve both nitpicks?
>
> But it should probably actually be name, given that's the underlying
> datatype. I kept changing it around and ended up half way in
> between...

https://www.postgresql.org/docs/current/functions-info.html#FUNCTIONS-INFO-SESSION-TABLE
(and pg_typeof(system_user)) says it's text.  Which makes sense, since
it can easily be longer than 63 bytes, e.g. in the case of a TLS client
certificate DN.

- ilmari



Re: System username in pg_stat_activity

From
Magnus Hagander
Date:
On Wed, Jan 10, 2024 at 2:27 PM Dagfinn Ilmari Mannsåker
<ilmari@ilmari.org> wrote:
>
> Magnus Hagander <magnus@hagander.net> writes:
>
> > On Wed, Jan 10, 2024 at 1:44 PM Aleksander Alekseev
> > <aleksander@timescale.com> wrote:
> >>
> >> It hurts my sense of beauty that usename and authname are of different
> >> types. But if I'm the only one, maybe we can close our eyes on this.
> >> Also I suspect that placing usename and authname in a close proximity
> >> can be somewhat confusing. Perhaps adding authname as the last column
> >> of the view will solve both nitpicks?
> >
> > But it should probably actually be name, given that's the underlying
> > datatype. I kept changing it around and ended up half way in
> > between...
>
> https://www.postgresql.org/docs/current/functions-info.html#FUNCTIONS-INFO-SESSION-TABLE
> (and pg_typeof(system_user)) says it's text.  Which makes sense, since
> it can easily be longer than 63 bytes, e.g. in the case of a TLS client
> certificate DN.

We already truncate all those to NAMEDATALEN in pg_stat_ssl for
example. so I think the truncation part of those should be OK. We'll
truncate "a little bit more" since we also have the 'cert:', but not
significantly so I think.

but yeah, conceptually it should probably be text because name is
supposedly a *postgres identifier*, which this is not.

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: System username in pg_stat_activity

From
Bertrand Drouvot
Date:
Hi,

On Wed, Jan 10, 2024 at 02:08:03PM +0100, Magnus Hagander wrote:
> On Wed, Jan 10, 2024 at 1:44 PM Aleksander Alekseev
> <aleksander@timescale.com> wrote:
> >
> > Hi,
> >
> > Thanks for the patch.

+1

> > > This overlaps with for example the values in pg_stat_gss, but it will
> > > include values for authentication methods that don't have their own
> > > view such as peer/ident. gss/ssl info will of course still be shown,
> > > it is just in more than one place.

Yeah, I think that's a good idea.

> > It hurts my sense of beauty that usename and authname are of different
> > types. But if I'm the only one, maybe we can close our eyes on this.
> > Also I suspect that placing usename and authname in a close proximity
> > can be somewhat confusing. Perhaps adding authname as the last column
> > of the view will solve both nitpicks?
> 
> But it should probably actually be name, given that's the underlying
> datatype. I kept changing it around and ended up half way in
> between...
> 
> 
> > ```
> > +    /* Information about the authenticated user */
> > +    char        st_authuser[NAMEDATALEN];
> > ```
> >
> > Well, here it's called "authuser" and it looks like the intention was
> > to use `name` datatype... I suggest using "authname" everywhere for
> > consistency.

I think it depends what we want the new field to reflect. If it is the exact
same thing as the SYSTEM_USER then I think it has to be text (as the SYSTEM_USER
is made of "auth_method:identity"). Now if we want it to be "only" the identity
part of it, then the `name` datatype would be fine. I'd vote for the exact same
thing as the SYSTEM_USER (means auth_method:identity).

> +     <row>
> +      <entry role="catalog_table_entry"><para role="column_definition">
> +       <structfield>authname</structfield> <type>name</type>
> +      </para>
> +      <para>
> +       The authentication method and identity (if any) that the user
> +       used to log in. It contains the same value as
> +       <xref linkend="system-user" /> returns in the backend.
> +      </para></entry>
> +     </row>

I'm fine with auth_method:identity.

> +            S.authname,

What about using system_user as the field name? (because if we keep
auth_method:identity it's not really the authname anyway).

Also, what about adding a test in say 003_peer.pl to check that the value matches
the SYSTEM_USER one?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: System username in pg_stat_activity

From
Magnus Hagander
Date:
On Wed, Jan 10, 2024 at 2:56 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Wed, Jan 10, 2024 at 02:08:03PM +0100, Magnus Hagander wrote:
> > On Wed, Jan 10, 2024 at 1:44 PM Aleksander Alekseev
> > <aleksander@timescale.com> wrote:
> > >
> > > Hi,
> > >
> > > Thanks for the patch.
>
> +1
>
> > > > This overlaps with for example the values in pg_stat_gss, but it will
> > > > include values for authentication methods that don't have their own
> > > > view such as peer/ident. gss/ssl info will of course still be shown,
> > > > it is just in more than one place.
>
> Yeah, I think that's a good idea.
>
> > > It hurts my sense of beauty that usename and authname are of different
> > > types. But if I'm the only one, maybe we can close our eyes on this.
> > > Also I suspect that placing usename and authname in a close proximity
> > > can be somewhat confusing. Perhaps adding authname as the last column
> > > of the view will solve both nitpicks?
> >
> > But it should probably actually be name, given that's the underlying
> > datatype. I kept changing it around and ended up half way in
> > between...
> >
> >
> > > ```
> > > +    /* Information about the authenticated user */
> > > +    char        st_authuser[NAMEDATALEN];
> > > ```
> > >
> > > Well, here it's called "authuser" and it looks like the intention was
> > > to use `name` datatype... I suggest using "authname" everywhere for
> > > consistency.
>
> I think it depends what we want the new field to reflect. If it is the exact
> same thing as the SYSTEM_USER then I think it has to be text (as the SYSTEM_USER
> is made of "auth_method:identity"). Now if we want it to be "only" the identity
> part of it, then the `name` datatype would be fine. I'd vote for the exact same
> thing as the SYSTEM_USER (means auth_method:identity).

I definitely think it should be the same. If it's not exactly the
same, then it should be *two* columns, one with auth method and one
with the name.

And thinking more about it maybe that's cleaner, because that makes it
easier to do things like filter based on auth method?


> > +     <row>
> > +      <entry role="catalog_table_entry"><para role="column_definition">
> > +       <structfield>authname</structfield> <type>name</type>
> > +      </para>
> > +      <para>
> > +       The authentication method and identity (if any) that the user
> > +       used to log in. It contains the same value as
> > +       <xref linkend="system-user" /> returns in the backend.
> > +      </para></entry>
> > +     </row>
>
> I'm fine with auth_method:identity.
>
> > +            S.authname,
>
> What about using system_user as the field name? (because if we keep
> auth_method:identity it's not really the authname anyway).

I was worried system_user or sysuser would both be confusing with the
fact that we have usesysid -- which would reference a *different*
sys...


> Also, what about adding a test in say 003_peer.pl to check that the value matches
> the SYSTEM_USER one?

Yeah, that's a good idea I think. I quickly looked over for tests and
couldn't really find any for pg_stat_activity, btu we can definitely
piggyback them in one or more of the auth tests.

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: System username in pg_stat_activity

From
Bertrand Drouvot
Date:
Hi,

On Wed, Jan 10, 2024 at 02:59:42PM +0100, Magnus Hagander wrote:
> On Wed, Jan 10, 2024 at 2:56 PM Bertrand Drouvot
> I definitely think it should be the same. If it's not exactly the
> same, then it should be *two* columns, one with auth method and one
> with the name.
> 
> And thinking more about it maybe that's cleaner, because that makes it
> easier to do things like filter based on auth method?

Yeah, that's sounds even better.

> 
> > > +     <row>
> > > +      <entry role="catalog_table_entry"><para role="column_definition">
> > > +       <structfield>authname</structfield> <type>name</type>
> > > +      </para>
> > > +      <para>
> > > +       The authentication method and identity (if any) that the user
> > > +       used to log in. It contains the same value as
> > > +       <xref linkend="system-user" /> returns in the backend.
> > > +      </para></entry>
> > > +     </row>
> >
> > I'm fine with auth_method:identity.
> >
> > > +            S.authname,
> >
> > What about using system_user as the field name? (because if we keep
> > auth_method:identity it's not really the authname anyway).
> 
> I was worried system_user or sysuser would both be confusing with the
> fact that we have usesysid -- which would reference a *different*
> sys...

If we go the 2 fields way, then what about auth_identity and auth_method then?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: System username in pg_stat_activity

From
Joe Conway
Date:
On 1/10/24 08:59, Magnus Hagander wrote:
> On Wed, Jan 10, 2024 at 2:56 PM Bertrand Drouvot
>> I think it depends what we want the new field to reflect. If it is the exact
>> same thing as the SYSTEM_USER then I think it has to be text (as the SYSTEM_USER
>> is made of "auth_method:identity"). Now if we want it to be "only" the identity
>> part of it, then the `name` datatype would be fine. I'd vote for the exact same
>> thing as the SYSTEM_USER (means auth_method:identity).
> 
> I definitely think it should be the same. If it's not exactly the
> same, then it should be *two* columns, one with auth method and one
> with the name.
> 
> And thinking more about it maybe that's cleaner, because that makes it
> easier to do things like filter based on auth method?

+1 for the overall feature and +1 for two columns

>> > +     <row>
>> > +      <entry role="catalog_table_entry"><para role="column_definition">
>> > +       <structfield>authname</structfield> <type>name</type>
>> > +      </para>
>> > +      <para>
>> > +       The authentication method and identity (if any) that the user
>> > +       used to log in. It contains the same value as
>> > +       <xref linkend="system-user" /> returns in the backend.
>> > +      </para></entry>
>> > +     </row>
>>
>> I'm fine with auth_method:identity.
>>
>> > +            S.authname,
>>
>> What about using system_user as the field name? (because if we keep
>> auth_method:identity it's not really the authname anyway).
> 
> I was worried system_user or sysuser would both be confusing with the
> fact that we have usesysid -- which would reference a *different*
> sys...


I think if it is exactly "system_user" it would be pretty clearly a 
match for SYSTEM_USER


-- 
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: System username in pg_stat_activity

From
Magnus Hagander
Date:
On Wed, Jan 10, 2024 at 3:12 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Wed, Jan 10, 2024 at 02:59:42PM +0100, Magnus Hagander wrote:
> > On Wed, Jan 10, 2024 at 2:56 PM Bertrand Drouvot
> > I definitely think it should be the same. If it's not exactly the
> > same, then it should be *two* columns, one with auth method and one
> > with the name.
> >
> > And thinking more about it maybe that's cleaner, because that makes it
> > easier to do things like filter based on auth method?
>
> Yeah, that's sounds even better.
>
> >
> > > > +     <row>
> > > > +      <entry role="catalog_table_entry"><para role="column_definition">
> > > > +       <structfield>authname</structfield> <type>name</type>
> > > > +      </para>
> > > > +      <para>
> > > > +       The authentication method and identity (if any) that the user
> > > > +       used to log in. It contains the same value as
> > > > +       <xref linkend="system-user" /> returns in the backend.
> > > > +      </para></entry>
> > > > +     </row>
> > >
> > > I'm fine with auth_method:identity.
> > >
> > > > +            S.authname,
> > >
> > > What about using system_user as the field name? (because if we keep
> > > auth_method:identity it's not really the authname anyway).
> >
> > I was worried system_user or sysuser would both be confusing with the
> > fact that we have usesysid -- which would reference a *different*
> > sys...
>
> If we go the 2 fields way, then what about auth_identity and auth_method then?


Here is an updated patch based on this idea.

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Attachment

Re: System username in pg_stat_activity

From
Bertrand Drouvot
Date:
Hi,

On Thu, Jan 11, 2024 at 02:24:58PM +0100, Magnus Hagander wrote:
> On Wed, Jan 10, 2024 at 3:12 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > If we go the 2 fields way, then what about auth_identity and auth_method then?
> 
> 
> Here is an updated patch based on this idea.

Thanks!

+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>auth_method</structfield> <type>text</type>
+      </para>
+      <para>
+       The authentication method used for authenticating the connection, or
+       NULL for background processes.
+      </para></entry>

I'm wondering if it would make sense to populate it for parallel workers too.
I think it's doable thanks to d951052, but I'm not sure it's worth it (one could
join based on the leader_pid though). OTOH that would be consistent with
how the SYSTEM_USER behaves with parallel workers (it's populated).

+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>auth_identity</structfield> <type>text</type>
+      </para>
+      <para>
+       The identity (if any) that the user presented during the authentication
+       cycle before they were assigned a database role.  Contains the same
+       value as <xref linkend="system-user" />

Same remark regarding the parallel workers case +:

- Would it be better to use the `name` datatype for auth_identity?
- what about "Contains the same value as the identity part in <xref linkend="system-user" />"?

+                       /*
+                        * Trust doesn't set_authn_id(), but we still need to store the
+                        * auth_method
+                        */
+                       MyClientConnectionInfo.auth_method = uaTrust;

+1, I think it is useful here to provide "trust" and not a NULL value in the
context of this patch.

+# pg_stat_activity shold contain trust and empty string for trust auth

typo: s/shold/should/

+# Users with md5 auth should show both auth method and name in pg_stat_activity

what about "show both auth method and identity"?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: System username in pg_stat_activity

From
Magnus Hagander
Date:
On Thu, Jan 11, 2024 at 5:55 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Thu, Jan 11, 2024 at 02:24:58PM +0100, Magnus Hagander wrote:
> > On Wed, Jan 10, 2024 at 3:12 PM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> > >
> > > If we go the 2 fields way, then what about auth_identity and auth_method then?
> >
> >
> > Here is an updated patch based on this idea.
>
> Thanks!
>
> +     <row>
> +      <entry role="catalog_table_entry"><para role="column_definition">
> +       <structfield>auth_method</structfield> <type>text</type>
> +      </para>
> +      <para>
> +       The authentication method used for authenticating the connection, or
> +       NULL for background processes.
> +      </para></entry>
>
> I'm wondering if it would make sense to populate it for parallel workers too.
> I think it's doable thanks to d951052, but I'm not sure it's worth it (one could
> join based on the leader_pid though). OTOH that would be consistent with
> how the SYSTEM_USER behaves with parallel workers (it's populated).

I guess one could conceptually argue that "authentication happens int
he leader". But we do populate it with the other user records, and
it'd be weird if this one was excluded.

The tricky thing is that pgstat_bestart() is called long before we
deserialize the data. But from what I can tell it should be safe to
change it per the attached? That should be AFAICT an extremely short
window of time longer before we report it, not enough to matter.


> +      <entry role="catalog_table_entry"><para role="column_definition">
> +       <structfield>auth_identity</structfield> <type>text</type>
> +      </para>
> +      <para>
> +       The identity (if any) that the user presented during the authentication
> +       cycle before they were assigned a database role.  Contains the same
> +       value as <xref linkend="system-user" />
>
> Same remark regarding the parallel workers case +:
>
> - Would it be better to use the `name` datatype for auth_identity?

I've been going back and forth. And I think my conclusion is that it's
not a postgres identifier, so it shouldn't be. See the earlier
discussion, and for example that that's what we do for cert names when
SSL is used.

> - what about "Contains the same value as the identity part in <xref linkend="system-user" />"?
>
> +                       /*
> +                        * Trust doesn't set_authn_id(), but we still need to store the
> +                        * auth_method
> +                        */
> +                       MyClientConnectionInfo.auth_method = uaTrust;
>
> +1, I think it is useful here to provide "trust" and not a NULL value in the
> context of this patch.

Yeah, that's probably "independently correct", but actually useful here.


> +# pg_stat_activity shold contain trust and empty string for trust auth
>
> typo: s/shold/should/

Ops.


> +# Users with md5 auth should show both auth method and name in pg_stat_activity
>
> what about "show both auth method and identity"?

Good spot, yeah, I changed it over to identity everywhere else so it
should be here as well.

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Attachment

Re: System username in pg_stat_activity

From
Bertrand Drouvot
Date:
Hi,

On Fri, Jan 12, 2024 at 05:16:53PM +0100, Magnus Hagander wrote:
> On Thu, Jan 11, 2024 at 5:55 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > I'm wondering if it would make sense to populate it for parallel workers too.
> > I think it's doable thanks to d951052, but I'm not sure it's worth it (one could
> > join based on the leader_pid though). OTOH that would be consistent with
> > how the SYSTEM_USER behaves with parallel workers (it's populated).
> 
> I guess one could conceptually argue that "authentication happens int
> he leader". But we do populate it with the other user records, and
> it'd be weird if this one was excluded.
> 
> The tricky thing is that pgstat_bestart() is called long before we
> deserialize the data. But from what I can tell it should be safe to
> change it per the attached? That should be AFAICT an extremely short
> window of time longer before we report it, not enough to matter.

Thanks! Yeah, that seems reasonable to me. Also, I think we should remove the
"MyProcPort" test here then (looking at v3):

+       if (MyProcPort && MyClientConnectionInfo.authn_id)
+               strlcpy(lbeentry.st_auth_identity, MyClientConnectionInfo.authn_id, NAMEDATALEN);
+       else
+               MemSet(&lbeentry.st_auth_identity, 0, sizeof(lbeentry.st_auth_identity));

to get the st_auth_identity propagated to the parallel workers.

> >
> > Same remark regarding the parallel workers case +:
> >
> > - Would it be better to use the `name` datatype for auth_identity?
> 
> I've been going back and forth. And I think my conclusion is that it's
> not a postgres identifier, so it shouldn't be. See the earlier
> discussion, and for example that that's what we do for cert names when
> SSL is used.

Yeah, Okay let's keep text then.

> 
> > - what about "Contains the same value as the identity part in <xref linkend="system-user" />"?

Not sure, but looks like you missed this comment?

> >
> > +                       /*
> > +                        * Trust doesn't set_authn_id(), but we still need to store the
> > +                        * auth_method
> > +                        */
> > +                       MyClientConnectionInfo.auth_method = uaTrust;
> >
> > +1, I think it is useful here to provide "trust" and not a NULL value in the
> > context of this patch.
> 
> Yeah, that's probably "independently correct", but actually useful here.

+1

> > +# Users with md5 auth should show both auth method and name in pg_stat_activity
> >
> > what about "show both auth method and identity"?
> 
> Good spot, yeah, I changed it over to identity everywhere else so it
> should be here as well.

Did you forget to share the new revision (aka v4)? I can only see the
"reorder_parallel_worker_bestart.patch" attached.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: System username in pg_stat_activity

From
Magnus Hagander
Date:
On Mon, Jan 15, 2024 at 11:17 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Fri, Jan 12, 2024 at 05:16:53PM +0100, Magnus Hagander wrote:
> > On Thu, Jan 11, 2024 at 5:55 PM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> > >
> > > I'm wondering if it would make sense to populate it for parallel workers too.
> > > I think it's doable thanks to d951052, but I'm not sure it's worth it (one could
> > > join based on the leader_pid though). OTOH that would be consistent with
> > > how the SYSTEM_USER behaves with parallel workers (it's populated).
> >
> > I guess one could conceptually argue that "authentication happens int
> > he leader". But we do populate it with the other user records, and
> > it'd be weird if this one was excluded.
> >
> > The tricky thing is that pgstat_bestart() is called long before we
> > deserialize the data. But from what I can tell it should be safe to
> > change it per the attached? That should be AFAICT an extremely short
> > window of time longer before we report it, not enough to matter.
>
> Thanks! Yeah, that seems reasonable to me. Also, I think we should remove the
> "MyProcPort" test here then (looking at v3):
>
> +       if (MyProcPort && MyClientConnectionInfo.authn_id)
> +               strlcpy(lbeentry.st_auth_identity, MyClientConnectionInfo.authn_id, NAMEDATALEN);
> +       else
> +               MemSet(&lbeentry.st_auth_identity, 0, sizeof(lbeentry.st_auth_identity));
>
> to get the st_auth_identity propagated to the parallel workers.

Yup, I had done that in v4 which as you noted further down, I forgot to post.


> > > - what about "Contains the same value as the identity part in <xref linkend="system-user" />"?
>
> Not sure, but looks like you missed this comment?

I did. Agree with your comment, and updated now.


> > > +# Users with md5 auth should show both auth method and name in pg_stat_activity
> > >
> > > what about "show both auth method and identity"?
> >
> > Good spot, yeah, I changed it over to identity everywhere else so it
> > should be here as well.
>
> Did you forget to share the new revision (aka v4)? I can only see the
> "reorder_parallel_worker_bestart.patch" attached.

I did. Here it is, and also including that suggested docs fix as well
as a rebase on current master.

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Attachment

Re: System username in pg_stat_activity

From
Bertrand Drouvot
Date:
Hi,

On Thu, Jan 18, 2024 at 04:01:33PM +0100, Magnus Hagander wrote:
> On Mon, Jan 15, 2024 at 11:17 AM Bertrand Drouvot
> > Did you forget to share the new revision (aka v4)? I can only see the
> > "reorder_parallel_worker_bestart.patch" attached.
> 
> I did. Here it is, and also including that suggested docs fix as well
> as a rebase on current master.
> 

Thanks!

Just a few comments:

1 ===

+       The authentication method used for authenticating the connection, or
+       NULL for background processes.

What about? "NULL for background processes (except for parallel workers which
inherit this information from their leader process)"

2 ===

+       cycle before they were assigned a database role.  Contains the same
+       value as the identity part in <xref linkend="system-user" />, or NULL
+       for background processes.

Same comment about parallel workers.

3 ===

+# pg_stat_activity should contain trust and empty string for trust auth
+$res = $node->safe_psql(
+       'postgres',
+       "SELECT auth_method, auth_identity='' FROM pg_stat_activity WHERE pid=pg_backend_pid()",
+       connstr => "user=scram_role");
+is($res, 'trust|t', 'Users with trust authentication should show correctly in pg_stat_activity');
+
+# pg_stat_activity should contain NULL for auth of background processes
+# (test is a bit out of place here, but this is where the other pg_stat_activity.auth* tests are)
+$res = $node->safe_psql(
+       'postgres',
+       "SELECT auth_method IS NULL, auth_identity IS NULL FROM pg_stat_activity WHERE backend_type='checkpointer'",
+);
+is($res, 't|t', 'Background processes should show NULL for auth in pg_stat_activity');

What do you think about testing the parallel workers cases too? (I'm not 100%
sure it's worth the extra complexity though).

Apart from those 3, it looks good to me.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: System username in pg_stat_activity

From
Aleksander Alekseev
Date:
Hi,

> > Did you forget to share the new revision (aka v4)? I can only see the
> > "reorder_parallel_worker_bestart.patch" attached.
>
> I did. Here it is, and also including that suggested docs fix as well
> as a rebase on current master.

```
+    lbeentry.st_auth_method = MyClientConnectionInfo.auth_method;
+    if (MyClientConnectionInfo.authn_id)
+        strlcpy(lbeentry.st_auth_identity,
MyClientConnectionInfo.authn_id, NAMEDATALEN);
+    else
+        MemSet(&lbeentry.st_auth_identity, 0,
sizeof(lbeentry.st_auth_identity));
```

I believe using sizeof(lbeentry.st_auth_identity) instead of
NAMEDATALEN is generally considered a better practice.

Calling MemSet for a CString seems to be an overkill. I suggest
setting lbeentry.st_auth_identity[0] to zero.

Except for these nitpicks v4 LGTM.

-- 
Best regards,
Aleksander Alekseev



Re: System username in pg_stat_activity

From
Julien Rouhaud
Date:
Hi,

On Thu, Jan 18, 2024 at 11:01 PM Magnus Hagander <magnus@hagander.net> wrote:
>
> I did. Here it is, and also including that suggested docs fix as well
> as a rebase on current master.

+    if (MyClientConnectionInfo.authn_id)
+        strlcpy(lbeentry.st_auth_identity,
MyClientConnectionInfo.authn_id, NAMEDATALEN);
+    else
+        MemSet(&lbeentry.st_auth_identity, 0,
sizeof(lbeentry.st_auth_identity));

Should we use pg_mbcliplen() here?  I don't think there's any strong
guarantee that no multibyte character can be used.  I also agree with
the nearby comment about MemSet being overkill.

+       value as the identity part in <xref linkend="system-user" />, or NULL
I was looking at
https://www.postgresql.org/docs/current/auth-username-maps.html and
noticed that this page is switching between system-user and
system-username.  Should we clean that up while at it?



Re: System username in pg_stat_activity

From
Magnus Hagander
Date:
On Fri, Jan 19, 2024 at 7:20 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Thu, Jan 18, 2024 at 04:01:33PM +0100, Magnus Hagander wrote:
> > On Mon, Jan 15, 2024 at 11:17 AM Bertrand Drouvot
> > > Did you forget to share the new revision (aka v4)? I can only see the
> > > "reorder_parallel_worker_bestart.patch" attached.
> >
> > I did. Here it is, and also including that suggested docs fix as well
> > as a rebase on current master.
> >
>
> Thanks!
>
> Just a few comments:
>
> 1 ===
>
> +       The authentication method used for authenticating the connection, or
> +       NULL for background processes.
>
> What about? "NULL for background processes (except for parallel workers which
> inherit this information from their leader process)"

Ugh. That doesn't read very well at all to me. Maybe just "NULL for
background processes without a user"?


> 2 ===
>
> +       cycle before they were assigned a database role.  Contains the same
> +       value as the identity part in <xref linkend="system-user" />, or NULL
> +       for background processes.
>
> Same comment about parallel workers.
>
> 3 ===
>
> +# pg_stat_activity should contain trust and empty string for trust auth
> +$res = $node->safe_psql(
> +       'postgres',
> +       "SELECT auth_method, auth_identity='' FROM pg_stat_activity WHERE pid=pg_backend_pid()",
> +       connstr => "user=scram_role");
> +is($res, 'trust|t', 'Users with trust authentication should show correctly in pg_stat_activity');
> +
> +# pg_stat_activity should contain NULL for auth of background processes
> +# (test is a bit out of place here, but this is where the other pg_stat_activity.auth* tests are)
> +$res = $node->safe_psql(
> +       'postgres',
> +       "SELECT auth_method IS NULL, auth_identity IS NULL FROM pg_stat_activity WHERE backend_type='checkpointer'",
> +);
> +is($res, 't|t', 'Background processes should show NULL for auth in pg_stat_activity');
>
> What do you think about testing the parallel workers cases too? (I'm not 100%
> sure it's worth the extra complexity though).

I'm leaning towards not worth that.

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: System username in pg_stat_activity

From
Magnus Hagander
Date:
On Fri, Jan 19, 2024 at 12:33 PM Aleksander Alekseev
<aleksander@timescale.com> wrote:
>
> Hi,
>
> > > Did you forget to share the new revision (aka v4)? I can only see the
> > > "reorder_parallel_worker_bestart.patch" attached.
> >
> > I did. Here it is, and also including that suggested docs fix as well
> > as a rebase on current master.
>
> ```
> +    lbeentry.st_auth_method = MyClientConnectionInfo.auth_method;
> +    if (MyClientConnectionInfo.authn_id)
> +        strlcpy(lbeentry.st_auth_identity,
> MyClientConnectionInfo.authn_id, NAMEDATALEN);
> +    else
> +        MemSet(&lbeentry.st_auth_identity, 0,
> sizeof(lbeentry.st_auth_identity));
> ```
>
> I believe using sizeof(lbeentry.st_auth_identity) instead of
> NAMEDATALEN is generally considered a better practice.

We use the NAMEDATALEN method in the rest of the function, so I did it
the same way for consistency. I think if we want to change that, we
should change the whole function at once to keep it consistent.


> Calling MemSet for a CString seems to be an overkill. I suggest
> setting lbeentry.st_auth_identity[0] to zero.

Fair enough. Will make that change.

//Magnus



Re: System username in pg_stat_activity

From
Magnus Hagander
Date:
On Fri, Jan 19, 2024 at 1:43 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> Hi,
>
> On Thu, Jan 18, 2024 at 11:01 PM Magnus Hagander <magnus@hagander.net> wrote:
> >
> > I did. Here it is, and also including that suggested docs fix as well
> > as a rebase on current master.
>
> +    if (MyClientConnectionInfo.authn_id)
> +        strlcpy(lbeentry.st_auth_identity,
> MyClientConnectionInfo.authn_id, NAMEDATALEN);
> +    else
> +        MemSet(&lbeentry.st_auth_identity, 0,
> sizeof(lbeentry.st_auth_identity));
>
> Should we use pg_mbcliplen() here?  I don't think there's any strong
> guarantee that no multibyte character can be used.  I also agree with
> the nearby comment about MemSet being overkill.

Hm. Good question. I don't think there is such a guarantee, no. So
something like attached v5?

Also, wouldn't that problem already exist a few lines down for the SSL parts?

> +       value as the identity part in <xref linkend="system-user" />, or NULL
> I was looking at
> https://www.postgresql.org/docs/current/auth-username-maps.html and
> noticed that this page is switching between system-user and
> system-username.  Should we clean that up while at it?

Seems like something we should clean up yes, but not as part of this patch.

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Attachment

Re: System username in pg_stat_activity

From
Andres Freund
Date:
Hi,

On 2024-01-10 12:46:34 +0100, Magnus Hagander wrote:
> The attached patch adds a column "authuser" to pg_stat_activity which
> contains the username of the externally authenticated user, being the
> same value as the SYSTEM_USER keyword returns in a backend.

I continue to think that it's a bad idea to make pg_stat_activity ever wider
with columns that do not actually describe properties that change across the
course of a session.  Yes, there's the argument that that ship has sailed, but
I don't think that's a good reason to continue ever further down that road.

It's not just a usability issue, it also makes it more expensive to query
pg_stat_activity. This is of course more pronounced with textual columns than
with integer ones.

Greetings,

Andres Freund



Re: System username in pg_stat_activity

From
Andres Freund
Date:
Hi,

On 2024-01-12 17:16:53 +0100, Magnus Hagander wrote:
> On Thu, Jan 11, 2024 at 5:55 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> > On Thu, Jan 11, 2024 at 02:24:58PM +0100, Magnus Hagander wrote:
> > > On Wed, Jan 10, 2024 at 3:12 PM Bertrand Drouvot
> > > <bertranddrouvot.pg@gmail.com> wrote:
> > > >
> > > > If we go the 2 fields way, then what about auth_identity and auth_method then?
> > >
> > >
> > > Here is an updated patch based on this idea.
> >
> > Thanks!
> >
> > +     <row>
> > +      <entry role="catalog_table_entry"><para role="column_definition">
> > +       <structfield>auth_method</structfield> <type>text</type>
> > +      </para>
> > +      <para>
> > +       The authentication method used for authenticating the connection, or
> > +       NULL for background processes.
> > +      </para></entry>
> >
> > I'm wondering if it would make sense to populate it for parallel workers too.
> > I think it's doable thanks to d951052, but I'm not sure it's worth it (one could
> > join based on the leader_pid though). OTOH that would be consistent with
> > how the SYSTEM_USER behaves with parallel workers (it's populated).
> 
> I guess one could conceptually argue that "authentication happens int
> he leader". But we do populate it with the other user records, and
> it'd be weird if this one was excluded.
> 
> The tricky thing is that pgstat_bestart() is called long before we
> deserialize the data. But from what I can tell it should be safe to
> change it per the attached? That should be AFAICT an extremely short
> window of time longer before we report it, not enough to matter.

I don't like that one bit. The whole subsystem initialization dance already is
quite complicated, particularly for pgstat, we shouldn't make it more
complicated. Especially not when the initialization is moved quite a bit away
from all the other calls.

Besides just that, I also don't think delaying visibility of the worker in
pg_stat_activity until parallel worker initialization has completed is good,
that's not all cheap work.


Maybe I am missing something, but why aren't we just getting the value from
the leader's entry, instead of copying it?

Greetings,

Andres Freund



Re: System username in pg_stat_activity

From
Magnus Hagander
Date:
On Fri, Feb 16, 2024 at 8:41 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2024-01-10 12:46:34 +0100, Magnus Hagander wrote:
> > The attached patch adds a column "authuser" to pg_stat_activity which
> > contains the username of the externally authenticated user, being the
> > same value as the SYSTEM_USER keyword returns in a backend.
>
> I continue to think that it's a bad idea to make pg_stat_activity ever wider
> with columns that do not actually describe properties that change across the
> course of a session.  Yes, there's the argument that that ship has sailed, but
> I don't think that's a good reason to continue ever further down that road.
>
> It's not just a usability issue, it also makes it more expensive to query
> pg_stat_activity. This is of course more pronounced with textual columns than
> with integer ones.

That's a fair point, but I do think that has in most ways already sailed, yes.

I mean, we could split it into more than one view. But adding a new
view for every new thing we want to show is also not very good from
either a usability or performance perspective.  So where would we put
it?

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: System username in pg_stat_activity

From
Andres Freund
Date:
Hi,

On 2024-02-16 20:57:59 +0100, Magnus Hagander wrote:
> On Fri, Feb 16, 2024 at 8:41 PM Andres Freund <andres@anarazel.de> wrote:
> > On 2024-01-10 12:46:34 +0100, Magnus Hagander wrote:
> > > The attached patch adds a column "authuser" to pg_stat_activity which
> > > contains the username of the externally authenticated user, being the
> > > same value as the SYSTEM_USER keyword returns in a backend.
> >
> > I continue to think that it's a bad idea to make pg_stat_activity ever wider
> > with columns that do not actually describe properties that change across the
> > course of a session.  Yes, there's the argument that that ship has sailed, but
> > I don't think that's a good reason to continue ever further down that road.
> >
> > It's not just a usability issue, it also makes it more expensive to query
> > pg_stat_activity. This is of course more pronounced with textual columns than
> > with integer ones.
> 
> That's a fair point, but I do think that has in most ways already sailed, yes.
> 
> I mean, we could split it into more than one view. But adding a new
> view for every new thing we want to show is also not very good from
> either a usability or performance perspective.  So where would we put
> it?

I think we should group new properties that don't change over the course of a
session ([1]) in a new view (e.g. pg_stat_session). I don't think we need one
view per property, but I do think it makes sense to split information that
changes very frequently (like most pg_stat_activity contents) from information
that doesn't (like auth_method, auth_identity).

Greetings,

Andres Freund

[1]

Additionally I think something like pg_stat_session could also contain
per-session cumulative counters like the session's contribution to
pg_stat_database.{idle_in_transaction_time,active_time}



Re: System username in pg_stat_activity

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> I mean, we could split it into more than one view. But adding a new
> view for every new thing we want to show is also not very good from
> either a usability or performance perspective.  So where would we put
> it?

It'd have to be a new view with a row per session, showing static
(or at least mostly static?) properties of the session.

Could we move some existing fields of pg_stat_activity into such a
view?  In any case, there'd have to be a key column to use to join
it to pg_stat_activity.

I'm not sure that this is worth the trouble TBH.  If it can be shown
that pulling a few fields out of pg_stat_activity actually does make
for a useful speedup, then maybe OK ... but Andres hasn't provided
any evidence that there's a measurable issue.

            regards, tom lane



Re: System username in pg_stat_activity

From
Magnus Hagander
Date:
On Fri, Feb 16, 2024 at 9:20 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2024-02-16 20:57:59 +0100, Magnus Hagander wrote:
> > On Fri, Feb 16, 2024 at 8:41 PM Andres Freund <andres@anarazel.de> wrote:
> > > On 2024-01-10 12:46:34 +0100, Magnus Hagander wrote:
> > > > The attached patch adds a column "authuser" to pg_stat_activity which
> > > > contains the username of the externally authenticated user, being the
> > > > same value as the SYSTEM_USER keyword returns in a backend.
> > >
> > > I continue to think that it's a bad idea to make pg_stat_activity ever wider
> > > with columns that do not actually describe properties that change across the
> > > course of a session.  Yes, there's the argument that that ship has sailed, but
> > > I don't think that's a good reason to continue ever further down that road.
> > >
> > > It's not just a usability issue, it also makes it more expensive to query
> > > pg_stat_activity. This is of course more pronounced with textual columns than
> > > with integer ones.
> >
> > That's a fair point, but I do think that has in most ways already sailed, yes.
> >
> > I mean, we could split it into more than one view. But adding a new
> > view for every new thing we want to show is also not very good from
> > either a usability or performance perspective.  So where would we put
> > it?
>
> I think we should group new properties that don't change over the course of a
> session ([1]) in a new view (e.g. pg_stat_session). I don't think we need one
> view per property, but I do think it makes sense to split information that
> changes very frequently (like most pg_stat_activity contents) from information
> that doesn't (like auth_method, auth_identity).

That would make sense in many ways, but ends up with "other level of
annoyances". E.g. the database name and oid don't change, but would we
want to move those out of pg_stat_activity? Same for username? Don't
we just end up in a grayzone about what belongs where?

Also - were you envisioning just another view, or actually replacing
the pg_stat_get_activity() part? As in where do you think the cost
comes?

(And as to Toms question about key column - the pid column can surely
be that? We already do that for pg_stat_ssl and pg_stat_gssapi, that
are both driven from pg_stat_get_activity() but shows a different set
of columns.


> Additionally I think something like pg_stat_session could also contain
> per-session cumulative counters like the session's contribution to
> pg_stat_database.{idle_in_transaction_time,active_time}

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: System username in pg_stat_activity

From
Magnus Hagander
Date:
On Fri, Feb 16, 2024 at 8:55 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2024-01-12 17:16:53 +0100, Magnus Hagander wrote:
> > On Thu, Jan 11, 2024 at 5:55 PM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> > > On Thu, Jan 11, 2024 at 02:24:58PM +0100, Magnus Hagander wrote:
> > > > On Wed, Jan 10, 2024 at 3:12 PM Bertrand Drouvot
> > > > <bertranddrouvot.pg@gmail.com> wrote:
> > > > >
> > > > > If we go the 2 fields way, then what about auth_identity and auth_method then?
> > > >
> > > >
> > > > Here is an updated patch based on this idea.
> > >
> > > Thanks!
> > >
> > > +     <row>
> > > +      <entry role="catalog_table_entry"><para role="column_definition">
> > > +       <structfield>auth_method</structfield> <type>text</type>
> > > +      </para>
> > > +      <para>
> > > +       The authentication method used for authenticating the connection, or
> > > +       NULL for background processes.
> > > +      </para></entry>
> > >
> > > I'm wondering if it would make sense to populate it for parallel workers too.
> > > I think it's doable thanks to d951052, but I'm not sure it's worth it (one could
> > > join based on the leader_pid though). OTOH that would be consistent with
> > > how the SYSTEM_USER behaves with parallel workers (it's populated).
> >
> > I guess one could conceptually argue that "authentication happens int
> > he leader". But we do populate it with the other user records, and
> > it'd be weird if this one was excluded.
> >
> > The tricky thing is that pgstat_bestart() is called long before we
> > deserialize the data. But from what I can tell it should be safe to
> > change it per the attached? That should be AFAICT an extremely short
> > window of time longer before we report it, not enough to matter.
>
> I don't like that one bit. The whole subsystem initialization dance already is
> quite complicated, particularly for pgstat, we shouldn't make it more
> complicated. Especially not when the initialization is moved quite a bit away
> from all the other calls.
>
> Besides just that, I also don't think delaying visibility of the worker in
> pg_stat_activity until parallel worker initialization has completed is good,
> that's not all cheap work.
>
>
> Maybe I am missing something, but why aren't we just getting the value from
> the leader's entry, instead of copying it?

The answer to that would be "because I didn't think of it" :)

Were you thinking of something like the attached?

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Attachment

Re: System username in pg_stat_activity

From
Andres Freund
Date:
Hi,

On 2024-02-16 15:22:16 -0500, Tom Lane wrote:
> Magnus Hagander <magnus@hagander.net> writes:
> > I mean, we could split it into more than one view. But adding a new
> > view for every new thing we want to show is also not very good from
> > either a usability or performance perspective.  So where would we put
> > it?
>
> It'd have to be a new view with a row per session, showing static
> (or at least mostly static?) properties of the session.

Yep.


> Could we move some existing fields of pg_stat_activity into such a
> view?

I'd suspect that at least some of
 - leader_pid
 - datid
 - datname
 - usesysid
 - usename
 - backend_start
 - client_addr
 - client_hostname
 - client_port
 - backend_type

could be moved. Whether's worth breaking existing queries, I don't quite know.

One option would be to not return (some) of them from pg_stat_get_activity(),
but add them to the view in a way that the planner can elide the reference.


> I'm not sure that this is worth the trouble TBH.  If it can be shown
> that pulling a few fields out of pg_stat_activity actually does make
> for a useful speedup, then maybe OK ... but Andres hasn't provided
> any evidence that there's a measurable issue.

If I thought that the two columns proposed here were all that we wanted to
add, I'd not be worried. But there have been quite a few other fields
proposed, e.g. tracking idle/active time on a per-connection granularity.

We even already have a patch to add pg_stat_session
https://commitfest.postgresql.org/47/3405/

Greetings,

Andres Freund



Re: System username in pg_stat_activity

From
Andres Freund
Date:
Hi,

On 2024-02-16 21:41:41 +0100, Magnus Hagander wrote:
> > Maybe I am missing something, but why aren't we just getting the value from
> > the leader's entry, instead of copying it?
>
> The answer to that would be "because I didn't think of it" :)

:)


> Were you thinking of something like the attached?

> @@ -435,6 +438,22 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
>                  {
>                      values[29] = Int32GetDatum(leader->pid);
>                      nulls[29] = false;
> +
> +                    /*
> +                     * The authenticated user in a parallel worker is the same as the one in
> +                     * the leader, so look it up there.
> +                     */
> +                    if (leader->backendId)
> +                    {
> +                        LocalPgBackendStatus *leaderstat =
pgstat_get_local_beentry_by_backend_id(leader->backendId);
> +
> +                        if (leaderstat->backendStatus.st_auth_method != uaReject &&
leaderstat->backendStatus.st_auth_method!= uaImplicitReject)
 
> +                        {
> +                            nulls[31] = nulls[32] = false;
> +                            values[31] =
CStringGetTextDatum(hba_authname(leaderstat->backendStatus.st_auth_method));
> +                            values[32] = CStringGetTextDatum(leaderstat->backendStatus.st_auth_identity);
> +                        }
> +                    }

Mostly, yes.

I only skimmed the patch, but it sure looks to me that we could end up with
none of the branches setting 31,32, so I think you'd have to make sure to
handle that case.

Greetings,

Andres Freund



Re: System username in pg_stat_activity

From
Magnus Hagander
Date:
On Fri, Feb 16, 2024 at 9:51 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2024-02-16 21:41:41 +0100, Magnus Hagander wrote:
> > > Maybe I am missing something, but why aren't we just getting the value from
> > > the leader's entry, instead of copying it?
> >
> > The answer to that would be "because I didn't think of it" :)
>
> :)
>
>
> > Were you thinking of something like the attached?
>
> > @@ -435,6 +438,22 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
> >                               {
> >                                       values[29] = Int32GetDatum(leader->pid);
> >                                       nulls[29] = false;
> > +
> > +                                     /*
> > +                                      * The authenticated user in a parallel worker is the same as the one in
> > +                                      * the leader, so look it up there.
> > +                                      */
> > +                                     if (leader->backendId)
> > +                                     {
> > +                                             LocalPgBackendStatus *leaderstat =
pgstat_get_local_beentry_by_backend_id(leader->backendId);
> > +
> > +                                             if (leaderstat->backendStatus.st_auth_method != uaReject &&
leaderstat->backendStatus.st_auth_method!= uaImplicitReject) 
> > +                                             {
> > +                                                     nulls[31] = nulls[32] = false;
> > +                                                     values[31] =
CStringGetTextDatum(hba_authname(leaderstat->backendStatus.st_auth_method));
> > +                                                     values[32] =
CStringGetTextDatum(leaderstat->backendStatus.st_auth_identity);
> > +                                             }
> > +                                     }
>
> Mostly, yes.
>
> I only skimmed the patch, but it sure looks to me that we could end up with
> none of the branches setting 31,32, so I think you'd have to make sure to
> handle that case.

That case sets nulls[] for both of them to true I believe? And when
that is set I don't believe we need to set the values themselves.

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: System username in pg_stat_activity

From
Andres Freund
Date:
On 2024-02-16 21:56:25 +0100, Magnus Hagander wrote:
> On Fri, Feb 16, 2024 at 9:51 PM Andres Freund <andres@anarazel.de> wrote:
> > I only skimmed the patch, but it sure looks to me that we could end up with
> > none of the branches setting 31,32, so I think you'd have to make sure to
> > handle that case.
> 
> That case sets nulls[] for both of them to true I believe? And when
> that is set I don't believe we need to set the values themselves.

Seems I skimmed too quickly :) - you're right.



Re: System username in pg_stat_activity

From
Bertrand Drouvot
Date:
Hi,

On Fri, Feb 16, 2024 at 08:17:41PM +0100, Magnus Hagander wrote:
> On Fri, Jan 19, 2024 at 7:20 AM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > Hi,
> >
> > On Thu, Jan 18, 2024 at 04:01:33PM +0100, Magnus Hagander wrote:
> > > On Mon, Jan 15, 2024 at 11:17 AM Bertrand Drouvot
> > > > Did you forget to share the new revision (aka v4)? I can only see the
> > > > "reorder_parallel_worker_bestart.patch" attached.
> > >
> > > I did. Here it is, and also including that suggested docs fix as well
> > > as a rebase on current master.
> > >
> >
> > Thanks!
> >
> > Just a few comments:
> >
> > 1 ===
> >
> > +       The authentication method used for authenticating the connection, or
> > +       NULL for background processes.
> >
> > What about? "NULL for background processes (except for parallel workers which
> > inherit this information from their leader process)"
> 
> Ugh. That doesn't read very well at all to me. Maybe just "NULL for
> background processes without a user"?

Not sure, as I think it could be NULL for background processes that provided
a user in BackgroundWorkerInitializeConnection() too.

> > 2 ===
> >
> > +       cycle before they were assigned a database role.  Contains the same
> > +       value as the identity part in <xref linkend="system-user" />, or NULL
> > +       for background processes.
> >
> > Same comment about parallel workers.
> >
> > 3 ===
> >
> > +# pg_stat_activity should contain trust and empty string for trust auth
> > +$res = $node->safe_psql(
> > +       'postgres',
> > +       "SELECT auth_method, auth_identity='' FROM pg_stat_activity WHERE pid=pg_backend_pid()",
> > +       connstr => "user=scram_role");
> > +is($res, 'trust|t', 'Users with trust authentication should show correctly in pg_stat_activity');
> > +
> > +# pg_stat_activity should contain NULL for auth of background processes
> > +# (test is a bit out of place here, but this is where the other pg_stat_activity.auth* tests are)
> > +$res = $node->safe_psql(
> > +       'postgres',
> > +       "SELECT auth_method IS NULL, auth_identity IS NULL FROM pg_stat_activity WHERE
backend_type='checkpointer'",
> > +);
> > +is($res, 't|t', 'Background processes should show NULL for auth in pg_stat_activity');
> >
> > What do you think about testing the parallel workers cases too? (I'm not 100%
> > sure it's worth the extra complexity though).
> 
> I'm leaning towards not worth that.

Okay, I'm fine with that too.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: System username in pg_stat_activity

From
Bertrand Drouvot
Date:
Hi,

On Fri, Feb 16, 2024 at 08:39:26PM +0100, Magnus Hagander wrote:
> On Fri, Jan 19, 2024 at 1:43 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > +       value as the identity part in <xref linkend="system-user" />, or NULL
> > I was looking at
> > https://www.postgresql.org/docs/current/auth-username-maps.html and
> > noticed that this page is switching between system-user and
> > system-username.  Should we clean that up while at it?
> 
> Seems like something we should clean up yes, but not as part of this patch.

Agree, done in [1].

[1]: https://www.postgresql.org/message-id/ZdMWux1HpIebkEmd%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: System username in pg_stat_activity

From
Bertrand Drouvot
Date:
Hi,

On Fri, Feb 16, 2024 at 09:41:41PM +0100, Magnus Hagander wrote:
> On Fri, Feb 16, 2024 at 8:55 PM Andres Freund <andres@anarazel.de> wrote:
> >
> > Hi,
> >
> > On 2024-01-12 17:16:53 +0100, Magnus Hagander wrote:
> > > On Thu, Jan 11, 2024 at 5:55 PM Bertrand Drouvot
> > > <bertranddrouvot.pg@gmail.com> wrote:
> > > > On Thu, Jan 11, 2024 at 02:24:58PM +0100, Magnus Hagander wrote:
> > > > > On Wed, Jan 10, 2024 at 3:12 PM Bertrand Drouvot
> > > > > <bertranddrouvot.pg@gmail.com> wrote:
> > > > > >
> > > > > > If we go the 2 fields way, then what about auth_identity and auth_method then?
> > > > >
> > > > >
> > > > > Here is an updated patch based on this idea.
> > > >
> > > > Thanks!
> > > >
> > > > +     <row>
> > > > +      <entry role="catalog_table_entry"><para role="column_definition">
> > > > +       <structfield>auth_method</structfield> <type>text</type>
> > > > +      </para>
> > > > +      <para>
> > > > +       The authentication method used for authenticating the connection, or
> > > > +       NULL for background processes.
> > > > +      </para></entry>
> > > >
> > > > I'm wondering if it would make sense to populate it for parallel workers too.
> > > > I think it's doable thanks to d951052, but I'm not sure it's worth it (one could
> > > > join based on the leader_pid though). OTOH that would be consistent with
> > > > how the SYSTEM_USER behaves with parallel workers (it's populated).
> > >
> > > I guess one could conceptually argue that "authentication happens int
> > > he leader". But we do populate it with the other user records, and
> > > it'd be weird if this one was excluded.
> > >
> > > The tricky thing is that pgstat_bestart() is called long before we
> > > deserialize the data. But from what I can tell it should be safe to
> > > change it per the attached? That should be AFAICT an extremely short
> > > window of time longer before we report it, not enough to matter.
> >
> > I don't like that one bit. The whole subsystem initialization dance already is
> > quite complicated, particularly for pgstat, we shouldn't make it more
> > complicated. Especially not when the initialization is moved quite a bit away
> > from all the other calls.
> >
> > Besides just that, I also don't think delaying visibility of the worker in
> > pg_stat_activity until parallel worker initialization has completed is good,
> > that's not all cheap work.
> >
> >
> > Maybe I am missing something, but why aren't we just getting the value from
> > the leader's entry, instead of copying it?

Good point!

> The answer to that would be "because I didn't think of it" :)

I'm in the same boat ;-) 

> Were you thinking of something like the attached?

Doing it that way looks good to me.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: System username in pg_stat_activity

From
Magnus Hagander
Date:
On Fri, Feb 16, 2024 at 9:31 PM Magnus Hagander <magnus@hagander.net> wrote:
>
> On Fri, Feb 16, 2024 at 9:20 PM Andres Freund <andres@anarazel.de> wrote:
> >
> > Hi,
> >
> > On 2024-02-16 20:57:59 +0100, Magnus Hagander wrote:
> > > On Fri, Feb 16, 2024 at 8:41 PM Andres Freund <andres@anarazel.de> wrote:
> > > > On 2024-01-10 12:46:34 +0100, Magnus Hagander wrote:
> > > > > The attached patch adds a column "authuser" to pg_stat_activity which
> > > > > contains the username of the externally authenticated user, being the
> > > > > same value as the SYSTEM_USER keyword returns in a backend.
> > > >
> > > > I continue to think that it's a bad idea to make pg_stat_activity ever wider
> > > > with columns that do not actually describe properties that change across the
> > > > course of a session.  Yes, there's the argument that that ship has sailed, but
> > > > I don't think that's a good reason to continue ever further down that road.
> > > >
> > > > It's not just a usability issue, it also makes it more expensive to query
> > > > pg_stat_activity. This is of course more pronounced with textual columns than
> > > > with integer ones.
> > >
> > > That's a fair point, but I do think that has in most ways already sailed, yes.
> > >
> > > I mean, we could split it into more than one view. But adding a new
> > > view for every new thing we want to show is also not very good from
> > > either a usability or performance perspective.  So where would we put
> > > it?
> >
> > I think we should group new properties that don't change over the course of a
> > session ([1]) in a new view (e.g. pg_stat_session). I don't think we need one
> > view per property, but I do think it makes sense to split information that
> > changes very frequently (like most pg_stat_activity contents) from information
> > that doesn't (like auth_method, auth_identity).
>
> That would make sense in many ways, but ends up with "other level of
> annoyances". E.g. the database name and oid don't change, but would we
> want to move those out of pg_stat_activity? Same for username? Don't
> we just end up in a grayzone about what belongs where?
>
> Also - were you envisioning just another view, or actually replacing
> the pg_stat_get_activity() part? As in where do you think the cost
> comes?

Andres -- did you spot this question in the middle or did it get lost
in the flurry of others? :)

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: System username in pg_stat_activity

From
Magnus Hagander
Date:
On Fri, Feb 16, 2024 at 9:45 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2024-02-16 15:22:16 -0500, Tom Lane wrote:
> > Magnus Hagander <magnus@hagander.net> writes:
> > > I mean, we could split it into more than one view. But adding a new
> > > view for every new thing we want to show is also not very good from
> > > either a usability or performance perspective.  So where would we put
> > > it?
> >
> > It'd have to be a new view with a row per session, showing static
> > (or at least mostly static?) properties of the session.
>
> Yep.
>
>
> > Could we move some existing fields of pg_stat_activity into such a
> > view?
>
> I'd suspect that at least some of
>  - leader_pid
>  - datid
>  - datname
>  - usesysid
>  - usename
>  - backend_start
>  - client_addr
>  - client_hostname
>  - client_port
>  - backend_type
>
> could be moved. Whether's worth breaking existing queries, I don't quite know.

I think that's the big question. I think if we move all of those we
will break every single monitoring tool out there for postgres...
That's a pretty hefty price.


> One option would be to not return (some) of them from pg_stat_get_activity(),
> but add them to the view in a way that the planner can elide the reference.

Without having any numbers, I would think that the join to pg_authid
for exapmle is likely more costly than returning all the other fields.
But that one does get eliminated as long as one doesn't query that
column. But if we make more things "joined in from the view", isn't
that likely to just make it more expensive in most cases?


> > I'm not sure that this is worth the trouble TBH.  If it can be shown
> > that pulling a few fields out of pg_stat_activity actually does make
> > for a useful speedup, then maybe OK ... but Andres hasn't provided
> > any evidence that there's a measurable issue.
>
> If I thought that the two columns proposed here were all that we wanted to
> add, I'd not be worried. But there have been quite a few other fields
> proposed, e.g. tracking idle/active time on a per-connection granularity.
>
> We even already have a patch to add pg_stat_session
> https://commitfest.postgresql.org/47/3405/

In a way, that's yet another different type of values though -- it
contains accumulated stats. So we really have 3 types -- "info" that's
not really stats (username, etc), "current state" (query, wait events,
state) and "accumulated stats" (counters since start).If we don't want
to combine them all, we should perhaps not combine any and actually
have 3 views?

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/