Thread: System username in pg_stat_activity
The attached patch adds a column "authuser" to pg_stat_activity which contains the username of the externally authenticated user, being the same value as the SYSTEM_USER keyword returns in a backend. This overlaps with for example the values in pg_stat_gss, but it will include values for authentication methods that don't have their own view such as peer/ident. gss/ssl info will of course still be shown, it is just in more than one place. I was originally thinking this column should be "sysuser" to map to the keyword, but since we already have "usesysid" as a column name in pg_stat_activity I figured that could be confusing since it actually means something completely different. But happy to change that back if people think that's better. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Attachment
Hi, Thanks for the patch. > The attached patch adds a column "authuser" to pg_stat_activity which > contains the username of the externally authenticated user, being the > same value as the SYSTEM_USER keyword returns in a backend. I believe what was meant is "authname", not "authuser". > This overlaps with for example the values in pg_stat_gss, but it will > include values for authentication methods that don't have their own > view such as peer/ident. gss/ssl info will of course still be shown, > it is just in more than one place. > > I was originally thinking this column should be "sysuser" to map to > the keyword, but since we already have "usesysid" as a column name in > pg_stat_activity I figured that could be confusing since it actually > means something completely different. But happy to change that back if > people think that's better. This part of the documentation is wrong: ``` + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>authname</structfield> <type>name</type> + </para> ``` Actually the type is `text`: ``` =# \d pg_stat_activity ; View "pg_catalog.pg_stat_activity" Column | Type | Collation | Nullable | Default ------------------+--------------------------+-----------+----------+--------- datid | oid | | | datname | name | | | pid | integer | | | leader_pid | integer | | | usesysid | oid | | | usename | name | | | authname | text | | | ``` It hurts my sense of beauty that usename and authname are of different types. But if I'm the only one, maybe we can close our eyes on this. Also I suspect that placing usename and authname in a close proximity can be somewhat confusing. Perhaps adding authname as the last column of the view will solve both nitpicks? ``` + /* Information about the authenticated user */ + char st_authuser[NAMEDATALEN]; ``` Well, here it's called "authuser" and it looks like the intention was to use `name` datatype... I suggest using "authname" everywhere for consistency. Since the patch affects pg_proc.dat I believe the commit message should remind bumping the catalog version. -- Best regards, Aleksander Alekseev
On Wed, Jan 10, 2024 at 1:44 PM Aleksander Alekseev <aleksander@timescale.com> wrote: > > Hi, > > Thanks for the patch. > > > The attached patch adds a column "authuser" to pg_stat_activity which > > contains the username of the externally authenticated user, being the > > same value as the SYSTEM_USER keyword returns in a backend. > > I believe what was meant is "authname", not "authuser". > > > This overlaps with for example the values in pg_stat_gss, but it will > > include values for authentication methods that don't have their own > > view such as peer/ident. gss/ssl info will of course still be shown, > > it is just in more than one place. > > > > I was originally thinking this column should be "sysuser" to map to > > the keyword, but since we already have "usesysid" as a column name in > > pg_stat_activity I figured that could be confusing since it actually > > means something completely different. But happy to change that back if > > people think that's better. > > This part of the documentation is wrong: > > ``` > + <row> > + <entry role="catalog_table_entry"><para role="column_definition"> > + <structfield>authname</structfield> <type>name</type> > + </para> > ``` > > Actually the type is `text`: > > ``` > =# \d pg_stat_activity ; > View "pg_catalog.pg_stat_activity" > Column | Type | Collation | Nullable | Default > ------------------+--------------------------+-----------+----------+--------- > datid | oid | | | > datname | name | | | > pid | integer | | | > leader_pid | integer | | | > usesysid | oid | | | > usename | name | | | > authname | text | | | > ``` > > It hurts my sense of beauty that usename and authname are of different > types. But if I'm the only one, maybe we can close our eyes on this. > Also I suspect that placing usename and authname in a close proximity > can be somewhat confusing. Perhaps adding authname as the last column > of the view will solve both nitpicks? But it should probably actually be name, given that's the underlying datatype. I kept changing it around and ended up half way in between... > ``` > + /* Information about the authenticated user */ > + char st_authuser[NAMEDATALEN]; > ``` > > Well, here it's called "authuser" and it looks like the intention was > to use `name` datatype... I suggest using "authname" everywhere for > consistency. Yeah, I flipped back and forth a few times and clearly got stuck in the middle. They should absolutely be the same everywhere - whatever name is used it should be consistent. > Since the patch affects pg_proc.dat I believe the commit message > should remind bumping the catalog version. Yes. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Attachment
Magnus Hagander <magnus@hagander.net> writes: > On Wed, Jan 10, 2024 at 1:44 PM Aleksander Alekseev > <aleksander@timescale.com> wrote: >> >> It hurts my sense of beauty that usename and authname are of different >> types. But if I'm the only one, maybe we can close our eyes on this. >> Also I suspect that placing usename and authname in a close proximity >> can be somewhat confusing. Perhaps adding authname as the last column >> of the view will solve both nitpicks? > > But it should probably actually be name, given that's the underlying > datatype. I kept changing it around and ended up half way in > between... https://www.postgresql.org/docs/current/functions-info.html#FUNCTIONS-INFO-SESSION-TABLE (and pg_typeof(system_user)) says it's text. Which makes sense, since it can easily be longer than 63 bytes, e.g. in the case of a TLS client certificate DN. - ilmari
On Wed, Jan 10, 2024 at 2:27 PM Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> wrote: > > Magnus Hagander <magnus@hagander.net> writes: > > > On Wed, Jan 10, 2024 at 1:44 PM Aleksander Alekseev > > <aleksander@timescale.com> wrote: > >> > >> It hurts my sense of beauty that usename and authname are of different > >> types. But if I'm the only one, maybe we can close our eyes on this. > >> Also I suspect that placing usename and authname in a close proximity > >> can be somewhat confusing. Perhaps adding authname as the last column > >> of the view will solve both nitpicks? > > > > But it should probably actually be name, given that's the underlying > > datatype. I kept changing it around and ended up half way in > > between... > > https://www.postgresql.org/docs/current/functions-info.html#FUNCTIONS-INFO-SESSION-TABLE > (and pg_typeof(system_user)) says it's text. Which makes sense, since > it can easily be longer than 63 bytes, e.g. in the case of a TLS client > certificate DN. We already truncate all those to NAMEDATALEN in pg_stat_ssl for example. so I think the truncation part of those should be OK. We'll truncate "a little bit more" since we also have the 'cert:', but not significantly so I think. but yeah, conceptually it should probably be text because name is supposedly a *postgres identifier*, which this is not. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Hi, On Wed, Jan 10, 2024 at 02:08:03PM +0100, Magnus Hagander wrote: > On Wed, Jan 10, 2024 at 1:44 PM Aleksander Alekseev > <aleksander@timescale.com> wrote: > > > > Hi, > > > > Thanks for the patch. +1 > > > This overlaps with for example the values in pg_stat_gss, but it will > > > include values for authentication methods that don't have their own > > > view such as peer/ident. gss/ssl info will of course still be shown, > > > it is just in more than one place. Yeah, I think that's a good idea. > > It hurts my sense of beauty that usename and authname are of different > > types. But if I'm the only one, maybe we can close our eyes on this. > > Also I suspect that placing usename and authname in a close proximity > > can be somewhat confusing. Perhaps adding authname as the last column > > of the view will solve both nitpicks? > > But it should probably actually be name, given that's the underlying > datatype. I kept changing it around and ended up half way in > between... > > > > ``` > > + /* Information about the authenticated user */ > > + char st_authuser[NAMEDATALEN]; > > ``` > > > > Well, here it's called "authuser" and it looks like the intention was > > to use `name` datatype... I suggest using "authname" everywhere for > > consistency. I think it depends what we want the new field to reflect. If it is the exact same thing as the SYSTEM_USER then I think it has to be text (as the SYSTEM_USER is made of "auth_method:identity"). Now if we want it to be "only" the identity part of it, then the `name` datatype would be fine. I'd vote for the exact same thing as the SYSTEM_USER (means auth_method:identity). > + <row> > + <entry role="catalog_table_entry"><para role="column_definition"> > + <structfield>authname</structfield> <type>name</type> > + </para> > + <para> > + The authentication method and identity (if any) that the user > + used to log in. It contains the same value as > + <xref linkend="system-user" /> returns in the backend. > + </para></entry> > + </row> I'm fine with auth_method:identity. > + S.authname, What about using system_user as the field name? (because if we keep auth_method:identity it's not really the authname anyway). Also, what about adding a test in say 003_peer.pl to check that the value matches the SYSTEM_USER one? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Wed, Jan 10, 2024 at 2:56 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > Hi, > > On Wed, Jan 10, 2024 at 02:08:03PM +0100, Magnus Hagander wrote: > > On Wed, Jan 10, 2024 at 1:44 PM Aleksander Alekseev > > <aleksander@timescale.com> wrote: > > > > > > Hi, > > > > > > Thanks for the patch. > > +1 > > > > > This overlaps with for example the values in pg_stat_gss, but it will > > > > include values for authentication methods that don't have their own > > > > view such as peer/ident. gss/ssl info will of course still be shown, > > > > it is just in more than one place. > > Yeah, I think that's a good idea. > > > > It hurts my sense of beauty that usename and authname are of different > > > types. But if I'm the only one, maybe we can close our eyes on this. > > > Also I suspect that placing usename and authname in a close proximity > > > can be somewhat confusing. Perhaps adding authname as the last column > > > of the view will solve both nitpicks? > > > > But it should probably actually be name, given that's the underlying > > datatype. I kept changing it around and ended up half way in > > between... > > > > > > > ``` > > > + /* Information about the authenticated user */ > > > + char st_authuser[NAMEDATALEN]; > > > ``` > > > > > > Well, here it's called "authuser" and it looks like the intention was > > > to use `name` datatype... I suggest using "authname" everywhere for > > > consistency. > > I think it depends what we want the new field to reflect. If it is the exact > same thing as the SYSTEM_USER then I think it has to be text (as the SYSTEM_USER > is made of "auth_method:identity"). Now if we want it to be "only" the identity > part of it, then the `name` datatype would be fine. I'd vote for the exact same > thing as the SYSTEM_USER (means auth_method:identity). I definitely think it should be the same. If it's not exactly the same, then it should be *two* columns, one with auth method and one with the name. And thinking more about it maybe that's cleaner, because that makes it easier to do things like filter based on auth method? > > + <row> > > + <entry role="catalog_table_entry"><para role="column_definition"> > > + <structfield>authname</structfield> <type>name</type> > > + </para> > > + <para> > > + The authentication method and identity (if any) that the user > > + used to log in. It contains the same value as > > + <xref linkend="system-user" /> returns in the backend. > > + </para></entry> > > + </row> > > I'm fine with auth_method:identity. > > > + S.authname, > > What about using system_user as the field name? (because if we keep > auth_method:identity it's not really the authname anyway). I was worried system_user or sysuser would both be confusing with the fact that we have usesysid -- which would reference a *different* sys... > Also, what about adding a test in say 003_peer.pl to check that the value matches > the SYSTEM_USER one? Yeah, that's a good idea I think. I quickly looked over for tests and couldn't really find any for pg_stat_activity, btu we can definitely piggyback them in one or more of the auth tests. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Hi, On Wed, Jan 10, 2024 at 02:59:42PM +0100, Magnus Hagander wrote: > On Wed, Jan 10, 2024 at 2:56 PM Bertrand Drouvot > I definitely think it should be the same. If it's not exactly the > same, then it should be *two* columns, one with auth method and one > with the name. > > And thinking more about it maybe that's cleaner, because that makes it > easier to do things like filter based on auth method? Yeah, that's sounds even better. > > > > + <row> > > > + <entry role="catalog_table_entry"><para role="column_definition"> > > > + <structfield>authname</structfield> <type>name</type> > > > + </para> > > > + <para> > > > + The authentication method and identity (if any) that the user > > > + used to log in. It contains the same value as > > > + <xref linkend="system-user" /> returns in the backend. > > > + </para></entry> > > > + </row> > > > > I'm fine with auth_method:identity. > > > > > + S.authname, > > > > What about using system_user as the field name? (because if we keep > > auth_method:identity it's not really the authname anyway). > > I was worried system_user or sysuser would both be confusing with the > fact that we have usesysid -- which would reference a *different* > sys... If we go the 2 fields way, then what about auth_identity and auth_method then? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On 1/10/24 08:59, Magnus Hagander wrote: > On Wed, Jan 10, 2024 at 2:56 PM Bertrand Drouvot >> I think it depends what we want the new field to reflect. If it is the exact >> same thing as the SYSTEM_USER then I think it has to be text (as the SYSTEM_USER >> is made of "auth_method:identity"). Now if we want it to be "only" the identity >> part of it, then the `name` datatype would be fine. I'd vote for the exact same >> thing as the SYSTEM_USER (means auth_method:identity). > > I definitely think it should be the same. If it's not exactly the > same, then it should be *two* columns, one with auth method and one > with the name. > > And thinking more about it maybe that's cleaner, because that makes it > easier to do things like filter based on auth method? +1 for the overall feature and +1 for two columns >> > + <row> >> > + <entry role="catalog_table_entry"><para role="column_definition"> >> > + <structfield>authname</structfield> <type>name</type> >> > + </para> >> > + <para> >> > + The authentication method and identity (if any) that the user >> > + used to log in. It contains the same value as >> > + <xref linkend="system-user" /> returns in the backend. >> > + </para></entry> >> > + </row> >> >> I'm fine with auth_method:identity. >> >> > + S.authname, >> >> What about using system_user as the field name? (because if we keep >> auth_method:identity it's not really the authname anyway). > > I was worried system_user or sysuser would both be confusing with the > fact that we have usesysid -- which would reference a *different* > sys... I think if it is exactly "system_user" it would be pretty clearly a match for SYSTEM_USER -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Wed, Jan 10, 2024 at 3:12 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > Hi, > > On Wed, Jan 10, 2024 at 02:59:42PM +0100, Magnus Hagander wrote: > > On Wed, Jan 10, 2024 at 2:56 PM Bertrand Drouvot > > I definitely think it should be the same. If it's not exactly the > > same, then it should be *two* columns, one with auth method and one > > with the name. > > > > And thinking more about it maybe that's cleaner, because that makes it > > easier to do things like filter based on auth method? > > Yeah, that's sounds even better. > > > > > > > + <row> > > > > + <entry role="catalog_table_entry"><para role="column_definition"> > > > > + <structfield>authname</structfield> <type>name</type> > > > > + </para> > > > > + <para> > > > > + The authentication method and identity (if any) that the user > > > > + used to log in. It contains the same value as > > > > + <xref linkend="system-user" /> returns in the backend. > > > > + </para></entry> > > > > + </row> > > > > > > I'm fine with auth_method:identity. > > > > > > > + S.authname, > > > > > > What about using system_user as the field name? (because if we keep > > > auth_method:identity it's not really the authname anyway). > > > > I was worried system_user or sysuser would both be confusing with the > > fact that we have usesysid -- which would reference a *different* > > sys... > > If we go the 2 fields way, then what about auth_identity and auth_method then? Here is an updated patch based on this idea. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Attachment
Hi, On Thu, Jan 11, 2024 at 02:24:58PM +0100, Magnus Hagander wrote: > On Wed, Jan 10, 2024 at 3:12 PM Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > > > If we go the 2 fields way, then what about auth_identity and auth_method then? > > > Here is an updated patch based on this idea. Thanks! + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>auth_method</structfield> <type>text</type> + </para> + <para> + The authentication method used for authenticating the connection, or + NULL for background processes. + </para></entry> I'm wondering if it would make sense to populate it for parallel workers too. I think it's doable thanks to d951052, but I'm not sure it's worth it (one could join based on the leader_pid though). OTOH that would be consistent with how the SYSTEM_USER behaves with parallel workers (it's populated). + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>auth_identity</structfield> <type>text</type> + </para> + <para> + The identity (if any) that the user presented during the authentication + cycle before they were assigned a database role. Contains the same + value as <xref linkend="system-user" /> Same remark regarding the parallel workers case +: - Would it be better to use the `name` datatype for auth_identity? - what about "Contains the same value as the identity part in <xref linkend="system-user" />"? + /* + * Trust doesn't set_authn_id(), but we still need to store the + * auth_method + */ + MyClientConnectionInfo.auth_method = uaTrust; +1, I think it is useful here to provide "trust" and not a NULL value in the context of this patch. +# pg_stat_activity shold contain trust and empty string for trust auth typo: s/shold/should/ +# Users with md5 auth should show both auth method and name in pg_stat_activity what about "show both auth method and identity"? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Thu, Jan 11, 2024 at 5:55 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > Hi, > > On Thu, Jan 11, 2024 at 02:24:58PM +0100, Magnus Hagander wrote: > > On Wed, Jan 10, 2024 at 3:12 PM Bertrand Drouvot > > <bertranddrouvot.pg@gmail.com> wrote: > > > > > > If we go the 2 fields way, then what about auth_identity and auth_method then? > > > > > > Here is an updated patch based on this idea. > > Thanks! > > + <row> > + <entry role="catalog_table_entry"><para role="column_definition"> > + <structfield>auth_method</structfield> <type>text</type> > + </para> > + <para> > + The authentication method used for authenticating the connection, or > + NULL for background processes. > + </para></entry> > > I'm wondering if it would make sense to populate it for parallel workers too. > I think it's doable thanks to d951052, but I'm not sure it's worth it (one could > join based on the leader_pid though). OTOH that would be consistent with > how the SYSTEM_USER behaves with parallel workers (it's populated). I guess one could conceptually argue that "authentication happens int he leader". But we do populate it with the other user records, and it'd be weird if this one was excluded. The tricky thing is that pgstat_bestart() is called long before we deserialize the data. But from what I can tell it should be safe to change it per the attached? That should be AFAICT an extremely short window of time longer before we report it, not enough to matter. > + <entry role="catalog_table_entry"><para role="column_definition"> > + <structfield>auth_identity</structfield> <type>text</type> > + </para> > + <para> > + The identity (if any) that the user presented during the authentication > + cycle before they were assigned a database role. Contains the same > + value as <xref linkend="system-user" /> > > Same remark regarding the parallel workers case +: > > - Would it be better to use the `name` datatype for auth_identity? I've been going back and forth. And I think my conclusion is that it's not a postgres identifier, so it shouldn't be. See the earlier discussion, and for example that that's what we do for cert names when SSL is used. > - what about "Contains the same value as the identity part in <xref linkend="system-user" />"? > > + /* > + * Trust doesn't set_authn_id(), but we still need to store the > + * auth_method > + */ > + MyClientConnectionInfo.auth_method = uaTrust; > > +1, I think it is useful here to provide "trust" and not a NULL value in the > context of this patch. Yeah, that's probably "independently correct", but actually useful here. > +# pg_stat_activity shold contain trust and empty string for trust auth > > typo: s/shold/should/ Ops. > +# Users with md5 auth should show both auth method and name in pg_stat_activity > > what about "show both auth method and identity"? Good spot, yeah, I changed it over to identity everywhere else so it should be here as well. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Attachment
Hi, On Fri, Jan 12, 2024 at 05:16:53PM +0100, Magnus Hagander wrote: > On Thu, Jan 11, 2024 at 5:55 PM Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > > > I'm wondering if it would make sense to populate it for parallel workers too. > > I think it's doable thanks to d951052, but I'm not sure it's worth it (one could > > join based on the leader_pid though). OTOH that would be consistent with > > how the SYSTEM_USER behaves with parallel workers (it's populated). > > I guess one could conceptually argue that "authentication happens int > he leader". But we do populate it with the other user records, and > it'd be weird if this one was excluded. > > The tricky thing is that pgstat_bestart() is called long before we > deserialize the data. But from what I can tell it should be safe to > change it per the attached? That should be AFAICT an extremely short > window of time longer before we report it, not enough to matter. Thanks! Yeah, that seems reasonable to me. Also, I think we should remove the "MyProcPort" test here then (looking at v3): + if (MyProcPort && MyClientConnectionInfo.authn_id) + strlcpy(lbeentry.st_auth_identity, MyClientConnectionInfo.authn_id, NAMEDATALEN); + else + MemSet(&lbeentry.st_auth_identity, 0, sizeof(lbeentry.st_auth_identity)); to get the st_auth_identity propagated to the parallel workers. > > > > Same remark regarding the parallel workers case +: > > > > - Would it be better to use the `name` datatype for auth_identity? > > I've been going back and forth. And I think my conclusion is that it's > not a postgres identifier, so it shouldn't be. See the earlier > discussion, and for example that that's what we do for cert names when > SSL is used. Yeah, Okay let's keep text then. > > > - what about "Contains the same value as the identity part in <xref linkend="system-user" />"? Not sure, but looks like you missed this comment? > > > > + /* > > + * Trust doesn't set_authn_id(), but we still need to store the > > + * auth_method > > + */ > > + MyClientConnectionInfo.auth_method = uaTrust; > > > > +1, I think it is useful here to provide "trust" and not a NULL value in the > > context of this patch. > > Yeah, that's probably "independently correct", but actually useful here. +1 > > +# Users with md5 auth should show both auth method and name in pg_stat_activity > > > > what about "show both auth method and identity"? > > Good spot, yeah, I changed it over to identity everywhere else so it > should be here as well. Did you forget to share the new revision (aka v4)? I can only see the "reorder_parallel_worker_bestart.patch" attached. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Mon, Jan 15, 2024 at 11:17 AM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > Hi, > > On Fri, Jan 12, 2024 at 05:16:53PM +0100, Magnus Hagander wrote: > > On Thu, Jan 11, 2024 at 5:55 PM Bertrand Drouvot > > <bertranddrouvot.pg@gmail.com> wrote: > > > > > > I'm wondering if it would make sense to populate it for parallel workers too. > > > I think it's doable thanks to d951052, but I'm not sure it's worth it (one could > > > join based on the leader_pid though). OTOH that would be consistent with > > > how the SYSTEM_USER behaves with parallel workers (it's populated). > > > > I guess one could conceptually argue that "authentication happens int > > he leader". But we do populate it with the other user records, and > > it'd be weird if this one was excluded. > > > > The tricky thing is that pgstat_bestart() is called long before we > > deserialize the data. But from what I can tell it should be safe to > > change it per the attached? That should be AFAICT an extremely short > > window of time longer before we report it, not enough to matter. > > Thanks! Yeah, that seems reasonable to me. Also, I think we should remove the > "MyProcPort" test here then (looking at v3): > > + if (MyProcPort && MyClientConnectionInfo.authn_id) > + strlcpy(lbeentry.st_auth_identity, MyClientConnectionInfo.authn_id, NAMEDATALEN); > + else > + MemSet(&lbeentry.st_auth_identity, 0, sizeof(lbeentry.st_auth_identity)); > > to get the st_auth_identity propagated to the parallel workers. Yup, I had done that in v4 which as you noted further down, I forgot to post. > > > - what about "Contains the same value as the identity part in <xref linkend="system-user" />"? > > Not sure, but looks like you missed this comment? I did. Agree with your comment, and updated now. > > > +# Users with md5 auth should show both auth method and name in pg_stat_activity > > > > > > what about "show both auth method and identity"? > > > > Good spot, yeah, I changed it over to identity everywhere else so it > > should be here as well. > > Did you forget to share the new revision (aka v4)? I can only see the > "reorder_parallel_worker_bestart.patch" attached. I did. Here it is, and also including that suggested docs fix as well as a rebase on current master. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Attachment
Hi, On Thu, Jan 18, 2024 at 04:01:33PM +0100, Magnus Hagander wrote: > On Mon, Jan 15, 2024 at 11:17 AM Bertrand Drouvot > > Did you forget to share the new revision (aka v4)? I can only see the > > "reorder_parallel_worker_bestart.patch" attached. > > I did. Here it is, and also including that suggested docs fix as well > as a rebase on current master. > Thanks! Just a few comments: 1 === + The authentication method used for authenticating the connection, or + NULL for background processes. What about? "NULL for background processes (except for parallel workers which inherit this information from their leader process)" 2 === + cycle before they were assigned a database role. Contains the same + value as the identity part in <xref linkend="system-user" />, or NULL + for background processes. Same comment about parallel workers. 3 === +# pg_stat_activity should contain trust and empty string for trust auth +$res = $node->safe_psql( + 'postgres', + "SELECT auth_method, auth_identity='' FROM pg_stat_activity WHERE pid=pg_backend_pid()", + connstr => "user=scram_role"); +is($res, 'trust|t', 'Users with trust authentication should show correctly in pg_stat_activity'); + +# pg_stat_activity should contain NULL for auth of background processes +# (test is a bit out of place here, but this is where the other pg_stat_activity.auth* tests are) +$res = $node->safe_psql( + 'postgres', + "SELECT auth_method IS NULL, auth_identity IS NULL FROM pg_stat_activity WHERE backend_type='checkpointer'", +); +is($res, 't|t', 'Background processes should show NULL for auth in pg_stat_activity'); What do you think about testing the parallel workers cases too? (I'm not 100% sure it's worth the extra complexity though). Apart from those 3, it looks good to me. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, > > Did you forget to share the new revision (aka v4)? I can only see the > > "reorder_parallel_worker_bestart.patch" attached. > > I did. Here it is, and also including that suggested docs fix as well > as a rebase on current master. ``` + lbeentry.st_auth_method = MyClientConnectionInfo.auth_method; + if (MyClientConnectionInfo.authn_id) + strlcpy(lbeentry.st_auth_identity, MyClientConnectionInfo.authn_id, NAMEDATALEN); + else + MemSet(&lbeentry.st_auth_identity, 0, sizeof(lbeentry.st_auth_identity)); ``` I believe using sizeof(lbeentry.st_auth_identity) instead of NAMEDATALEN is generally considered a better practice. Calling MemSet for a CString seems to be an overkill. I suggest setting lbeentry.st_auth_identity[0] to zero. Except for these nitpicks v4 LGTM. -- Best regards, Aleksander Alekseev
Hi, On Thu, Jan 18, 2024 at 11:01 PM Magnus Hagander <magnus@hagander.net> wrote: > > I did. Here it is, and also including that suggested docs fix as well > as a rebase on current master. + if (MyClientConnectionInfo.authn_id) + strlcpy(lbeentry.st_auth_identity, MyClientConnectionInfo.authn_id, NAMEDATALEN); + else + MemSet(&lbeentry.st_auth_identity, 0, sizeof(lbeentry.st_auth_identity)); Should we use pg_mbcliplen() here? I don't think there's any strong guarantee that no multibyte character can be used. I also agree with the nearby comment about MemSet being overkill. + value as the identity part in <xref linkend="system-user" />, or NULL I was looking at https://www.postgresql.org/docs/current/auth-username-maps.html and noticed that this page is switching between system-user and system-username. Should we clean that up while at it?
On Fri, Jan 19, 2024 at 7:20 AM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > Hi, > > On Thu, Jan 18, 2024 at 04:01:33PM +0100, Magnus Hagander wrote: > > On Mon, Jan 15, 2024 at 11:17 AM Bertrand Drouvot > > > Did you forget to share the new revision (aka v4)? I can only see the > > > "reorder_parallel_worker_bestart.patch" attached. > > > > I did. Here it is, and also including that suggested docs fix as well > > as a rebase on current master. > > > > Thanks! > > Just a few comments: > > 1 === > > + The authentication method used for authenticating the connection, or > + NULL for background processes. > > What about? "NULL for background processes (except for parallel workers which > inherit this information from their leader process)" Ugh. That doesn't read very well at all to me. Maybe just "NULL for background processes without a user"? > 2 === > > + cycle before they were assigned a database role. Contains the same > + value as the identity part in <xref linkend="system-user" />, or NULL > + for background processes. > > Same comment about parallel workers. > > 3 === > > +# pg_stat_activity should contain trust and empty string for trust auth > +$res = $node->safe_psql( > + 'postgres', > + "SELECT auth_method, auth_identity='' FROM pg_stat_activity WHERE pid=pg_backend_pid()", > + connstr => "user=scram_role"); > +is($res, 'trust|t', 'Users with trust authentication should show correctly in pg_stat_activity'); > + > +# pg_stat_activity should contain NULL for auth of background processes > +# (test is a bit out of place here, but this is where the other pg_stat_activity.auth* tests are) > +$res = $node->safe_psql( > + 'postgres', > + "SELECT auth_method IS NULL, auth_identity IS NULL FROM pg_stat_activity WHERE backend_type='checkpointer'", > +); > +is($res, 't|t', 'Background processes should show NULL for auth in pg_stat_activity'); > > What do you think about testing the parallel workers cases too? (I'm not 100% > sure it's worth the extra complexity though). I'm leaning towards not worth that. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
On Fri, Jan 19, 2024 at 12:33 PM Aleksander Alekseev <aleksander@timescale.com> wrote: > > Hi, > > > > Did you forget to share the new revision (aka v4)? I can only see the > > > "reorder_parallel_worker_bestart.patch" attached. > > > > I did. Here it is, and also including that suggested docs fix as well > > as a rebase on current master. > > ``` > + lbeentry.st_auth_method = MyClientConnectionInfo.auth_method; > + if (MyClientConnectionInfo.authn_id) > + strlcpy(lbeentry.st_auth_identity, > MyClientConnectionInfo.authn_id, NAMEDATALEN); > + else > + MemSet(&lbeentry.st_auth_identity, 0, > sizeof(lbeentry.st_auth_identity)); > ``` > > I believe using sizeof(lbeentry.st_auth_identity) instead of > NAMEDATALEN is generally considered a better practice. We use the NAMEDATALEN method in the rest of the function, so I did it the same way for consistency. I think if we want to change that, we should change the whole function at once to keep it consistent. > Calling MemSet for a CString seems to be an overkill. I suggest > setting lbeentry.st_auth_identity[0] to zero. Fair enough. Will make that change. //Magnus
On Fri, Jan 19, 2024 at 1:43 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > Hi, > > On Thu, Jan 18, 2024 at 11:01 PM Magnus Hagander <magnus@hagander.net> wrote: > > > > I did. Here it is, and also including that suggested docs fix as well > > as a rebase on current master. > > + if (MyClientConnectionInfo.authn_id) > + strlcpy(lbeentry.st_auth_identity, > MyClientConnectionInfo.authn_id, NAMEDATALEN); > + else > + MemSet(&lbeentry.st_auth_identity, 0, > sizeof(lbeentry.st_auth_identity)); > > Should we use pg_mbcliplen() here? I don't think there's any strong > guarantee that no multibyte character can be used. I also agree with > the nearby comment about MemSet being overkill. Hm. Good question. I don't think there is such a guarantee, no. So something like attached v5? Also, wouldn't that problem already exist a few lines down for the SSL parts? > + value as the identity part in <xref linkend="system-user" />, or NULL > I was looking at > https://www.postgresql.org/docs/current/auth-username-maps.html and > noticed that this page is switching between system-user and > system-username. Should we clean that up while at it? Seems like something we should clean up yes, but not as part of this patch. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Attachment
Hi, On 2024-01-10 12:46:34 +0100, Magnus Hagander wrote: > The attached patch adds a column "authuser" to pg_stat_activity which > contains the username of the externally authenticated user, being the > same value as the SYSTEM_USER keyword returns in a backend. I continue to think that it's a bad idea to make pg_stat_activity ever wider with columns that do not actually describe properties that change across the course of a session. Yes, there's the argument that that ship has sailed, but I don't think that's a good reason to continue ever further down that road. It's not just a usability issue, it also makes it more expensive to query pg_stat_activity. This is of course more pronounced with textual columns than with integer ones. Greetings, Andres Freund
Hi, On 2024-01-12 17:16:53 +0100, Magnus Hagander wrote: > On Thu, Jan 11, 2024 at 5:55 PM Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > On Thu, Jan 11, 2024 at 02:24:58PM +0100, Magnus Hagander wrote: > > > On Wed, Jan 10, 2024 at 3:12 PM Bertrand Drouvot > > > <bertranddrouvot.pg@gmail.com> wrote: > > > > > > > > If we go the 2 fields way, then what about auth_identity and auth_method then? > > > > > > > > > Here is an updated patch based on this idea. > > > > Thanks! > > > > + <row> > > + <entry role="catalog_table_entry"><para role="column_definition"> > > + <structfield>auth_method</structfield> <type>text</type> > > + </para> > > + <para> > > + The authentication method used for authenticating the connection, or > > + NULL for background processes. > > + </para></entry> > > > > I'm wondering if it would make sense to populate it for parallel workers too. > > I think it's doable thanks to d951052, but I'm not sure it's worth it (one could > > join based on the leader_pid though). OTOH that would be consistent with > > how the SYSTEM_USER behaves with parallel workers (it's populated). > > I guess one could conceptually argue that "authentication happens int > he leader". But we do populate it with the other user records, and > it'd be weird if this one was excluded. > > The tricky thing is that pgstat_bestart() is called long before we > deserialize the data. But from what I can tell it should be safe to > change it per the attached? That should be AFAICT an extremely short > window of time longer before we report it, not enough to matter. I don't like that one bit. The whole subsystem initialization dance already is quite complicated, particularly for pgstat, we shouldn't make it more complicated. Especially not when the initialization is moved quite a bit away from all the other calls. Besides just that, I also don't think delaying visibility of the worker in pg_stat_activity until parallel worker initialization has completed is good, that's not all cheap work. Maybe I am missing something, but why aren't we just getting the value from the leader's entry, instead of copying it? Greetings, Andres Freund
On Fri, Feb 16, 2024 at 8:41 PM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2024-01-10 12:46:34 +0100, Magnus Hagander wrote: > > The attached patch adds a column "authuser" to pg_stat_activity which > > contains the username of the externally authenticated user, being the > > same value as the SYSTEM_USER keyword returns in a backend. > > I continue to think that it's a bad idea to make pg_stat_activity ever wider > with columns that do not actually describe properties that change across the > course of a session. Yes, there's the argument that that ship has sailed, but > I don't think that's a good reason to continue ever further down that road. > > It's not just a usability issue, it also makes it more expensive to query > pg_stat_activity. This is of course more pronounced with textual columns than > with integer ones. That's a fair point, but I do think that has in most ways already sailed, yes. I mean, we could split it into more than one view. But adding a new view for every new thing we want to show is also not very good from either a usability or performance perspective. So where would we put it? -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Hi, On 2024-02-16 20:57:59 +0100, Magnus Hagander wrote: > On Fri, Feb 16, 2024 at 8:41 PM Andres Freund <andres@anarazel.de> wrote: > > On 2024-01-10 12:46:34 +0100, Magnus Hagander wrote: > > > The attached patch adds a column "authuser" to pg_stat_activity which > > > contains the username of the externally authenticated user, being the > > > same value as the SYSTEM_USER keyword returns in a backend. > > > > I continue to think that it's a bad idea to make pg_stat_activity ever wider > > with columns that do not actually describe properties that change across the > > course of a session. Yes, there's the argument that that ship has sailed, but > > I don't think that's a good reason to continue ever further down that road. > > > > It's not just a usability issue, it also makes it more expensive to query > > pg_stat_activity. This is of course more pronounced with textual columns than > > with integer ones. > > That's a fair point, but I do think that has in most ways already sailed, yes. > > I mean, we could split it into more than one view. But adding a new > view for every new thing we want to show is also not very good from > either a usability or performance perspective. So where would we put > it? I think we should group new properties that don't change over the course of a session ([1]) in a new view (e.g. pg_stat_session). I don't think we need one view per property, but I do think it makes sense to split information that changes very frequently (like most pg_stat_activity contents) from information that doesn't (like auth_method, auth_identity). Greetings, Andres Freund [1] Additionally I think something like pg_stat_session could also contain per-session cumulative counters like the session's contribution to pg_stat_database.{idle_in_transaction_time,active_time}
Magnus Hagander <magnus@hagander.net> writes: > I mean, we could split it into more than one view. But adding a new > view for every new thing we want to show is also not very good from > either a usability or performance perspective. So where would we put > it? It'd have to be a new view with a row per session, showing static (or at least mostly static?) properties of the session. Could we move some existing fields of pg_stat_activity into such a view? In any case, there'd have to be a key column to use to join it to pg_stat_activity. I'm not sure that this is worth the trouble TBH. If it can be shown that pulling a few fields out of pg_stat_activity actually does make for a useful speedup, then maybe OK ... but Andres hasn't provided any evidence that there's a measurable issue. regards, tom lane
On Fri, Feb 16, 2024 at 9:20 PM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2024-02-16 20:57:59 +0100, Magnus Hagander wrote: > > On Fri, Feb 16, 2024 at 8:41 PM Andres Freund <andres@anarazel.de> wrote: > > > On 2024-01-10 12:46:34 +0100, Magnus Hagander wrote: > > > > The attached patch adds a column "authuser" to pg_stat_activity which > > > > contains the username of the externally authenticated user, being the > > > > same value as the SYSTEM_USER keyword returns in a backend. > > > > > > I continue to think that it's a bad idea to make pg_stat_activity ever wider > > > with columns that do not actually describe properties that change across the > > > course of a session. Yes, there's the argument that that ship has sailed, but > > > I don't think that's a good reason to continue ever further down that road. > > > > > > It's not just a usability issue, it also makes it more expensive to query > > > pg_stat_activity. This is of course more pronounced with textual columns than > > > with integer ones. > > > > That's a fair point, but I do think that has in most ways already sailed, yes. > > > > I mean, we could split it into more than one view. But adding a new > > view for every new thing we want to show is also not very good from > > either a usability or performance perspective. So where would we put > > it? > > I think we should group new properties that don't change over the course of a > session ([1]) in a new view (e.g. pg_stat_session). I don't think we need one > view per property, but I do think it makes sense to split information that > changes very frequently (like most pg_stat_activity contents) from information > that doesn't (like auth_method, auth_identity). That would make sense in many ways, but ends up with "other level of annoyances". E.g. the database name and oid don't change, but would we want to move those out of pg_stat_activity? Same for username? Don't we just end up in a grayzone about what belongs where? Also - were you envisioning just another view, or actually replacing the pg_stat_get_activity() part? As in where do you think the cost comes? (And as to Toms question about key column - the pid column can surely be that? We already do that for pg_stat_ssl and pg_stat_gssapi, that are both driven from pg_stat_get_activity() but shows a different set of columns. > Additionally I think something like pg_stat_session could also contain > per-session cumulative counters like the session's contribution to > pg_stat_database.{idle_in_transaction_time,active_time} -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
On Fri, Feb 16, 2024 at 8:55 PM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2024-01-12 17:16:53 +0100, Magnus Hagander wrote: > > On Thu, Jan 11, 2024 at 5:55 PM Bertrand Drouvot > > <bertranddrouvot.pg@gmail.com> wrote: > > > On Thu, Jan 11, 2024 at 02:24:58PM +0100, Magnus Hagander wrote: > > > > On Wed, Jan 10, 2024 at 3:12 PM Bertrand Drouvot > > > > <bertranddrouvot.pg@gmail.com> wrote: > > > > > > > > > > If we go the 2 fields way, then what about auth_identity and auth_method then? > > > > > > > > > > > > Here is an updated patch based on this idea. > > > > > > Thanks! > > > > > > + <row> > > > + <entry role="catalog_table_entry"><para role="column_definition"> > > > + <structfield>auth_method</structfield> <type>text</type> > > > + </para> > > > + <para> > > > + The authentication method used for authenticating the connection, or > > > + NULL for background processes. > > > + </para></entry> > > > > > > I'm wondering if it would make sense to populate it for parallel workers too. > > > I think it's doable thanks to d951052, but I'm not sure it's worth it (one could > > > join based on the leader_pid though). OTOH that would be consistent with > > > how the SYSTEM_USER behaves with parallel workers (it's populated). > > > > I guess one could conceptually argue that "authentication happens int > > he leader". But we do populate it with the other user records, and > > it'd be weird if this one was excluded. > > > > The tricky thing is that pgstat_bestart() is called long before we > > deserialize the data. But from what I can tell it should be safe to > > change it per the attached? That should be AFAICT an extremely short > > window of time longer before we report it, not enough to matter. > > I don't like that one bit. The whole subsystem initialization dance already is > quite complicated, particularly for pgstat, we shouldn't make it more > complicated. Especially not when the initialization is moved quite a bit away > from all the other calls. > > Besides just that, I also don't think delaying visibility of the worker in > pg_stat_activity until parallel worker initialization has completed is good, > that's not all cheap work. > > > Maybe I am missing something, but why aren't we just getting the value from > the leader's entry, instead of copying it? The answer to that would be "because I didn't think of it" :) Were you thinking of something like the attached? -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Attachment
Hi, On 2024-02-16 15:22:16 -0500, Tom Lane wrote: > Magnus Hagander <magnus@hagander.net> writes: > > I mean, we could split it into more than one view. But adding a new > > view for every new thing we want to show is also not very good from > > either a usability or performance perspective. So where would we put > > it? > > It'd have to be a new view with a row per session, showing static > (or at least mostly static?) properties of the session. Yep. > Could we move some existing fields of pg_stat_activity into such a > view? I'd suspect that at least some of - leader_pid - datid - datname - usesysid - usename - backend_start - client_addr - client_hostname - client_port - backend_type could be moved. Whether's worth breaking existing queries, I don't quite know. One option would be to not return (some) of them from pg_stat_get_activity(), but add them to the view in a way that the planner can elide the reference. > I'm not sure that this is worth the trouble TBH. If it can be shown > that pulling a few fields out of pg_stat_activity actually does make > for a useful speedup, then maybe OK ... but Andres hasn't provided > any evidence that there's a measurable issue. If I thought that the two columns proposed here were all that we wanted to add, I'd not be worried. But there have been quite a few other fields proposed, e.g. tracking idle/active time on a per-connection granularity. We even already have a patch to add pg_stat_session https://commitfest.postgresql.org/47/3405/ Greetings, Andres Freund
Hi, On 2024-02-16 21:41:41 +0100, Magnus Hagander wrote: > > Maybe I am missing something, but why aren't we just getting the value from > > the leader's entry, instead of copying it? > > The answer to that would be "because I didn't think of it" :) :) > Were you thinking of something like the attached? > @@ -435,6 +438,22 @@ pg_stat_get_activity(PG_FUNCTION_ARGS) > { > values[29] = Int32GetDatum(leader->pid); > nulls[29] = false; > + > + /* > + * The authenticated user in a parallel worker is the same as the one in > + * the leader, so look it up there. > + */ > + if (leader->backendId) > + { > + LocalPgBackendStatus *leaderstat = pgstat_get_local_beentry_by_backend_id(leader->backendId); > + > + if (leaderstat->backendStatus.st_auth_method != uaReject && leaderstat->backendStatus.st_auth_method!= uaImplicitReject) > + { > + nulls[31] = nulls[32] = false; > + values[31] = CStringGetTextDatum(hba_authname(leaderstat->backendStatus.st_auth_method)); > + values[32] = CStringGetTextDatum(leaderstat->backendStatus.st_auth_identity); > + } > + } Mostly, yes. I only skimmed the patch, but it sure looks to me that we could end up with none of the branches setting 31,32, so I think you'd have to make sure to handle that case. Greetings, Andres Freund
On Fri, Feb 16, 2024 at 9:51 PM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2024-02-16 21:41:41 +0100, Magnus Hagander wrote: > > > Maybe I am missing something, but why aren't we just getting the value from > > > the leader's entry, instead of copying it? > > > > The answer to that would be "because I didn't think of it" :) > > :) > > > > Were you thinking of something like the attached? > > > @@ -435,6 +438,22 @@ pg_stat_get_activity(PG_FUNCTION_ARGS) > > { > > values[29] = Int32GetDatum(leader->pid); > > nulls[29] = false; > > + > > + /* > > + * The authenticated user in a parallel worker is the same as the one in > > + * the leader, so look it up there. > > + */ > > + if (leader->backendId) > > + { > > + LocalPgBackendStatus *leaderstat = pgstat_get_local_beentry_by_backend_id(leader->backendId); > > + > > + if (leaderstat->backendStatus.st_auth_method != uaReject && leaderstat->backendStatus.st_auth_method!= uaImplicitReject) > > + { > > + nulls[31] = nulls[32] = false; > > + values[31] = CStringGetTextDatum(hba_authname(leaderstat->backendStatus.st_auth_method)); > > + values[32] = CStringGetTextDatum(leaderstat->backendStatus.st_auth_identity); > > + } > > + } > > Mostly, yes. > > I only skimmed the patch, but it sure looks to me that we could end up with > none of the branches setting 31,32, so I think you'd have to make sure to > handle that case. That case sets nulls[] for both of them to true I believe? And when that is set I don't believe we need to set the values themselves. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
On 2024-02-16 21:56:25 +0100, Magnus Hagander wrote: > On Fri, Feb 16, 2024 at 9:51 PM Andres Freund <andres@anarazel.de> wrote: > > I only skimmed the patch, but it sure looks to me that we could end up with > > none of the branches setting 31,32, so I think you'd have to make sure to > > handle that case. > > That case sets nulls[] for both of them to true I believe? And when > that is set I don't believe we need to set the values themselves. Seems I skimmed too quickly :) - you're right.
Hi, On Fri, Feb 16, 2024 at 08:17:41PM +0100, Magnus Hagander wrote: > On Fri, Jan 19, 2024 at 7:20 AM Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > > > Hi, > > > > On Thu, Jan 18, 2024 at 04:01:33PM +0100, Magnus Hagander wrote: > > > On Mon, Jan 15, 2024 at 11:17 AM Bertrand Drouvot > > > > Did you forget to share the new revision (aka v4)? I can only see the > > > > "reorder_parallel_worker_bestart.patch" attached. > > > > > > I did. Here it is, and also including that suggested docs fix as well > > > as a rebase on current master. > > > > > > > Thanks! > > > > Just a few comments: > > > > 1 === > > > > + The authentication method used for authenticating the connection, or > > + NULL for background processes. > > > > What about? "NULL for background processes (except for parallel workers which > > inherit this information from their leader process)" > > Ugh. That doesn't read very well at all to me. Maybe just "NULL for > background processes without a user"? Not sure, as I think it could be NULL for background processes that provided a user in BackgroundWorkerInitializeConnection() too. > > 2 === > > > > + cycle before they were assigned a database role. Contains the same > > + value as the identity part in <xref linkend="system-user" />, or NULL > > + for background processes. > > > > Same comment about parallel workers. > > > > 3 === > > > > +# pg_stat_activity should contain trust and empty string for trust auth > > +$res = $node->safe_psql( > > + 'postgres', > > + "SELECT auth_method, auth_identity='' FROM pg_stat_activity WHERE pid=pg_backend_pid()", > > + connstr => "user=scram_role"); > > +is($res, 'trust|t', 'Users with trust authentication should show correctly in pg_stat_activity'); > > + > > +# pg_stat_activity should contain NULL for auth of background processes > > +# (test is a bit out of place here, but this is where the other pg_stat_activity.auth* tests are) > > +$res = $node->safe_psql( > > + 'postgres', > > + "SELECT auth_method IS NULL, auth_identity IS NULL FROM pg_stat_activity WHERE backend_type='checkpointer'", > > +); > > +is($res, 't|t', 'Background processes should show NULL for auth in pg_stat_activity'); > > > > What do you think about testing the parallel workers cases too? (I'm not 100% > > sure it's worth the extra complexity though). > > I'm leaning towards not worth that. Okay, I'm fine with that too. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Fri, Feb 16, 2024 at 08:39:26PM +0100, Magnus Hagander wrote: > On Fri, Jan 19, 2024 at 1:43 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > + value as the identity part in <xref linkend="system-user" />, or NULL > > I was looking at > > https://www.postgresql.org/docs/current/auth-username-maps.html and > > noticed that this page is switching between system-user and > > system-username. Should we clean that up while at it? > > Seems like something we should clean up yes, but not as part of this patch. Agree, done in [1]. [1]: https://www.postgresql.org/message-id/ZdMWux1HpIebkEmd%40ip-10-97-1-34.eu-west-3.compute.internal Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Fri, Feb 16, 2024 at 09:41:41PM +0100, Magnus Hagander wrote: > On Fri, Feb 16, 2024 at 8:55 PM Andres Freund <andres@anarazel.de> wrote: > > > > Hi, > > > > On 2024-01-12 17:16:53 +0100, Magnus Hagander wrote: > > > On Thu, Jan 11, 2024 at 5:55 PM Bertrand Drouvot > > > <bertranddrouvot.pg@gmail.com> wrote: > > > > On Thu, Jan 11, 2024 at 02:24:58PM +0100, Magnus Hagander wrote: > > > > > On Wed, Jan 10, 2024 at 3:12 PM Bertrand Drouvot > > > > > <bertranddrouvot.pg@gmail.com> wrote: > > > > > > > > > > > > If we go the 2 fields way, then what about auth_identity and auth_method then? > > > > > > > > > > > > > > > Here is an updated patch based on this idea. > > > > > > > > Thanks! > > > > > > > > + <row> > > > > + <entry role="catalog_table_entry"><para role="column_definition"> > > > > + <structfield>auth_method</structfield> <type>text</type> > > > > + </para> > > > > + <para> > > > > + The authentication method used for authenticating the connection, or > > > > + NULL for background processes. > > > > + </para></entry> > > > > > > > > I'm wondering if it would make sense to populate it for parallel workers too. > > > > I think it's doable thanks to d951052, but I'm not sure it's worth it (one could > > > > join based on the leader_pid though). OTOH that would be consistent with > > > > how the SYSTEM_USER behaves with parallel workers (it's populated). > > > > > > I guess one could conceptually argue that "authentication happens int > > > he leader". But we do populate it with the other user records, and > > > it'd be weird if this one was excluded. > > > > > > The tricky thing is that pgstat_bestart() is called long before we > > > deserialize the data. But from what I can tell it should be safe to > > > change it per the attached? That should be AFAICT an extremely short > > > window of time longer before we report it, not enough to matter. > > > > I don't like that one bit. The whole subsystem initialization dance already is > > quite complicated, particularly for pgstat, we shouldn't make it more > > complicated. Especially not when the initialization is moved quite a bit away > > from all the other calls. > > > > Besides just that, I also don't think delaying visibility of the worker in > > pg_stat_activity until parallel worker initialization has completed is good, > > that's not all cheap work. > > > > > > Maybe I am missing something, but why aren't we just getting the value from > > the leader's entry, instead of copying it? Good point! > The answer to that would be "because I didn't think of it" :) I'm in the same boat ;-) > Were you thinking of something like the attached? Doing it that way looks good to me. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Fri, Feb 16, 2024 at 9:31 PM Magnus Hagander <magnus@hagander.net> wrote: > > On Fri, Feb 16, 2024 at 9:20 PM Andres Freund <andres@anarazel.de> wrote: > > > > Hi, > > > > On 2024-02-16 20:57:59 +0100, Magnus Hagander wrote: > > > On Fri, Feb 16, 2024 at 8:41 PM Andres Freund <andres@anarazel.de> wrote: > > > > On 2024-01-10 12:46:34 +0100, Magnus Hagander wrote: > > > > > The attached patch adds a column "authuser" to pg_stat_activity which > > > > > contains the username of the externally authenticated user, being the > > > > > same value as the SYSTEM_USER keyword returns in a backend. > > > > > > > > I continue to think that it's a bad idea to make pg_stat_activity ever wider > > > > with columns that do not actually describe properties that change across the > > > > course of a session. Yes, there's the argument that that ship has sailed, but > > > > I don't think that's a good reason to continue ever further down that road. > > > > > > > > It's not just a usability issue, it also makes it more expensive to query > > > > pg_stat_activity. This is of course more pronounced with textual columns than > > > > with integer ones. > > > > > > That's a fair point, but I do think that has in most ways already sailed, yes. > > > > > > I mean, we could split it into more than one view. But adding a new > > > view for every new thing we want to show is also not very good from > > > either a usability or performance perspective. So where would we put > > > it? > > > > I think we should group new properties that don't change over the course of a > > session ([1]) in a new view (e.g. pg_stat_session). I don't think we need one > > view per property, but I do think it makes sense to split information that > > changes very frequently (like most pg_stat_activity contents) from information > > that doesn't (like auth_method, auth_identity). > > That would make sense in many ways, but ends up with "other level of > annoyances". E.g. the database name and oid don't change, but would we > want to move those out of pg_stat_activity? Same for username? Don't > we just end up in a grayzone about what belongs where? > > Also - were you envisioning just another view, or actually replacing > the pg_stat_get_activity() part? As in where do you think the cost > comes? Andres -- did you spot this question in the middle or did it get lost in the flurry of others? :) -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
On Fri, Feb 16, 2024 at 9:45 PM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2024-02-16 15:22:16 -0500, Tom Lane wrote: > > Magnus Hagander <magnus@hagander.net> writes: > > > I mean, we could split it into more than one view. But adding a new > > > view for every new thing we want to show is also not very good from > > > either a usability or performance perspective. So where would we put > > > it? > > > > It'd have to be a new view with a row per session, showing static > > (or at least mostly static?) properties of the session. > > Yep. > > > > Could we move some existing fields of pg_stat_activity into such a > > view? > > I'd suspect that at least some of > - leader_pid > - datid > - datname > - usesysid > - usename > - backend_start > - client_addr > - client_hostname > - client_port > - backend_type > > could be moved. Whether's worth breaking existing queries, I don't quite know. I think that's the big question. I think if we move all of those we will break every single monitoring tool out there for postgres... That's a pretty hefty price. > One option would be to not return (some) of them from pg_stat_get_activity(), > but add them to the view in a way that the planner can elide the reference. Without having any numbers, I would think that the join to pg_authid for exapmle is likely more costly than returning all the other fields. But that one does get eliminated as long as one doesn't query that column. But if we make more things "joined in from the view", isn't that likely to just make it more expensive in most cases? > > I'm not sure that this is worth the trouble TBH. If it can be shown > > that pulling a few fields out of pg_stat_activity actually does make > > for a useful speedup, then maybe OK ... but Andres hasn't provided > > any evidence that there's a measurable issue. > > If I thought that the two columns proposed here were all that we wanted to > add, I'd not be worried. But there have been quite a few other fields > proposed, e.g. tracking idle/active time on a per-connection granularity. > > We even already have a patch to add pg_stat_session > https://commitfest.postgresql.org/47/3405/ In a way, that's yet another different type of values though -- it contains accumulated stats. So we really have 3 types -- "info" that's not really stats (username, etc), "current state" (query, wait events, state) and "accumulated stats" (counters since start).If we don't want to combine them all, we should perhaps not combine any and actually have 3 views? -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/