Thread: Addition of authenticated ID to pg_stat_activity

Addition of authenticated ID to pg_stat_activity

From
Michael Paquier
Date:
Hi all,

9afffcb has added the concept of authenticated identity to the
information provided in log_connections for audit purposes, with this
data stored in each backend's port.  One extra thing that can be
really useful for monitoring is the capability to track this
information directly in pg_stat_activity.

Please find attached a patch to do that, with the following choices
made:
- Like query strings, authenticated IDs could be rather long, so we
need a GUC to control the maximum size allocated for these in shared
memory.  The attached uses 128 bytes by default, that should be enough
in most cases even for DNs, LDAP or krb5.
- Multi-byte strings need to be truncated appropriately.  As a matter
of consistency with the query string code, I have made things so as
the truncation is done each time a string is requested, with
PgBackendStatus storing the raw information truncated depending on the
maximum size allowed at the GUC level.
- Some tests are added within the SSL and LDAP code paths.  We could
add more of that within the authentication and kerberos tests but that
did not strike me as mandatory either as the backend logs are checked
everywhere already.
- The new field has been added at the end of pg_stat_end_activity()
mainly as a matter of readability.  I'd rather move that just after
the application_name, now only pg_stat_activity does that.

I am adding that to the next CF.

Thanks,
--
Michael

Attachment

Re: Addition of authenticated ID to pg_stat_activity

From
Justin Pryzby
Date:
On Mon, Apr 26, 2021 at 11:34:16AM +0900, Michael Paquier wrote:
> +++ b/doc/src/sgml/config.sgml
> @@ -7596,6 +7596,24 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
>        </listitem>
>       </varlistentry>
>  
> +     <varlistentry id="guc-track-activity-authn-size" xreflabel="track_activity_authn_size">
> +      <term><varname>track_activity_authn_size</varname> (<type>integer</type>)
> +      <indexterm>
> +       <primary><varname>track_activity_authn_size</varname> configuration parameter</primary>
> +      </indexterm>
> +      </term>
> +      <listitem>
> +       <para>
> +       Specifies the amount of memory reserved to store the text of the
> +       currently executing command for each active session, for the

That part looks to be a copy+paste error.

> +       <structname>pg_stat_activity</structname>.<structfield>authenticated_id</structfield> field.
> +       If this value is specified without units, it is taken as bytes.
> +       The default value is 128 bytes.
> +       This parameter can only be set at server start.
> +       </para>
> +      </listitem>
> +     </varlistentry>

I think many/most things in log/CSV should also go in PSA, and vice versa.

It seems like there should be a comment about this - in both places - to avoid
forgetting it in the future.

-- 
Justin



Re: Addition of authenticated ID to pg_stat_activity

From
Michael Paquier
Date:
On Sun, Apr 25, 2021 at 10:14:43PM -0500, Justin Pryzby wrote:
> That part looks to be a copy+paste error.

Sorry about that.  I have fixed that on my own branch.

>> +       <structname>pg_stat_activity</structname>.<structfield>authenticated_id</structfield> field.
>> +       If this value is specified without units, it is taken as bytes.
>> +       The default value is 128 bytes.
>> +       This parameter can only be set at server start.
>> +       </para>
>> +      </listitem>
>> +     </varlistentry>
>
> I think many/most things in log/CSV should also go in PSA, and vice versa.
>
> It seems like there should be a comment about this - in both places - to avoid
> forgetting it in the future.

I am not sure what you mean here, neither do I see in what this is
related to what is proposed on this thread.
--
Michael

Attachment

Re: Addition of authenticated ID to pg_stat_activity

From
Andres Freund
Date:
Hi,

On 2021-04-26 11:34:16 +0900, Michael Paquier wrote:
> 9afffcb has added the concept of authenticated identity to the
> information provided in log_connections for audit purposes, with this
> data stored in each backend's port.  One extra thing that can be
> really useful for monitoring is the capability to track this
> information directly in pg_stat_activity.

I'm getting a bit worried about the incremental increase in
pg_stat_activity width - it's probably by far the view that's most
viewed interactively. I think we need to be careful not to add too niche
things to it. This is especially true for columns that may be wider.

It'd be bad for discoverability, but perhaps something like this, that's
not that likely to be used interactively, would be better done as a
separate function that would need to be used explicitly?


A select * from pg_stat_activity on a plain installation, connecting
over unix socket, with nothing running, is 411 chars wide for me...

Greetings,

Andres Freund



Re: Addition of authenticated ID to pg_stat_activity

From
Stephen Frost
Date:
Greetings,

* Andres Freund (andres@anarazel.de) wrote:
> On 2021-04-26 11:34:16 +0900, Michael Paquier wrote:
> > 9afffcb has added the concept of authenticated identity to the
> > information provided in log_connections for audit purposes, with this
> > data stored in each backend's port.  One extra thing that can be
> > really useful for monitoring is the capability to track this
> > information directly in pg_stat_activity.
>
> I'm getting a bit worried about the incremental increase in
> pg_stat_activity width - it's probably by far the view that's most
> viewed interactively. I think we need to be careful not to add too niche
> things to it. This is especially true for columns that may be wider.
>
> It'd be bad for discoverability, but perhaps something like this, that's
> not that likely to be used interactively, would be better done as a
> separate function that would need to be used explicitly?

I mean.. we already have separate functions and views for this, though
they're auth-method-specific currently, but also provide more details,
since it isn't actually a "one size fits all" kind of thing like this
entire approach is imagining it to be.

Thanks,

Stephen

Attachment

Re: Addition of authenticated ID to pg_stat_activity

From
Michael Paquier
Date:
On Mon, Apr 26, 2021 at 03:21:46PM -0400, Stephen Frost wrote:
> * Andres Freund (andres@anarazel.de) wrote:
>> I'm getting a bit worried about the incremental increase in
>> pg_stat_activity width - it's probably by far the view that's most
>> viewed interactively. I think we need to be careful not to add too niche
>> things to it. This is especially true for columns that may be wider.
>>
>> It'd be bad for discoverability, but perhaps something like this, that's
>> not that likely to be used interactively, would be better done as a
>> separate function that would need to be used explicitly?
>
> I mean.. we already have separate functions and views for this, though
> they're auth-method-specific currently, but also provide more details,
> since it isn't actually a "one size fits all" kind of thing like this
> entire approach is imagining it to be.

Referring to pg_stat_ssl and pg_stat_gssapi here, right?  Yes, that
would be very limited as this leads to no visibility for LDAP, all
password-based authentications and more.

I am wondering if we should take this as an occasion to move some data
out of pg_stat_activity into a separate biew, dedicated to the data
related to the connection that remains set to the same value for the
duration of a backend's life, as of the following set:
- the backend PID
- client_addr
- client_hostname
- client_port
- authenticated ID
- application_name?  (well, this one could change on reload, so I am
lying).

It would be tempting to move the database name and the username but
these are popular fields with monitoring.  Maybe we could name that
pg_stat_connection_status, pg_stat_auth_status or just
pg_stat_connection?
--
Michael

Attachment

Re: Addition of authenticated ID to pg_stat_activity

From
Julien Rouhaud
Date:
On Tue, Apr 27, 2021 at 09:59:18AM +0900, Michael Paquier wrote:
> 
> I am wondering if we should take this as an occasion to move some data
> out of pg_stat_activity into a separate biew, dedicated to the data
> related to the connection that remains set to the same value for the
> duration of a backend's life, as of the following set:
> - the backend PID

-1.  It's already annoying enough to have to type "WHERE pid !=
pg_backend_pid()" to exclude my own backend, and I usually need it quite often.
Unless we add some new view which integrate that, something like
pg_stat_activity_except_me with a better name.  I also don't see how we could
join a new dedicated view with the old one without that information.

> - application_name?  (well, this one could change on reload, so I am
> lying).

No, it can change at any time.  And the fact that it's not transactional makes
it quite convenient for poor man progress reporting.  For instance in powa I
use that to report what the bgworker is currently working on, and this has
already proven to be useful.



Re: Addition of authenticated ID to pg_stat_activity

From
Michael Paquier
Date:
On Tue, Apr 27, 2021 at 09:26:11AM +0800, Julien Rouhaud wrote:
> -1.  It's already annoying enough to have to type "WHERE pid !=
> pg_backend_pid()" to exclude my own backend, and I usually need it quite often.
> Unless we add some new view which integrate that, something like
> pg_stat_activity_except_me with a better name.  I also don't see how we could
> join a new dedicated view with the old one without that information.

Err, sorry for the confusion.  What I meant here is to also keep the
PID in pg_stat_activity, but also add it to this new view to be able
to join things across the board.
--
Michael

Attachment

Re: Addition of authenticated ID to pg_stat_activity

From
Stephen Frost
Date:
Greetings,

* Michael Paquier (michael@paquier.xyz) wrote:
> On Mon, Apr 26, 2021 at 03:21:46PM -0400, Stephen Frost wrote:
> > * Andres Freund (andres@anarazel.de) wrote:
> >> I'm getting a bit worried about the incremental increase in
> >> pg_stat_activity width - it's probably by far the view that's most
> >> viewed interactively. I think we need to be careful not to add too niche
> >> things to it. This is especially true for columns that may be wider.
> >>
> >> It'd be bad for discoverability, but perhaps something like this, that's
> >> not that likely to be used interactively, would be better done as a
> >> separate function that would need to be used explicitly?
> >
> > I mean.. we already have separate functions and views for this, though
> > they're auth-method-specific currently, but also provide more details,
> > since it isn't actually a "one size fits all" kind of thing like this
> > entire approach is imagining it to be.
>
> Referring to pg_stat_ssl and pg_stat_gssapi here, right?  Yes, that
> would be very limited as this leads to no visibility for LDAP, all
> password-based authentications and more.

Yes, of course.  The point being made was that we could do the same for
the other auth methods rather than adding something to pg_stat_activity.

> I am wondering if we should take this as an occasion to move some data
> out of pg_stat_activity into a separate biew, dedicated to the data
> related to the connection that remains set to the same value for the
> duration of a backend's life, as of the following set:
> - the backend PID
> - client_addr
> - client_hostname
> - client_port
> - authenticated ID
> - application_name?  (well, this one could change on reload, so I am
> lying).

application_name certainly changes, as pointed out elsewhere.

> It would be tempting to move the database name and the username but
> these are popular fields with monitoring.  Maybe we could name that
> pg_stat_connection_status, pg_stat_auth_status or just
> pg_stat_connection?

I don't know that there's really any of the existing fields that
aren't "popular fields with monitoring"..  The issue that Andres brought
up wasn't about monitoring though- it was about users looking
interactively.  Monitoring systems can adjust their queries for the new
major version to do whatever joins, et al, they need and that's a
once-per-major-version to do.  On the other hand, people doing:

table pg_stat_activity;

Would like to get the info they really want out of that and not anything
else.  If we're going to adjust the fields returned from that then
that's really the lens we should use.

So, what fields are people really looking at when querying
pg_stat_activity interactively?  User, database, pid, last query,
transaction start, query start, state, wait event info, maybe backend
xmin/xid?  I doubt most people looking at pg_stat_activity interactively
actually care about the non-user backends (autovacuum, et al).

Thanks,

Stephen

Attachment

Re: Addition of authenticated ID to pg_stat_activity

From
Andres Freund
Date:
Hi,

On 2021-04-27 12:40:29 -0400, Stephen Frost wrote:
> So, what fields are people really looking at when querying
> pg_stat_activity interactively?  User, database, pid, last query,
> transaction start, query start, state, wait event info, maybe backend
> xmin/xid?  I doubt most people looking at pg_stat_activity interactively
> actually care about the non-user backends (autovacuum, et al).

Not representative, but I personally am about as often interested in one
of the non-connection processes as the connection
ones. E.g. investigating what is autovacuum's bottleneck, are
checkpointer / wal writer / bgwriter io bound or keeping up, etc.

Greetings,

Andres Freund



Re: Addition of authenticated ID to pg_stat_activity

From
Magnus Hagander
Date:
On Tue, Apr 27, 2021 at 8:25 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2021-04-27 12:40:29 -0400, Stephen Frost wrote:
> > So, what fields are people really looking at when querying
> > pg_stat_activity interactively?  User, database, pid, last query,
> > transaction start, query start, state, wait event info, maybe backend
> > xmin/xid?  I doubt most people looking at pg_stat_activity interactively
> > actually care about the non-user backends (autovacuum, et al).
>
> Not representative, but I personally am about as often interested in one
> of the non-connection processes as the connection
> ones. E.g. investigating what is autovacuum's bottleneck, are
> checkpointer / wal writer / bgwriter io bound or keeping up, etc.

I definitely use it all the time to monitor autovacuum all the time.
The others as well regularly, but autovacuum continuously. I also see
a lot of people doing things like "from pg_stat_activity where query
like '%mytablename%'" where they'd want both any regular queries and
any autovacuums currently processing the table.

I'd say client address is also pretty common to identify which set of
app servers connections are coming in from -- but client port and
client hostname are a lot less interesting. But it'd be kind of weird
to split those out.

For *interactive use* I'd find pretty much all other columns
interesting and commonly used. Probably not that interested in the
oids of the database and user, but again they are the cheap ones. We
could get rid of the joints if we only showed the oids, but in
interactive use it's really the names that are interesting. But if
we're just trying to save column count, I'd say get rid of datid and
usesysid.

I'd hold everything else as interesting.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: Addition of authenticated ID to pg_stat_activity

From
Michael Paquier
Date:
On Thu, Apr 29, 2021 at 04:56:42PM +0200, Magnus Hagander wrote:
> I definitely use it all the time to monitor autovacuum all the time.
> The others as well regularly, but autovacuum continuously. I also see
> a lot of people doing things like "from pg_stat_activity where query
> like '%mytablename%'" where they'd want both any regular queries and
> any autovacuums currently processing the table.

When it comes to development work, I also look at things different
than backend connections, checkpointer and WAL writer included.

> I'd say client address is also pretty common to identify which set of
> app servers connections are coming in from -- but client port and
> client hostname are a lot less interesting. But it'd be kind of weird
> to split those out.

Yes, I agree that it would be confusing to split the client_* fields
across multiple views.

> For *interactive use* I'd find pretty much all other columns
> interesting and commonly used. Probably not that interested in the
> oids of the database and user, but again they are the cheap ones. We
> could get rid of the joints if we only showed the oids, but in
> interactive use it's really the names that are interesting. But if
> we're just trying to save column count, I'd say get rid of datid and
> usesysid.
>
> I'd hold everything else as interesting.

Yes, you have an argument here about the removal of usesysid and
datid.  Now I find joins involving OIDs to be much more natural than
the object names, because that's the base of what we use in the
catalogs.

Not sure if we would be able to agree on something here, but the
barrier to what a session and a connection hold is thin when it comes
to roles and application_name.  Thinking more about that, I would be
really tempted to get to do a more straight split with data that's
associated to a session, to a transaction and to a connection, say:
1) pg_stat_session, data that may change.
- PID
- leader PID
- the role name
- role ID
- application_name
- wait_event_type
- wait_event
2) pg_stat_connection, static data associated to a connection.
- PID
- database name
- database OID
- client_addr
- client_hostname
- client_port
- backend_start
- authn ID
- backend_type
3) pg_stat_transaction, or pg_stat_activity, for the transactional
activity.
- PID
- xact_start
- query_start
- backend_xid
- state_change
- query string
- query ID
- state

Or I could just drop a new function that fetches the authn ID for a
given PID, even if this makes things potentially less consistent when
it comes to the lookup of PgBackendStatus, guarantee given now by
pg_stat_get_activity().
--
Michael

Attachment

Re: Addition of authenticated ID to pg_stat_activity

From
Magnus Hagander
Date:
On Mon, May 17, 2021 at 6:35 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Apr 29, 2021 at 04:56:42PM +0200, Magnus Hagander wrote:
> > I definitely use it all the time to monitor autovacuum all the time.
> > The others as well regularly, but autovacuum continuously. I also see
> > a lot of people doing things like "from pg_stat_activity where query
> > like '%mytablename%'" where they'd want both any regular queries and
> > any autovacuums currently processing the table.
>
> When it comes to development work, I also look at things different
> than backend connections, checkpointer and WAL writer included.

While I think we should optimize these primarily for users and not
developers, I definitely do those things as well. In particular wait
events for the background processes.


> > I'd say client address is also pretty common to identify which set of
> > app servers connections are coming in from -- but client port and
> > client hostname are a lot less interesting. But it'd be kind of weird
> > to split those out.
>
> Yes, I agree that it would be confusing to split the client_* fields
> across multiple views.
>
> > For *interactive use* I'd find pretty much all other columns
> > interesting and commonly used. Probably not that interested in the
> > oids of the database and user, but again they are the cheap ones. We
> > could get rid of the joints if we only showed the oids, but in
> > interactive use it's really the names that are interesting. But if
> > we're just trying to save column count, I'd say get rid of datid and
> > usesysid.
> >
> > I'd hold everything else as interesting.
>
> Yes, you have an argument here about the removal of usesysid and
> datid.  Now I find joins involving OIDs to be much more natural than
> the object names, because that's the base of what we use in the
> catalogs.

Agreed. And I'm not sure the actual gain is that big if we can just
remove oid columns...


> Not sure if we would be able to agree on something here, but the
> barrier to what a session and a connection hold is thin when it comes
> to roles and application_name.  Thinking more about that, I would be
> really tempted to get to do a more straight split with data that's
> associated to a session, to a transaction and to a connection, say:
> 1) pg_stat_session, data that may change.
> - PID
> - leader PID
> - the role name
> - role ID
> - application_name
> - wait_event_type
> - wait_event
> 2) pg_stat_connection, static data associated to a connection.
> - PID
> - database name
> - database OID
> - client_addr
> - client_hostname
> - client_port
> - backend_start
> - authn ID
> - backend_type
> 3) pg_stat_transaction, or pg_stat_activity, for the transactional
> activity.
> - PID
> - xact_start
> - query_start
> - backend_xid
> - state_change
> - query string
> - query ID
> - state

This seems extremely user-unfriendly to me.

I mean. Timestamps are nso split out between different views so you
can't track the process iwthout it. And surely wait_event info is
*extremely* related to things like what query is running and what
state the session is in. And putting backend_type off in a separate
view means most queries are going to have to join that in anyway. Or
include it in all views. And if we're forcing the majority of queries
to join multiple views, what have we actually gained?

Based on your list above, I'd definitely want at least (1) and (2) to
be in the same one, but they'd have to also gain at least the database
oid/name and backend_type, and maybe also backend_start.

So basically, it would be moving out client_*, and authn_id. If we're
doing that then as you say maybe pg_stat_connection is a good name and
could then *also* gain the information that's currently in the ssl and
gss views for a net simplification.

tld;dr; I think we have to be really careful here or the cure is going
to be way worse than the disease.

> Or I could just drop a new function that fetches the authn ID for a
> given PID, even if this makes things potentially less consistent when
> it comes to the lookup of PgBackendStatus, guarantee given now by
> pg_stat_get_activity().

Well, the authnid will never change so I'm not sure the consistency
part is a big problem? Or maybe I'm misunderstanding the risk you're
referring to?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: Addition of authenticated ID to pg_stat_activity

From
Michael Paquier
Date:
On Mon, May 17, 2021 at 10:28:49AM +0200, Magnus Hagander wrote:
> On Mon, May 17, 2021 at 6:35 AM Michael Paquier <michael@paquier.xyz> wrote:
>> Not sure if we would be able to agree on something here, but the
>> barrier to what a session and a connection hold is thin when it comes
>> to roles and application_name.  Thinking more about that, I would be
>> really tempted to get to do a more straight split with data that's
>> associated to a session, to a transaction and to a connection, say:
>> 1) pg_stat_session, data that may change.
>> - PID
>> - leader PID
>> - the role name
>> - role ID
>> - application_name
>> - wait_event_type
>> - wait_event
>> 2) pg_stat_connection, static data associated to a connection.
>> - PID
>> - database name
>> - database OID
>> - client_addr
>> - client_hostname
>> - client_port
>> - backend_start
>> - authn ID
>> - backend_type
>> 3) pg_stat_transaction, or pg_stat_activity, for the transactional
>> activity.
>> - PID
>> - xact_start
>> - query_start
>> - backend_xid
>> - state_change
>> - query string
>> - query ID
>> - state
>
> This seems extremely user-unfriendly to me.
>
> I mean. Timestamps are nso split out between different views so you
> can't track the process iwthout it. And surely wait_event info is
> *extremely* related to things like what query is running and what
> state the session is in. And putting backend_type off in a separate
> view means most queries are going to have to join that in anyway. Or
> include it in all views. And if we're forcing the majority of queries
> to join multiple views, what have we actually gained?
>
> Based on your list above, I'd definitely want at least (1) and (2) to
> be in the same one, but they'd have to also gain at least the database
> oid/name and backend_type, and maybe also backend_start.

Okay.

> So basically, it would be moving out client_*, and authn_id.

So that would mean the addition of one new catalog view, called
pg_stat_connection, with the following fields:
- PID
- all three client_*
- authn ID
I can live with this split.  Thoughts from others?

> If we're
> doing that then as you say maybe pg_stat_connection is a good name and
> could then *also* gain the information that's currently in the ssl and
> gss views for a net simplification.

I am less enthutiastic about this addition.  SSL and GSSAPI have no
fields in common, so that would bloat the view for connection data
with mostly NULL fields most of the time.

> tld;dr; I think we have to be really careful here or the cure is going
> to be way worse than the disease.

Agreed.

>> Or I could just drop a new function that fetches the authn ID for a
>> given PID, even if this makes things potentially less consistent when
>> it comes to the lookup of PgBackendStatus, guarantee given now by
>> pg_stat_get_activity().
>
> Well, the authnid will never change so I'm not sure the consistency
> part is a big problem? Or maybe I'm misunderstanding the risk you're
> referring to?

I just mean to keep the consistency we have now with one single call
of pg_stat_get_activity() for each catalog view, so as we still fetch
once a consistent copy of all PgBackendStatus entries in this code
path.
--
Michael

Attachment

Re: Addition of authenticated ID to pg_stat_activity

From
Michael Paquier
Date:
On Tue, May 18, 2021 at 11:20:49AM +0900, Michael Paquier wrote:
> So that would mean the addition of one new catalog view, called
> pg_stat_connection, with the following fields:
> - PID
> - all three client_*
> - authn ID
> I can live with this split.  Thoughts from others?

Just to make the discussion move on, attached is an updated version
doing that.
--
Michael

Attachment

Re: Addition of authenticated ID to pg_stat_activity

From
Aleksander Alekseev
Date:
Hi Michael,

> Just to make the discussion move on, attached is an updated version
> doing that.

The code seems OK, but I have mixed feelings about the way that the
VIEW currently works.

Here is what I get when a single user is connected via a UNIX socket:

43204 (master) =# select * from pg_stat_connection;
  pid  | authenticated_id | client_addr | client_hostname | client_port
-------+------------------+-------------+-----------------+-------------
 25806 |                  |             |                 |
 25808 |                  |             |                 |
 43204 |                  |             |                 |          -1
 25804 |                  |             |                 |
 25803 |                  |             |                 |
 25805 |                  |             |                 |
(6 rows)

I bet we could be more user-friendly than this. To begin with, the
documentation says:

+  <para>
+   The <structname>pg_stat_connection</structname> view will have one row
+   per server process, showing information related to
+   the current connection of that process.
+  </para>

[...]

+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>client_addr</structfield> <type>inet</type>
+      </para>
+      <para>
+       IP address of the client connected to this backend.
+       If this field is null, it indicates either that the client is
+       connected via a Unix socket on the server machine or that this is an
+       internal process such as autovacuum.
+      </para></entry>
+     </row>

Any reason why we shouldn't simply exclude internal processes from the
view? Do they have a connection that the VIEW could show?

Secondly, maybe instead of magic constants like -1, we could use an
additional text column, e.g. connection_type: "unix"? Thirdly, not
sure if client_hostname is really needed, isn't address:port pair
enough to identify the client? Lastly, introducing a new GUC for
truncating values in a single view, which can only be set at server
start, doesn't strike me as a great idea. What is the worst-case
scenario if instead we will always allocate
`strlen(MyProcPort->authn_id)` and the user will truncate the result
manually if needed?

-- 
Best regards,
Aleksander Alekseev



Re: Addition of authenticated ID to pg_stat_activity

From
Michael Paquier
Date:
On Mon, Jul 19, 2021 at 04:56:24PM +0300, Aleksander Alekseev wrote:
> Any reason why we shouldn't simply exclude internal processes from the
> view? Do they have a connection that the VIEW could show?

Yeah, they don't really have any useful information.  Still, I kept
that mainly as a matter of consistency with pg_stat_activity, and this
can be useful to find out about internal processes without relying on
an extra check based on pg_stat_activity.backend_type.  Perhaps we
could just add a check in system_views.sql based on the NULL-ness of
client_port.

> Secondly, maybe instead of magic constants like -1, we could use an
> additional text column, e.g. connection_type: "unix"?

I am not really incline to break that more, as told by
pg_stat_get_activity() there are cases where this looks useful:
/*
 * Unix sockets always reports NULL for host and -1 for
 * port, so it's possible to tell the difference to
 * connections we have no permissions to view, or with
 * errors.
 */

> Thirdly, not
> sure if client_hostname is really needed, isn't address:port pair
> enough to identify the client?

It seems to me that this is still useful to know about
Port->remote_hostname.

> Lastly, introducing a new GUC for
> truncating values in a single view, which can only be set at server
> start, doesn't strike me as a great idea. What is the worst-case
> scenario if instead we will always allocate
> `strlen(MyProcPort->authn_id)` and the user will truncate the result
> manually if needed?

The authenticated ID could be a SSL DN longer than the default of
128kB that this patch is proposing.  I think that it is a good idea to
provide some way to the user to be able to control that without a
recompilation.
--
Michael

Attachment

Re: Addition of authenticated ID to pg_stat_activity

From
Michael Paquier
Date:
On Wed, Jul 21, 2021 at 01:21:17PM +0900, Michael Paquier wrote:
> The authenticated ID could be a SSL DN longer than the default of
> 128kB that this patch is proposing.  I think that it is a good idea to
> provide some way to the user to be able to control that without a
> recompilation.

I got to think about this patch more for the last couple of days, and
I'd still think that having a GUC to control how much shared memory we
need for the authenticated ID in each BackendStatusArray.  Now, the
thread has been idle for two months now, and it does not seem to
attract much attention.  This also includes a split of
pg_stat_activity for client_addr, client_hostname and client_port into
a new catalog, which may be hard to justify for this feature.  So I am
dropping the patch.
--
Michael

Attachment