Thread: Added columns to pg_stat_activity

Added columns to pg_stat_activity

From
"Magnus Hagander"
Date:
This patch ticks off the following TODO items:

* Add session start time and last statement time to pg_stat_activity
 (we already had last-statement-time, provided command string stats were
enabled)
* Add the client IP address and port to pg_stat_activity

I have tested on Linux (both IPV4 and Unix sockets for the address/port)
and Win32 (naturally only IPV4). I don't have a machine around with IPV6
support, so I haven't tested that. The code to handle it is stolen from
inet_client_addr() though, so it should hopefully be ok.


//Magnus

Attachment

Re: Added columns to pg_stat_activity

From
Neil Conway
Date:
Magnus Hagander wrote:
> * Add session start time and last statement time to pg_stat_activity
>  (we already had last-statement-time, provided command string stats were
> enabled)
> * Add the client IP address and port to pg_stat_activity

Looks pretty good -- barring any objections I'll apply this to HEAD
tomorrow or the day after (there's a room for a bit of cleanup to the
docs, but I'll do that before committing).

> I have tested on Linux (both IPV4 and Unix sockets for the address/port)
> and Win32 (naturally only IPV4).

Do you mean we only support IPv4 under Win32? (There are certainly
versions of Win32 which support IPv6.)

-Neil

Re: Added columns to pg_stat_activity

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> Looks pretty good -- barring any objections I'll apply this to HEAD
> tomorrow or the day after (there's a room for a bit of cleanup to the
> docs, but I'll do that before committing).

Couple of minor gripes:

1.  Can we reduce the amount of stuff #include'd by pgstat.h?  I
particularly dislike including builtins.h there; that surely should
be unnecessary, since there's nothing but externs in builtins.h.

2.  WRT the inet.h change: the project standard is that extern
declarations are spelled with "extern".

            regards, tom lane

Re: Added columns to pg_stat_activity

From
"Magnus Hagander"
Date:
> > Looks pretty good -- barring any objections I'll apply this to HEAD
> > tomorrow or the day after (there's a room for a bit of
> cleanup to the
> > docs, but I'll do that before committing).
>
> Couple of minor gripes:
>
> 1.  Can we reduce the amount of stuff #include'd by pgstat.h?
>  I particularly dislike including builtins.h there; that
> surely should be unnecessary, since there's nothing but
> externs in builtins.h.

Right. Only one needs to be there, the other two can be moved to
pgstatfuncs.c. The last one is needed to get the SockAddr structure.


> 2.  WRT the inet.h change: the project standard is that
> extern declarations are spelled with "extern".

Fixed.

Updated patch attached.

//Magnus

Attachment

Re: Added columns to pg_stat_activity

From
Neil Conway
Date:
Magnus Hagander wrote:
> Updated patch attached.

A few more comments:

- why did you add the client address to PgStat_MsgHdr, rather than
PgStat_MsgBestart? It would be nice to avoid sending the client address
for each and every stats message, as it shouldn't change over the life
of the session.

- I think the backend_client_addr and backend_client_port view columns
should be named "client_addr" and "client_port", respectively.

- Is there a reason you need to expose and then call the private
network_in() function, rather than using inet_in(), which is already public?

- Is the backend startup time sensitive information? Considering this
information is available via ps(1), perhaps it would be better to allow
any user to see any backend's startup time, rather than providing a
false sense of security.

- We return (NULL, -1) for the client address/port if connected via a
unix domain socket, and (NULL, NULL) for the address/port if the
selecting user isn't a superuser / the owner of that client connection.
It seems a little ugly to return NULL for the address in both cases,
since the same value is used for two completely different meanings. Not
sure of a better convention, though -- just idly complaining :)

You needn't respin the patch, as I can make the necessary changes myself
-- I'm just looking for comments on the above before applying.

-Neil

Re: Added columns to pg_stat_activity

From
"Magnus Hagander"
Date:
>> Updated patch attached.
>
>A few more comments:
>
>- why did you add the client address to PgStat_MsgHdr, rather than
>PgStat_MsgBestart? It would be nice to avoid sending the
>client address
>for each and every stats message, as it shouldn't change over the life
>of the session.

I guess that's from not reading things carefully enough. I looked for
where "backend pid" was transmitted, because I thought that would be a
good place to put it. Which might bring up the question, why are things
like the backend pid sent in msghdr and not in bestart? (seems bestart
is nothing more than the header, which is why I missed it completely)


>- I think the backend_client_addr and backend_client_port view columns
>should be named "client_addr" and "client_port", respectively.

Seems reasonable.


>- Is there a reason you need to expose and then call the private
>network_in() function, rather than using inet_in(), which is
>already public?

Not really, that part of the code was a copy from client_addr().


>- Is the backend startup time sensitive information? Considering this
>information is available via ps(1), perhaps it would be better
>to allow
>any user to see any backend's startup time, rather than providing a
>false sense of security.

Well, ps(1) will show you the client address as well, so the argument
can be taken there too. For ps(1) access, you need shell access to the
machine. Something at least I would *never* give to people who just "do
stuff in the database".
That said, I'm not sure that the startup time specifically is sensitive,
no. No real rason to hide that, but I'm not sure I buy the argument
"considering since it's available via ps(1)" ;-)


>- We return (NULL, -1) for the client address/port if connected via a
>unix domain socket, and (NULL, NULL) for the address/port if the
>selecting user isn't a superuser / the owner of that client
>connection.
>It seems a little ugly to return NULL for the address in both cases,
>since the same value is used for two completely different
>meanings. Not
>sure of a better convention, though -- just idly complaining :)

I first did it with NULL,NULL, before I realised I could not make just
that distinction. I guess in theory we could return 0.0.0.0 or
255.255.255.255 (similar for IPv6 of course, don't know offhand what
that would be), btu that seems at least as ugly.
Another way would be to have yet another column that would be "client
address type", but given the options I think the current way is probably
the least ugly.


>You needn't respin the patch, as I can make the necessary
>changes myself
>-- I'm just looking for comments on the above before applying.

Ok.

//Magnus

Re: Added columns to pg_stat_activity

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> - Is the backend startup time sensitive information? Considering this
> information is available via ps(1), perhaps it would be better to allow
> any user to see any backend's startup time, rather than providing a
> false sense of security.

Remote database users don't have access to ps(1).  I think that hiding
the startup time is reasonable.

> - We return (NULL, -1) for the client address/port if connected via a
> unix domain socket, and (NULL, NULL) for the address/port if the
> selecting user isn't a superuser / the owner of that client connection.
> It seems a little ugly to return NULL for the address in both cases,
> since the same value is used for two completely different meanings. Not
> sure of a better convention, though -- just idly complaining :)

Perhaps 0.0.0.0/0 or some such for the address in the unix domain case?
Not sure that this is an improvement over NULL though.

Other comments seem on-target to me.

            regards, tom lane

Re: Added columns to pg_stat_activity

From
Tom Lane
Date:
"Magnus Hagander" <mha@sollentuna.net> writes:
> I guess that's from not reading things carefully enough. I looked for
> where "backend pid" was transmitted, because I thought that would be a
> good place to put it. Which might bring up the question, why are things
> like the backend pid sent in msghdr and not in bestart?

I think the PID is used as the lookup key to associate messages from the
same backend.  You need some such key, though BackendId (sinval slot
number) is another workable candidate.

            regards, tom lane

Re: Added columns to pg_stat_activity

From
Neil Conway
Date:
Magnus Hagander wrote:
> I guess that's from not reading things carefully enough. I looked for
> where "backend pid" was transmitted, because I thought that would be a
> good place to put it.

On adapting the code to just send the client address in the BE start
message, I'm not actually sure this is the right way to go. The problem
is that stats messages are unordered -- so we might receive, say, an
activity message for a backend before we receive its BE startup message.
  That means pg_stat_get_backend_client_addr() might be invoked for a
backend the stats collector knows about, but doesn't know its client
address. We could invent a sentinel value for this case and return SQL
NULL, but it seems quite ugly (since the backend entry struct stores
SockAddr, not SockAddr *)-- simpler to just send the client address with
every message, I guess.

> Which might bring up the question, why are things like the backend
> pid sent in msghdr and not in bestart?

If not for the situation outlined above, we could move at least m_userid
out of the message header.

> That said, I'm not sure that the startup time specifically is sensitive,
> no. No real rason to hide that, but I'm not sure I buy the argument
> "considering since it's available via ps(1)" ;-)

Yeah, that's true, the ps(1) argument is invalid. May as well limit
access to it...

> I first did it with NULL,NULL, before I realised I could not make just
> that distinction. I guess in theory we could return 0.0.0.0 or
> 255.255.255.255 (similar for IPv6 of course, don't know offhand what
> that would be), btu that seems at least as ugly.
> Another way would be to have yet another column that would be "client
> address type", but given the options I think the current way is probably
> the least ugly.

Yeah, I agree -- I guess we should stick with the current convention.

-Neil

Re: Added columns to pg_stat_activity

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> On adapting the code to just send the client address in the BE start
> message, I'm not actually sure this is the right way to go. The problem
> is that stats messages are unordered -- so we might receive, say, an
> activity message for a backend before we receive its BE startup message.

I think this argument is a red herring ... or at least it leads in a
direction I find unacceptable.  We are already transmitting three more
fields than necessary in every MsgHdr --- PID, database OID, and user ID
--- and it will only get worse if we go down this path.  I would have to
ask what is the point of the BESTART message at all, if the design
requires retransmitting everything it could possibly carry in every
subsequent message.

It is true that the pgstat message mechanism is unreliable, and
therefore you will sometimes not find out things you would like to find
out.  But adding more overhead onto every message is not the way to
improve that situation; it will make it worse by increasing the
bandwidth demand.  I think we should just accept that we will sometimes
have to return NULL for these items because we missed a message.

(And if it wasn't clear, I'm in favor of stripping the msghdr down.
The header should be about 8 bytes not 24, let alone 40+ which is
what it would have to be to handle IPv6 client addresses the way
you suggest.)

            regards, tom lane

Re: Added columns to pg_stat_activity

From
Neil Conway
Date:
Tom Lane wrote:
> I think this argument is a red herring ... or at least it leads in a
> direction I find unacceptable.

I agree -- I was just pointing out the reason that, in the current
design, there is cause to do things as Magnus' original patch did.

> We are already transmitting three more fields than necessary in every
> MsgHdr --- PID, database OID, and user ID --- and it will only get
> worse if we go down this path.

How about changing the statistics collector so that we only include a
row in the statistics view when we receive the BESTART message? That
would mean the BESTART message could include backend-start metadata
(user ID, database ID, client address), and all other messages would
only need enough header data to identify the backend process they are
associated with (so perhaps backend id and process id).

Using the existing dead-backend hash table, we should be able to
distinguish between the cases of "seen a message for a new backend
before receiving its BESTART" and "seen a message for a dead backend
after we've seen its BETERM".

-Neil

Re: Added columns to pg_stat_activity

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> How about changing the statistics collector so that we only include a
> row in the statistics view when we receive the BESTART message?

I'd vote against that.  The mechanism is lossy by design and so we
must design on the assumption that we will sometimes lose the BESTART
message.  I don't think it's acceptable to refuse to display information
we do have (queries, access counts) just because we don't have every
element of a rather-arbitrarily-chosen view row.

> would mean the BESTART message could include backend-start metadata
> (user ID, database ID, client address), and all other messages would
> only need enough header data to identify the backend process they are
> associated with (so perhaps backend id and process id).

Looking at the code, backendid seems sufficient since that is the
lookup key.

            regards, tom lane

Re: Added columns to pg_stat_activity

From
Neil Conway
Date:
Tom Lane wrote:
> I'd vote against that.  The mechanism is lossy by design

Is it _lossy_, or merely unordered? While UDP doesn't guarantee message
delivery, I wonder what kind of extreme circumstances would need to
exist for you to lose UDP packets outright over the loopback interface.

 > I don't think it's acceptable to refuse to display information
> we do have (queries, access counts) just because we don't have every
> element of a rather-arbitrarily-chosen view row.

Is there really any point in returning such incomplete statistics data?
ISTM it would mostly serve to complicate the lives of people writing
automated tools to query statistics data, for example.

> Looking at the code, backendid seems sufficient since that is the
> lookup key.

Not if you want to distinguish between dead and live backends, it's not
(see pgstat_add_backend).

-Neil

Re: Added columns to pg_stat_activity

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> Tom Lane wrote:
>> I'd vote against that.  The mechanism is lossy by design

> Is it _lossy_, or merely unordered? While UDP doesn't guarantee message
> delivery, I wonder what kind of extreme circumstances would need to
> exist for you to lose UDP packets outright over the loopback interface.

If the stats collector gets sufficiently backed up, you will lose
messages.  The most obvious case is the "statistics buffer is full"
error in pgstat_recvbuffer() (and you can find two or three reports
of that message in just the last week or so of the mailing list
archives).  But the entire messaging mechanism is *specifically*
designed to drop messages rather than block backends if the stats buffer
process gets backed up to the point where it can't accept messages fast
enough.  For starters, we'd have used TCP not UDP if we didn't want it
to do that.

>> I don't think it's acceptable to refuse to display information
>> we do have (queries, access counts) just because we don't have every
>> element of a rather-arbitrarily-chosen view row.

> Is there really any point in returning such incomplete statistics data?

Absolutely.  Given the lossiness of the mechanism, the total stats data
is incomplete by definition, and so if we were insistent on returning
only 100% guaranteed data, we might as well shut down the collector and
return nothing.  The entire facility is designed on the assumption that
partial data is better than no data (and also better than taking the
performance hit that would ensue if we tried to guarantee full data).
Please do not try to make piecemeal changes in that basic design
decision.

>> Looking at the code, backendid seems sufficient since that is the
>> lookup key.

> Not if you want to distinguish between dead and live backends, it's not
> (see pgstat_add_backend).

Ah, good point.  But still we need not send more than backend PID plus
BackendId.  (Actually, we could skinny it down to backend PID only ...
but that would add a hashtable lookup to every message reception in
the stats collector, instead of just indexing into a BackendId array,
so it'd probably be a net loss.)

            regards, tom lane

Re: Added columns to pg_stat_activity

From
Neil Conway
Date:
Tom Lane wrote:
> If the stats collector gets sufficiently backed up, you will lose
> messages. [...] Please do not try to make piecemeal changes in that
> basic design decision.

Fair enough.

 > (Actually, we could skinny it down to backend PID only ...
> but that would add a hashtable lookup to every message reception in
> the stats collector, instead of just indexing into a BackendId array,
> so it'd probably be a net loss.)

Yeah, I agree -- backend ID and database ID isn't too bad.

I've applied Magnus patch and bumped the catalog version. I'll take a
look at reorganizing the statistics collector so that the message header
is smaller shortly.

-Neil

Re: Added columns to pg_stat_activity

From
Andrew Dunstan
Date:
Is this what broke the regression tests on HEAD?

cheers

andrew

Neil Conway wrote:

> Tom Lane wrote:
>
>> If the stats collector gets sufficiently backed up, you will lose
>> messages. [...] Please do not try to make piecemeal changes in that
>> basic design decision.
>
>
> Fair enough.
>
> > (Actually, we could skinny it down to backend PID only ...
>
>> but that would add a hashtable lookup to every message reception in
>> the stats collector, instead of just indexing into a BackendId array,
>> so it'd probably be a net loss.)
>
>
> Yeah, I agree -- backend ID and database ID isn't too bad.
>
> I've applied Magnus patch and bumped the catalog version. I'll take a
> look at reorganizing the statistics collector so that the message
> header is smaller shortly.
>
> -


Re: Added columns to pg_stat_activity

From
"Magnus Hagander"
Date:
Certainly seems that way. Missed a .diff for the rules output there...
Should be a simple replace, from what I can tell what's in the output on
the buildfarm boxes I checked is correct, the expected file shuold
changed.

Oops.

//Magnus


>
> Is this what broke the regression tests on HEAD?
>
> cheers
>
> andrew
>
> Neil Conway wrote:
>
> > Tom Lane wrote:
> >
> >> If the stats collector gets sufficiently backed up, you will lose
> >> messages. [...] Please do not try to make piecemeal
> changes in that
> >> basic design decision.
> >
> >
> > Fair enough.
> >
> > > (Actually, we could skinny it down to backend PID only ...
> >
> >> but that would add a hashtable lookup to every message
> reception in
> >> the stats collector, instead of just indexing into a
> BackendId array,
> >> so it'd probably be a net loss.)
> >
> >
> > Yeah, I agree -- backend ID and database ID isn't too bad.
> >
> > I've applied Magnus patch and bumped the catalog version.
> I'll take a
> > look at reorganizing the statistics collector so that the message
> > header is smaller shortly.
> >
> > -
>
>

Re: Added columns to pg_stat_activity

From
Neil Conway
Date:
Andrew Dunstan wrote:
> Is this what broke the regression tests on HEAD?

Yes, it is -- fix checked in to HEAD. My apologies for missing this.

-Neil