Thread: Re: [COMMITTERS] pgsql: Use LEFT JOINs in some system views in case referenced row doesn

On Sat, Aug 20, 2016 at 6:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Use LEFT JOINs in some system views in case referenced row doesn't exist.
>
> In particular, left join to pg_authid so that rows in pg_stat_activity
> don't disappear if the session's owning user has been dropped.
> Also convert a few joins to pg_database to left joins, in the same spirit,
> though that case might be harder to hit.  We were doing this in other
> views already, so it was a bit inconsistent that these views didn't.

This change causes pg_stat_activity to report the "bogus" information about
walsenders as follows.

$ ps aux | grep 75076
postgres        75076   0.0  0.0  2608252   1364   ??  Ss    2:41AM
0:00.07 postgres: wal sender process postgres [local] streaming
0/3000140

=# SELECT * FROM pg_stat_activity;
-[ RECORD 1 ]----+--------------------------------
datid            | 0
datname          | (null)
pid              | 75076
usesysid         | 10
usename          | postgres
application_name | sby1
client_addr      | (null)
client_hostname  | (null)
client_port      | -1
backend_start    | 2016-09-10 02:41:24.568313+09
xact_start       | (null)
query_start      | (null)
state_change     | 2016-09-10 02:41:24.570135+09
wait_event_type  | (null)
wait_event       | (null)
state            | idle
backend_xid      | (null)
backend_xmin     | (null)
query            |

I think that this information is confusing and should not exist in
pg_stat_activity.

Regards,

-- 
Fujii Masao



Fujii Masao <masao.fujii@gmail.com> writes:
> On Sat, Aug 20, 2016 at 6:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Use LEFT JOINs in some system views in case referenced row doesn't exist.

> This change causes pg_stat_activity to report the "bogus" information about
> walsenders as follows.

Hmm ... but if we want to exclude walsenders from that view, surely we
should do so explicitly, not have it happen as a hidden side-effect
of not being able to join them to pg_database.
        regards, tom lane



I wrote:
> Fujii Masao <masao.fujii@gmail.com> writes:
>> On Sat, Aug 20, 2016 at 6:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Use LEFT JOINs in some system views in case referenced row doesn't exist.

>> This change causes pg_stat_activity to report the "bogus" information about
>> walsenders as follows.

> Hmm ... but if we want to exclude walsenders from that view, surely we
> should do so explicitly, not have it happen as a hidden side-effect
> of not being able to join them to pg_database.

I looked into this a little.  There are at least three things we could
do here:

1. Decide that showing walsenders is a good thing.  I'm not really
sure why it isn't -- for example, we could take the trouble to display
the current replication command as the walsender's activity.

2. Suppress walsenders by the expedient of ignoring PGPROCs that have
zero database ID.  This would also ignore other process types that aren't
connected to a specific database.  This is pretty backwards-compatible
because it's equivalent to what used to happen implicitly via the inner
join in the view.

3. Suppress walsenders by adding some new field to PgBackendStatus that
identifies them, and having pg_stat_get_activity() use that to filter.


It looks to me like even before the change Fujii-san complains of, it was
only generic walsenders that were hidden in the view.  Database-specific
walsenders (am_db_walsender) would have a valid database OID in
PgBackendStatus and therefore would pass the join.  This makes me think
that option #1 is really the sanest alternative.  While option #2 is
the most backwards-compatible, it's pretty hard to see why we would think
that showing only database-specific walsenders is a desirable behavior.
If we want to exclude walsenders altogether, we need option #3.

Thoughts?
        regards, tom lane



I wrote:
> I looked into this a little.  There are at least three things we could
> do here:
> 1. Decide that showing walsenders is a good thing.  I'm not really
> sure why it isn't -- for example, we could take the trouble to display
> the current replication command as the walsender's activity.
> 2. Suppress walsenders by the expedient of ignoring PGPROCs that have
> zero database ID.  This would also ignore other process types that aren't
> connected to a specific database.  This is pretty backwards-compatible
> because it's equivalent to what used to happen implicitly via the inner
> join in the view.
> 3. Suppress walsenders by adding some new field to PgBackendStatus that
> identifies them, and having pg_stat_get_activity() use that to filter.

BTW, I had been thinking we could implement #2 or #3 relatively painlessly
by adding filter logic inside pg_stat_get_activity().  However, I now
notice that that function is also used by the pg_stat_replication view,
which joins its output to pg_stat_get_wal_senders().  That means we can
*not* simply suppress walsenders in the function's output; if we want to
filter them away in pg_stat_activity, we have to change the view
definition somehow, and probably the function's API as well.  And that
means we can't change the behavior without an initdb, which is kind of
annoying to have to do after rc1.

The fact that the pg_stat_replication view does show walsender processes
seems like possibly a reasonable argument for not showing them in
pg_stat_activity.  But we'd have to do some rejiggering of the view
definition to make that happen.

Don't have a strong opinion which way it ought to work --- I could go with
either the "walsenders are a perfectly good activity" position or the
"they should appear only in pg_stat_replication" one.  But the latter will
take an initdb to do, and I'm inclined to the position that it's too late
for that in 9.6.  As I noted, it was already inconsistent for per-db
walsenders long before 9.6, without field complaints, so I'm having a hard
time seeing this as a critical bug.
        regards, tom lane



On Sun, Sep 11, 2016 at 5:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The fact that the pg_stat_replication view does show walsender processes
> seems like possibly a reasonable argument for not showing them in
> pg_stat_activity.  But we'd have to do some rejiggering of the view
> definition to make that happen.

We may actually had better show WAL sender processes in
pg_stat_activity. An argument in favor of that is the tracking of
WaitEventSet events (or latches if you want).
-- 
Michael



Michael Paquier <michael.paquier@gmail.com> writes:
> On Sun, Sep 11, 2016 at 5:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The fact that the pg_stat_replication view does show walsender processes
>> seems like possibly a reasonable argument for not showing them in
>> pg_stat_activity.  But we'd have to do some rejiggering of the view
>> definition to make that happen.

> We may actually had better show WAL sender processes in
> pg_stat_activity. An argument in favor of that is the tracking of
> WaitEventSet events (or latches if you want).

Also, walsenders count against MaxBackends don't they?  So not showing
them could contribute to confusion about why an installation is hitting
the connection limit.

If we do keep them in the view, I would definitely vote for having them
set their "query" fields to something that shows they're walsenders.
It's awfully late to be doing anything complicated there for 9.6,
but we could just set the query to "walsender" and plan to improve
on that in future releases.
        regards, tom lane



* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
> > On Sun, Sep 11, 2016 at 5:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> The fact that the pg_stat_replication view does show walsender processes
> >> seems like possibly a reasonable argument for not showing them in
> >> pg_stat_activity.  But we'd have to do some rejiggering of the view
> >> definition to make that happen.
>
> > We may actually had better show WAL sender processes in
> > pg_stat_activity. An argument in favor of that is the tracking of
> > WaitEventSet events (or latches if you want).
>
> Also, walsenders count against MaxBackends don't they?  So not showing
> them could contribute to confusion about why an installation is hitting
> the connection limit.
>
> If we do keep them in the view, I would definitely vote for having them
> set their "query" fields to something that shows they're walsenders.
> It's awfully late to be doing anything complicated there for 9.6,
> but we could just set the query to "walsender" and plan to improve
> on that in future releases.

+1

Thanks!

Stephen

On Mon, Sep 12, 2016 at 10:16 AM, Stephen Frost <sfrost@snowman.net> wrote:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> Michael Paquier <michael.paquier@gmail.com> writes:
>> > On Sun, Sep 11, 2016 at 5:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> >> The fact that the pg_stat_replication view does show walsender processes
>> >> seems like possibly a reasonable argument for not showing them in
>> >> pg_stat_activity.  But we'd have to do some rejiggering of the view
>> >> definition to make that happen.
>>
>> > We may actually had better show WAL sender processes in
>> > pg_stat_activity. An argument in favor of that is the tracking of
>> > WaitEventSet events (or latches if you want).
>>
>> Also, walsenders count against MaxBackends don't they?  So not showing
>> them could contribute to confusion about why an installation is hitting
>> the connection limit.

Yes they are counted in MaxBackends. So that would help as well in
looking at connection limit issues.

>> If we do keep them in the view, I would definitely vote for having them
>> set their "query" fields to something that shows they're walsenders.
>> It's awfully late to be doing anything complicated there for 9.6,
>> but we could just set the query to "walsender" and plan to improve
>> on that in future releases.
>
> +1

Indeed, and the query field does not have much more meaning for a WAL
sender. So I'd propose the attached for 9.6 and HEAD. I have thought
about reporting that to pgstat in StartReplication(), but as there is
some error handling there I'd think that WalSndLoop() is a better
place to call pgstat_report_activity, as per the attached.

We could have far more fancy verbose information to offer to the user,
but that's another story..
--
Michael

Attachment
Michael Paquier <michael.paquier@gmail.com> writes:
> Indeed, and the query field does not have much more meaning for a WAL
> sender. So I'd propose the attached for 9.6 and HEAD. I have thought
> about reporting that to pgstat in StartReplication(), but as there is
> some error handling there I'd think that WalSndLoop() is a better
> place to call pgstat_report_activity, as per the attached.

As long as it's a fixed string there's no reason to set it repeatedly,
so this placement looks fine for now.  We can reconsider when/if we
make it variable and decide what is going to drive it.

On reflection, maybe s/walsender/WAL sender/?  It doesn't look like
we really use the word "walsender" in user-facing docs.
        regards, tom lane



On Mon, Sep 12, 2016 at 11:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> Indeed, and the query field does not have much more meaning for a WAL
>> sender. So I'd propose the attached for 9.6 and HEAD. I have thought
>> about reporting that to pgstat in StartReplication(), but as there is
>> some error handling there I'd think that WalSndLoop() is a better
>> place to call pgstat_report_activity, as per the attached.
>
> As long as it's a fixed string there's no reason to set it repeatedly,
> so this placement looks fine for now.  We can reconsider when/if we
> make it variable and decide what is going to drive it.
>
> On reflection, maybe s/walsender/WAL sender/?  It doesn't look like
> we really use the word "walsender" in user-facing docs.

Indeed, that may be better for clarity.

Except from the release notes, walsender is mentioned a couple of
times in the protocol docs, as *walsender mode*:
src/sgml/high-availability.sgml:    a corresponding walsender process
in the primary.
src/sgml/protocol.sgml:    Copy-both mode is initiated when a backend
in walsender mode
src/sgml/protocol.sgml:of <literal>true</> tells the backend to go
into walsender mode, wherein a
src/sgml/protocol.sgml:the simple query protocol can be used in walsender mode.
src/sgml/protocol.sgml:Passing <literal>database</> as the value
instructs walsender to connect to
src/sgml/protocol.sgml:The commands accepted in walsender mode are:
So that looked adapted to me at first sight. Actually, the text had
better be "WAL sender" and not "walsender" in high-availability, no?
That refers to the process, and not the backend mode.
-- 
Michael



On 12 September 2016 at 03:38, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> On reflection, maybe s/walsender/WAL sender/?  It doesn't look like
> we really use the word "walsender" in user-facing docs.

There are already
* 3 user messages referring to walsender and 2 referring to walreceiver
* multiple references in the docs and release notes referring to walsender

so usage has already been set and I vote to just leave things that way.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On 12 September 2016 at 03:27, Michael Paquier
<michael.paquier@gmail.com> wrote:

> So I'd propose the attached for 9.6 and HEAD.

The $OP commit was against HEAD only, not against 9.6

Why would we need to backpatch this commit?

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On Mon, Sep 12, 2016 at 4:19 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 12 September 2016 at 03:27, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>
>> So I'd propose the attached for 9.6 and HEAD.
>
> The $OP commit was against HEAD only, not against 9.6
>
> Why would we need to backpatch this commit?

You are right there is no need to get that into 9.6. Sorry for the mistake.
-- 
Michael



On 12 September 2016 at 08:28, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Mon, Sep 12, 2016 at 4:19 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On 12 September 2016 at 03:27, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>
>>> So I'd propose the attached for 9.6 and HEAD.
>>
>> The $OP commit was against HEAD only, not against 9.6
>>
>> Why would we need to backpatch this commit?
>
> You are right there is no need to get that into 9.6. Sorry for the mistake.

Committed to HEAD only.


[No longer a 9.6 issue.]

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On Mon, Sep 12, 2016 at 4:59 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 12 September 2016 at 08:28, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Mon, Sep 12, 2016 at 4:19 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> On 12 September 2016 at 03:27, Michael Paquier
>>> <michael.paquier@gmail.com> wrote:
>>>
>>>> So I'd propose the attached for 9.6 and HEAD.
>>>
>>> The $OP commit was against HEAD only, not against 9.6
>>>
>>> Why would we need to backpatch this commit?
>>
>> You are right there is no need to get that into 9.6. Sorry for the mistake.
>
> Committed to HEAD only.

Thanks. Using walsender here is fine for me.
-- 
Michael



Michael Paquier <michael.paquier@gmail.com> writes:
> On Mon, Sep 12, 2016 at 4:19 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Why would we need to backpatch this commit?

> You are right there is no need to get that into 9.6. Sorry for the mistake.

Oh, that's my fault, I'd incorrectly remembered this commit as having been
further back than it was.  HEAD-only is correct so far as fixing
Fujii-san's original complaint is concerned.  However, don't we have the
problem that am_db_walsender processes will show up in pg_stat_activity
anyway?  Do we want to do something about those further back?
        regards, tom lane



On Mon, Sep 12, 2016 at 9:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> On Mon, Sep 12, 2016 at 4:19 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> Why would we need to backpatch this commit?
>
>> You are right there is no need to get that into 9.6. Sorry for the mistake.
>
> Oh, that's my fault, I'd incorrectly remembered this commit as having been
> further back than it was.  HEAD-only is correct so far as fixing
> Fujii-san's original complaint is concerned.  However, don't we have the
> problem that am_db_walsender processes will show up in pg_stat_activity
> anyway?

Yes, those show up..

> Do we want to do something about those further back?

Hiding them is not something that we should do, and changing the query
field to show something that we think is helpful may impact existing
applications that rely on the fact that this field is NULL. So I'd
vote for doing nothing.
-- 
Michael