Re: empty pg_stat_replication when replication works fine? - Mailing list pgsql-general

From Andrej Vanek
Subject Re: empty pg_stat_replication when replication works fine?
Date
Msg-id CAFNFRyGFO8nERP6CDPScWWa0-QNrUTifpLVYY1dHywUWY5pHzw@mail.gmail.com
Whole thread Raw
In response to empty pg_stat_replication when replication works fine?  (Andrej Vanek <andrej.vanek.sk@gmail.com>)
List pgsql-general
I've found the reason: inconsistent catalog data. 
This state did not disappear after restart of all postgres instances.

pg_authid.oid does not match the one in pg_stat_get_activity(NULL::integer).

I wonder how this inconsistency could happen.. Maybe some error during cloning database (binary database copy, or some older WAL logs left over either in archive or pg_xlog, or some corrupted index in pg_catalog?) during several test-cycles..

Any other idea how can oid of users could be different from the one appearing in pg_stat_get_activity()? (see details below)

Andrej
--------------details:
postgres=# \d+ pg_stat_replication;
                      View "pg_catalog.pg_stat_replication"
      Column      |           Type           | Modifiers | Storage  | Description
------------------+--------------------------+-----------+----------+-------------
 pid              | integer                  |           | plain    |
 usesysid         | oid                      |           | plain    |
 usename          | name                     |           | plain    |
 application_name | text                     |           | extended |
 client_addr      | inet                     |           | main     |
 client_hostname  | text                     |           | extended |
 client_port      | integer                  |           | plain    |
 backend_start    | timestamp with time zone |           | plain    |
 state            | text                     |           | extended |
 sent_location    | text                     |           | extended |
 write_location   | text                     |           | extended |
 flush_location   | text                     |           | extended |
 replay_location  | text                     |           | extended |
 sync_priority    | integer                  |           | plain    |
 sync_state       | text                     |           | extended |
View definition:
 SELECT s.pid,
    s.usesysid,
    u.rolname AS usename,
    s.application_name,
    s.client_addr,
    s.client_hostname,
    s.client_port,
    s.backend_start,
    w.state,
    w.sent_location,
    w.write_location,
    w.flush_location,
    w.replay_location,
    w.sync_priority,
    w.sync_state
   FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, waiting, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port),
    pg_authid u,
    pg_stat_get_wal_senders() w(pid, state, sent_location, write_location, flush_location, replay_location, sync_priority, sync_state)
  WHERE s.usesysid = u.oid AND s.pid = w.pid;

postgres=# \dv+ pg_stat_replication;
                             List of relations
   Schema   |        Name         | Type |  Owner   |  Size   | Description
------------+---------------------+------+----------+---------+-------------
 pg_catalog | pg_stat_replication | view | postgres | 0 bytes |
(1 row)

postgres=# select * from pg_stat_get_wal_senders();
 pid |   state   | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state
-----+-----------+---------------+----------------+----------------+-----------------+---------------+------------
 707 | streaming | 0/B5000090    | 0/B5000090     | 0/B5000090     | 0/B5000090      |             0 | async
(1 row)

postgres=# select * from pg_stat_get_activity(NULL::integer);
 datid | pid | usesysid | application_name | state  |                       query                        | waiting |          xact_start           |          query_star
t          |         backend_start         |         state_change          |  client_addr   | client_hostname | client_port
-------+-----+----------+------------------+--------+----------------------------------------------------+---------+-------------------------------+--------------------
-----------+-------------------------------+-------------------------------+----------------+-----------------+-------------
     0 | 707 |    34456 | l2amain          | idle   |                                                    | f       |                               |
           | 2016-05-31 13:19:04.875468+02 | 2016-05-31 13:19:04.877894+02 | 192.168.101.11 |                 |       33329
 12896 | 709 |       10 | psql             | active | select * from pg_stat_get_activity(NULL::integer); | f       | 2016-05-31 13:22:23.090373+02 | 2016-05-31 13:22:23
.090373+02 | 2016-05-31 13:19:08.719215+02 | 2016-05-31 13:22:23.090382+02 |                |                 |          -1
(2 rows)

postgres=# select * from pg_authid where oid = 34456;
 rolname | rolsuper | rolinherit | rolcreaterole | rolcreatedb | rolcatupdate | rolcanlogin | rolreplication | rolconnlimit | rolpassword | rolvaliduntil
---------+----------+------------+---------------+-------------+--------------+-------------+----------------+--------------+-------------+---------------
(0 rows)


postgres=#  select oid, * from pg_authid;
  oid  |  rolname  | rolsuper | rolinherit | rolcreaterole | rolcreatedb | rolcatupdate | rolcanlogin | rolreplication | rolconnlimit | rolpassword | rolvaliduntil
-------+-----------+----------+------------+---------------+-------------+--------------+-------------+----------------+--------------+-------------+---------------
    10 | postgres  | t        | t          | t             | t           | t            | t           | t              |           -1 |             |
 29732 | xxusrrole | f        | t          | f             | f           | f            | f           | f              |           -1 |             |
 29733 | xxadmrole | f        | t          | f             | f           | f            | f           | f              |           -1 |             |
 29734 | xxschema  | f        | t          | f             | f           | f            | t           | f              |           -1 | xxxxxxxx    |
 29735 | xxadm     | f        | t          | f             | f           | f            | t           | f              |           -1 | xxxxx       |
 29737 | pgbackup  | f        | t          | f             | f           | f            | t           | f              |           -1 | xxxxxxxx    |
 29738 | pgmonitor | f        | t          | f             | f           | f            | t           | f              |           -1 | xxxxxxxxx   |
 29739 | pgreplic  | f        | t          | f             | f           | f            | t           | t              |           -1 | xxxxxxxx    |
 29736 | CTSYSTEM  | f        | t          | f             | f           | f            | t           | f              |           -1 | xxxxxx      |
(9 rows)





2016-05-25 23:22 GMT+02:00 Andrej Vanek <andrej.vanek.sk@gmail.com>:
Streaming replication set-up,

one master, 3 slaves connecting to it.
I expected ps -ef gets all wal-sender processes and SAME information I'll get via select * from pg_stat_replication. 
Instead I observed: 
- pg_stat_replication is empty
- 3 wal-sender processes up and running
- each slave has wal-receiver process running
- replication works (tried to create a table- it appears in all databases)
Question:
- why is pg_stat_replication empty?

Andrej
---------------details
[root@l2bmain ~]# tail /opt/pg_data/postgresql.conf
max_wal_senders = 5
hot_standby = on
wal_keep_segments = 128
archive_command = '/opt/postgres/dbconf/archive_command.sh %p %f'
wal_receiver_status_interval = 2
max_standby_streaming_delay = -1
max_standby_archive_delay = -1
restart_after_crash = off
hot_standby_feedback = on
wal_sender_timeout = 1min
[root@l2bmain ~]# ps f -fu postgres
UID        PID  PPID  C STIME TTY      STAT   TIME CMD
postgres 10797     1  0 15:53 ?        S      0:20 /usr/pgsql-9.3/bin/postgres -D /opt/pg_data -c config_file=/opt/pg_data//postgresql.conf
postgres 10820 10797  0 15:53 ?        Ss     0:00  \_ postgres: logger process
postgres 10823 10797  0 15:53 ?        Ss     0:00  \_ postgres: checkpointer process
postgres 10824 10797  0 15:53 ?        Ss     0:00  \_ postgres: writer process
postgres 10825 10797  0 15:53 ?        Ss     0:00  \_ postgres: wal writer process
postgres 10826 10797  0 15:53 ?        Ss     0:01  \_ postgres: autovacuum launcher process
postgres 10827 10797  0 15:53 ?        Ss     0:00  \_ postgres: archiver process   last was 0000000100000000000000A3.00000028.backup
postgres 10828 10797  0 15:53 ?        Ss     0:03  \_ postgres: stats collector process
postgres 11286 10797  0 15:54 ?        Ss     0:08  \_ postgres: wal sender process pgreplic 192.168.204.12(55231) streaming 0/A401BED8
postgres 11287 10797  0 15:54 ?        Ss     0:06  \_ postgres: wal sender process pgreplic 192.168.204.11(42937) streaming 0/A401BED8
postgres 19322 10797  0 15:58 ?        Ss     0:08  \_ postgres: wal sender process pgreplic 192.168.101.11(52379) streaming 0/A401BED8
postgres 28704 10797  0 18:44 ?        Ss     0:00  \_ postgres: CTSYSTEM lidb 192.168.102.13(58245) idle
postgres  7256 10797  0 18:52 ?        Ss     0:00  \_ postgres: CTSYSTEM lidb 192.168.102.23(55190) idle
postgres  8667 10797  0 18:53 ?        Ss     0:00  \_ postgres: CTSYSTEM lidb 192.168.102.13(58287) idle
[root@l2bmain ~]# psql -U postgres -c "select * from pg_stat_replication;"
 pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | state | sent_location | write_location | flush_location | r
eplay_location | sync_priority | sync_state
-----+----------+---------+------------------+-------------+-----------------+-------------+---------------+-------+---------------+----------------+----------------+--
---------------+---------------+------------
(0 rows)

[root@l2bmain ~]# tail /opt/pg_data/pg_log/postgresql-Wed.log
2016-05-25 15:53:56 CEST:@:[8603] LOG:  database system is shut down
2016-05-25 15:53:58 CEST:@:[10821] LOG:  database system was shut down in recovery at 2016-05-25 15:53:56 CEST
2016-05-25 15:53:58 CEST:@:[10821] LOG:  database system was not properly shut down; automatic recovery in progress
2016-05-25 15:53:58 CEST:@:[10821] LOG:  consistent recovery state reached at 0/A2000090
2016-05-25 15:53:58 CEST:@:[10821] LOG:  record with zero length at 0/A2000090
2016-05-25 15:53:58 CEST:@:[10821] LOG:  redo is not required
2016-05-25 15:53:58 CEST:@:[10821] LOG:  MultiXact member wraparound protections are now enabled
2016-05-25 15:53:58 CEST:@:[10797] LOG:  database system is ready to accept connections
2016-05-25 15:53:58 CEST:@:[10826] LOG:  autovacuum launcher started
`
[root@l2bmain ~]# ssh 192.168.101.11
Last login: Wed May 25 22:48:18 2016 from 192.168.101.12
[root@l2amain ~]# ps f -fu postgres
UID        PID  PPID  C STIME TTY      STAT   TIME CMD
postgres  5730     1  0 15:58 ?        S      0:04 /usr/pgsql-9.3/bin/postgres -D /opt/pg_data -c config_file=/opt/pg_data//postgresql.conf
postgres  5754  5730  0 15:58 ?        Ss     0:00  \_ postgres: logger process
postgres  5755  5730  0 15:58 ?        Ss     0:00  \_ postgres: startup process   recovering 0000000100000000000000A4
postgres  5773  5730  0 15:58 ?        Ss     0:12  \_ postgres: wal receiver process   streaming 0/A401C030
postgres  5774  5730  0 15:58 ?        Ss     0:00  \_ postgres: checkpointer process
postgres  5775  5730  0 15:58 ?        Ss     0:00  \_ postgres: writer process
postgres  5776  5730  0 15:58 ?        Ss     0:00  \_ postgres: stats collector process
[root@l2amain ~]# psql -U postgres -c "select pg_is_in_recovery();"
 pg_is_in_recovery
-------------------
 t
(1 row)
[root@l2bmain ~]# ssh 192.168.204.11
Warning: Permanently added '192.168.204.11' (RSA) to the list of known hosts.
Last login: Wed May 25 16:28:49 2016 from 192.168.200.254
[root@l2abrnch ~]# ps f -fu postgres
UID        PID  PPID  C STIME TTY      STAT   TIME CMD
postgres  8174     1  0 15:48 ?        S      0:04 /usr/pgsql-9.3/bin/postgres -D /opt/geo_stdby_data -c config_file=/opt/geo_stdby_data//postgresql.conf
postgres  8195  8174  0 15:48 ?        Ss     0:00  \_ postgres: logger process
postgres  8197  8174  0 15:48 ?        Ss     0:00  \_ postgres: startup process   recovering 0000000100000000000000A4
postgres  8206  8174  0 15:48 ?        Ss     0:00  \_ postgres: checkpointer process
postgres  8207  8174  0 15:48 ?        Ss     0:00  \_ postgres: writer process
postgres  8208  8174  0 15:48 ?        Ss     0:00  \_ postgres: stats collector process
postgres 11414  8174  0 15:53 ?        Ss     0:13  \_ postgres: wal receiver process   streaming 0/A401C0D0

[root@l2bmain ~]# ssh 192.168.204.11
Warning: Permanently added '192.168.204.11' (RSA) to the list of known hosts.
Last login: Wed May 25 16:28:49 2016 from 192.168.200.254
[root@l2abrnch ~]# ps f -fu postgres
UID        PID  PPID  C STIME TTY      STAT   TIME CMD
postgres  8174     1  0 15:48 ?        S      0:04 /usr/pgsql-9.3/bin/postgres -D /opt/geo_stdby_data -c config_file=/opt/geo_stdby_data//postgresql.conf
postgres  8195  8174  0 15:48 ?        Ss     0:00  \_ postgres: logger process
postgres  8197  8174  0 15:48 ?        Ss     0:00  \_ postgres: startup process   recovering 0000000100000000000000A4
postgres  8206  8174  0 15:48 ?        Ss     0:00  \_ postgres: checkpointer process
postgres  8207  8174  0 15:48 ?        Ss     0:00  \_ postgres: writer process
postgres  8208  8174  0 15:48 ?        Ss     0:00  \_ postgres: stats collector process
postgres 11414  8174  0 15:53 ?        Ss     0:13  \_ postgres: wal receiver process   streaming 0/A401C0D0
[root@l2abrnch ~]# psql -U postgres -c "select pg_is_in_recovery();"
 pg_is_in_recovery
-------------------
 t
(1 row)

[root@l2abrnch ~]# logout
Connection to 192.168.204.11 closed.
[root@l2bmain ~]# ssh 192.168.204.12
Warning: Permanently added '192.168.204.12' (RSA) to the list of known hosts.
Last login: Wed May 25 14:58:03 2016 from 192.168.200.254
[root@l2bbrnch ~]# ps f -fu postgres
UID        PID  PPID  C STIME TTY      STAT   TIME CMD
postgres 22885     1  0 15:48 ?        S      0:00 /usr/pgsql-9.3/bin/postgres -D /opt/geo_stdby_data -c config_file=/opt/geo_stdby_data//postgresql.conf
postgres 22913 22885  0 15:48 ?        Ss     0:00  \_ postgres: logger process
postgres 22914 22885  0 15:48 ?        Ss     0:00  \_ postgres: startup process   recovering 0000000100000000000000A4
postgres 22917 22885  0 15:48 ?        Ss     0:00  \_ postgres: checkpointer process
postgres 22918 22885  0 15:48 ?        Ss     0:00  \_ postgres: writer process
postgres 22919 22885  0 15:48 ?        Ss     0:00  \_ postgres: stats collector process
postgres 26163 22885  0 15:54 ?        Ss     0:13  \_ postgres: wal receiver process   streaming 0/A401C0D0


pgsql-general by date:

Previous
From: Stefan Keller
Date:
Subject: Re: Slides for PGCon2016; "FTS is dead ? Long live FTS !"
Next
From: Francisco Olarte
Date:
Subject: Re: Deleting a table file does not raise an error when the table is touched afterwards, why?