WAL replay stuck in hot standby when performing the backup - Mailing list pgsql-admin

From Andrea Moretto
Subject WAL replay stuck in hot standby when performing the backup
Date
Msg-id 92CEF44C-A3DD-433D-8CD7-B96EA01A2AA5@gmail.com
Whole thread Raw
List pgsql-admin
Hi there,

  I have noticed that WAL replay blocks in a hot standby slave when performing the backup.
I am using on both PostgreSQL 9.4.23 on Redhat.

During the whole backup procedure (which takes more than 24 hours), the data onto the hot standby node is not updated.

Is this behaviour correct?

My idea is to confine batch workloads, backups and less critical OLTP queries on the hot standby.
If during the backup live data is not updated in the hot standby node, I have to re-plan the environment.

Details:

- Replication statistics show that replay location is lagging far behind the flush location (on master side):

  select * from pg_stat_replication;

    pid  | usesysid | usename  | application_name | client_addr | client_hostname | client_port |         backend_start
       | backend_xmin |   state   | sent_location | write_location | flush_location | replay_location | sync_priority |
sync_state 

-------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+--------------+-----------+---------------+----------------+----------------+-----------------+---------------+------------
   15477 |       10 | postgres | walreceiver      | 127.0.0.1   |                 |       50612 | 2019-07-15
17:34:36.759306+00|   1152223785 | streaming | 274D/804CBCB8 | 274D/804CBCB8  | 274D/804CBCB8  | 2744/30F84028   |
      0 | async 
  (1 row)

- Replay lag is about 19 hours and running (on hot standby side):

  select now()-pg_last_xact_replay_timestamp() as replication_lag;

   replication_lag
  -----------------
   19:12:11.2289
  (1 row)

  SELECT
  pg_last_xlog_receive_location() receive,
  pg_last_xlog_replay_location() replay,
  (
   extract(epoch FROM now()) -
   extract(epoch FROM pg_last_xact_replay_timestamp())
  )::int lag;

      receive    |    replay     |  lag
  ---------------+---------------+-------
   274E/BAF0B610 | 2744/30F84028 | 69157
  (1 row)

  The only query currently running (not in idle state) on the slave is a COPY:

  select * from pg_stat_activity;

   datid |   datname   |  pid  | usesysid |    usename     |    application_name    | client_addr  | client_hostname |
client_port|         backend_start         |          xact_start           |          query_start          |
state_change         | waiting | state  | backend_xid | backend_xmin |
                                                           query
                                                     

-------+-------------+-------+----------+----------------+------------------------+--------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+---------+--------+-------------+--------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   16993 | mydb        | 15604 |  3226933 | rousr          | PostgreSQL JDBC Driver | 10.192.87.91 |                 |
    41491 | 2019-07-15 17:35:57.01458+00  |                               | 2019-07-16 12:55:58.21023+00  | 2019-07-16
12:55:58.210251+00| f       | idle   |             |              | COMMIT 
   16993 | mydb        | 18868 |  3226603 | amoretto       | psql93                 | 10.174.62.66 |                 |
    54106 | 2019-07-16 12:59:30.122865+00 | 2019-07-16 13:00:18.807889+00 | 2019-07-16 13:00:18.807889+00 | 2019-07-16
13:00:18.807893+00| f       | active |             |   1152301732 | select * from pg_stat_activity; 
   16993 | mydb        |  2559 |  3226603 | amoretto       | psql93                 | 10.174.62.66 |                 |
    52338 | 2019-07-16 10:57:03.118782+00 |                               | 2019-07-16 10:58:27.278095+00 | 2019-07-16
10:58:27.279752+00| f       | idle   |             |              | select * from pg_stat_replication; 
   16993 | mydb        | 15736 |  3226603 | amoretto       | pg_dump                | 10.192.87.90 |                 |
     4281 | 2019-07-15 17:36:42.540157+00 | 2019-07-15 17:36:42.566704+00 | 2019-07-16 12:51:04.488305+00 | 2019-07-16
12:51:04.488307+00| f       | active |             |   1152223785 | COPY public.c_box_hw_dev_app_hist (box_id,
hardware_id,device_id, application_id, sequence, history_time, history_info, usr_id, usr_info, archived,
is_history_info_zip,description, parent_application_name, iid) TO stdout; 
(4 rows)

  select * from pg_stat_database_conflicts;
   datid |   datname   | confl_tablespace | confl_lock | confl_snapshot | confl_bufferpin | confl_deadlock
  -------+-------------+------------------+------------+----------------+-----------------+----------------
       1 | template1   |                0 |          0 |              0 |               0 |              0
   12998 | template0   |                0 |          0 |              0 |               0 |              0
   13003 | postgres    |                0 |          0 |              0 |               0 |              0
   16993 | mydb        |                0 |          0 |              0 |               0 |              0
  (4 rows)

  select * from pg_stat_database;
   datid |   datname   | numbackends | xact_commit | xact_rollback | blks_read | blks_hit | tup_returned | tup_fetched
|tup_inserted | tup_updated | tup_deleted | conflicts | temp_files | temp_bytes | deadlocks | blk_read_time |
blk_write_time|          stats_reset           

-------+-------------+-------------+-------------+---------------+-----------+----------+--------------+-------------+--------------+-------------+-------------+-----------+------------+------------+-----------+---------------+----------------+-------------------------------
       1 | template1   |           0 |           0 |             0 |         0 |        0 |            0 |           0
|           0 |           0 |           0 |         0 |          0 |          0 |         0 |             0 |
  0 |  
   12998 | template0   |           0 |           0 |             0 |         0 |        0 |            0 |           0
|           0 |           0 |           0 |         0 |          0 |          0 |         0 |             0 |
  0 |  
   13003 | postgres    |           0 |           0 |             0 |         0 |        0 |            0 |           0
|           0 |           0 |           0 |         0 |          0 |          0 |         0 |             0 |
  0 |  
   16993 | mydb        |           3 |        2710 |             2 |      2499 |   333071 |       213588 |       72658
|           0 |           0 |           0 |         0 |          0 |          0 |         0 |             0 |
  0 | 2019-07-15 17:31:51.600354+00 
  (4 rows)

  select * from pg_stat_bgwriter;
   checkpoints_timed | checkpoints_req | checkpoint_write_time | checkpoint_sync_time | buffers_checkpoint |
buffers_clean| maxwritten_clean | buffers_backend | buffers_backend_fsync | buffers_alloc |          stats_reset
 

-------------------+-----------------+-----------------------+----------------------+--------------------+---------------+------------------+-----------------+-----------------------+---------------+-------------------------------
                4636 |               1 |               1107064 |                    9 |             378395 |
0 |                0 |          460498 |                     0 |       1432887 | 2019-07-15 17:31:22.319099+00 
  (1 row)

I hope that someone can give me some hint on this topic, it would be appreciated!

Thanks in advance,

  Andrea

Andrea Moretto
moretto.andrea@gmail.com
-------------------------------------------------------
CONFIDENTIALITY NOTICE
This message and its attachments are addressed solely to the persons
above and may contain confidential information. If you have received
the message in error, be informed that any use of the content hereof
is prohibited. Please return it immediately to the sender and delete
the message.
-------------------------------------------------------




pgsql-admin by date:

Previous
From: Venkataramana Aitla
Date:
Subject: Re: Unable to connect DB...?
Next
From: Ron
Date:
Subject: Re: Unable to connect DB...?