Thread: Lag clarification with Sync Replication
I have made a synchronous replication setup with PG12.
During data loading of 200 gig in master node, I found that there is a lag that it shows on the slave. But, ideally for sync replication, lag should be always 0 right.Because commit never happens in master before updating the slave.
log_delay
------------
871.549671
(1 row) log_delay
------------
871.557246
(1 row) log_delay
-----------
0
(1 row) log_delay
-----------
0
(1 row)
log_delay
------------
871.549671
(1 row) log_delay
------------
871.557246
(1 row) log_delay
-----------
0
(1 row) log_delay
-----------
0
(1 row)
Query Used for checking Standby Replication Lag:
psql -d perfbench -p 5432 -c "SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()
THEN 0
ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
END AS log_delay;"
THEN 0
ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
END AS log_delay;"
Setting Used in PG12
cat postgresql.conf |grep synchronous_
synchronous_commit = remote_write # synchronization level;
synchronous_standby_names = 'ANY 1 (standby1,standby2)' # standby servers that provide sync rep
cat postgresql.conf |grep primary_conn
primary_conninfo = 'user=edbrepl host= 10.12.11.101 port=5432 application_name=standby1 '
Thanks,
Raj Kumar
Your query is unrelated to whether or not the commit has completed in the master.
On Fri, May 22, 2020 at 12:36 AM Raj kumar <rajkumar820999@gmail.com> wrote:
I have made a synchronous replication setup with PG12.During data loading of 200 gig in master node, I found that there is a lag that it shows on the slave. But, ideally for sync replication, lag should be always 0 right.Because commit never happens in master before updating the slave.
log_delay
------------
871.549671
(1 row) log_delay
------------
871.557246
(1 row) log_delay
-----------
0
(1 row) log_delay
-----------
0
(1 row)Query Used for checking Standby Replication Lag:psql -d perfbench -p 5432 -c "SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()
THEN 0
ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
END AS log_delay;"Setting Used in PG12cat postgresql.conf |grep synchronous_
synchronous_commit = remote_write # synchronization level;
synchronous_standby_names = 'ANY 1 (standby1,standby2)' # standby servers that provide sync rep
cat postgresql.conf |grep primary_conn
primary_conninfo = 'user=edbrepl host= 10.12.11.101 port=5432 application_name=standby1 '
Thanks,Raj Kumar
For synchronous replication, and any replication really, I would recommend monitoring byte lag from the primary. This is much more accurate and, if the count of replicas goes down from an expected number, you'll know it's been disconnected as well. Monitoring replication from the replica can be done, but it has caveats and shouldn't be the only way it's monitored
Wrote a short post on this a while ago. See relavant updates further down - https://www.keithf4.com/monitoring_streaming_slave_lag/
On May 22, 2020, at 12:36 AM, Raj kumar <rajkumar820999@gmail.com> wrote:. But, ideally for sync replication, lag should be always 0 right.
Incorrect. Synchronous replication means that a commit will not return until it has been safely written to disk on the primary and the replica. That means the transaction is written to WAL file on both primary and replica. On the primary, the transaction is also visible to transactions with a later xmin. On the replica, the transaction has been recorded in the WAL; it still needs to get applied to the database for it to become visible to read transactions.
Also, changes could occur faster on the primary where the replica could be lagging behind — either by lagging in receiving the updates or writing the information to the WAL. What will happen is when the commit is iussed; the session will hang until replica catches up and writes the all the data need safely to the WAL.
I would look at pg_stat_replication view on the primary/upstream provider. Here’s a query that make it easier.
Sent_lag — data that still needs to be sent to the replica
Flush_lag —data that still has not been fsync’d to WAL files yet
Replay_lag — data that still need to be applied to the database so that it can be queried
select pg_stat_replication.client_addr
, pg_stat_replication.application_name
, pg_stat_replication.sync_priority
, pg_stat_replication.sync_state
, pg_stat_replication.state
, case pg_is_in_recovery()
when true then pg_size_pretty(pg_wal_lsn_diff(pg_last_wal_receive_lsn(), sent_lsn))
else pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn))
end as sent_lag
, case pg_is_in_recovery()
when true then pg_size_pretty(pg_wal_lsn_diff(pg_last_wal_receive_lsn(), flush_lsn))
else pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), flush_lsn))
end as flush_lag
, case pg_is_in_recovery()
when true then pg_size_pretty(pg_wal_lsn_diff(pg_last_wal_receive_lsn(), replay_lsn))
else pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn))
end as replay_lag
from pg_stat_replication
;
Thanks Rui, Keith and Scott for nice explanation and blog :)
Thanks,
Raj
On Fri, May 22, 2020 at 10:18 PM Rui DeSousa <rui@crazybean.net> wrote:
On May 22, 2020, at 12:36 AM, Raj kumar <rajkumar820999@gmail.com> wrote:. But, ideally for sync replication, lag should be always 0 right.Incorrect. Synchronous replication means that a commit will not return until it has been safely written to disk on the primary and the replica. That means the transaction is written to WAL file on both primary and replica. On the primary, the transaction is also visible to transactions with a later xmin. On the replica, the transaction has been recorded in the WAL; it still needs to get applied to the database for it to become visible to read transactions.Also, changes could occur faster on the primary where the replica could be lagging behind — either by lagging in receiving the updates or writing the information to the WAL. What will happen is when the commit is iussed; the session will hang until replica catches up and writes the all the data need safely to the WAL.I would look at pg_stat_replication view on the primary/upstream provider. Here’s a query that make it easier.Sent_lag — data that still needs to be sent to the replicaFlush_lag —data that still has not been fsync’d to WAL files yetReplay_lag — data that still need to be applied to the database so that it can be queriedselect pg_stat_replication.client_addr, pg_stat_replication.application_name, pg_stat_replication.sync_priority, pg_stat_replication.sync_state, pg_stat_replication.state, case pg_is_in_recovery()when true then pg_size_pretty(pg_wal_lsn_diff(pg_last_wal_receive_lsn(), sent_lsn))else pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn))end as sent_lag, case pg_is_in_recovery()when true then pg_size_pretty(pg_wal_lsn_diff(pg_last_wal_receive_lsn(), flush_lsn))else pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), flush_lsn))end as flush_lag, case pg_is_in_recovery()when true then pg_size_pretty(pg_wal_lsn_diff(pg_last_wal_receive_lsn(), replay_lsn))else pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn))end as replay_lagfrom pg_stat_replication;
On Fri, 2020-05-22 at 12:48 -0400, Rui DeSousa wrote: > > On May 22, 2020, at 12:36 AM, Raj kumar <rajkumar820999@gmail.com> wrote: > > > > . But, ideally for sync replication, lag should be always 0 right. > > Incorrect. Synchronous replication means that a commit will not return until it has been safely written to disk on theprimary and the replica. That means the transaction is written to WAL file on > both primary and replica. On the primary, the transaction is also visible to transactions with a later xmin. On the replica,the transaction has been recorded in the WAL; it still needs to get > applied to the database for it to become visible to read transactions. If you set "synchronous_commit = remote_apply", the commit will only return when the change has been replayed on the synchronous standby server. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com