I am facing replication lag in postgres16 at times,not able to find the reason.
Please find the configuration: 1) Two replication slots for two servers (1 is the same data center, another remote)
2) weekly once/twice facing lag(around 2GB for 1k TPS environment) in the remotereplication slot and another same data center server slot with 0 lag.
My observation:
1) Noticed pg_stat_replication_slot total_txn reduced from 1k to 5 or 6 but other slot is same with 1k TPS 2)And noticed lag for remote slot but fine with same data center 3) Most importantly I have plenty of Bandwidth available in n/w ,2GB is still freely available out of 4GB network. 4)No IO issues on servers I am not able to prove if this is due to Network .Can you help me how to proceed on this? At Least how the logical decoding total_txn is counting the view pg_stat_replication_slot?