Thread: sql query for postgres replication check
We would like to check the Postgres SYNC streaming replication status with Nagios using the same query on all servers (master + standby) and versions (9.6, 10, 12) for simplicity.
I came up with the following query which should return any apply lag in seconds.
select coalesce(replay_delay, 0) replication_delay_in_sec
from (
select datname,
(
select case
when received_lsn = latest_end_lsn then 0
else extract(epoch
from now() - latest_end_time)
end
from pg_stat_wal_receiver
) replay_delay
from pg_database
where datname = current_database()
) xview;
I would expect delays >0 in case SYNC or ASYNC replication is somehow behind. We will do a warning at 120 secs and critical at 300 secs.
Would this do the job or am I missing something here?
Thanks, Markus
On Fri, Nov 22, 2019 at 01:20:59PM +0000, Zwettler Markus (OIZ) wrote: > I came up with the following query which should return any apply lag in seconds. > > select coalesce(replay_delay, 0) replication_delay_in_sec > from ( > select datname, > ( > select case > when received_lsn = latest_end_lsn then 0 > else extract(epoch > from now() - latest_end_time) > end > from pg_stat_wal_receiver > ) replay_delay > from pg_database > where datname = current_database() > ) xview; > > > I would expect delays >0 in case SYNC or ASYNC replication is > somehow behind. We will do a warning at 120 secs and critical at 300 > secs. pg_stat_wal_receiver is available only on the receiver, aka the standby so it would not really be helpful on a primary. On top of that streaming replication is system-wide, so there is no actual point to look at databases either. > Would this do the job or am I missing something here? Here is a suggestion for Nagios: hot_standby_delay, as told in https://github.com/bucardo/check_postgres/blob/master/check_postgres.pl -- Michael
Attachment
> On Fri, Nov 22, 2019 at 01:20:59PM +0000, Zwettler Markus (OIZ) wrote: > > I came up with the following query which should return any apply lag in seconds. > > > > select coalesce(replay_delay, 0) replication_delay_in_sec from ( > > select datname, > > ( > > select case > > when received_lsn = latest_end_lsn then 0 > > else extract(epoch > > from now() - latest_end_time) > > end > > from pg_stat_wal_receiver > > ) replay_delay > > from pg_database > > where datname = current_database() > > ) xview; > > > > > > I would expect delays >0 in case SYNC or ASYNC replication is somehow > > behind. We will do a warning at 120 secs and critical at 300 secs. > > pg_stat_wal_receiver is available only on the receiver, aka the standby so it would > not really be helpful on a primary. On top of that streaming replication is system- > wide, so there is no actual point to look at databases either. > > > Would this do the job or am I missing something here? > > Here is a suggestion for Nagios: hot_standby_delay, as told in > https://github.com/bucardo/check_postgres/blob/master/check_postgres.pl > -- > Michael I don't want to use check_hot_standby_delay as I would have to configure every streaming replication configuration separatelywith nagios. I want a generic routine which I can load on any postgres server regardless of streaming replication or database role. The query would return >0 if streaming replication falls behind and 0 in all other cases (replication or not). Checking streaming replication per database doesn't make any sense to me. Markus