Thread: sql query for postgres replication check

sql query for postgres replication check

From
"Zwettler Markus (OIZ)"
Date:

We would like to check the Postgres SYNC streaming replication status with Nagios using the same query on all servers (master + standby) and versions (9.6, 10, 12) for simplicity.

 

 

I came up with the following query which should return any apply lag in seconds.

 

 

select coalesce(replay_delay, 0) replication_delay_in_sec

from (

       select datname,

              (

                select case

                         when received_lsn = latest_end_lsn then 0

                         else extract(epoch

                from now() - latest_end_time)

                       end

                from pg_stat_wal_receiver

              ) replay_delay

       from pg_database

       where datname = current_database()

     ) xview;

 

 

I would expect delays >0 in case SYNC or ASYNC replication is somehow behind. We will do a warning at 120 secs and critical at 300 secs.

 

 

Would this do the job or am I missing something here?

 

 

Thanks, Markus

 

 

 

 

 

 

 

 

Re: sql query for postgres replication check

From
Michael Paquier
Date:
On Fri, Nov 22, 2019 at 01:20:59PM +0000, Zwettler Markus (OIZ) wrote:
> I came up with the following query which should return any apply lag in seconds.
>
> select coalesce(replay_delay, 0) replication_delay_in_sec
> from (
>        select datname,
>               (
>                 select case
>                          when received_lsn = latest_end_lsn then 0
>                          else extract(epoch
>                 from now() - latest_end_time)
>                        end
>                 from pg_stat_wal_receiver
>               ) replay_delay
>        from pg_database
>        where datname = current_database()
>      ) xview;
>
>
> I would expect delays >0 in case SYNC or ASYNC replication is
> somehow behind. We will do a warning at 120 secs and critical at 300
> secs.

pg_stat_wal_receiver is available only on the receiver, aka the
standby so it would not really be helpful on a primary.  On top of
that streaming replication is system-wide, so there is no actual point
to look at databases either.

> Would this do the job or am I missing something here?

Here is a suggestion for Nagios: hot_standby_delay, as told in
https://github.com/bucardo/check_postgres/blob/master/check_postgres.pl
--
Michael

Attachment

AW: sql query for postgres replication check

From
"Zwettler Markus (OIZ)"
Date:
> On Fri, Nov 22, 2019 at 01:20:59PM +0000, Zwettler Markus (OIZ) wrote:
> > I came up with the following query which should return any apply lag in seconds.
> >
> > select coalesce(replay_delay, 0) replication_delay_in_sec from (
> >        select datname,
> >               (
> >                 select case
> >                          when received_lsn = latest_end_lsn then 0
> >                          else extract(epoch
> >                 from now() - latest_end_time)
> >                        end
> >                 from pg_stat_wal_receiver
> >               ) replay_delay
> >        from pg_database
> >        where datname = current_database()
> >      ) xview;
> >
> >
> > I would expect delays >0 in case SYNC or ASYNC replication is somehow
> > behind. We will do a warning at 120 secs and critical at 300 secs.
>
> pg_stat_wal_receiver is available only on the receiver, aka the standby so it would
> not really be helpful on a primary.  On top of that streaming replication is system-
> wide, so there is no actual point to look at databases either.
>
> > Would this do the job or am I missing something here?
>
> Here is a suggestion for Nagios: hot_standby_delay, as told in
> https://github.com/bucardo/check_postgres/blob/master/check_postgres.pl
> --
> Michael


I don't want to use check_hot_standby_delay as I would have to configure every streaming replication configuration
separatelywith nagios. 

I want a generic routine which I can load on any postgres server regardless of streaming replication or database role.

The query would return >0 if streaming replication falls behind and 0 in all other cases (replication or not).

Checking streaming replication per database doesn't make any sense to me.

Markus