Home > mailing lists

Nagios plugin to check slony replication - Mailing list pgsql-general

From	John Sidney-Woollett
Subject	Nagios plugin to check slony replication
Date	February 27, 2005 17:54:18
Msg-id	4221EE0D.5010502@wardbrook.com Whole thread Raw
Responses	Re: [Slony1-general] Nagios plugin to check slony replication
List	pgsql-general

Tree view

I've finally got around to writing the two nagios plugins which I am
using to check our slony cluster (on our linux servers). I'm posting
them in case anyone else wants them or to use them as a basis for
something else. These are based on Christopher Browne's scripts that
ship with slony.

The two scripts perform different tasks.

check_slon checks to see that the slon daemon is in the proces list and
optionally checks for any error or warning messages in the slon log file
it is called using two or three parameters; the clustername, the dbname
and (optionally) the location of the log file. This script is to be
executed on each node in the cluster (both master and slaves)

check_sloncluster checks that active receiver nodes are comfirming sync
within 10 seconds of the master. I'm not entirely sure that this is the
best strategy, and if you know otherwise, I'd love to hear. Requires two
parameters;  the clustername and the dbname. This script is executed on
the master database only.

These scripts are designed to run on the host on which they are
checking. With a little modification, they could check remote servers on
the network. They are quite simplistic and may not be suitable for your
environment. You are free to modify the code to suit your own needs.

John Sidney-Woollett

check_slon
==========

#!/bin/sh

# nagios plugin that checks whether the slon daemon is running
# if the 3rd parameter (LOGFILE) is specified then the log file is
# checked to see if the last entry is a WARN or FATAL message
#
# three possible exit statuses:
#  0 = OK
#  1 = Warning (warning in slon log file)
#  2 = Fatal Error (slon not running, or error in log file)
#
# script requires two or three parameters:
# CLUSTERNAME - name of slon cluster to be checked
# DBNAME - name of database being replicated
# LOGFILE - (optional) location of the slon log file
#
# Author:  John Sidney-Woollett
# Created: 26-Feb-2005
# Copyright 2005

# check parameters are valid
if [[ $# -lt 2 && $# -gt 3 ]]
then
   echo "Invalid parameters need CLUSTERNAME DBNAME [LOGFILE]"
   exit 2
fi

# assign parameters
CLUSTERNAME=$1
DBNAME=$2
LOGFILE=$3

# check to see whether the slon daemon is running
SLONPROCESS=`ps -auxww | egrep "[s]lon $CLUSTERNAME" | egrep
"dbname=$DBNAME" | awk '{print $2}'`

if [ ! -n "$SLONPROCESS" ]
then
   echo "no slon process active"
   exit 2
fi

# if the logfile is specified, check it exists
# and check for the word ERROR or WARN in the last line
if [ -n "$LOGFILE" ]
then
   # check for log file
   if [ -f "$LOGFILE" ]
   then
     LOGLINE=`tail -1 $LOGFILE`
     LOGSTATUS=`tail -1 $LOGFILE | awk '{print $1}'`
     if [ $LOGSTATUS = "FATAL" ]
     then
       echo "$LOGLINE"
       exit 2
     elif [ $LOGSTATUS = "WARN" ]
     then
       echo "$LOGLINE"
       exit 1
     fi
   else
     echo "$LOGFILE not found"
     exit 2
   fi
fi

# otherwise all looks to be OK
echo "OK - slon process $SLONPROCESS"
exit 0



check_sloncluster
=================

#!/bin/sh

# nagios plugin that checks whether the slave nodes in a slony cluster
# are being updated from the master
#
# possible exit statuses:
#  0 = OK
#  2 = Error, one or more slave nodes are not sync'ing with the master
#
# script requires two parameters:
# CLUSTERNAME - name of slon cluster to be checked
# DBNAME - name of master database
#
# Author:  John Sidney-Woollett
# Created: 26-Feb-2005
# Copyright 2005

# check parameters are valid
if [[ $# -ne 2 ]]
then
   echo "Invalid parameters need CLUSTERNAME DBNAME"
   exit 2
fi

# assign parameters
CLUSTERNAME=$1
DBNAME=$2

# setup the query to check the replication status
SQL="select case
   when ttlcount = okcount then 'OK - '||okcount||' nodes in sync'
   else 'ERROR - '||ttlcount-okcount||' of '||ttlcount||' nodes not in sync'
end as syncstatus
from (
-- determine total active receivers
select (select count(distinct sub_receiver)
     from _$CLUSTERNAME.sl_subscribe
     where sub_active = true) as ttlcount,
(
-- determine active nodes syncing within 10 seconds
  select count(*) from (
   select st_received, st_last_received_ts - st_last_event_ts as cfmdelay
   from _$CLUSTERNAME.sl_status
   where st_received in (
     select distinct sub_receiver
     from _$CLUSTERNAME.sl_subscribe
     where sub_active = true
   )
) as t1
where cfmdelay < interval '10 secs') as okcount
) as t2"

# query the master database
CHECK=`/usr/local/pgsql/bin/psql -c "$SQL" --tuples-only -U postgres
$DBNAME`

if [ ! -n "$CHECK" ]
then
   echo "ERROR querying $DBNAME"
   exit 2
fi

# echo the result of the query
echo $CHECK

# and check the return status
STATUS=`echo $CHECK | awk '{print $1}'`
if [ $STATUS = "OK" ]
then
   exit 0
else
   exit 2
fi

pgsql-general by date:

From: Christopher Browne
Date: 27 February 2005, 12:51:08
Subject: Re: postgresql 8.0 advantages

From: Keary Suska
Date: 27 February 2005, 18:51:38
Subject: Re: Is this correct behavior for ON DELETE rule?

Nagios plugin to check slony replication - Mailing list pgsql-general

Previous

Next