Thread: how to tell if a replication server has stopped replicating

how to tell if a replication server has stopped replicating

From
Bill MacArthur
Date:
Hello,

We recently discovered, quite by accident, that our streaming replication server was no longer replicating. We noticed
thisin our master server log file: 
2011-08-26 00:00:05 PDT postgres 192.168.17.4 [unknown]LOG:  replication connection authorized: user=postgres
host=192.168.17.4port=53542 
2011-08-26 00:00:05 PDT postgres 192.168.17.4 [unknown]FATAL:  requested WAL segment 00000001000001D10000006B has
alreadybeen removed 

As it turned out this has been going on for at least a week as everyday's log files were crammed with these messages.
Whatevercaused the replication server to end up needing the WAL file is a mystery for another day. What I would like to
dois setup a simple method of alerting us if replication stops. We could do a simple grep of log files on the
replicationside, but I am guessing that there is some SQL command that could be run against the postgres internals that
wouldbe cleaner. Is there such an animal? 

Thank you,
Bill MacArthur

Re: how to tell if a replication server has stopped replicating

From
"mark"
Date:

> -----Original Message-----
> From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-
> owner@postgresql.org] On Behalf Of Bill MacArthur
> Sent: Friday, August 26, 2011 10:21 AM
> To: pgsql-admin@postgresql.org
> Subject: [ADMIN] how to tell if a replication server has stopped
> replicating
>
> Hello,
>
> We recently discovered, quite by accident, that our streaming
> replication server was no longer replicating. We noticed this in our
> master server log file:
> 2011-08-26 00:00:05 PDT postgres 192.168.17.4 [unknown]LOG:
> replication connection authorized: user=postgres host=192.168.17.4
> port=53542
> 2011-08-26 00:00:05 PDT postgres 192.168.17.4 [unknown]FATAL:
> requested WAL segment 00000001000001D10000006B has already been removed
>
> As it turned out this has been going on for at least a week as
> everyday's log files were crammed with these messages. Whatever caused
> the replication server to end up needing the WAL file is a mystery for
> another day. What I would like to do is setup a simple method of
> alerting us if replication stops. We could do a simple grep of log
> files on the replication side, but I am guessing that there is some SQL
> command that could be run against the postgres internals that would be
> cleaner. Is there such an animal?
>
> Thank you,
> Bill MacArthur
>


* http://archives.postgresql.org/pgsql-hackers/2010-11/msg00198.php

* http://archives.postgresql.org/pgsql-hackers/2010-11/msg00252.php


Those two posts should cover the basics. There are other ways some people use to do it, but this seems to be the
generallyaccepted way.  

I think 9.1 has some stuff in the works that should make it far easier to monitor.

-Mark