Thread: Streaming Recovery - Automated Monitoring

Streaming Recovery - Automated Monitoring

From

Karl Denninger

Date:

03 October 2010, 03:50:03

 I'm trying to come up with an automated monitoring system to watch the
WAL log progress and sound appropriate alarms if it gets too far behind
for some reason (e.g. communications problems, etc.) - so far without
success.

What I need is some sort of way to compute a difference between the
master and slave(s) position in the WAL stream.  It appears that there
is no function set specifically to do this, and one of the things I
thought I'd try (using pg_xlogfile_name_offset) doesn't work on the
slave - it returns:

ticker=# select pg_xlogfile_name_offset(pg_last_xlog_receive_location());
ERROR:  recovery is in progress
HINT:  pg_xlogfile_name_offset() cannot be executed during recovery.

Any ideas?

Looking in on the system is ok, but I want to come up with an automated
tool for letting me know if something goes haywire.

Thanks in advance.

-- Karl

Attachment

karl.vcf

Re: Streaming Recovery - Automated Monitoring

From

Rajesh Kumar Mallah

Date:

03 October 2010, 04:40:34

I hope u checked point #11 http://wiki.postgresql.org/wiki/Streaming_Replication#How_to_Use

11. You can calculate the replication lag by comparing the current WAL write location on the primary with the last WAL location received/replayed by the standby. They can be retrieved using pg_current_xlog_location on the primary and the pg_last_xlog_receive_location/pg_last_xlog_replay_location on the standby, respectively.

$ psql -c "SELECT pg_current_xlog_location()" -h192.168.0.10 (primary host)pg_current_xlog_location 
--------------------------0/2000000
(1 row)

$ psql -c "select pg_last_xlog_receive_location()" -h192.168.0.20 (standby host)pg_last_xlog_receive_location 
-------------------------------0/2000000
(1 row)

$ psql -c "select pg_last_xlog_replay_location()" -h192.168.0.20 (standby host)pg_last_xlog_replay_location 
------------------------------0/2000000
(1 row)

Regds
Rajesh Kumar Mallah.

Re: Streaming Recovery - Automated Monitoring

From

Karl Denninger

Date:

03 October 2010, 05:09:47

On 10/2/2010 11:40 PM, Rajesh Kumar Mallah wrote:

I hope u checked point #11 http://wiki.postgresql.org/wiki/Streaming_Replication#How_to_Use
11. You can calculate the replication lag by comparing the current WAL write location on the primary with the last WAL location received/replayed by the standby. They can be retrieved using pg_current_xlog_location on the primary and the pg_last_xlog_receive_location/pg_last_xlog_replay_location on the standby, respectively.
$ psql -c "SELECT pg_current_xlog_location()" -h192.168.0.10 (primary host)pg_current_xlog_location 
--------------------------0/2000000
(1 row)

$ psql -c "select pg_last_xlog_receive_location()" -h192.168.0.20 (standby host)pg_last_xlog_receive_location 
-------------------------------0/2000000
(1 row)

$ psql -c "select pg_last_xlog_replay_location()" -h192.168.0.20 (standby host)pg_last_xlog_replay_location 
------------------------------0/2000000
(1 row)
Regds
Rajesh Kumar Mallah.

Yes, I did.

Now how do I get an arithmetic difference between the two? There will (usually) be a small difference between the master and slave on a busy system - what I want to do is query both and if the difference in their locations is greater than some defined size, start raising hell (e.g. sending SMS to people, etc)

I can SEE the difference, but I don't see a way to COMPUTE a difference, and there does not appear to be a function that will accept the log file location as an argument for conversion - the one documented for offsets (which might otherwise work) does not work on the slave as I noted.

With Slony there was a set of system tables that would tell me how many unapplied changes were in the queue. From this I could determine health - if the number was more than some reasonably-small amount, something was broken and alarms were to be raised.

I'm looking for a way to implement the same sort of functionality here.

ticker=# select pg_last_xlog_replay_location();
pg_last_xlog_replay_location
------------------------------
37A/327D1888
(1 row)

ticker=# select pg_current_xlog_location();
pg_current_xlog_location
--------------------------
37A/3280DCB8
(1 row)

How do I get an arithmetic difference between these two programmatically, and will such always be monoatomically increasing (that is, will they ever roll over, thereby giving me a potential NEGATIVE difference?)

The offset function doesn't work on the slave, but that probably doesn't help me anyway since it appears to be file-relative (that is, if the prefix is different its useless anyway.)

If there is no internal Postgres functionality that can do this, then I need to know the computational rules for how to get an absolute offset between two different values returned by these functions.

-- Karl

Attachment

karl.vcf

Re: Streaming Recovery - Automated Monitoring

From

Magnus Hagander

Date:

03 October 2010, 06:28:39

On Sun, Oct 3, 2010 at 07:07, Karl Denninger <karl@denninger.net> wrote:
> Now how do I get an arithmetic difference between the two?  There will
> (usually) be a small difference between the master and slave on a busy
> system - what I want to do is query both and if the difference in their
> locations is greater than some defined size, start raising hell (e.g.
> sending SMS to people, etc)

Take a look at this code for an example:

http://munin-monitoring.org/browser/trunk/plugins/node.d/postgres_streaming_.in?rev=3905

I *think* it's complete :-)


> With Slony there was a set of system tables that would tell me how many
> unapplied changes were in the queue.  From this I could determine health -
> if the number was more than some reasonably-small amount, something was
> broken and alarms were to be raised.

Yeah, streaming can't do it quite so conveniently for you, since the
view can only talk to one database at a time.


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Streaming Recovery - Automated Monitoring

From

Guillaume Lelarge

Date:

03 October 2010, 06:35:07

Le 03/10/2010 07:07, Karl Denninger a écrit :
>  On 10/2/2010 11:40 PM, Rajesh Kumar Mallah wrote:
>>
>> I hope u checked point #11
>> http://wiki.postgresql.org/wiki/Streaming_Replication#How_to_Use
>>
>>     * *11.* You can calculate the replication lag by comparing the
>>       current WAL write location on the primary with the last WAL
>>       location received/replayed by the standby. They can be retrieved
>>       using /pg_current_xlog_location/ on the primary and the
>>       /pg_last_xlog_receive_location///pg_last_xlog_replay_location/
>>       on the standby, respectively.
>>
>> $ psql -c "SELECT pg_current_xlog_location()" -h192.168.0.10 (primary host)
>>  pg_current_xlog_location
>> --------------------------
>>  0/2000000
>> (1 row)
>>
>> $ psql -c "select pg_last_xlog_receive_location()" -h192.168.0.20 (standby host)
>>  pg_last_xlog_receive_location
>> -------------------------------
>>  0/2000000
>> (1 row)
>>
>> $ psql -c "select pg_last_xlog_replay_location()" -h192.168.0.20 (standby host)
>>  pg_last_xlog_replay_location
>> ------------------------------
>>  0/2000000
>> (1 row)
>>
>> Regds
>> Rajesh Kumar Mallah.
>
> Yes, I did.
>
> Now how do I get an arithmetic difference between the two?  There will
> (usually) be a small difference between the master and slave on a busy
> system - what I want to do is query both and if the difference in their
> locations is greater than some defined size, start raising hell (e.g.
> sending SMS to people, etc)
>
> I can SEE the difference, but I don't see a way to COMPUTE a difference,
> and there does not appear to be a function that will accept the log file
> location as an argument for conversion - the one documented for offsets
> (which might otherwise work) does not work on the slave as I noted.
>
> With Slony there was a set of system tables that would tell me how many
> unapplied changes were in the queue.  From this I could determine health
> - if the number was more than some reasonably-small amount, something
> was broken and alarms were to be raised.
>
> I'm looking for a way to implement the same sort of functionality here.
>
> ticker=# select pg_last_xlog_replay_location();
>  pg_last_xlog_replay_location
> ------------------------------
>  37A/327D1888
> (1 row)
>
> ticker=# select pg_current_xlog_location();
>  pg_current_xlog_location
> --------------------------
>  37A/3280DCB8
> (1 row)
>
> How do I get an arithmetic difference between these two
> programmatically, and will such always be monoatomically increasing
> (that is, will they ever roll over, thereby giving me a potential
> NEGATIVE difference?)
>

pgPool-II does that computation. You should check that in its source
code. File pool_worker_child.c, function check_replication_time_lag().
It creates a long value from the LSN returned by these functions. Here
is the computation:

  lsn = xlogid * 16 * 1024 * 1024 * 255 + xrecoff;

In your example, xlogid is 37A and xrecoff is 327D1888 on the slave and
3280DCB8 on the master. The hexadecimal values were first converted to
their unsigned decimal integer representation (same file, function
text_to_lsn()).

You should really get a look at this file. It's probably a better
explanation than what I'm trying to do :)


--
Guillaume
 http://www.postgresql.fr
 http://dalibo.com

Re: Streaming Recovery - Automated Monitoring

From

Karl Denninger

Date:

03 October 2010, 20:47:15

 On 10/3/2010 1:34 AM, Guillaume Lelarge wrote:
> Le 03/10/2010 07:07, Karl Denninger a écrit :
>>  On 10/2/2010 11:40 PM, Rajesh Kumar Mallah wrote:
>>> I hope u checked point #11
>>> http://wiki.postgresql.org/wiki/Streaming_Replication#How_to_Use
>>>
>>>     * *11.* You can calculate the replication lag by comparing the
>>>       current WAL write location on the primary with the last WAL
>>>       location received/replayed by the standby. They can be retrieved
>>>       using /pg_current_xlog_location/ on the primary and the
>>>       /pg_last_xlog_receive_location///pg_last_xlog_replay_location/
>>>       on the standby, respectively.
>>>
>>> $ psql -c "SELECT pg_current_xlog_location()" -h192.168.0.10 (primary host)
>>>  pg_current_xlog_location
>>> --------------------------
>>>  0/2000000
>>> (1 row)
>>>
>>> $ psql -c "select pg_last_xlog_receive_location()" -h192.168.0.20 (standby host)
>>>  pg_last_xlog_receive_location
>>> -------------------------------
>>>  0/2000000
>>> (1 row)
>>>
>>> $ psql -c "select pg_last_xlog_replay_location()" -h192.168.0.20 (standby host)
>>>  pg_last_xlog_replay_location
>>> ------------------------------
>>>  0/2000000
>>> (1 row)
>>>
>>> Regds
>>> Rajesh Kumar Mallah.
>> Yes, I did.
>>
>> Now how do I get an arithmetic difference between the two?  There will
>> (usually) be a small difference between the master and slave on a busy
>> system - what I want to do is query both and if the difference in their
>> locations is greater than some defined size, start raising hell (e.g.
>> sending SMS to people, etc)
>>
>> I can SEE the difference, but I don't see a way to COMPUTE a difference,
>> and there does not appear to be a function that will accept the log file
>> location as an argument for conversion - the one documented for offsets
>> (which might otherwise work) does not work on the slave as I noted.
>>
>> With Slony there was a set of system tables that would tell me how many
>> unapplied changes were in the queue.  From this I could determine health
>> - if the number was more than some reasonably-small amount, something
>> was broken and alarms were to be raised.
>>
>> I'm looking for a way to implement the same sort of functionality here.
>>
>> ticker=# select pg_last_xlog_replay_location();
>>  pg_last_xlog_replay_location
>> ------------------------------
>>  37A/327D1888
>> (1 row)
>>
>> ticker=# select pg_current_xlog_location();
>>  pg_current_xlog_location
>> --------------------------
>>  37A/3280DCB8
>> (1 row)
>>
>> How do I get an arithmetic difference between these two
>> programmatically, and will such always be monoatomically increasing
>> (that is, will they ever roll over, thereby giving me a potential
>> NEGATIVE difference?)
>>
> pgPool-II does that computation. You should check that in its source
> code. File pool_worker_child.c, function check_replication_time_lag().
> It creates a long value from the LSN returned by these functions. Here
> is the computation:
>
>   lsn = xlogid * 16 * 1024 * 1024 * 255 + xrecoff;
>
> In your example, xlogid is 37A and xrecoff is 327D1888 on the slave and
> 3280DCB8 on the master. The hexadecimal values were first converted to
> their unsigned decimal integer representation (same file, function
> text_to_lsn()).
>
> You should really get a look at this file. It's probably a better
> explanation than what I'm trying to do :)
>

Here's a little program to do it - change "MAX_OFFSET" and modify
(particularly the "notify" function) to suit and place a file called
"CHECK.cluster" in the following form in the home directory of whatever
this runs as (it will look in the home directory of the euid of the
process):

master-name connection-string-to-connect-to-master
slave1-name slave-string-to-connect
slave2-name slave-string-to-connect
......

It will emit a notification for each slave that is more than
"MAX_OFFSET" behind, or if any slave is not in recovery mode (bad; it
thinks it's a master!) or worse, if the MASTER is in recovery mode, or
if it can't talk to any of the declared nodes.

Note that "CHECK.cluster" contains connection information so make sure
you run this as someone and with a home directory for that someone
appropriately permitted to prevent leakage of the connection credentials.

Ain't gonna claim it's elegant, but it was something I was able to
quickly bang out in a few minutes and it works.  Stuck in the CRON if
you have the cron's email going somewhere that will get noticed fast
(e.g. a Blackberry email, etc) you can run this on whatever interval you
need and it'll do the job.

Postgresql's development team can consider this a contribution to the
codebase if someone wants to clean it up a bit and include it.  You
could obviously pick up the config file as an argument rather than using
the running euid
of the process to grab it; I'm using it in an environment where the
latter works better for me - YMMV.


/*
 * check-replication - Checks the replication state of a Postgresql
master/slave cluster, with multiple slaves and
 * one master.  Each slave's streaming log location is checked against
the master in turn, and if any are out of
 * sync by more than the allowed amount, or if we can't connect to the
slave (or master), we emit a message
 * to that effect.  Suitable for running out of cron as it's silent in
the absence of something to report.
 *
 * Public domain - created 2010/10/3 by Karl Denninger (karl@denninger.net)
 *
 * No claim of elegance is made - it's quick, dirty, and it works.
 */

#include        <stdio.h>
#include        <stdlib.h>
#include        <string.h>
#include        <syslog.h>
#include        <unistd.h>
#include        <sys/types.h>
#include        <pwd.h>
#include        <signal.h>
#include        <libpq-fe.h>

#include        "defs.h"

#define MAX_OFFSET      1024000         /* Can be behind by 1MB */

PGconn  *conn;
PGconn  *conn_slave;

unsigned        long    convert_long(str)
char    *str;

{
        char    tmp[512];
        char    tmp2[512];

        unsigned        long    offset;

        unsigned        long    c1, c2;

        sscanf(str, "%lx/%lx", &c1, &c2);

        offset = (c1 * 16 * 1024 * 1024 * 255) + c2;
        return(offset);
}


notify(string, user)
char    *string;
{
        FILE    *fid;
        char    tmp[512];

        printf("%s\n", string);
        return;
}

int     main(argc, argv)
int     argc;
char    *argv[];

{

        PGresult        *result;
        PGresult        *result2;

        FILE    *fid;


        struct  passwd  *pwd;
        char    tmp[512];
        char    tmp2[512];
        char    master[512];
        char    slave[512];

        unsigned        long    posmaster;
        unsigned        long    posclient;

        pwd = getpwuid(geteuid());
        chdir(pwd->pw_dir);

        fid = fopen("CHECK.cluster", "r");
        if (!fid) {
                printf("Error: Cannot open CHECK.cluster in base
directory\n");
                exit(1);
        }


        fscanf(fid, "%[!-~] %[!-~ ]", master, tmp);     /* Get master */

        conn = PQconnectdb(tmp);
        if (PQstatus(conn) == CONNECTION_BAD) {
                sprintf(tmp, "%s - %s", master, PQerrorMessage(conn));
                notify(tmp);
                PQfinish(conn);
                exit(0);
        }
        result = PQexec(conn, "select pg_is_in_recovery()");
        if (!PQntuples(result)) {
                sprintf(tmp, "ERROR: Master [%s] fails to give us
recovery status - cannot continue", master);
                notify(tmp);
                exit(1);
        }
        strcpy(tmp, PQgetvalue(result, 0, 0));
        PQclear(result);
        if (tmp[0] == 't')  {
                sprintf(tmp, "ERROR: Master [%s] is in recovery - cannot
continue", master);
                notify(tmp, pwd->pw_name);
                notify(tmp);
                exit(1);
        }

        while (fscanf(fid, " %[!-~] %[!-~ ]", slave, tmp) == 2) {
                conn_slave = PQconnectdb(tmp);
                if (PQstatus(conn) == CONNECTION_BAD) {
                        sprintf(tmp, "%s - %s", slave,
PQerrorMessage(conn));
                        notify(tmp);
                        PQfinish(conn_slave);
                        exit(0);
                }
                result = PQexec(conn_slave, "select pg_is_in_recovery()");
                if (!PQntuples(result)) {
                        sprintf(tmp, "ERROR: Slave [%s] fails to give us
recovery status - cannot continue", slave);
                        notify(tmp);
                        exit(1);
                }
                strcpy(tmp, PQgetvalue(result, 0, 0));
                PQclear(result);
                if (tmp[0] != 't')  {
                        sprintf(tmp, "ERROR: Slave [%s] is NOT in
recovery - cannot continue", slave);
                        notify(tmp);
                        exit(1);
                }

                sprintf(tmp2, "select pg_last_xlog_replay_location()");
                result = PQexec(conn_slave, tmp2);
                if (!PQntuples(result))  {
                        sprintf(tmp, "ERROR: Slave [%s] returned no
result on replay location", slave);
                        notify(tmp);
                        exit(1);
                }
                strcpy(tmp, PQgetvalue(result, 0, 0));
                PQclear(result);
                posclient = convert_long(tmp);

                sprintf(tmp2, "select pg_current_xlog_location()");
                result = PQexec(conn, tmp2);
                if (!PQntuples(result))  {
                        sprintf(tmp, "ERROR: Master [%s] returned no
result on replay location", master);
                        notify(tmp);
                        exit(1);
                }
                strcpy(tmp, PQgetvalue(result, 0, 0));
                PQclear(result);
                posmaster = convert_long(tmp);
                if (posmaster > (posclient + MAX_OFFSET)) {
                        sprintf(tmp, "ERROR: Client [%s] lags master by
%ld", slave, posmaster - posclient);
                        notify(tmp);
                }
                PQfinish(conn_slave);
        }
        PQfinish(conn);
        fclose(fid);
        exit(0);
}

Attachment

karl.vcf

Re: Streaming Recovery - Automated Monitoring - CODE LINK

From

Karl Denninger

Date:

03 October 2010, 21:27:34

 On 10/3/2010 3:44 PM, Karl Denninger wrote:
>  On 10/3/2010 1:34 AM, Guillaume Lelarge wrote:
>> Le 03/10/2010 07:07, Karl Denninger a écrit :
>>>  On 10/2/2010 11:40 PM, Rajesh Kumar Mallah wrote:
>>>> I hope u checked point #11
>>>> http://wiki.postgresql.org/wiki/Streaming_Replication#How_to_Use
>>>>
>>>>     * *11.* You can calculate the replication lag by comparing the
>>>>       current WAL write location on the primary with the last WAL
>>>>       location received/replayed by the standby. They can be retrieved
>>>>       using /pg_current_xlog_location/ on the primary and the
>>>>       /pg_last_xlog_receive_location///pg_last_xlog_replay_location/
>>>>       on the standby, respectively.
>>>>
>>>> $ psql -c "SELECT pg_current_xlog_location()" -h192.168.0.10 (primary host)
>>>>  pg_current_xlog_location
>>>> --------------------------
>>>>  0/2000000
>>>> (1 row)
>>>>
>>>> $ psql -c "select pg_last_xlog_receive_location()" -h192.168.0.20 (standby host)
>>>>  pg_last_xlog_receive_location
>>>> -------------------------------
>>>>  0/2000000
>>>> (1 row)
>>>>
>>>> $ psql -c "select pg_last_xlog_replay_location()" -h192.168.0.20 (standby host)
>>>>  pg_last_xlog_replay_location
>>>> ------------------------------
>>>>  0/2000000
>>>> (1 row)
>>>>
>>>> Regds
>>>> Rajesh Kumar Mallah.
>>> Yes, I did.
>>>
>>> Now how do I get an arithmetic difference between the two?  There will
>>> (usually) be a small difference between the master and slave on a busy
>>> system - what I want to do is query both and if the difference in their
>>> locations is greater than some defined size, start raising hell (e.g.
>>> sending SMS to people, etc)
>>>
>>> I can SEE the difference, but I don't see a way to COMPUTE a difference,
>>> and there does not appear to be a function that will accept the log file
>>> location as an argument for conversion - the one documented for offsets
>>> (which might otherwise work) does not work on the slave as I noted.
>>>
>>> With Slony there was a set of system tables that would tell me how many
>>> unapplied changes were in the queue.  From this I could determine health
>>> - if the number was more than some reasonably-small amount, something
>>> was broken and alarms were to be raised.
>>>
>>> I'm looking for a way to implement the same sort of functionality here.
>>>
>>> ticker=# select pg_last_xlog_replay_location();
>>>  pg_last_xlog_replay_location
>>> ------------------------------
>>>  37A/327D1888
>>> (1 row)
>>>
>>> ticker=# select pg_current_xlog_location();
>>>  pg_current_xlog_location
>>> --------------------------
>>>  37A/3280DCB8
>>> (1 row)
>>>
>>> How do I get an arithmetic difference between these two
>>> programmatically, and will such always be monoatomically increasing
>>> (that is, will they ever roll over, thereby giving me a potential
>>> NEGATIVE difference?)
>>>
>> pgPool-II does that computation. You should check that in its source
>> code. File pool_worker_child.c, function check_replication_time_lag().
>> It creates a long value from the LSN returned by these functions. Here
>> is the computation:
>>
>>   lsn = xlogid * 16 * 1024 * 1024 * 255 + xrecoff;
>>
>> In your example, xlogid is 37A and xrecoff is 327D1888 on the slave and
>> 3280DCB8 on the master. The hexadecimal values were first converted to
>> their unsigned decimal integer representation (same file, function
>> text_to_lsn()).
>>
>> You should really get a look at this file. It's probably a better
>> explanation than what I'm trying to do :)
>>
> Here's a little program to do it - change "MAX_OFFSET" and modify
> (particularly the "notify" function) to suit and place a file called
> "CHECK.cluster" in the following form in the home directory of whatever
> this runs as (it will look in the home directory of the euid of the
> process):
>
> master-name connection-string-to-connect-to-master
> slave1-name slave-string-to-connect
> slave2-name slave-string-to-connect
> ......
>
> It will emit a notification for each slave that is more than
> "MAX_OFFSET" behind, or if any slave is not in recovery mode (bad; it
> thinks it's a master!) or worse, if the MASTER is in recovery mode, or
> if it can't talk to any of the declared nodes.
>
> Note that "CHECK.cluster" contains connection information so make sure
> you run this as someone and with a home directory for that someone
> appropriately permitted to prevent leakage of the connection credentials.
>
> Ain't gonna claim it's elegant, but it was something I was able to
> quickly bang out in a few minutes and it works.  Stuck in the CRON if
> you have the cron's email going somewhere that will get noticed fast
> (e.g. a Blackberry email, etc) you can run this on whatever interval you
> need and it'll do the job.
>
> Postgresql's development team can consider this a contribution to the
> codebase if someone wants to clean it up a bit and include it.  You
> could obviously pick up the config file as an argument rather than using
> the running euid
> of the process to grab it; I'm using it in an environment where the
> latter works better for me - YMMV.
>
I've cleaned this up a bit more and it is now at
http://www.denninger.net/check-replication.c

Feel free to grab and use it if you think it would be helpful.

-- Karl

Attachment

karl.vcf