Re: Logical replication lag in seconds - Mailing list pgsql-general

From Klaus Darilion
Subject Re: Logical replication lag in seconds
Date
Msg-id 264f1acc-f79b-961f-72b4-dda3ea6c9834@pernau.at
Whole thread Raw
In response to Re: Logical replication lag in seconds  (Michael Lewis <mlewis@entrata.com>)
List pgsql-general
For the records - with a simple script I hacked a solution which is
purely based on the server.

1. Create a table to track the timestamp of an lsn:

CREATE TABLE lsn2date(
   lsn pg_lsn PRIMARY KEY,
   seen timestamp NOT NULL DEFAULT NOW()
);
CREATE ROLE replication_lag_user WITH LOGIN PASSWORD 'xxx';
GRANT ALL ON TABLE lsn2date TO replication_lag_user;


2. Create a script which populates the table:

# cat /etc/systemd/system/calculate_logical_replication_lag.service
[Unit]
Description=Start and auto restart service

[Install]
WantedBy=multi-user.target

[Service]
ExecStart=/usr/bin/php /path/to/calculate_logical_replication_lag.php
Restart=always
RestartSec=10


# cat calculate_logical_replication_lag.php
<?php

$dbuser="replication_lag_user";
$dbpass="xxx";
if (!$dbconn = pg_pconnect('host=127.0.0.1 dbname=mydb user='.$dbuser.'
password='.$dbpass)) {
        print "Sorry, database connection failed";
        exit;
}

$accuracy = 10; // in seconds

while (1) {
        $dbq = pg_query("INSERT INTO lsn2date (lsn) VALUES
(pg_current_wal_lsn())");
        if ($dbq === FALSE) {
                mylog(LOG_ERROR, "SQL query error: ".pg_last_error()."\n");
                exit(1);
        }

        $dbq = pg_query("DELETE FROM lsn2date WHERE lsn < (".
                          "SELECT lsn FROM lsn2date WHERE lsn < (".
                            "SELECT confirmed_flush_lsn FROM
pg_replication_slots ORDER BY confirmed_flush_lsn ASC LIMIT 1".
                          ") ORDER BY lsn DESC LIMIT 1".
                        ")"
        );
        if ($dbq === FALSE) {
                mylog(LOG_ERROR, "SQL query error: ".pg_last_error()."\n");
                exit(1);
        }
        sleep($accuracy);
}




3. Get the lag, using a function which compares the lsn of the
replication_slots with the lsn/timestamp in the lsn2date table:



CREATE OR REPLACE FUNCTION get_replication_lag() RETURNS TABLE
(subscriber name, lag bigint) AS
$BODY$
DECLARE
    subscriber name;
BEGIN
    FOR subscriber IN
        SELECT slot_name FROM pg_replication_slots
    LOOP
        RETURN QUERY SELECT slot_name, EXTRACT(EPOCH FROM
NOW()-seen)::bigint lag from lsn2date,pg_replication_slots WHERE
slot_name=subscriber AND lsn < confirmed_flush_lsn ORDER BY lsn DESC
LIMIT 1;
    END LOOP;
    RETURN;
END
$BODY$
LANGUAGE plpgsql;


# SELECT * FROM get_replication_lag() ;
 subscriber | lag
------------+-----
 reg_sjc1   |   0
 reg_ffm1   |   0
 reg_tst2   |   0
 reg_mia1   |   0
 reg_jbg1   |   0
 reg_ams1   |   0
 reg_syy1   |   0
 reg_wie1   |   0
 reg_hkg1   |   0
 reg_gnf1   |   0
 reg_tor1   |   0
 reg_sea1   |   0
 reg_chi1   |   0
 reg_dfw1   |   0
 reg_sgp1   |   0
 reg_lhr1   |   0



regards
Klaus




pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: pg_dump and public schema
Next
From: John DeSoi
Date:
Subject: Re: Postgres on macOS 10