[GENERAL] pg_last_xact_replay_timestamp() sometimes reports unlikely, very large delays - Mailing list pgsql-general

From Toby Corkindale
Subject [GENERAL] pg_last_xact_replay_timestamp() sometimes reports unlikely, very large delays
Date
Msg-id 2853232.ondRSBYBj0@adonai
Whole thread Raw
Responses Re: [GENERAL] pg_last_xact_replay_timestamp() sometimes reports unlikely, very large delays
browser interface to forums please?
List pgsql-general
Hi all,
I wondered if any experts can help me out?

I currently monitor Postgresql's replication status by looking at two metrics.
First I check to see if the current slave xlog replay is equal to the master
-- if so, it's up to date.
If it's not equal, then I look at pg_last_xact_replay_timestamp().

I can compare this with now() to get a duration, which I believe should
indicate how far behind the slave is tracking.

Most of the time this works quite well -- the slave might fall behind by some
seconds under heavy load, but that's fine.

However, occasionally this replay timestamp will report times many hours or
days behind! This goes on for a few minutes, then suddenly recovers.


My best guess for what is going on is:
 - There has been no activity for hours or days, and so the oldest replayed
transaction on the slave is genuinely quite old.
 - Something has happened on the master that causes its
pg_current_xlog_location() to be updated, but not in a way that is sent to the
slave until the end of a long-running transaction.


Could anyone suggest how to do this in a manner that avoids the problem?


It's annoying because when it happens, because for 5-10 minutes monitoring
alerts get fired off about catastrophic amounts of lag on the read-only slave!

Cheers
Toby


pgsql-general by date:

Previous
From: rakeshkumar464
Date:
Subject: Re: [GENERAL] Incremental / Level -1 backup in PG
Next
From: Glen Huang
Date:
Subject: Re: [GENERAL] How to create unique index on multiple columns wherethe combination doesn't matter?