On 2013-04-22 16:36:38 -0600, Quentin Hartman wrote: > I'm using this script to check my replication lag on my streaming > replication pairs with Nagios: > > https://gist.github.com/jacobian/743942 > > It generally works fine, but will occasionally return a negative lag value > (-37kb for example) which of course causes it to throw an alarm, but is > total nonsense. I've been working on the assumption that it is some sort of > bug in the script, but in taking a quick look at it nothing jumps out at me. > > Is there something in Postgres itself that could cause this to happen once > in awhile? Is it something to be concerned about? Is there a better way to > monitor this state?
Well, between the time pg_current_xlog_location() is run on the primary and pg_last_xlog_replay_location() on the standby some time passes, so its not all that unlikely that wal has been generated, streamed *and* applied in that time. Given the short timeframe it only happens every now and then.
Did you check the pg_stat_replication view on the primary?
Greetings,
Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services