On Fri, Nov 5, 2010 at 5:39 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Nov 5, 2010 at 2:46 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> I'm continuing in my efforts now to document how to deploy and manage
>> replication on our wiki. One of the things a DBA needs to do is to use
>> pg_current_xlog_location() (and related functions) to check how far
>> behind the master the standby is.
>>
>> However, there's some serious problems with that:
>>
>> (1) comparing these numbers is quite mathematically complex -- and, for
>> that matter, undocumented.
>>
>> (2) pg_rotate_xlog and/or archive_timeout will create a "gap" in the
>> xlog positions, quite a large one if it happens near the beginning of a
>> file. There is no way for any monitoring on the standby to tell the
>> difference between a gap created by forced rotation as opposed to being
>> most of a file behind, until the next record shows up. Hello, nagios
>> false alerts!
>>
>> (3) There is no easy way to relate a difference in log positions to an
>> amount of time.
>>
>> I'll work on some tools to make this a bit more palatable, but I
>> disagree with earlier assertions that we have the replication monitoring
>> "done". There's still a *lot* of work to do.
>
> I've heard the same complaint, and I agree with your concerns.
"All this has happened before, and all of it will happen again."
At this point pg has the equivalent of MySQL's "show slave status" in
4.0. The output of that change significantly over time:
http://dev.mysql.com/doc/refman/4.1/en/show-slave-status.html
http://dev.mysql.com/doc/refman/5.5/en/show-slave-status.html
Also of interest
http://dev.mysql.com/doc/refman/4.1/en/show-binary-logs.html
--
Rob Wultsch
wultsch@gmail.com