Hi,
On Tue, Sep 15, 2009 at 12:47 AM, Greg Smith <gsmith@gregsmith.com> wrote:
> Putting on my DBA hat for a minute, the first question I see people asking
> is "how do I measure how far behind the slaves are?". Presumably you can
> get that out of pg_controldata; my first question is whether that's complete
> enough information? If not, what else should be monitored?
Currently the progress of replication is shown only in PS display. So, the
following three steps are necessary to measure the gap of the servers.
1. execute pg_current_xlog_location() to check how far the primary has written WAL.
2. execute 'ps' to check how far the standby has written WAL.
3. compare the above results.
This is very messy. More user-friendly monitoring feature is necessary,
and development of it is one of TODO item for the later CommitFest.
I'm thinking something like pg_standbys_xlog_location() which returns
one row per standby servers, showing pid of walsender, host name/
port number/user OID of the standby, the location where the standby
has written/flushed WAL. DBA can measure the gap from the
combination of pg_current_xlog_location() and pg_standbys_xlog_location()
via one query on the primary. Thought?
But the problem might be what happens after the primary has fallen
down. The current write location of the primary cannot be checked via
pg_current_xlog_locaton, and might need to be calculated from WAL
files on the primary. Is the tool which performs such calculation
necessary?
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center