Re: Streaming replication status - Mailing list pgsql-hackers

From Greg Smith
Subject Re: Streaming replication status
Date
Msg-id 4B4FEA18.5080705@2ndquadrant.com
Whole thread Raw
In response to Re: Streaming replication status  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: Streaming replication status  (Simon Riggs <simon@2ndQuadrant.com>)
Re: Streaming replication status  (Stefan Kaltenbrunner <stefan@kaltenbrunner.cc>)
Re: Streaming replication status  (Josh Berkus <josh@agliodbs.com>)
List pgsql-hackers
Fujii Masao wrote:<br /><blockquote cite="mid:3f0b79eb1001140033p707862f4yabece81301ca609c@mail.gmail.com"
type="cite"><blockquotetype="cite"><pre wrap="">"I'm thinking something like pg_standbys_xlog_location() [on the
primary]which returns
 
one row per standby servers, showing pid of walsender, host name/
port number/user OID of the standby, the location where the standby
has written/flushed WAL. DBA can measure the gap from the
combination of pg_current_xlog_location() and pg_standbys_xlog_location()
via one query on the primary."   </pre></blockquote><pre wrap="">
This function is useful but not essential for troubleshooting, I think.
So I'd like to postpone it. </pre></blockquote><br /> Sure; in a functional system where primary and secondary are both
up,you can assemble the info using the new functions you just added, so this other one is certainly optional.  I just
tooka brief look at the code of the features you added, and it looks like it exposes the minimum necessary to make this
wholething possible to manage.  I think it's OK if you postpone this other bit, more important stuff for you to work
on.<br/><br /> So:  the one piece of information I though was most important to expose here at an absolute minimum is
therenow.  Good progress.  The other popular request that keeps popping up here is  providing an easy way to see how
backloggedthe archive_command is, to make it easier to monitor for out of disk errors that might prove catastrophic to
replication.<br/><br /> I just spent some time looking through the WAL/archiving code in that context.  It looks to me
thatthat this information isn't really stored anywhere right now.  The only thing that knows what segment is currently
queuedup to copy over is pgarch_ArchiverCopyLoop via its call to pgarch_readyXlog.  Now, this is a pretty brute-force
pieceof code:  it doesn't remember its previous work at all, it literally walks the archive_status directory looking
for*.ready files that have names that look like xlog files, then returns the earliest.  That unfortunately means that
it'snot even thinking in the same terms as all these other functions, which are driven by the xlog_location advancing,
andthen the filename is computed from that.  All you've got is the filename at this point, and it's not even guaranteed
tobe real--you could easily fool this code if you dropped an inappropriately named file into that directory.<br /><br
/>I could easily update this code path to save the name of the last archived file in memory while all this directory
scanningis going on anyway, and then provide a UDF to expose that bit of information.  The result would need to have
documentationthat disclaims it like this:<br /><br /> pg_last_archived_xlogfile() text:  Get the name of the last file
thearchive_command [tried to|successfully] archived since the server was started.  If archiving is disabled or no xlog
fileshave become ready to archive since startup, a blank line will be returned.  It is possible for this function to
returna result that does not reflect an actual xlogfile if files are manually added to the server's archive_status
directory.<br/><br /> I'd find this extremely handy as a hook for monitoring scripts that want to watch the server but
don'thave access to the filesystem directly, even given those limitations.  I'd prefer to have the "tried to" version,
becauseit will populate with the name of the troublesome file it's stuck on even if archiving never gets its first
segmentdelivered.<br /><br /> I'd happily write a patch to handle all that if I thought it would be accepted.  I fear
thatthe whole approach will be considered a bit too hackish and get rejected on that basis though.  Not really sure of
a"right" way to handle this though.  Anything better is going to be more complicated because it requires passing more
informationinto the archiver, with little gain for that work beyond improving the quality of this diagnostic routine. 
AndI think most people would find what I described above useful enough.<br /><br /><pre class="moz-signature"
cols="72">--
 
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
<a class="moz-txt-link-abbreviated" href="mailto:greg@2ndQuadrant.com">greg@2ndQuadrant.com</a>  <a
class="moz-txt-link-abbreviated"href="http://www.2ndQuadrant.com">www.2ndQuadrant.com</a>
 
</pre>

pgsql-hackers by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: Re: plpython3
Next
From: Simon Riggs
Date:
Subject: Re: Streaming replication status