Hi all,
While doing some monitoring of a replication setup for a stable
branch, I have been surprised by the fact that we have never tracked
WAL statistics for the WAL receiver in pg_stat_wal because we have
never bothered to update its code so as WAL stats are reported. This
is relevant for the write and sync counts and timings. On HEAD, this
information has been moved to pg_stat_io, but the stats reports happen
with the same routine (pgstat_report_wal in 15~, and pgstat_send_wal
in ~14).
As of f4694e0f35b2, the situation is better thanks to the addition of
a pgstat_report_wal() in the WAL receiver main loop, so we have some
data. However, we are only able to gather the data for segment syncs
and initializations, not the writes themselves as these are managed by
an independent code path, XLogWalRcvWrite().
A second thing that lacks in XLogWalRcvWrite() is a wait event around
the pg_pwrite() call, which is useful as the WAL receiver is listed in
pg_stat_activity. Note that it is possible to re-use the same wait
event as XLogWrite() for the WAL receiver, WAL_WRITE, because the WAL
receiver does not rely on the write and flush calls from xlog.c when
doing its work, and both have the same meaning, aka they write WAL.
The fsync calls use issue_xlog_fsync() and the segment inits happen in
XLogFileInit().
Perhaps there's a point in backpatching a portion of what's in the
attached patch (the wait event?), but I am not planning to bother much
with the stable branches based on the lack of complaints. If you
have an opinion about that, please feel free.
Thoughts?
--
Michael