Thread: Monitoring gaps in XLogWalRcvWrite() for the WAL receiver
Hi all, While doing some monitoring of a replication setup for a stable branch, I have been surprised by the fact that we have never tracked WAL statistics for the WAL receiver in pg_stat_wal because we have never bothered to update its code so as WAL stats are reported. This is relevant for the write and sync counts and timings. On HEAD, this information has been moved to pg_stat_io, but the stats reports happen with the same routine (pgstat_report_wal in 15~, and pgstat_send_wal in ~14). As of f4694e0f35b2, the situation is better thanks to the addition of a pgstat_report_wal() in the WAL receiver main loop, so we have some data. However, we are only able to gather the data for segment syncs and initializations, not the writes themselves as these are managed by an independent code path, XLogWalRcvWrite(). A second thing that lacks in XLogWalRcvWrite() is a wait event around the pg_pwrite() call, which is useful as the WAL receiver is listed in pg_stat_activity. Note that it is possible to re-use the same wait event as XLogWrite() for the WAL receiver, WAL_WRITE, because the WAL receiver does not rely on the write and flush calls from xlog.c when doing its work, and both have the same meaning, aka they write WAL. The fsync calls use issue_xlog_fsync() and the segment inits happen in XLogFileInit(). Perhaps there's a point in backpatching a portion of what's in the attached patch (the wait event?), but I am not planning to bother much with the stable branches based on the lack of complaints. If you have an opinion about that, please feel free. Thoughts? -- Michael
Attachment
Hi, On Wed, Mar 05, 2025 at 12:35:26PM +0900, Michael Paquier wrote: > Hi all, > > While doing some monitoring of a replication setup for a stable > branch, I have been surprised by the fact that we have never tracked > WAL statistics for the WAL receiver in pg_stat_wal because we have > never bothered to update its code so as WAL stats are reported. Nice catch! > This > is relevant for the write and sync counts and timings. Also for sync? sync looks fine as issue_xlog_fsync() is being called in XLogWalRcvFlush(), no? > As of f4694e0f35b2, the situation is better thanks to the addition of > a pgstat_report_wal() in the WAL receiver main loop, so we have some > data. However, we are only able to gather the data for segment syncs > and initializations, not the writes themselves as these are managed by > an independent code path, XLogWalRcvWrite(). > > A second thing that lacks in XLogWalRcvWrite() is a wait event around > the pg_pwrite() call, which is useful as the WAL receiver is listed in > pg_stat_activity. Note that it is possible to re-use the same wait > event as XLogWrite() for the WAL receiver, WAL_WRITE, because the WAL > receiver does not rely on the write and flush calls from xlog.c when > doing its work, and both have the same meaning, aka they write WAL. > The fsync calls use issue_xlog_fsync() and the segment inits happen in > XLogFileInit(). > > Perhaps there's a point in backpatching a portion of what's in the > attached patch (the wait event?), but I am not planning to bother much > with the stable branches based on the lack of complaints. We're not emitting some statistics, so I think that it's hard for users to complain about something they don't/can't see. > If you > have an opinion about that, please feel free. I'm tempted to say that the wal receiver part of f4694e0f35b2 should be backpatched as well as what you're doing here. + /* + * Measure I/O timing to write WAL data, for pg_stat_io. + */ + start = pgstat_prepare_io_time(track_wal_io_timing); + + pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE); byteswritten = pg_pwrite(recvFile, buf, segbytes, (off_t) startoff); + pgstat_report_wait_end(); Same logic as in XLogWrite() and I don't think there is a need for a dedicated wait event, so LGTM. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Wed, Mar 05, 2025 at 08:04:44AM +0000, Bertrand Drouvot wrote: > On Wed, Mar 05, 2025 at 12:35:26PM +0900, Michael Paquier wrote: > Also for sync? sync looks fine as issue_xlog_fsync() is being called in > XLogWalRcvFlush(), no? Yes, we're OK for the sync data aggregated in the WAL receiver on HEAD, as mentioned below, not in the back-branches. > We're not emitting some statistics, so I think that it's hard for users to > complain about something they don't/can't see. One would see idle data in pg_stat_wal on a standby, so the lack of data could be annoying, but I'm perhaps the only one who noticed that.. > Same logic as in XLogWrite() and I don't think there is a need for a > dedicated wait event, so LGTM. Thanks. -- Michael
Attachment
On Wed, Mar 05, 2025 at 08:04:44AM +0000, Bertrand Drouvot wrote: > On Wed, Mar 05, 2025 at 12:35:26PM +0900, Michael Paquier wrote: >> Perhaps there's a point in backpatching a portion of what's in the >> attached patch (the wait event?), but I am not planning to bother much >> with the stable branches based on the lack of complaints. > > We're not emitting some statistics, so I think that it's hard for users to > complain about something they don't/can't see. Hmm, not exactly actually. I've missed that ff99918c625a mentions that WAL receiver was discarded on purpose, but this was still when pgstats was not in shared memory and still file-based. We scale much better now. It is not that difficult to make XLogWrite() hot enough that it would trigger a lot of calls of pgstat_count_io_op_time() per ms, either, like the WAL receiver, so as long as the timings are behind track_wal_io_timing we're fine. Let's do this at the end, without a backpatch. At least we'll be anle to get better IO metrics for replication setups. -- Michael