Thread: Monitoring gaps in XLogWalRcvWrite() for the WAL receiver

Monitoring gaps in XLogWalRcvWrite() for the WAL receiver

From
Michael Paquier
Date:
Hi all,

While doing some monitoring of a replication setup for a stable
branch, I have been surprised by the fact that we have never tracked
WAL statistics for the WAL receiver in pg_stat_wal because we have
never bothered to update its code so as WAL stats are reported.  This
is relevant for the write and sync counts and timings.  On HEAD, this
information has been moved to pg_stat_io, but the stats reports happen
with the same routine (pgstat_report_wal in 15~, and pgstat_send_wal
in ~14).

As of f4694e0f35b2, the situation is better thanks to the addition of
a pgstat_report_wal() in the WAL receiver main loop, so we have some
data.  However, we are only able to gather the data for segment syncs
and initializations, not the writes themselves as these are managed by
an independent code path, XLogWalRcvWrite().

A second thing that lacks in XLogWalRcvWrite() is a wait event around
the pg_pwrite() call, which is useful as the WAL receiver is listed in
pg_stat_activity.  Note that it is possible to re-use the same wait
event as XLogWrite() for the WAL receiver, WAL_WRITE, because the WAL
receiver does not rely on the write and flush calls from xlog.c when
doing its work, and both have the same meaning, aka they write WAL.
The fsync calls use issue_xlog_fsync() and the segment inits happen in
XLogFileInit().

Perhaps there's a point in backpatching a portion of what's in the
attached patch (the wait event?), but I am not planning to bother much
with the stable branches based on the lack of complaints.  If you
have an opinion about that, please feel free.

Thoughts?
--
Michael

Attachment

Re: Monitoring gaps in XLogWalRcvWrite() for the WAL receiver

From
Bertrand Drouvot
Date:
Hi,

On Wed, Mar 05, 2025 at 12:35:26PM +0900, Michael Paquier wrote:
> Hi all,
> 
> While doing some monitoring of a replication setup for a stable
> branch, I have been surprised by the fact that we have never tracked
> WAL statistics for the WAL receiver in pg_stat_wal because we have
> never bothered to update its code so as WAL stats are reported.

Nice catch!

> This
> is relevant for the write and sync counts and timings.

Also for sync? sync looks fine as issue_xlog_fsync() is being called in 
XLogWalRcvFlush(), no?

> As of f4694e0f35b2, the situation is better thanks to the addition of
> a pgstat_report_wal() in the WAL receiver main loop, so we have some
> data.  However, we are only able to gather the data for segment syncs
> and initializations, not the writes themselves as these are managed by
> an independent code path, XLogWalRcvWrite().
> 
> A second thing that lacks in XLogWalRcvWrite() is a wait event around
> the pg_pwrite() call, which is useful as the WAL receiver is listed in
> pg_stat_activity.  Note that it is possible to re-use the same wait
> event as XLogWrite() for the WAL receiver, WAL_WRITE, because the WAL
> receiver does not rely on the write and flush calls from xlog.c when
> doing its work, and both have the same meaning, aka they write WAL.
> The fsync calls use issue_xlog_fsync() and the segment inits happen in
> XLogFileInit().
> 
> Perhaps there's a point in backpatching a portion of what's in the
> attached patch (the wait event?), but I am not planning to bother much
> with the stable branches based on the lack of complaints.

We're not emitting some statistics, so I think that it's hard for users to
complain about something they don't/can't see.

> If you
> have an opinion about that, please feel free.

I'm tempted to say that the wal receiver part of f4694e0f35b2 should be
backpatched as well as what you're doing here.

+               /*
+                * Measure I/O timing to write WAL data, for pg_stat_io.
+                */
+               start = pgstat_prepare_io_time(track_wal_io_timing);
+
+               pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE);
                byteswritten = pg_pwrite(recvFile, buf, segbytes, (off_t) startoff);
+               pgstat_report_wait_end();

Same logic as in XLogWrite() and I don't think there is a need for a 
dedicated wait event, so LGTM.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Monitoring gaps in XLogWalRcvWrite() for the WAL receiver

From
Michael Paquier
Date:
On Wed, Mar 05, 2025 at 08:04:44AM +0000, Bertrand Drouvot wrote:
> On Wed, Mar 05, 2025 at 12:35:26PM +0900, Michael Paquier wrote:
> Also for sync? sync looks fine as issue_xlog_fsync() is being called in
> XLogWalRcvFlush(), no?

Yes, we're OK for the sync data aggregated in the WAL receiver on
HEAD, as mentioned below, not in the back-branches.

> We're not emitting some statistics, so I think that it's hard for users to
> complain about something they don't/can't see.

One would see idle data in pg_stat_wal on a standby, so the lack of
data could be annoying, but I'm perhaps the only one who noticed
that..

> Same logic as in XLogWrite() and I don't think there is a need for a
> dedicated wait event, so LGTM.

Thanks.
--
Michael

Attachment

Re: Monitoring gaps in XLogWalRcvWrite() for the WAL receiver

From
Michael Paquier
Date:
On Wed, Mar 05, 2025 at 08:04:44AM +0000, Bertrand Drouvot wrote:
> On Wed, Mar 05, 2025 at 12:35:26PM +0900, Michael Paquier wrote:
>> Perhaps there's a point in backpatching a portion of what's in the
>> attached patch (the wait event?), but I am not planning to bother much
>> with the stable branches based on the lack of complaints.
>
> We're not emitting some statistics, so I think that it's hard for users to
> complain about something they don't/can't see.

Hmm, not exactly actually.  I've missed that ff99918c625a mentions
that WAL receiver was discarded on purpose, but this was still when
pgstats was not in shared memory and still file-based.  We scale much
better now.  It is not that difficult to make XLogWrite() hot enough
that it would trigger a lot of calls of pgstat_count_io_op_time() per
ms, either, like the WAL receiver, so as long as the timings are
behind track_wal_io_timing we're fine.

Let's do this at the end, without a backpatch.  At least we'll be anle
to get better IO metrics for replication setups.
--
Michael

Attachment