Thread: pg_stat_replication lag fields return non-NULL values even with NULLLSNs

pg_stat_replication lag fields return non-NULL values even with NULLLSNs

From
Michael Paquier
Date:
Hi all,
(Thomas in CC as per 6912acc0)

I got surprised by the following behavior from pg_stat_get_wal_senders
when connecting for example pg_receivewal to a primary:
=# select application_name, flush_lsn, replay_lsn, flush_lag,
replay_lag from pg_stat_replication;
 application_name | flush_lsn | replay_lsn |    flush_lag    |    replay_lag
------------------+-----------+------------+-----------------+-----------------
 receivewal       | null      | null       | 00:09:13.578185 | 00:09:13.578185
(1 row)

It makes little sense to me, as we are reporting a replay lag on a
position which has never been reported yet, so it cannot actually be
used as a comparison base for the lag.  Am I missing something or
should we return NULL for those fields if we have no write, flush or
apply LSNs like in the attached?

Thoughts?
--
Michael

Attachment

Re: pg_stat_replication lag fields return non-NULL values even withNULL LSNs

From
Thomas Munro
Date:
On Wed, Jul 17, 2019 at 1:52 PM Michael Paquier <michael@paquier.xyz> wrote:
> I got surprised by the following behavior from pg_stat_get_wal_senders
> when connecting for example pg_receivewal to a primary:
> =# select application_name, flush_lsn, replay_lsn, flush_lag,
> replay_lag from pg_stat_replication;
>  application_name | flush_lsn | replay_lsn |    flush_lag    |    replay_lag
> ------------------+-----------+------------+-----------------+-----------------
>  receivewal       | null      | null       | 00:09:13.578185 | 00:09:13.578185
> (1 row)
>
> It makes little sense to me, as we are reporting a replay lag on a
> position which has never been reported yet, so it cannot actually be
> used as a comparison base for the lag.  Am I missing something or
> should we return NULL for those fields if we have no write, flush or
> apply LSNs like in the attached?

Hmm.  It's working as designed, but indeed it's not very newsworthy
information in this case.  If you run pg_receivewal --synchronous then
you get sensible looking flush_lag times.  Without that, flush_lag
only goes up, and of course replay_lag only goes up, so although it's
telling the truth, I think your proposal makes sense.

One question I had is what would happen with your patch without
--synchronous, once it flushes a whole file and opens a new one; I
wondered if your new boring-information-hiding behaviour would stop
working after one segment file because of that.  I tested that and the
column remains NULL when we move to a new file, so that's good.

One thing I noticed in passing is that you always get the same times
in the write_lag and flush_lag columns, in --synchronous mode, and the
times updates infrequently.  That's not the case with regular
replicas; I suspect there is a difference in the time and frequency of
replies sent to the server, which I guess might make synchronous
commit a bit "lumpier", but I didn't dig further today.

-- 
Thomas Munro
https://enterprisedb.com



Re: pg_stat_replication lag fields return non-NULL values even withNULL LSNs

From
Michael Paquier
Date:
On Tue, Aug 13, 2019 at 11:15:42AM +1200, Thomas Munro wrote:
> Hmm.  It's working as designed, but indeed it's not very newsworthy
> information in this case.  If you run pg_receivewal --synchronous then
> you get sensible looking flush_lag times.  Without that, flush_lag
> only goes up, and of course replay_lag only goes up, so although it's
> telling the truth, I think your proposal makes sense.

Thanks!

> One question I had is what would happen with your patch without
> --synchronous, once it flushes a whole file and opens a new one; I
> wondered if your new boring-information-hiding behaviour would stop
> working after one segment file because of that.

Indeed.

> I tested that and the column remains NULL when we move to a new
> file, so that's good.

Thanks for looking.

> One thing I noticed in passing is that you always get the same times
> in the write_lag and flush_lag columns, in --synchronous mode, and the
> times updates infrequently.  That's not the case with regular
> replicas; I suspect there is a difference in the time and frequency of
> replies sent to the server, which I guess might make synchronous
> commit a bit "lumpier", but I didn't dig further today.

The messages are sent by pg_receivewal via sendFeedback() in
receivelog.c.  It gets triggered for the --synchronous case once a
flush is done (but you are not surprised by my reply here, right!),
and most likely the matches you are seeing some from the messages sent
at the beginning of HandleCopyStream() where the flush and write
LSNs are equal.  This code behaves as I would expect based on your
description and a read of the code I have just done to refresh my
mind, but we may of course have some issues or potential
improvements.
--
Michael

Attachment

Re: pg_stat_replication lag fields return non-NULL values even withNULL LSNs

From
Thomas Munro
Date:
On Tue, Aug 13, 2019 at 2:20 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Tue, Aug 13, 2019 at 11:15:42AM +1200, Thomas Munro wrote:
> > One thing I noticed in passing is that you always get the same times
> > in the write_lag and flush_lag columns, in --synchronous mode, and the
> > times updates infrequently.  That's not the case with regular
> > replicas; I suspect there is a difference in the time and frequency of
> > replies sent to the server, which I guess might make synchronous
> > commit a bit "lumpier", but I didn't dig further today.
>
> The messages are sent by pg_receivewal via sendFeedback() in
> receivelog.c.  It gets triggered for the --synchronous case once a
> flush is done (but you are not surprised by my reply here, right!),
> and most likely the matches you are seeing some from the messages sent
> at the beginning of HandleCopyStream() where the flush and write
> LSNs are equal.  This code behaves as I would expect based on your
> description and a read of the code I have just done to refresh my
> mind, but we may of course have some issues or potential
> improvements.

Right.  For a replica server we call XLogWalRcvSendReply() after
writing, and then again inside XLogWalRcvFlush().  So the primary gets
to measure write_lag and flush_lag separately.  If pg_receivewal just
sends one reply after flushing, then turning on --synchronous has the
effect of showing the flush lag in both write_lag and flush_lag
columns.

Of course those things aren't quite as independent as they should be
anyway, since the flush is blocking and therefore delays the next
write.  <mind-reading-mode>That's why Simon probably wants to move the
flush to the WAL writer process, and Andres probably wants to change
the whole thing to use some kind of async IO[1].</mind-reading-mode>

[1] https://lwn.net/Articles/789024/

-- 
Thomas Munro
https://enterprisedb.com