Re: pg_stat_replication lag fields return non-NULL values even withNULL LSNs - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: pg_stat_replication lag fields return non-NULL values even withNULL LSNs
Date
Msg-id CA+hUKGKaB12rqG_WNpTwOB_-==v2Gs9ahbsVSsHOrMvmNh1Y4g@mail.gmail.com
Whole thread Raw
In response to Re: pg_stat_replication lag fields return non-NULL values even withNULL LSNs  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
On Tue, Aug 13, 2019 at 2:20 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Tue, Aug 13, 2019 at 11:15:42AM +1200, Thomas Munro wrote:
> > One thing I noticed in passing is that you always get the same times
> > in the write_lag and flush_lag columns, in --synchronous mode, and the
> > times updates infrequently.  That's not the case with regular
> > replicas; I suspect there is a difference in the time and frequency of
> > replies sent to the server, which I guess might make synchronous
> > commit a bit "lumpier", but I didn't dig further today.
>
> The messages are sent by pg_receivewal via sendFeedback() in
> receivelog.c.  It gets triggered for the --synchronous case once a
> flush is done (but you are not surprised by my reply here, right!),
> and most likely the matches you are seeing some from the messages sent
> at the beginning of HandleCopyStream() where the flush and write
> LSNs are equal.  This code behaves as I would expect based on your
> description and a read of the code I have just done to refresh my
> mind, but we may of course have some issues or potential
> improvements.

Right.  For a replica server we call XLogWalRcvSendReply() after
writing, and then again inside XLogWalRcvFlush().  So the primary gets
to measure write_lag and flush_lag separately.  If pg_receivewal just
sends one reply after flushing, then turning on --synchronous has the
effect of showing the flush lag in both write_lag and flush_lag
columns.

Of course those things aren't quite as independent as they should be
anyway, since the flush is blocking and therefore delays the next
write.  <mind-reading-mode>That's why Simon probably wants to move the
flush to the WAL writer process, and Andres probably wants to change
the whole thing to use some kind of async IO[1].</mind-reading-mode>

[1] https://lwn.net/Articles/789024/

-- 
Thomas Munro
https://enterprisedb.com



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Add "password_protocol" connection parameter to libpq
Next
From: Michael Paquier
Date:
Subject: Re: Regression test failure in regression test temp.sql