On Tue, Aug 13, 2019 at 2:20 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Tue, Aug 13, 2019 at 11:15:42AM +1200, Thomas Munro wrote:
> > One thing I noticed in passing is that you always get the same times
> > in the write_lag and flush_lag columns, in --synchronous mode, and the
> > times updates infrequently. That's not the case with regular
> > replicas; I suspect there is a difference in the time and frequency of
> > replies sent to the server, which I guess might make synchronous
> > commit a bit "lumpier", but I didn't dig further today.
>
> The messages are sent by pg_receivewal via sendFeedback() in
> receivelog.c. It gets triggered for the --synchronous case once a
> flush is done (but you are not surprised by my reply here, right!),
> and most likely the matches you are seeing some from the messages sent
> at the beginning of HandleCopyStream() where the flush and write
> LSNs are equal. This code behaves as I would expect based on your
> description and a read of the code I have just done to refresh my
> mind, but we may of course have some issues or potential
> improvements.
Right. For a replica server we call XLogWalRcvSendReply() after
writing, and then again inside XLogWalRcvFlush(). So the primary gets
to measure write_lag and flush_lag separately. If pg_receivewal just
sends one reply after flushing, then turning on --synchronous has the
effect of showing the flush lag in both write_lag and flush_lag
columns.
Of course those things aren't quite as independent as they should be
anyway, since the flush is blocking and therefore delays the next
write. <mind-reading-mode>That's why Simon probably wants to move the
flush to the WAL writer process, and Andres probably wants to change
the whole thing to use some kind of async IO[1].</mind-reading-mode>
[1] https://lwn.net/Articles/789024/
--
Thomas Munro
https://enterprisedb.com