Re: [HACKERS] Measuring replay lag - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: [HACKERS] Measuring replay lag
Date
Msg-id CAHGQGwGANKWsH4jETZpucK7K0FZ8P70=9NEwgOJHPUzGxN0Z9A@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Measuring replay lag  (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses Re: [HACKERS] Measuring replay lag  (Thomas Munro <thomas.munro@enterprisedb.com>)
List pgsql-hackers
On Mon, Dec 19, 2016 at 8:13 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Mon, Dec 19, 2016 at 4:03 PM, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
>> On 11/22/16 4:27 AM, Thomas Munro wrote:
>>> Thanks very much for testing!  New version attached.  I will add this
>>> to the next CF.
>>
>> I don't see it there yet.
>
> Thanks for the reminder.  Added here:  https://commitfest.postgresql.org/12/920/
>
> Here's a rebased patch.

I agree that the capability to measure the remote_apply lag is very useful.
Also I want to measure the remote_write and remote_flush lags, for example,
in order to diagnose the cause of replication lag.

For that, what about maintaining the pairs of send-timestamp and LSN in
*sender side* instead of receiver side? That is, walsender adds the pairs
of send-timestamp and LSN into the buffer every sampling period.
Whenever walsender receives the write, flush and apply locations from
walreceiver, it calculates the write, flush and apply lags by comparing
the received and stored LSN and comparing the current timestamp and
stored send-timestamp.

As a bonus of this approach, we don't need to add the field into the replay
message that walreceiver can very frequently send back. Which might be
helpful in terms of networking overhead.

Regards,

-- 
Fujii Masao



pgsql-hackers by date:

Previous
From: Ants Aasma
Date:
Subject: Re: [HACKERS] Replication slot xmin is not reset if HS feedback isturned off while standby is shut down
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] Logical tape pause/resume