On Wed, Apr 7, 2022 at 1:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Wed, Apr 6, 2022 at 6:30 PM wangw.fnst@fujitsu.com
> <wangw.fnst@fujitsu.com> wrote:
> >
> > On Wed, Apr 6, 2022 at 1:58 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Wed, Apr 6, 2022 at 4:32 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > Also, let's try to evaluate how it impacts lag functionality for large
> transactions?
> > I think this patch will not affect lag functionality. It will updates the lag
> > field of view pg_stat_replication more frequently.
> > IIUC, when invoking function WalSndUpdateProgress, it will store the lsn of
> > change and invoking time in lag_tracker. Then when invoking function
> > ProcessStandbyReplyMessage, it will calculate the lag field according to the
> > message from subscriber and the information in lag_tracker. This patch does
> > not modify this logic, but only increases the frequency of invoking.
> > Please let me know if I understand wrong.
> >
>
> No, your understanding seems correct to me. But what I want to check
> is if calling the progress function more often has any impact on
> lag-related fields in pg_stat_replication? I think you need to check
> the impact of large transaction replay.
Thanks for the explanation.
After doing some checks, I found that the v13 patch makes the calculations of
lag functionality inaccurate.
In short, v13 patch lets us try to track lag more frequently and try to send a
keepalive message to subscribers. But in order to prevent flooding the lag
tracker, we could not track lag more than once within
WALSND_LOGICAL_LAG_TRACK_INTERVAL_MS (see function WalSndUpdateProgress).
This means we may lose informations that needs to be tracked.
For example, suppose there is a large transaction with lsn from lsn1 to lsn3.
In HEAD, when we calculate the lag time for lsn3, the lag time of lsn3 is
(now - lsn3.time).
But with v13 patch, when we calculate the lag time for lsn3, because there
maybe no informations of lsn3 but has informations of lsn2 in lag_tracker, the
lag time of lsn3 is (now - t2.time). (see function LagTrackerRead)
Therefore, if we lose the informations that need to be tracked, the lag time
becomes large and inaccurate.
So I skip tracking lag during a transaction just like the current HEAD.
Attach the new patch.
Regards,
Wang wei