Re: Logical replication keepalive flood - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Logical replication keepalive flood
Date
Msg-id CAA4eK1L2LUbx5Oo=+5tk34YgZWjNQjCp1xDrztA8m9KD=BS96w@mail.gmail.com
Whole thread Raw
In response to Re: Logical replication keepalive flood  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: Logical replication keepalive flood
List pgsql-hackers
On Fri, Jun 11, 2021 at 7:07 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Thu, 10 Jun 2021 12:18:00 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
> > Good analysis. I think this analysis has shown that walsender is
> > sending messages at top speed as soon as they are generated. So, I am
> > wondering why there is any need to wait/sleep in such a workload. One
> > possibility that occurred to me RecentFlushPtr is not updated and or
> > we are not checking it aggressively. To investigate on that lines, can
> > you check the behavior with the attached patch? This is just a quick
> > hack patch to test whether we need to really wait for WAL a bit
> > aggressively.
>
> Yeah, anyway the comment for the caller site of WalSndKeepalive tells
> that exiting out of the function *after* there is somewhat wrong.
>
> > * possibly are waiting for a later location. So, before sleeping, we
> > * send a ping containing the flush location. If the receiver is
>
> But I nothing changed by moving the keepalive check to after the exit
> check. (loc <= RecentFlushPtr).
>
> And the patch also doesn't change the situation so much. The average
> number of loops is reduced from 3 to 2 per call but the ratio between
> total records and keepalives doesn't change.
>
> previsous: A=#total-rec = 19476, B=#keepalive=3006, B/A = 0.154
> this time: A=#total-rec = 13208, B=#keepalive=1988, B/A = 0.151
>
> Total records: 13208
> reqsz: #sent/ #!sent/ #call: wr lag  / fl lag
>    8:     4 /     4 /     4:    6448 /  268148
>   16:     1 /     1 /     1:    8688 /  387320
>   24:  1988 /  1987 /  1999:    6357 /  226163
>  195:     1 /     0 /    20:     408 /    1647
> 7477:     2 /     0 /   244:      68 /     847
> 8225:     1 /     1 /     1:    7208 /    7208
>
> So I checked how many bytes RecentFlushPtr is behind requested loc if
> it is not advanced enough.
>
> Total records: 15128
> reqsz:  #sent/ #!sent/ #call: wr lag  / fl lag  / RecentFlushPtr lag
>     8:     2 /     2 /     2:     520 /   60640 /       8
>    16:     1 /     1 /     1:    8664 /   89336 /      16
>    24:  2290 /  2274 /  2302:    5677 /  230583 /      23
>   187:     1 /     0 /    40:       1 /    6118 /       1
>  7577:     1 /     0 /    69:     120 /    3733 /      65
>  8177:     1 /     1 /     1:    8288 /    8288 /    2673
>

Does this data indicate that when the request_size is 187 or 7577,
even though we have called WalSndWaitForWal() 40 and 69 times
respectively but keepalive is sent just once? Why such a behavior
should depend upon request size?

> So it's not a matter of RecentFlushPtr check. (Almost) Always when
> WalSndWakeupRequest feels to need to send a keepalive, the function is
> called before the record begins to be written.
>

I think we always wake up walsender after we have flushed the WAL via
WalSndWakeupProcessRequests(). I think here the reason why we are
seeing keepalives is that we always send it before sleeping. So, it
seems each time we try to read a new page, we call WalSndWaitForWal
which sends at least one keepalive message. I am not sure what is an
appropriate way to reduce the frequency of these keepalive messages.
Andres might have some ideas?

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Race condition in recovery?
Next
From: David Rowley
Date:
Subject: Use pg_nextpower2_* in a few more places