RE: Logical replication timeout problem - Mailing list pgsql-hackers

From wangw.fnst@fujitsu.com
Subject RE: Logical replication timeout problem
Date
Msg-id OS3PR01MB6275C67F14954E05CE5D04399E139@OS3PR01MB6275.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Logical replication timeout problem  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Logical replication timeout problem
List pgsql-hackers
On Thu, Mar 17, 2022 at 7:52 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
Thanks for your comments.

> On Thu, Mar 17, 2022 at 7:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Mar 17, 2022 at 12:27 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > >
> > > On Wed, Mar 16, 2022 at 7:38 PM Masahiko Sawada
> <sawada.mshk@gmail.com> wrote:
> > > >
> > > > After more thought, can we check only wal_sender_timeout without
> > > > skip-count? That is, in WalSndUpdateProgress(), if we have
> > > > received any reply from the subscriber in last (wal_sender_timeout
> > > > / 2), we don't need to do anything in terms of keep-alive. If not,
> > > > we do
> > > > ProcessRepliesIfAny() (and probably WalSndCheckTimeOut()?) then
> > > > WalSndKeepalivesIfNecessary(). That way, we can send keep-alive
> > > > messages every (wal_sender_timeout / 2). And since we don't call
> > > > them for every change, we would not need to worry about the overhead
> much.
> > > >
> > >
> > > But won't that lead to a call to GetCurrentTimestamp() for each
> > > change we skip? IIUC from previous replies that lead to a slight
> > > slowdown in previous tests of Wang-San.
> > >
> > If the above is true then I think we can use a lower skip_count say 10
> > along with a timeout mechanism to send keepalive message. This will
> > help us to alleviate the overhead Wang-San has shown.
> 
> Using both sounds reasonable to me. I'd like to see how much the overhead is
> alleviated by using skip_count 10 (or 100).
> 
> > BTW, I think there could be one other advantage of using
> > ProcessRepliesIfAny() (as you are suggesting) is that it can help to
> > release sync waiters if there are any. I feel that would be the case
> > for the skip_empty_transactions patch [1] which uses
> > WalSndUpdateProgress to send keepalive messages after skipping empty
> > transactions.
> 
> +1
I modified the patch according to your and Amit-San's suggestions.
In addition, after testing, I found that when the threshold is 10, it brings
slight overhead.
So I try to change it to 100, after testing, the results look good to me.
10  : 1.22%--UpdateProgress
100 : 0.16%--UpdateProgress

Please refer to attachment.

Attach the new patch.
1. Refactor the way to send keepalive messages.
   [suggestion by Sawada-San, Amit-San.]
2. Modify the value of flag is_send initialization to make it look more
   reasonable. [suggestion by Kuroda-San.]
3. Improve new function names.
   (From SendKeepaliveIfNecessary to UpdateProgress.)

Regards,
Wang wei

Attachment

pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Proposal: Support custom authentication methods using hooks
Next
From: "wangw.fnst@fujitsu.com"
Date:
Subject: RE: Logical replication timeout problem