RE: Logical replication timeout problem - Mailing list pgsql-hackers

From wangw.fnst@fujitsu.com
Subject RE: Logical replication timeout problem
Date
Msg-id OS3PR01MB62756599B78FA7D4C908109A9E179@OS3PR01MB6275.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Logical replication timeout problem  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Logical replication timeout problem
List pgsql-hackers
On Mon, Mar 21, 2022 at 1:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
Thanks for your comments.

> On Fri, Mar 18, 2022 at 4:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Mar 18, 2022 at 10:43 AM wangw.fnst@fujitsu.com
> > <wangw.fnst@fujitsu.com> wrote:
> > >
> > > On Thu, Mar 17, 2022 at 7:52 PM Masahiko Sawada
> <sawada.mshk@gmail.com> wrote:
> > > >
> > >
> > > Attach the new patch.
> > >
> >
> > *
> >   case REORDER_BUFFER_CHANGE_INVALIDATION:
> > - /* Execute the invalidation messages locally */
> > - ReorderBufferExecuteInvalidations(
> > -   change->data.inval.ninvalidations,
> > -   change->data.inval.invalidations);
> > - break;
> > + {
> > + LogicalDecodingContext *ctx = rb->private_data;
> > +
> > + /* Try to send a keepalive message. */
> > + UpdateProgress(ctx, true);
> >
> > Calling UpdateProgress() here appears adhoc to me especially because
> > it calls OutputPluginUpdateProgress which appears to be called only
> > from plugin API. Am, I missing something? Also why the same handling
> > is missed in other similar messages like
> > REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID where we don't call
> any
> > plug-in API?
Yes, you are right.
And I invoke in case REORDER_BUFFER_CHANGE_INVALIDATION because I think every
DDL will modify the catalog then get into this case. So I only invoke function
UpdateProgress here to handle DDL.

> > I am not sure what is a good way to achieve this but one idea that
> > occurred to me was shall we invent a new callback
> > ReorderBufferSkipChangeCB similar to ReorderBufferApplyChangeCB and
> > then pgoutput can register its API where we can have the logic similar
> > to what you have in UpdateProgress()? If we do so, then all the
> > cuurent callers of UpdateProgress in pgoutput can also call that API.
> > What do you think?
> >
> Another idea could be that we leave the DDL case for now as anyway
> there is very less chance of timeout for skipping DDLs and we may
> later need to even backpatch this bug-fix which would be another
> reason to not make such invasive changes. We can handle the DDL case
> if required separately.
Yes, I think a new callback function would be nice.
Yes, as you said, maybe we could fix the usecase that found the problem in the
first place. Then make further modifications on the master branch.
Modify the patch. Currently only DML related code remains.

> > * Why don't you have a quick exit like below code in WalSndWriteData?
> > /* Try taking fast path unless we get too close to walsender timeout. */ if (now
> > < TimestampTzPlusMilliseconds(last_reply_timestamp,
> >   wal_sender_timeout / 2) &&
> > !pq_is_send_pending())
> > {
> > return;
> > }
Fixed. I missed this so adding it in the new patch.

> > *  Can we rename variable 'is_send' to 'change_sent'?
Improve the the name of this variable.(From 'is_send' to 'change_sent')

Attach the new patch. [suggestion by Amit-San.]
1. Remove DDL related code. Handle the DDL case later separately if need.
2. Fix a missing.(In function WalSndUpdateProgress)
3. Improve variable names. (From 'is_send' to 'change_sent')
4. Fix some comments.(Above and inside the function WalSndUpdateProgress.)

Regards,
Wang wei

Attachment

pgsql-hackers by date:

Previous
From: "houzj.fnst@fujitsu.com"
Date:
Subject: RE: logical replication empty transactions
Next
From: Tom Lane
Date:
Subject: Re: Frontend error logging style