RE: Logical replication timeout problem - Mailing list pgsql-hackers

From wangw.fnst@fujitsu.com
Subject RE: Logical replication timeout problem
Date
Msg-id OS3PR01MB62751097A6FA5F3187A0EEFA9ED29@OS3PR01MB6275.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Logical replication timeout problem  (Amit Kapila <amit.kapila16@gmail.com>)
Responses RE: Logical replication timeout problem
List pgsql-hackers
On Fri, Jan 27, 2023 at 19:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Fri, Jan 27, 2023 at 5:18 PM houzj.fnst@fujitsu.com
> <houzj.fnst@fujitsu.com> wrote:
> >
> > On Wednesday, January 25, 2023 7:26 PM Amit Kapila
> <amit.kapila16@gmail.com>
> > >
> > > On Tue, Jan 24, 2023 at 8:15 AM wangw.fnst@fujitsu.com
> > > <wangw.fnst@fujitsu.com> wrote:
> > > >
> > > > Attach the new patch.
> > > >
> > >
> > > I think the patch missed to handle the case of non-transactional messages
> which
> > > was previously getting handled. I have tried to address that in the attached.
> Is
> > > there a reason that shouldn't be handled?
> >
> > Thanks for updating the patch!
> >
> > I thought about the non-transactional message. I think it seems fine if we
> > don’t handle it for timeout because such message is decoded via:
> >
> > WalSndLoop
> > -XLogSendLogical
> > --LogicalDecodingProcessRecord
> > ---logicalmsg_decode
> > ----ReorderBufferQueueMessage
> > -----rb->message() -- //maybe send the message or do nothing here.
> >
> > After invoking rb->message(), we will directly return to the main
> > loop(WalSndLoop) where we will get a chance to call
> > WalSndKeepaliveIfNecessary() to avoid the timeout.
> >
> 
> Valid point. But this means the previous handling of non-transactional
> messages was also redundant.

Thanks for the analysis, I think it makes sense. So I removed the handling of
non-transactional messages.

> > This is a bit different from transactional changes, because for transactional
> changes, we
> > will buffer them and then send every buffered change one by one(via
> > ReorderBufferProcessTXN) without going back to the WalSndLoop, so we
> don't get
> > a chance to send keepalive message if necessary, which is more likely to cause
> the
> > timeout problem.
> >
> > I will also test the non-transactional message for timeout in case I missed
> something.
> >
> 
> Okay, thanks. Please see if we can test a mix of transactional and
> non-transactional messages as well.

I tested a mix transaction of transactional and non-transactional messages on
the current HEAD and reproduced the timeout problem. I think this result is OK.
Because when decoding a transaction, non-transactional changes are processed
directly and the function WalSndKeepaliveIfNecessary is called, while
transactional changes are cached and processed after decoding. After decoding,
only transactional changes will be processed (in the function
ReorderBufferProcessTXN), so the timeout problem will still be reproduced.

After applying the v8 patch, the test mentioned above didn't reproduce the
timeout problem (Attach this test script 'test_with_nontransactional.sh').

Attach the new patch.

Regards,
Wang Wei

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: run pgindent on a regular basis / scripted manner
Next
From: Dean Rasheed
Date:
Subject: Bug #17759: GENERATED columns not computed during MERGE