RE: Time delayed LR (WAS Re: logical replication restrictions) - Mailing list pgsql-hackers

From Takamichi Osumi (Fujitsu)
Subject RE: Time delayed LR (WAS Re: logical replication restrictions)
Date
Msg-id TYCPR01MB83738F2BEF83DE525410E3ACEDD49@TYCPR01MB8373.jpnprd01.prod.outlook.com
Whole thread Raw
In response to RE: Time delayed LR (WAS Re: logical replication restrictions)  ("Takamichi Osumi (Fujitsu)" <osumi.takamichi@fujitsu.com>)
Responses Re: Time delayed LR (WAS Re: logical replication restrictions)
List pgsql-hackers
Hi,


On Friday, February 3, 2023 3:35 PM I wrote:
> On Friday, February 3, 2023 2:21 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > On Fri, Feb 3, 2023 at 6:41 AM Peter Smith <smithpb2250@gmail.com>
> wrote:
> > > On Thu, Feb 2, 2023 at 7:21 PM Takamichi Osumi (Fujitsu)
> > > <osumi.takamichi@fujitsu.com> wrote:
> > > ...
> > > > > Besides, I am not sure it's a stable test to check the log. Is
> > > > > it possible that there's no such log on a slow machine? I
> > > > > modified the code to sleep 1s at the beginning of
> > > > > apply_dispatch(), then the new added test failed because the server
> log cannot match.
> > > > To get the log by itself is necessary to ensure that the delay is
> > > > conducted by the apply worker, because we emit the diffms only if
> > > > it's bigger than 0 in maybe_apply_delay(). If we omit the step, we
> > > > are not sure the delay is caused by other reasons or the
> > > > time-delayed
> > feature.
> > > >
> > > > As you mentioned, it's possible that no log is emitted on slow
> > > > machine. Then, the idea to make the test safer for such machines
> > > > should
> > be to make the delayed time longer.
> > > > But we shortened the delay time to 1 second to mitigate the long
> > > > test
> > execution time of this TAP test.
> > > > So, I'm not sure if it's a good idea to make it longer again.
> > >
> > > I think there are a couple of things that can be done about this problem:
> > >
> > > 1. If you need the code/test to remain as-is then at least the test
> > > message could include some comforting text like "(this can fail on
> > > slow machines when the delay time is already exceeded)" so then a
> > > test failure will not cause undue alarm.
> > >
> > > 2. Try moving the DEBUG2 elog (in function maybe_apply_delay) so
> > > that it will *always* log the remaining wait time even if that wait
> > > time becomes negative. Then I think the test cases can be made
> > > deterministic instead of relying on good luck. This seems like the
> > > better option.
> > >
> >
> > I don't understand why we have to do any of this instead of using 3s
> > as min_apply_delay similar to what we are doing in
> > src/test/recovery/t/005_replay_delay. Also, I think we should use
> > exactly the same way to verify the test even though we want to keep
> > the log level as
> > DEBUG2 to check logs in case of any failures.
> OK, will try to make our tests similar to the tests in 005_replay_delay as much
> as possible.
I've updated the TAP test and made it aligned with 005_reply_delay.pl.

For coverage, I have the stream of in-progress transaction test case
and ALTER SUBSCRIPTION DISABLE behavior, which is unique to logical replication.
Also, conducted pgindent and pgperltidy. Note that the latter half of the
005_reply_delay.pl doesn't seem to match with the test for time-delayed logical replication
(e.g. promotion). So, I don't have those points.

Kindly have a look at the attached v27.


Best Regards,
    Takamichi Osumi


Attachment

pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: run pgindent on a regular basis / scripted manner
Next
From: "Takamichi Osumi (Fujitsu)"
Date:
Subject: RE: Time delayed LR (WAS Re: logical replication restrictions)