Re: Time delayed LR (WAS Re: logical replication restrictions) - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Time delayed LR (WAS Re: logical replication restrictions)
Date
Msg-id CAA4eK1K-JESJ_AfUtvNy+=rA4eXXRTz3o2O8TdVch9=r-7g1Pw@mail.gmail.com
Whole thread Raw
In response to Re: Time delayed LR (WAS Re: logical replication restrictions)  (Peter Smith <smithpb2250@gmail.com>)
Responses Re: Time delayed LR (WAS Re: logical replication restrictions)
RE: Time delayed LR (WAS Re: logical replication restrictions)
List pgsql-hackers
On Fri, Feb 3, 2023 at 6:41 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Thu, Feb 2, 2023 at 7:21 PM Takamichi Osumi (Fujitsu)
> <osumi.takamichi@fujitsu.com> wrote:
> >
> ...
> >
> >
> > > Besides, I am not sure it's a stable test to check the log. Is it possible that there's
> > > no such log on a slow machine? I modified the code to sleep 1s at the beginning
> > > of apply_dispatch(), then the new added test failed because the server log
> > > cannot match.
> > To get the log by itself is necessary to ensure
> > that the delay is conducted by the apply worker, because we emit the diffms
> > only if it's bigger than 0 in maybe_apply_delay(). If we omit the step,
> > we are not sure the delay is caused by other reasons or the time-delayed feature.
> >
> > As you mentioned, it's possible that no log is emitted on slow machine. Then,
> > the idea to make the test safer for such machines should be to make the delayed time longer.
> > But we shortened the delay time to 1 second to mitigate the long test execution time of this TAP test.
> > So, I'm not sure if it's a good idea to make it longer again.
>
> I think there are a couple of things that can be done about this problem:
>
> 1. If you need the code/test to remain as-is then at least the test
> message could include some comforting text like "(this can fail on
> slow machines when the delay time is already exceeded)" so then a test
> failure will not cause undue alarm.
>
> 2. Try moving the DEBUG2 elog (in function maybe_apply_delay) so that
> it will *always* log the remaining wait time even if that wait time
> becomes negative. Then I think the test cases can be made
> deterministic instead of relying on good luck. This seems like the
> better option.
>

I don't understand why we have to do any of this instead of using 3s
as min_apply_delay similar to what we are doing in
src/test/recovery/t/005_replay_delay. Also, I think we should use
exactly the same way to verify the test even though we want to keep
the log level as DEBUG2 to check logs in case of any failures.

Also, I don't see the need to add more tests like the ones below:
+# Test whether ALTER SUBSCRIPTION changes the delayed time of the apply worker
+# (1 day 5 minutes). Note that the extra 5 minute is to account for any
+# decoding/network overhead.

Let's try to add tests similar to what we have for
recovery_min_apply_delay unless there is some functionality in this
patch that is not there in the recovery_min_apply_delay feature.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Deadlock between logrep apply worker and tablesync worker
Next
From: Nathan Bossart
Date:
Subject: Re: Weird failure with latches in curculio on v15