RE: Time delayed LR (WAS Re: logical replication restrictions) - Mailing list pgsql-hackers
From | Takamichi Osumi (Fujitsu) |
---|---|
Subject | RE: Time delayed LR (WAS Re: logical replication restrictions) |
Date | |
Msg-id | TYCPR01MB837337242FF0F5C50B997E76EDD79@TYCPR01MB8373.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: Time delayed LR (WAS Re: logical replication restrictions) (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
RE: Time delayed LR (WAS Re: logical replication restrictions)
|
List | pgsql-hackers |
Hi, On Friday, February 3, 2023 2:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Fri, Feb 3, 2023 at 6:41 AM Peter Smith <smithpb2250@gmail.com> wrote: > > On Thu, Feb 2, 2023 at 7:21 PM Takamichi Osumi (Fujitsu) > > <osumi.takamichi@fujitsu.com> wrote: > > > > > ... > > > > Besides, I am not sure it's a stable test to check the log. Is it > > > > possible that there's no such log on a slow machine? I modified > > > > the code to sleep 1s at the beginning of apply_dispatch(), then > > > > the new added test failed because the server log cannot match. > > > To get the log by itself is necessary to ensure that the delay is > > > conducted by the apply worker, because we emit the diffms only if > > > it's bigger than 0 in maybe_apply_delay(). If we omit the step, we > > > are not sure the delay is caused by other reasons or the time-delayed > feature. > > > > > > As you mentioned, it's possible that no log is emitted on slow > > > machine. Then, the idea to make the test safer for such machines should > be to make the delayed time longer. > > > But we shortened the delay time to 1 second to mitigate the long test > execution time of this TAP test. > > > So, I'm not sure if it's a good idea to make it longer again. > > > > I think there are a couple of things that can be done about this problem: > > > > 1. If you need the code/test to remain as-is then at least the test > > message could include some comforting text like "(this can fail on > > slow machines when the delay time is already exceeded)" so then a test > > failure will not cause undue alarm. > > > > 2. Try moving the DEBUG2 elog (in function maybe_apply_delay) so that > > it will *always* log the remaining wait time even if that wait time > > becomes negative. Then I think the test cases can be made > > deterministic instead of relying on good luck. This seems like the > > better option. > > > > I don't understand why we have to do any of this instead of using 3s as > min_apply_delay similar to what we are doing in > src/test/recovery/t/005_replay_delay. Also, I think we should use exactly the > same way to verify the test even though we want to keep the log level as > DEBUG2 to check logs in case of any failures. OK, will try to make our tests similar to the tests in 005_replay_delay as much as possible. > Also, I don't see the need to add more tests like the ones below: > +# Test whether ALTER SUBSCRIPTION changes the delayed time of the apply > +worker # (1 day 5 minutes). Note that the extra 5 minute is to account > +for any # decoding/network overhead. > > Let's try to add tests similar to what we have for recovery_min_apply_delay > unless there is some functionality in this patch that is not there in the > recovery_min_apply_delay feature. The above command is a preparation part to check a behavior unique to time-delayed logical replication, which is to DISABLE a subscription causes the apply worker not to apply the suspended (delayed) transaction. So, it will be OK to have this test. Best Regards, Takamichi Osumi
pgsql-hackers by date: