Home > mailing lists

Re: Time delayed LR (WAS Re: logical replication restrictions) - Mailing list pgsql-hackers

From	Peter Smith
Subject	Re: Time delayed LR (WAS Re: logical replication restrictions)
Date	February 3, 2023 08:42:27
Msg-id	CAHut+PuctLJtkBO5MKntH6nzgF1b5AqeEJj6uOvxnQSQZ2pA3g@mail.gmail.com Whole thread Raw
In response to	Re: Time delayed LR (WAS Re: logical replication restrictions) (Amit Kapila <amit.kapila16@gmail.com>)
Responses	Re: Time delayed LR (WAS Re: logical replication restrictions)
List	pgsql-hackers

Tree view

On Fri, Feb 3, 2023 at 4:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Feb 3, 2023 at 6:41 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > On Thu, Feb 2, 2023 at 7:21 PM Takamichi Osumi (Fujitsu)
> > <osumi.takamichi@fujitsu.com> wrote:
> > >
> > ...
> > >
> > >
> > > > Besides, I am not sure it's a stable test to check the log. Is it possible that there's
> > > > no such log on a slow machine? I modified the code to sleep 1s at the beginning
> > > > of apply_dispatch(), then the new added test failed because the server log
> > > > cannot match.
> > > To get the log by itself is necessary to ensure
> > > that the delay is conducted by the apply worker, because we emit the diffms
> > > only if it's bigger than 0 in maybe_apply_delay(). If we omit the step,
> > > we are not sure the delay is caused by other reasons or the time-delayed feature.
> > >
> > > As you mentioned, it's possible that no log is emitted on slow machine. Then,
> > > the idea to make the test safer for such machines should be to make the delayed time longer.
> > > But we shortened the delay time to 1 second to mitigate the long test execution time of this TAP test.
> > > So, I'm not sure if it's a good idea to make it longer again.
> >
> > I think there are a couple of things that can be done about this problem:
> >
> > 1. If you need the code/test to remain as-is then at least the test
> > message could include some comforting text like "(this can fail on
> > slow machines when the delay time is already exceeded)" so then a test
> > failure will not cause undue alarm.
> >
> > 2. Try moving the DEBUG2 elog (in function maybe_apply_delay) so that
> > it will *always* log the remaining wait time even if that wait time
> > becomes negative. Then I think the test cases can be made
> > deterministic instead of relying on good luck. This seems like the
> > better option.
> >
>
> I don't understand why we have to do any of this instead of using 3s
> as min_apply_delay similar to what we are doing in
> src/test/recovery/t/005_replay_delay. Also, I think we should use
> exactly the same way to verify the test even though we want to keep
> the log level as DEBUG2 to check logs in case of any failures.
>

IIUC the reasons are due to conflicting requirements. e.g.
- A longer delay like 3s might work better for testing this feature, but OTOH
- A longer delay will also cause the whole BF execution to take longer

------
Kind Regards,
Peter Smith.
Fujitsu Australia.

pgsql-hackers by date:

From: Amit Kapila
Date: 03 February 2023, 08:37:53
Subject: Re: Time delayed LR (WAS Re: logical replication restrictions)

From: Nathan Bossart
Date: 03 February 2023, 08:48:08
Subject: Re: O(n) tasks cause lengthy startups and checkpoints

Re: Time delayed LR (WAS Re: logical replication restrictions) - Mailing list pgsql-hackers

Previous

Next