On Sun, Aug 15, 2021 at 8:16 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Aug 13, 2021 at 05:59:21PM -0700, Soumyadeep Chakraborty wrote:
> > and passes with the code change, as expected. I can't explain why the
> > test doesn't freeze up in v3 in wait_for_catchup() at the end.
>
> It took me some some to understand why. If I am right, that's because
> of the intermediate test block working on $standby_2 and the two
> INSERT queries of the primary. In v1 and v4, we have no activity on
> the primary between the first set of tests and yours, meaning that
> $standby has nothing to do. In v2 and v3, the two INSERT queries run
> on the primary for the purpose of the recovery pause make $standby_1
> wait for the default value of recovery_min_apply_delay, aka 3s, in
> parallel. If the set of tests for $standby_2 is faster than that,
> we'd bump on the phase where the code still waited for 3s, not the 2
> hours set, visibly.
I see, thanks a lot for the explanation. Thanks to your investigation, I
can now kind of reuse some of the test mechanisms for the other patch that
I am working on [1]. There, we don't have multiple standbys getting in the
way, thankfully.
> After considering this stuff, the order dependency we'd introduce in
> this test makes the whole thing more brittle than it should. And such
> an edge case does not seem worth spending extra cycles testing anyway,
> as if things break we'd finish with a test stuck for an unnecessary
> long time by relying on wait_for_catchup("replay"). We could use
> something else, say based on a lookup of pg_stat_activity but this
> still requires extra run time for the wait phases needed. So at the
> end I have dropped the test, but backpatched the fix.
> --
Fair.
Regards,
Soumyadeep (VMware)
[1] https://www.postgresql.org/message-id/flat/CANXE4Tc3FNvZ_xAimempJWv_RH9pCvsZH7Yq93o1VuNLjUT-mQ%40mail.gmail.com