On 2024-12-12 Th 9:31 PM, Robins Tharakan wrote:
REL_17_STABLE failed on misc-recovery and one context I can add here is
that I triggered both REL_16_STABLE and REL_17_STABLE together and
were running neck-and-neck wherein v16 went past this test (in ~4 minutes)
and v17 (got stuck for ~10 min) and failed.
Is it possible that 2 concurrent runs (of different branches) could step on each other?
These are the logs that I captured, and v16 [2] / v17 [1] literally ran at the same
time (seconds apart).
v16 log:
alligator:REL_16_STABLE [12:24:42] running bin test scripts ...
alligator:REL_16_STABLE [12:25:28] running test misc-recovery ...
alligator:REL_16_STABLE [12:29:40] running test misc-subscription ...
v17 log:
alligator:REL_17_STABLE [12:25:09] running bin test psql ...
alligator:REL_17_STABLE [12:25:19] running bin test scripts ...
alligator:REL_17_STABLE [12:26:00] running test misc-recovery ...
alligator:REL_17_STABLE [12:37:15] failed at stage recoveryCheck
$
We actually have a good deal of protection against concurrent runs clobbering each other.
It's not clear to me if you're using "run_branches.pl --run-parallel" or not. If not, you might like to consider changing to that - it's the recommended way of doing concurrent runs. Apart from any other reason it removes the need for a lot of redundant git fetches. By default it staggers concurrent build starts by 60 seconds.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com