Hi,
On 2022-04-01 10:21:50 -0700, Andres Freund wrote:
> right now I am looking at a test added in the shmstats patch that's slow on
> CI, on windows only. Unfortunately the regress_log_* output is useless as-is
> to figure out where things hang.
<two hours of debugging later>
This turns out to be a problem somewhere in the tap testing infrastructure,
rather than the test itself. The slow thing wasn't anything the test did. All
the time is spent in an is(). To verify that, I added a bunch of
ok(1, "this is some long output to test a theory");
print_time();
a few tests before the slow test. And:
ok 7 - this is some long output to test a theory
# test theory 1: 0.000 sec
ok 8 - this is some long output to test a theory
# test theory 2: 0.000 sec
ok 9 - this is some long output to test a theory
# test theory 3: 40.484 sec
ok 10 - this is some long output to test a theory
# test theory 4: 0.001 sec
ok 11 - this is some long output to test a theory
# test theory 5: 0.000 sec
The problem also vanishes when running tests without PROVE_FLAGS=-j$something
What this looks like to me is that when running tests concurrently, the buffer
of the file descriptor used to report tap test output fills up. The blocked
test can only progress once prove gets around to reading from that fd,
presumably when another test finishes.
Gah. I want my time back.
I can't reproduce a similar issue on linux. But of course I'm using a newer
perl, and it's likely a timing dependent issue, so it's not guaranteed to be a
windows problem. But ...
Greetings,
Andres Freund