Hi,
On 2022-11-08 01:16:09 +1300, Thomas Munro wrote:
> So [1] on its own didn't fix this. My next guess is that the attached
> might help.
>
> Hmm. Following Michael's clue that this might involve log files and
> pg_ctl, I noticed one thing: pg_ctl implements
> wait_for_postmaster_stop() by waiting for kill(pid, 0) to fail, and
> our kill emulation does CallNamedPipe(). If the server is in the
> process of exiting and the kernel is cleaning up all the handles we
> didn't close, is there any reason to expect the signal pipe to be
> closed after the log file?
What is our plan here? This afaict is the most common "false positive" for
cfbot in the last weeks.
E.g.:
https://api.cirrus-ci.com/v1/artifact/task/5462686092230656/testrun/build/testrun/pg_upgrade/002_pg_upgrade/log/regress_log_002_pg_upgrade
...
[00:02:58.761](93.859s) ok 10 - run of pg_upgrade for new instance
[00:02:58.808](0.047s) not ok 11 - pg_upgrade_output.d/ removed after pg_upgrade success
[00:02:58.815](0.007s) # Failed test 'pg_upgrade_output.d/ removed after pg_upgrade success'
# at C:/cirrus/src/bin/pg_upgrade/t/002_pg_upgrade.pl line 288.
Michael:
Why does 002_pg_upgrade.pl try to filter the list of files in
pg_upgrade_output.d for files ending in .log? And why does it print those
only after starting the new node?
How about moving the iteration through the pg_upgrade_output.d to before the
->start and printing all the files, but only slurp_file() if the filename ends
with *.log?
Minor nit: It seems off to quite so many copies of
$newnode->data_dir . "/pg_upgrade_output.d"
particularly where the test defines $log_path, but then still builds
it from scratch after (line 304).
Greetings,
Andres Freund