On Sun, Apr 17, 2022 at 08:56:33AM +1200, Thomas Munro wrote:
> Under valgrind I got "Undefined subroutine &main::usleep called at
> t/002_archiving.pl line 103" so I added "use Time::HiRes qw(usleep);",
> and now I get past the first 4 tests with your patch, but then
> promotion times out, not sure why:
>
> +++ tap check in src/test/recovery +++
> t/002_archiving.pl ..
> ok 1 - check content from archives
> ok 2 - archive_cleanup_command executed on checkpoint
> ok 3 - recovery_end_command not executed yet
> # found 00000002.history after 14 attempts
> ok 4 - recovery_end_command executed after promotion
> Bailout called. Further testing stopped: command "pg_ctl -D
> /home/tmunro/projects/postgresql/src/test/recovery/tmp_check/t_002_archiving_standby2_data/pgdata
> -l /home/tmunro/projects/postgresql/src/test/recovery/tmp_check/log/002_archiving_standby2.log
> promote" exited with value 1
Hmm. As far as I can see, aren't you just hitting the 60s timeout of
pg_ctl here due to the slowness of valgrind?
> Since it's quite painful to run TAP tests under valgrind, I found a
> place to stick a plain old sleep to repro these problems:
Actually, I am wondering how you are patching Cluster.pm to do that.
> Soon I'll push the fix to the slowness that xlogprefetcher.c
> accidentally introduced to continuous archive recovery, ie the problem
> of calling a failing restore_command repeatedly as we approach the end
> of a WAL segment instead of just once every 5 seconds after we run out
> of data, and after that you'll probably need to revert that fix
> locally to repro this.
Okay. Thanks. Anyway, I'll do something about that tomorrow (no
room to look at the buildfarm today), and I was thinking about
replacing the while loop I had in the last version of the patch with a
poll_query_until that does a pg_stat_file() with an absolute path to
the history file to avoid the dependency to usleep() in the test,
splitting the fix into two commits as there is more than one problem,
each applying to different branches.
--
Michael