I wrote:
> Hmm ... desmoxytes has failed this test once, out of four runs since
> it went in:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=desmoxytes&dt=2021-06-19%2003%3A06%3A04
I studied this failure a bit more, and I think the test itself has
a race condition. It's doing
# freeze walsender and walreceiver. Slot will still be active, but walreceiver
# won't get anything anymore.
kill 'STOP', $senderpid, $receiverpid;
$logstart = get_log_size($node_primary3);
advance_wal($node_primary3, 4);
ok(find_in_log($node_primary3, "to release replication slot", $logstart),
"walreceiver termination logged");
The string it's looking for does show up in node_primary3's log, but
not for another second or so; we can see instances of the following
poll_query_until query before that happens. So the problem is that
there is no interlock to ensure that the walreceiver terminates
before this find_in_log check looks for it.
You should be able to fix this by adding a retry loop around the
find_in_log check (which would likely mean that you don't need
to do multiple advance_wal iterations here).
However, I agree with reverting the test for now and then trying
again after beta2.
regards, tom lane