Hi,
On 2022-06-21 17:22:05 +1200, Thomas Munro wrote:
> Problem: I saw 031_recovery_conflict.pl time out while waiting for a
> buffer pin conflict, but so far once only, on CI:
>
> https://cirrus-ci.com/task/5956804860444672
>
> timed out waiting for match: (?^:User was holding shared buffer pin
> for too long) at t/031_recovery_conflict.pl line 367.
>
> Hrmph. Still trying to reproduce that, which may be a bug in this
> patch, a bug in the test or a pre-existing problem. Note that
> recovery didn't say something like:
>
> 2022-06-21 17:05:40.931 NZST [57674] LOG: recovery still waiting
> after 11.197 ms: recovery conflict on buffer pin
>
> (That's what I'd expect to see in
>
https://api.cirrus-ci.com/v1/artifact/task/5956804860444672/log/src/test/recovery/tmp_check/log/031_recovery_conflict_standby.log
> if the startup process had decided to send the signal).
>
> ... so it seems like the problem in that run is upstream of the interrupt stuff.
Odd. The only theory I have so far is that the manual vacuum on the primary
somehow decided to skip the page, and thus didn't trigger a conflict. Because
clearly replay progressed past the records of the VACUUM. Perhaps we should
use VACUUM VERBOSE? In contrast to pg_regress tests that should be
unproblematic?
Greetings,
Andres Freund