On Tue, Mar 03, 2026 at 09:17:16AM +0000, Hayato Kuroda (Fujitsu) wrote:
> Thanks for the info. So I can provide the patch after the issue for 009_twophase.pl
> is fixed. For better understanding we may be able to fork new
> thread.
Regarding your posted v4, I am actually not convinced that there is a
need for injection points and disabling standby snapshots, for the
three sequences of tests proposed.
While the first wait_for_replay_catchup() can be useful before the
teardown_node() of the primary in the "Check that prepared
transactions can be committed on promoted standby" sequence, it still
has a limited impact. It looks like we could have other parasite
records as well, depending on how slowly the primary is stopped? I
think that we should switch to a plain stop() of the primary, the test
wants to check that prepared transactions can be committed on a
standby. Stopping the primary abruptly does not matter for this
sequence.
For the second wait_for_replay_catchup(), after the PREPARE of
xact_009_11. I may be missing something but in how does it change
things? A plain stop() of the primary means that it would have
received all the WAL records from the primary on disk in its pg_wal,
no? Upon restart, it should replay everything it finds in pg_wal/. I
don't see a change required here.
For the third wait_for_replay_catchup(), after the PREPARE of
xact_009_12, same dance. The primary is cleanly stopped first. All
the WAL records of the primary should have been flushed to the
standby.
As a whole, it looks like we should just switch the teardown() call to
a stop() call in the first test with xact_009_10, backpatch it, and
call it a day. No need for injection points and no need for GUC
tweaks. I have not looked at 004_timeline_switch yet.
> I guess so. cluster::stop does the `pg_ctl stop -m fast` command. In this case
> the walsender waits till there are nothing to be sent, see WalSndLoop().
> Do let me know if you have observed the similar failure here.
Exactly. Doing a clean stop of the primary offers a strong guarantee
here. We are sure that the standby will have received all the records
from the primary. Timeline forking is an impossible thing in
012_subtransactions.pl based on how the switchover from the primary to
the standby happens. I don't see a need for tweaking this test at
all. Or perhaps you did see a failure of some kind in this test,
Alexander?
--
Michael