Re: BUG: Former primary node might stuck when started as a standby - Mailing list pgsql-hackers

From Alexander Lakhin
Subject Re: BUG: Former primary node might stuck when started as a standby
Date
Msg-id 9d4abbe2-95aa-47e0-9ce2-842196a662a7@gmail.com
Whole thread Raw
In response to Re: BUG: Former primary node might stuck when started as a standby  (Michael Paquier <michael@paquier.xyz>)
Responses Re: BUG: Former primary node might stuck when started as a standby
List pgsql-hackers
Hello Michael,

04.03.2026 07:31, Michael Paquier wrote
I guess so. cluster::stop does the `pg_ctl stop -m fast` command. In this case
the walsender waits till there are nothing to be sent, see WalSndLoop().
Do let me know if you have observed the similar failure here.
Exactly.  Doing a clean stop of the primary offers a strong guarantee
here.  We are sure that the standby will have received all the records
from the primary.  Timeline forking is an impossible thing in
012_subtransactions.pl based on how the switchover from the primary to
the standby happens.  I don't see a need for tweaking this test at
all.  Or perhaps you did see a failure of some kind in this test,
Alexander?


Yes, 012_subtransactions doesn't fail with aggressive bgwriter, as I noted
before. I mentioned it exactly to show that stop does matter here. But if
we recognize teardown_node in this context as risky, maybe it would make
sense to review also other tests in recovery/. I already wrote about
004_timeline_switch, but probably there are more. E.g., 028_pitr_timelines
(I haven't tested it intensively yet) does:
$node_primary->stop('immediate');

# Promote the standby, and switch WAL so that it archives a WAL segment
# that contains all the INSERTs, on a new timeline.
$node_standby->promote;

Best regards,
Alexander

pgsql-hackers by date:

Previous
From: Anthonin Bonnefoy
Date:
Subject: Re: Don't keep closed WAL segment in page cache after replay
Next
From:
Date:
Subject: [BUG + PATCH] DSA pagemap out-of-bounds in make_new_segment odd-sized path