Re: Exit walsender before confirming remote flush in logical replication - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: Exit walsender before confirming remote flush in logical replication
Date
Msg-id CAHGQGwGoZos=7G5eRUs3JyFqYhCNLuZMmDmxS-cWjS0R56Jvcg@mail.gmail.com
Whole thread
In response to Re: Exit walsender before confirming remote flush in logical replication  (Andres Freund <andres@anarazel.de>)
Responses Re: Exit walsender before confirming remote flush in logical replication
List pgsql-hackers
On Tue, Apr 7, 2026 at 12:32 AM Andres Freund <andres@anarazel.de> wrote:
> Failed on CI just now:
>
> https://cirrus-ci.com/task/6745359004729344?logs=test_world#L410
>
https://api.cirrus-ci.com/v1/artifact/task/6745359004729344/testrun/build/testrun/subscription/038_walsnd_shutdown_timeout/log/regress_log_038_walsnd_shutdown_timeout
>
> [14:58:26.146](0.066s) ok 3 - have walreceiver pid 13796
> ### Stopping node "publisher" using mode fast
> # Running: pg_ctl --pgdata
/home/postgres/postgres/build/testrun/subscription/038_walsnd_shutdown_timeout/data/t_038_walsnd_shutdown_timeout_publisher_data/pgdata
--modefast stop 
> waiting for server to shut
down...........................................................................................................................
failed
> pg_ctl: server does not shut down
> # pg_ctl stop failed: 256
> # Postmaster PID for node "publisher" is 3679
> [15:00:38.178](132.032s) Bail out!  pg_ctl stop failed

Thanks for reporting this!

From the CI results [1], the failure in 038_walsnd_shutdown_timeout.pl appears
to occur intermittently on FreeBSD. The failing case tests that, when both
physical and logical replication are in use with slotsync enabled and both are
stalled (walreceiver on the standby and the logical apply worker on
the subscriber are blocked), shutting down the primary completes due to
wal_sender_shutdown_timeout.

On FreeBSD, however, it seems that after the shutdown request, the physical
walsender can occasionally keep running, preventing shutdown from completing.
As a result, pg_ctl stop times out and the test fails.

I’ll investigate the cause. If it takes time to identify, I may temporarily
disable just this test case so it doesn’t block other development and testing,
then re-enable it once the issue is fixed.

Regards,

[1]
https://cirrus-ci.com/build/5134823678803968
https://cirrus-ci.com/build/5735329598013440
https://cirrus-ci.com/build/5917696627310592
https://cirrus-ci.com/build/5742460250357760

--
Fujii Masao



pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: DOCS: typo on CLUSTER page
Next
From: Peter Smith
Date:
Subject: Re: DOCS: typo on CLUSTER page