Re: Exit walsender before confirming remote flush in logical replication - Mailing list pgsql-hackers

From Vitaly Davydov
Subject Re: Exit walsender before confirming remote flush in logical replication
Date
Msg-id d31063db-ba90-4ce6-b6a4-cb9d92da7096@postgrespro.ru
Whole thread Raw
In response to Re: Exit walsender before confirming remote flush in logical replication  (Fujii Masao <masao.fujii@gmail.com>)
List pgsql-hackers
Hi Fujii-san,

Thank you for the testing.

On 3/25/26 15:39, Fujii Masao wrote:
 > I tested wal_sender_shutdown_timeout under several configurations and
 > encountered a case where the primary shutdown got stuck, even with the patch
 > and wal_sender_shutdown_timeout = 1. I'm not sure yet whether this is a bug in
 > the patch or an issue with my test setup, but anyway I'd like to share
 > the reproduction steps for reference.

It seems that the problem lies in the logic of calculating sleep time in
WalSndComputeSleeptime function. If the parameter wal_sender_timeout is set to
one hour and the function WalSndWait executes with an argument sleeptime = 1h,
then the variable shutdown_request_timestamp will only be updated after one
hour at next call of WalSndCheckShutdownTimeout immediately following the
waiting period completion.

May be to use the minimal timeout in WalSndComputeSleeptimes or to use
the timeouts mechanism (timeout.c), but WalSndWait should wake up on latch then.

With best regards,
Vitaly



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Don't synchronously wait for already-in-progress IO in read stream
Next
From: Fujii Masao
Date:
Subject: Re: pg_stat_replication.*_lag sometimes shows NULL during active replication