Dear Hackers,
I think, I reproduced test fails. The test fails because walsender is in
waiting state in WalSndDoneImmediate -> ereport with the following stack (see
below). It seems, it tries to send the message to the replica and flush it, but
the replica is hung.
#0 0x00007a4b37f2a037 in epoll_wait
#1 0x000056855317a2e8 in WaitEventSetWaitBlock
#2 WaitEventSetWait
#3 0x0000568552feea8e in secure_write
#4 0x0000568552ff5666 in internal_flush_buffer
#5 0x0000568552ff5966 in internal_flush
#6 socket_flush ()
#7 socket_flush ()
#8 0x00005685532ff1b3 in send_message_to_frontend (edata=<optimized out>)
#9 EmitErrorReport ()
#10 0x00005685532ff6dd in errfinish
#11 0x000056855312cc9c in WalSndDoneImmediate () at walsender.c:3625
I would propose to remove the ereport call from WalSndDoneImmediate.
With best regards,
Vitaly
On 1/19/26 15:41, Fujii Masao wrote:
> On Sun, Jan 18, 2026 at 1:20 AM Andrey Silitskiy
> <a.silitskiy@postgrespro.ru> wrote:
>>
>> On Jan 9, 2026 at 10:04 AM Fujii Masao
>> <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> Why do we need to send a "done" message to the receiver here?
>>> Since delivery isn't guaranteed in immediate mode, it seems of limited
>>> value.
>>
>> It seems to me that it is better to send a message in cases where it is
>> possible, so as not to raise errors on the subscriber during a clean shutdown.
>> And when this is not possible, exit the process without waiting.
>>
>>> For the immediate mode, would it make sense to log that the walsender is
>>> terminating in immediate mode and that WAL replication may be incomplete,
>>> so users can more easily understand what happened?
>>
>> Added to the latest patch.
>
> Thanks for updating the patch!
>
> cfbot is reporting a test failure. Could you please look into it and
> fix the issue?
> https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F6234
>
> Regards,
>