Re: Exit walsender before confirming remote flush in logical replication - Mailing list pgsql-hackers

From Vitaly Davydov
Subject Re: Exit walsender before confirming remote flush in logical replication
Date
Msg-id e25567b4-9893-48bf-ac17-0e884f1acef9@postgrespro.ru
Whole thread Raw
In response to Re: Exit walsender before confirming remote flush in logical replication  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: Exit walsender before confirming remote flush in logical replication
List pgsql-hackers
Dear Hackers,

I think, I reproduced test fails. The test fails because walsender is in
waiting state in WalSndDoneImmediate -> ereport with the following stack (see
below). It seems, it tries to send the message to the replica and flush it, but
the replica is hung.

#0  0x00007a4b37f2a037 in epoll_wait
#1  0x000056855317a2e8 in WaitEventSetWaitBlock
#2  WaitEventSetWait
#3  0x0000568552feea8e in secure_write
#4  0x0000568552ff5666 in internal_flush_buffer
#5  0x0000568552ff5966 in internal_flush
#6  socket_flush ()
#7  socket_flush ()
#8  0x00005685532ff1b3 in send_message_to_frontend (edata=<optimized out>)
#9  EmitErrorReport ()
#10 0x00005685532ff6dd in errfinish
#11 0x000056855312cc9c in WalSndDoneImmediate () at walsender.c:3625

I would propose to remove the ereport call from WalSndDoneImmediate.

With best regards,
Vitaly

On 1/19/26 15:41, Fujii Masao wrote:
> On Sun, Jan 18, 2026 at 1:20 AM Andrey Silitskiy
> <a.silitskiy@postgrespro.ru> wrote:
>>
>> On Jan 9, 2026 at 10:04 AM Fujii Masao
>> <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> Why do we need to send a "done" message to the receiver here?
>>> Since delivery isn't guaranteed in immediate mode, it seems of limited
>>> value.
>>
>> It seems to me that it is better to send a message in cases where it is
>> possible, so as not to raise errors on the subscriber during a clean shutdown.
>> And when this is not possible, exit the process without waiting.
>>
>>> For the immediate mode, would it make sense to log that the walsender is
>>> terminating in immediate mode and that WAL replication may be incomplete,
>>> so users can more easily understand what happened?
>>
>> Added to the latest patch.
> 
> Thanks for updating the patch!
> 
> cfbot is reporting a test failure. Could you please look into it and
> fix the issue?
> https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F6234
> 
> Regards,
> 




pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: meson: Allow disabling static libraries
Next
From: Alvaro Herrera
Date:
Subject: Re: log_min_messages per backend type