Re: Exit walsender before confirming remote flush in logical replication - Mailing list pgsql-hackers

From Chao Li
Subject Re: Exit walsender before confirming remote flush in logical replication
Date
Msg-id 750545C3-A04C-4A62-9CF3-62AD91BD5104@gmail.com
Whole thread Raw
In response to Re: Exit walsender before confirming remote flush in logical replication  (Fujii Masao <masao.fujii@gmail.com>)
List pgsql-hackers

> On Apr 8, 2026, at 16:11, Fujii Masao <masao.fujii@gmail.com> wrote:
>
> On Wed, Apr 8, 2026 at 4:05 PM Chao Li <li.evan.chao@gmail.com> wrote:
>> I have some CF entries failed on this test case as well, so I tried to look into the problem.
>
> Thanks for working on this, much appreciated!
>
>
>> Once entering WalSndDone(), it might call pg_flush() and get stuck:
>> ```
>>        if (WalSndCaughtUp && sentPtr == replicatedPtr &&
>>                !pq_is_send_pending())
>>        {
>>                QueryCompletion qc;
>>
>>                /* Inform the standby that XLOG streaming is done */
>>                SetQueryCompletion(&qc, CMDTAG_COPY, 0);
>>                EndCommand(&qc, DestRemote, false);
>>                pq_flush();
>>
>>                proc_exit(0);
>> ```
>>
>> And once stuck, it will never get back to WalSndCheckShutdownTimeout(), so the new GUC timeout cannot rescue it.
>
> pq_flush() is called when WalSndCaughtUp && sentPtr == replicatedPtr
> && !pq_is_send_pending().
> Under these conditions, I was thinking that we can assume the kernel send
> buffer isn't full, so pq_flush() (i.e., the send() call) can copy the data
> without blocking and return immediately.
>
> I'm not very familiar with FreeBSD, but based on [1], I wonder if this
> assumption may not hold there, and pq_flush() could still block....
>
> Regards,
>
> [1] https://man.freebsd.org/cgi/man.cgi?unix(4)#BUFFERING
>
>> Due  to the local nature of the Unix-domain sockets, they do not imple-
>> ment send buffers.  The send(2) and write(2) families of system calls
>> attempt to write data to the receive buffer of the destination socket.
>
> --
> Fujii Masao

I don’t have a FreeBSD box to verify that directly. But the document you pointed out seems to state explicitly that, on
Unix-domainsockets, writes go directly to the peer’s receive buffer. If so, the assumption that “the kernel send buffer
isn’tfull” no longer really holds on FreeBSD. From this perspective, changing to non-blocking pq_flush_if_writable()
makessense to me. 

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Adding REPACK [concurrently]
Next
From: Imran Zaheer
Date:
Subject: Re: [WIP] Pipelined Recovery