Re: Exit walsender before confirming remote flush in logical replication - Mailing list pgsql-hackers

From Chao Li
Subject Re: Exit walsender before confirming remote flush in logical replication
Date
Msg-id DF779135-64BA-421A-B835-8E815399BEC3@gmail.com
Whole thread
In response to Re: Exit walsender before confirming remote flush in logical replication  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: Exit walsender before confirming remote flush in logical replication
List pgsql-hackers

> On Apr 23, 2026, at 12:51, Fujii Masao <masao.fujii@gmail.com> wrote:
>
> On Wed, Apr 22, 2026 at 3:32 AM Fujii Masao <masao.fujii@gmail.com> wrote:
>> Therefore, since replacing pq_flush() with pq_flush_if_writable() seems to
>> change behavior only in a limited and acceptable way, I'm thinking to create
>> the patch doing that replacement.
>
> On second thought, replacing pq_flush() with pq_flush_if_writable() is not
> sufficient. EndCommand(), which WalSndDone() calls before pq_flush(), can also
> block when the send buffer is full. That happens because EndCommand() uses
> pq_putmessage() rather than pq_putmessage_noblock().
>
> Also, replacing pq_flush() with pq_flush_if_writable() would cause walsender to
> give up sending pending messages, including CommandComplete, even before
> wal_sender_shutdown_timeout expires. That seems a bit odd. I think it is better
> for walsender to continue honoring wal_sender_shutdown_timeout while attempting
> to send the final CommandComplete message.
>
> I've attached a patch that addresses both issues. For the first, it introduces
> EndCommandExtended(), which allows CommandComplete to be queued with
> pq_putmessage_noblock(). For the second, it updates WalSndDone() to use
> ProcessPendingWrites() instead of pq_flush(), so the walsender write loop can
> continue processing replies and checking replication and shutdown timeouts
> while pending output is being flushed.
>
> Thoughts?
>
> Regards,
>
> --
> Fujii Masao
> <v1-0001-Avoid-blocking-indefinitely-while-finishing-walse.patch>

```
-        EndCommand(&qc, DestRemote, false);
-        pq_flush();
+        EndCommandExtended(&qc, DestRemote, false, true);
+        shutdown_stream_done_queued = true;
+
+        /*
+         * Don't call pq_flush() here. It can block indefinitely waiting for
+         * the socket to become writeable, which would prevent
+         * wal_sender_shutdown_timeout from being enforced. Use the regular
+         * walsender non-blocking flush path so shutdown and replication
+         * timeouts continue to be checked while waiting for the send buffer
+         * to drain.
+         */
+        ProcessPendingWrites();
```

I think adding EndCommandExtended() with a “nonblock” parameter is good. However, I have a suspicion replacing pg_flush
withProcessPendingWrites(). 

ProcessPendingWrites() calls ProcessRepliesIfAny() in the first place, so if it is possible that, a new COPY message is
appendedafter the already-queued CommandComplete? Which seems to violate the protocol, but I am not sure if that would
leadto any trouble. 

So, maybe we need a new helper, say ProcessPendingWritesForShutdown(), that loops while pq_is_send_pending(), call
WalSndCheckShutdownTimeout()and only wait for WL_SOCKET_WRITEABLE, then pq_flush_if_writable(), on flush failure, maybe
WalSndShutdown().

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







pgsql-hackers by date:

Previous
From: jian he
Date:
Subject: FOR PORTION OF gram.y target_location seems wrong
Next
From: David Rowley
Date:
Subject: Re: [PATCH] Fix hashed ScalarArrayOp semantics for NULL LHS with non-strict comparators