Network failure may prevent promotion - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Network failure may prevent promotion
Date
Msg-id 20231231.200741.1078989336605759878.horikyota.ntt@gmail.com
Whole thread Raw
In response to libpqsrv_connect_params should call ProcessWalRcvInterrupts  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: Network failure may prevent promotion
List pgsql-hackers
(Apology for resubmitting due to poor subject of the previous mail)
---
Hello.

We've noticed that when walreceiver is waiting for a connection to
complete, standby does not immediately respond to promotion
requests. In PG14, upon receiving a promotion request, walreceiver
terminates instantly, but in PG16, it waits for connection
timeout. This behavior is attributed to commit 728f86fec65, where a
part of libpqrcv_connect was simply replaced with a call to
libpqsrc_connect_params. This behavior can be verified by simply
dropping packets from the standby to the primary.

By a simple thought, in walreceiver, libpqsrv_connect_internal could
just call ProcessWalRcvInterrupts() instead of CHECK_FOR_INTERRUPTS(),
but this approach is quite ugly. Since ProcessWalRcvInterrupts()
originally calls CHECK_FOR_INTERRUPTS() and there are no standalone
calls to CHECK_FOR_INTERRUPTS() within walreceiver, I think it might
be better to use ProcDiePending instead of ShutdownRequestPending.  I
added a subset function of die() as the SIGTERM handler in walsender
in a crude patch attached.

What do you think about the issue, and the approach?

If there are no issues or objections with this method, I will continue
to refine this patch. For now, I plan to register it for the upcoming
commitfest.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: libpqsrv_connect_params should call ProcessWalRcvInterrupts
Next
From: Ivan Kush
Date:
Subject: Re: Autonomous transactions 2023, WIP