Re: Logical replication keepalive flood - Mailing list pgsql-hackers
| From | Kyotaro Horiguchi |
|---|---|
| Subject | Re: Logical replication keepalive flood |
| Date | |
| Msg-id | 20210609.171751.1579873424296912837.horikyota.ntt@gmail.com Whole thread Raw |
| In response to | Re: Logical replication keepalive flood (Kyotaro Horiguchi <horikyota.ntt@gmail.com>) |
| Responses |
Re: Logical replication keepalive flood
|
| List | pgsql-hackers |
At Wed, 9 Jun 2021 11:21:55 +0900, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
> The issue - if actually it is - we send a keep-alive packet before a
> quite short sleep.
>
> We really want to send it if the sleep gets long but we cannot predict
> that before entering a sleep.
>
> Let me think a little more on this..
After some investigation, I find out that the keepalives are sent
almost always after XLogSendLogical requests for the *next* record. In
most of the cases the record is not yet inserted at the request time
but insertd very soon (in 1-digit milliseconds). It doesn't seem to be
expected that that happens with such a high frequency when
XLogSendLogical is keeping up-to-date with the bleeding edge of WAL
records.
It is completely unpredictable when the next record comes, so we
cannot decide whether to send a keepalive or not at the current
timing.
Since we want to send a keepalive when we have nothing to send for a
while, it is a bit different to keep sending keepalives at some
intervals while the loop is busy.
As a possible solution, the attached patch splits the sleep into two
pieces. If the first sleep reaches the timeout then send a keepalive
then sleep for the remaining time. The first timeout is quite
arbitrary but keepalive of 4Hz at maximum doesn't look so bad to me.
Is it acceptable?
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 109c723f4e..49b3c0d4e2 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -105,6 +105,9 @@
*/
#define MAX_SEND_SIZE (XLOG_BLCKSZ * 16)
+/* Minimum idle time for sending an idle-time keepalive in milliseconds */
+#define KEEPALIVE_TIMEOUT 250
+
/* Array of WalSnds in shared memory */
WalSndCtlData *WalSndCtl = NULL;
@@ -244,7 +247,7 @@ static void WalSndKeepalive(bool requestReply);
static void WalSndKeepaliveIfNecessary(void);
static void WalSndCheckTimeOut(void);
static long WalSndComputeSleeptime(TimestampTz now);
-static void WalSndWait(uint32 socket_events, long timeout, uint32 wait_event);
+static int WalSndWait(uint32 socket_events, long timeout, uint32 wait_event);
static void WalSndPrepareWrite(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid, bool last_write);
static void WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid, bool last_write);
static void WalSndUpdateProgress(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid);
@@ -1428,19 +1431,6 @@ WalSndWaitForWal(XLogRecPtr loc)
if (got_STOPPING)
break;
- /*
- * We only send regular messages to the client for full decoded
- * transactions, but a synchronous replication and walsender shutdown
- * possibly are waiting for a later location. So, before sleeping, we
- * send a ping containing the flush location. If the receiver is
- * otherwise idle, this keepalive will trigger a reply. Processing the
- * reply will update these MyWalSnd locations.
- */
- if (MyWalSnd->flush < sentPtr &&
- MyWalSnd->write < sentPtr &&
- !waiting_for_ping_response)
- WalSndKeepalive(false);
-
/* check whether we're done */
if (loc <= RecentFlushPtr)
break;
@@ -1483,6 +1473,39 @@ WalSndWaitForWal(XLogRecPtr loc)
if (pq_is_send_pending())
wakeEvents |= WL_SOCKET_WRITEABLE;
+ /*
+ * We only send regular messages to the client for full decoded
+ * transactions, but a synchronous replication and walsender shutdown
+ * possibly are waiting for a later location. So, before sleeping, we
+ * send a ping containing the flush location. If the receiver is
+ * otherwise idle, this keepalive will trigger a reply. Processing the
+ * reply will update these MyWalSnd locations. If the sleep is shorter
+ * than KEEPALIVE_TIMEOUT milliseconds, we skip sending a keepalive to
+ * prevent it from getting too-frequent.
+ */
+ if (MyWalSnd->flush < sentPtr &&
+ MyWalSnd->write < sentPtr &&
+ !waiting_for_ping_response)
+ {
+ if (sleeptime > KEEPALIVE_TIMEOUT)
+ {
+ int r;
+
+ r = WalSndWait(wakeEvents, KEEPALIVE_TIMEOUT,
+ WAIT_EVENT_WAL_SENDER_WAIT_WAL);
+
+ if (r != 0)
+ continue;
+
+ sleeptime -= KEEPALIVE_TIMEOUT;
+ }
+
+ WalSndKeepalive(false);
+
+ if (pq_flush_if_writable() != 0)
+ WalSndShutdown();
+ }
+
WalSndWait(wakeEvents, sleeptime, WAIT_EVENT_WAL_SENDER_WAIT_WAL);
}
@@ -3136,15 +3159,18 @@ WalSndWakeup(void)
* composed of optional WL_SOCKET_WRITEABLE and WL_SOCKET_READABLE flags. Exit
* on postmaster death.
*/
-static void
+static int
WalSndWait(uint32 socket_events, long timeout, uint32 wait_event)
{
WaitEvent event;
+ int ret;
ModifyWaitEvent(FeBeWaitSet, FeBeWaitSetSocketPos, socket_events, NULL);
- if (WaitEventSetWait(FeBeWaitSet, timeout, &event, 1, wait_event) == 1 &&
- (event.events & WL_POSTMASTER_DEATH))
+ ret = WaitEventSetWait(FeBeWaitSet, timeout, &event, 1, wait_event);
+ if (ret == 1 && (event.events & WL_POSTMASTER_DEATH))
proc_exit(1);
+
+ return ret;
}
/*
pgsql-hackers by date: