Re: Logical replication keepalive flood - Mailing list pgsql-hackers
| From | Kyotaro Horiguchi |
|---|---|
| Subject | Re: Logical replication keepalive flood |
| Date | |
| Msg-id | 20210607.162353.1202919828973013934.horikyota.ntt@gmail.com Whole thread Raw |
| In response to | Logical replication keepalive flood (Abbas Butt <abbas.butt@enterprisedb.com>) |
| Responses |
Re: Logical replication keepalive flood
|
| List | pgsql-hackers |
At Sat, 5 Jun 2021 16:08:00 +0500, Abbas Butt <abbas.butt@enterprisedb.com> wrote in
> Hi,
> I have observed the following behavior with PostgreSQL 13.3.
>
> The WAL sender process sends approximately 500 keepalive messages per
> second to pg_recvlogical.
> These keepalive messages are totally un-necessary.
> Keepalives should be sent only if there is no network traffic and a certain
> time (half of wal_sender_timeout) passes.
> These keepalive messages not only choke the network but also impact the
> performance of the receiver,
> because the receiver has to process the received message and then decide
> whether to reply to it or not.
> The receiver remains busy doing this activity 500 times a second.
I can reproduce the problem.
> On investigation it is revealed that the following code fragment in
> function WalSndWaitForWal in file walsender.c is responsible for sending
> these frequent keepalives:
>
> if (MyWalSnd->flush < sentPtr &&
> MyWalSnd->write < sentPtr &&
> !waiting_for_ping_response)
> WalSndKeepalive(false);
The immediate cause is pg_recvlogical doesn't send a reply before
sleeping. Currently it sends replies every 10 seconds intervals.
So the attached first patch stops the flood.
That said, I don't think it is not intended that logical walsender
sends keep-alive packets with such a high frequency. It happens
because walsender actually doesn't wait at all because it waits on
WL_SOCKET_WRITEABLE because the keep-alive packet inserted just before
is always pending.
So as the attached second, we should try to flush out the keep-alive
packets if possible before checking pg_is_send_pending().
Any one can "fix" the issue but I think each of them is reasonable by
itself.
Any thoughts, suggestions and/or opinions?
regareds.
--
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/bin/pg_basebackup/pg_recvlogical.c b/src/bin/pg_basebackup/pg_recvlogical.c
index 5efec160e8..4497ff1071 100644
--- a/src/bin/pg_basebackup/pg_recvlogical.c
+++ b/src/bin/pg_basebackup/pg_recvlogical.c
@@ -362,6 +362,10 @@ StreamLogicalLog(void)
goto error;
}
+ /* sned reply for all writes so far */
+ if (!flushAndSendFeedback(conn, &now))
+ goto error;
+
FD_ZERO(&input_mask);
FD_SET(PQsocket(conn), &input_mask);
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 109c723f4e..fcea56d1c1 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1469,6 +1469,9 @@ WalSndWaitForWal(XLogRecPtr loc)
/* Send keepalive if the time has come */
WalSndKeepaliveIfNecessary();
+ /* We may have queued a keep alive packet. flush it before sleeping. */
+ pq_flush_if_writable();
+
/*
* Sleep until something happens or we time out. Also wait for the
* socket becoming writable, if there's still pending output.
pgsql-hackers by date: