Re: Logical replication keepalive flood - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | Re: Logical replication keepalive flood |
Date | |
Msg-id | 20210607.162353.1202919828973013934.horikyota.ntt@gmail.com Whole thread Raw |
In response to | Logical replication keepalive flood (Abbas Butt <abbas.butt@enterprisedb.com>) |
Responses |
Re: Logical replication keepalive flood
|
List | pgsql-hackers |
At Sat, 5 Jun 2021 16:08:00 +0500, Abbas Butt <abbas.butt@enterprisedb.com> wrote in > Hi, > I have observed the following behavior with PostgreSQL 13.3. > > The WAL sender process sends approximately 500 keepalive messages per > second to pg_recvlogical. > These keepalive messages are totally un-necessary. > Keepalives should be sent only if there is no network traffic and a certain > time (half of wal_sender_timeout) passes. > These keepalive messages not only choke the network but also impact the > performance of the receiver, > because the receiver has to process the received message and then decide > whether to reply to it or not. > The receiver remains busy doing this activity 500 times a second. I can reproduce the problem. > On investigation it is revealed that the following code fragment in > function WalSndWaitForWal in file walsender.c is responsible for sending > these frequent keepalives: > > if (MyWalSnd->flush < sentPtr && > MyWalSnd->write < sentPtr && > !waiting_for_ping_response) > WalSndKeepalive(false); The immediate cause is pg_recvlogical doesn't send a reply before sleeping. Currently it sends replies every 10 seconds intervals. So the attached first patch stops the flood. That said, I don't think it is not intended that logical walsender sends keep-alive packets with such a high frequency. It happens because walsender actually doesn't wait at all because it waits on WL_SOCKET_WRITEABLE because the keep-alive packet inserted just before is always pending. So as the attached second, we should try to flush out the keep-alive packets if possible before checking pg_is_send_pending(). Any one can "fix" the issue but I think each of them is reasonable by itself. Any thoughts, suggestions and/or opinions? regareds. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/bin/pg_basebackup/pg_recvlogical.c b/src/bin/pg_basebackup/pg_recvlogical.c index 5efec160e8..4497ff1071 100644 --- a/src/bin/pg_basebackup/pg_recvlogical.c +++ b/src/bin/pg_basebackup/pg_recvlogical.c @@ -362,6 +362,10 @@ StreamLogicalLog(void) goto error; } + /* sned reply for all writes so far */ + if (!flushAndSendFeedback(conn, &now)) + goto error; + FD_ZERO(&input_mask); FD_SET(PQsocket(conn), &input_mask); diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 109c723f4e..fcea56d1c1 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -1469,6 +1469,9 @@ WalSndWaitForWal(XLogRecPtr loc) /* Send keepalive if the time has come */ WalSndKeepaliveIfNecessary(); + /* We may have queued a keep alive packet. flush it before sleeping. */ + pq_flush_if_writable(); + /* * Sleep until something happens or we time out. Also wait for the * socket becoming writable, if there's still pending output.
pgsql-hackers by date: