Re: Logical replication keepalive flood - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Logical replication keepalive flood |
Date | |
Msg-id | CAA4eK1+6vYODWH4HHv+DODD=fACvnAosu0ivnP1w-wLWF=FqEw@mail.gmail.com Whole thread Raw |
In response to | Re: Logical replication keepalive flood (Kyotaro Horiguchi <horikyota.ntt@gmail.com>) |
Responses |
Re: Logical replication keepalive flood
|
List | pgsql-hackers |
On Mon, Jun 7, 2021 at 12:54 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Sat, 5 Jun 2021 16:08:00 +0500, Abbas Butt <abbas.butt@enterprisedb.com> wrote in > > Hi, > > I have observed the following behavior with PostgreSQL 13.3. > > > > The WAL sender process sends approximately 500 keepalive messages per > > second to pg_recvlogical. > > These keepalive messages are totally un-necessary. > > Keepalives should be sent only if there is no network traffic and a certain > > time (half of wal_sender_timeout) passes. > > These keepalive messages not only choke the network but also impact the > > performance of the receiver, > > because the receiver has to process the received message and then decide > > whether to reply to it or not. > > The receiver remains busy doing this activity 500 times a second. > > I can reproduce the problem. > > > On investigation it is revealed that the following code fragment in > > function WalSndWaitForWal in file walsender.c is responsible for sending > > these frequent keepalives: > > > > if (MyWalSnd->flush < sentPtr && > > MyWalSnd->write < sentPtr && > > !waiting_for_ping_response) > > WalSndKeepalive(false); > > The immediate cause is pg_recvlogical doesn't send a reply before > sleeping. Currently it sends replies every 10 seconds intervals. > Yeah, but one can use -s option to send it at lesser intervals. > So the attached first patch stops the flood. > I am not sure sending feedback every time before sleep is a good idea, this might lead to unnecessarily sending more messages. Can we try by using one-second interval with -s option to see how it behaves? As a matter of comparison the similar logic in workers.c uses wal_receiver_timeout to send such an update message rather than sending it every time before sleep. > That said, I don't think it is not intended that logical walsender > sends keep-alive packets with such a high frequency. It happens > because walsender actually doesn't wait at all because it waits on > WL_SOCKET_WRITEABLE because the keep-alive packet inserted just before > is always pending. > > So as the attached second, we should try to flush out the keep-alive > packets if possible before checking pg_is_send_pending(). > /* Send keepalive if the time has come */ WalSndKeepaliveIfNecessary(); + /* We may have queued a keep alive packet. flush it before sleeping. */ + pq_flush_if_writable(); We already call pq_flush_if_writable() from WalSndKeepaliveIfNecessary after sending the keep-alive message, so not sure how this helps? -- With Regards, Amit Kapila.
pgsql-hackers by date: