Re: Logical replication keepalive flood - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | Re: Logical replication keepalive flood |
Date | |
Msg-id | 20210608.100536.1154387425930082759.horikyota.ntt@gmail.com Whole thread Raw |
In response to | Re: Logical replication keepalive flood (Abbas Butt <abbas.butt@enterprisedb.com>) |
Responses |
Re: Logical replication keepalive flood
|
List | pgsql-hackers |
At Mon, 7 Jun 2021 15:26:05 +0500, Abbas Butt <abbas.butt@enterprisedb.com> wrote in > On Mon, Jun 7, 2021 at 3:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > The immediate cause is pg_recvlogical doesn't send a reply before > > > sleeping. Currently it sends replies every 10 seconds intervals. > > > > > > > Yeah, but one can use -s option to send it at lesser intervals. > > > > That option can impact pg_recvlogical, it will not impact the server > sending keepalives too frequently. > By default the status interval is 10 secs, still we are getting 500 > keepalives a second from the server. > > > > So the attached first patch stops the flood. > > > > > > > I am not sure sending feedback every time before sleep is a good idea, > > this might lead to unnecessarily sending more messages. Can we try by > > using one-second interval with -s option to see how it behaves? As a > > matter of comparison the similar logic in workers.c uses > > wal_receiver_timeout to send such an update message rather than > > sending it every time before sleep. Logical walreceiver sends a feedback when walrcv_eceive() doesn't receive a byte. If its' not good that pg_recvlogical does the same thing, do we need to improve logical walsender's behavior as well? > > > That said, I don't think it is not intended that logical walsender > > > sends keep-alive packets with such a high frequency. It happens > > > because walsender actually doesn't wait at all because it waits on > > > WL_SOCKET_WRITEABLE because the keep-alive packet inserted just before > > > is always pending. > > > > > > So as the attached second, we should try to flush out the keep-alive > > > packets if possible before checking pg_is_send_pending(). > > > > > > > /* Send keepalive if the time has come */ > > WalSndKeepaliveIfNecessary(); > > > > + /* We may have queued a keep alive packet. flush it before sleeping. */ > > + pq_flush_if_writable(); > > > > We already call pq_flush_if_writable() from WalSndKeepaliveIfNecessary > > after sending the keep-alive message, so not sure how this helps? No. WalSndKeepaliveIfNecessary calls it only when walreceiver does not receive a reply message for a long time. So the keepalive sent by the direct call to WalSndKeepalive() from WalSndWaitForWal is not flushed out in most cases, which causes the flood. I rechecked all callers of WalSndKeepalive(). WalSndKeepalive() +- *WalSndWaltForWal +- ProcessStandbyReplyMessage |+- ProcessStandbyMessage | +- ProcessRepliesIfAny | +- $WalSndWriteData | +- *WalSndWaitForWal | +- WalSndLoop | (calls pq_flush_if_writable() after sending the packet, but the | keepalive packet prevents following stream data from being sent | since the pending keepalive-packet causes pq_is_sned_pending() | return (falsely) true.) +- WalSndDone +- *WalSndLoop +- WalSndKeepaliveIfNecessary (calls pq_flush_if_writable always only after calling WalSndKeepalive()) The callers prefixed by '*' above misunderstand that some of the data sent by them are still pending even when the only pending bytes is the keepalive packet. Of course the keepalive pakcets should be sent *before* sleep and the unsent keepalive packet prevents the callers from sleeping then they immediately retry sending another keepalive pakcet and repeat it until the condition changes. (The callers prevised by "$" also enters a sleep before flushing but doesn't repeat sending keepalives.) The caller is forgetting that a keepalive pakcet may be queued but not flushed after calling WalSndKeepalive. So more sensible fix would be calling pq_flush_if_writable in WalSndKeepalive? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: