Re: Logical replication keepalive flood - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Logical replication keepalive flood
Date
Msg-id 20210608.100536.1154387425930082759.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: Logical replication keepalive flood  (Abbas Butt <abbas.butt@enterprisedb.com>)
Responses Re: Logical replication keepalive flood
List pgsql-hackers
At Mon, 7 Jun 2021 15:26:05 +0500, Abbas Butt <abbas.butt@enterprisedb.com> wrote in 
> On Mon, Jun 7, 2021 at 3:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > The immediate cause is pg_recvlogical doesn't send a reply before
> > > sleeping. Currently it sends replies every 10 seconds intervals.
> > >
> >
> > Yeah, but one can use -s option to send it at lesser intervals.
> >
> 
> That option can impact pg_recvlogical, it will not impact the server
> sending keepalives too frequently.
> By default the status interval is 10 secs, still we are getting 500
> keepalives a second from the server.
>
> > > So the attached first patch stops the flood.
> > >
> >
> > I am not sure sending feedback every time before sleep is a good idea,
> > this might lead to unnecessarily sending more messages. Can we try by
> > using one-second interval with -s option to see how it behaves? As a
> > matter of comparison the similar logic in workers.c uses
> > wal_receiver_timeout to send such an update message rather than
> > sending it every time before sleep.

Logical walreceiver sends a feedback when walrcv_eceive() doesn't
receive a byte.  If its' not good that pg_recvlogical does the same
thing, do we need to improve logical walsender's behavior as well?

> > > That said, I don't think it is not intended that logical walsender
> > > sends keep-alive packets with such a high frequency.  It happens
> > > because walsender actually doesn't wait at all because it waits on
> > > WL_SOCKET_WRITEABLE because the keep-alive packet inserted just before
> > > is always pending.
> > >
> > > So as the attached second, we should try to flush out the keep-alive
> > > packets if possible before checking pg_is_send_pending().
> > >
> >
> > /* Send keepalive if the time has come */
> >   WalSndKeepaliveIfNecessary();
> >
> > + /* We may have queued a keep alive packet. flush it before sleeping. */
> > + pq_flush_if_writable();
> >
> > We already call pq_flush_if_writable() from WalSndKeepaliveIfNecessary
> > after sending the keep-alive message, so not sure how this helps?

No. WalSndKeepaliveIfNecessary calls it only when walreceiver does not
receive a reply message for a long time. So the keepalive sent by the
direct call to WalSndKeepalive() from WalSndWaitForWal is not flushed
out in most cases, which causes the flood.

I rechecked all callers of WalSndKeepalive().

WalSndKeepalive()
+- *WalSndWaltForWal
+- ProcessStandbyReplyMessage
|+- ProcessStandbyMessage
| +- ProcessRepliesIfAny
|  +- $WalSndWriteData
|  +- *WalSndWaitForWal
|  +- WalSndLoop
|    (calls pq_flush_if_writable() after sending the packet, but the
|     keepalive packet prevents following stream data from being sent
|     since the pending keepalive-packet causes pq_is_sned_pending()
|     return (falsely) true.)
+- WalSndDone
 +- *WalSndLoop
+- WalSndKeepaliveIfNecessary
   (calls pq_flush_if_writable always only after calling WalSndKeepalive())

The callers prefixed by '*' above misunderstand that some of the data
sent by them are still pending even when the only pending bytes is the
keepalive packet. Of course the keepalive pakcets should be sent
*before* sleep and the unsent keepalive packet prevents the callers
from sleeping then they immediately retry sending another keepalive
pakcet and repeat it until the condition changes. (The callers
prevised by "$" also enters a sleep before flushing but doesn't repeat
sending keepalives.)

The caller is forgetting that a keepalive pakcet may be queued but not
flushed after calling WalSndKeepalive.  So more sensible fix would be
calling pq_flush_if_writable in WalSndKeepalive?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: pgsql: Support parallel btree index builds.
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: Race condition in recovery?