On 2020-08-06 18:55:58 -0400, Alvaro Herrera wrote:
> Ashutosh Bapat noticed that WalSndWaitForWal() is setting
> waiting_for_ping_response after sending a keepalive that does *not*
> request a reply. The bad consequence is that other callers that do
> require a reply end up in not sending a keepalive, because they think it
> was already sent previously. So the whole thing gets stuck.
>
> He found that commit 41d5f8ad734 failed to remove the setting of
> waiting_for_ping_response after changing the "request" parameter
> WalSndKeepalive from true to false; that seems to have been an omission
> and it breaks the algorithm. Thread at [1].
>
> The simplest fix is just to remove the line that sets
> waiting_for_ping_response, but I think it is less error-prone to have
> WalSndKeepalive set the flag itself, instead of expecting its callers to
> do it (and know when to). Patch attached. Also rewords some related
> commentary.
Thanks for diagnosis and fix!
- Andres