Ashutosh Bapat noticed that WalSndWaitForWal() is setting
waiting_for_ping_response after sending a keepalive that does *not*
request a reply. The bad consequence is that other callers that do
require a reply end up in not sending a keepalive, because they think it
was already sent previously. So the whole thing gets stuck.
He found that commit 41d5f8ad734 failed to remove the setting of
waiting_for_ping_response after changing the "request" parameter
WalSndKeepalive from true to false; that seems to have been an omission
and it breaks the algorithm. Thread at [1].
The simplest fix is just to remove the line that sets
waiting_for_ping_response, but I think it is less error-prone to have
WalSndKeepalive set the flag itself, instead of expecting its callers to
do it (and know when to). Patch attached. Also rewords some related
commentary.
[1] https://postgr.es/m/flat/BLU436-SMTP25712B7EF9FC2ADEB87C522DC040@phx.gbl
--
Álvaro Herrera Valdivia, Chile