Hello,
While testing replication for 9.5, we found that repl-master can
ignore wal_sender_timeout and seemingly waits for TCP
retransmission timeout for the case of sudden power-off of a
standby.
My investigation told me that the immediate cause could be that
secure_write() is called with *blocking mode* (that is,
port->noblock = false) under *pq_putmessage_noblock* macro called
from XLogSendPhysical().
libpq.h of 9.5 and newer defines it as the following,
> #define pq_putmessage(msgtype, s, len) \
> (PqCommMethods->putmessage(msgtype, s, len))
> #define pq_putmessage_noblock(msgtype, s, len) \
> (PqCommMethods->putmessage(msgtype, s, len))
which is apparently should be the following.
> #define pq_putmessage_noblock(msgtype, s, len) \
> (PqCommMethods->putmessage_noblock(msgtype, s, len))
The attached patch fixes it.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center