It strikes me that maybe the misuse of io_flag could be contributing to this: if the walreceiver process's latch were set, we'd end up calling PQconnectPoll before the socket had necessarily come ready, which would produce the described symptom. That's grasping at straws admittedly, because I'm not sure why the walreceiver process's latch would be set at this point; but it seems like we ought to test a version of the patch that we believe correct before deciding that we still have a problem.
To move things along, here's a corrected patch --- Jobin, please test.
regards, tom lane
--
Jobin Augustine Architect : Production Database Operations