I wrote:
>> Hmm, maybe. It sure *looks* like we need to do
>> - return conn->asyncStatus == PGASYNC_BUSY || conn->write_failed;
>> + return conn->asyncStatus == PGASYNC_BUSY && !conn->write_failed;
> Dunno anything about the Windows angle, but after hacking together
> a reproducer for this, I'm fairly convinced that this is indeed a bug.
After further study, I now think that checking write_failed here is
the wrong thing entirely, because it's irrelevant to what we want to
do, which is keep reading until we've collected a server message or
seen read EOF.
What we do need to do though is check to see if we've closed the
socket. That's because places like libpqwalreceiver.c assume
that if PQisBusy is true, then PQsocket() must be valid:
while (PQisBusy(streamConn))
{
int rc;
rc = WaitLatchOrSocket(MyLatch,
WL_EXIT_ON_PM_DEATH | WL_SOCKET_READABLE |
WL_LATCH_SET,
PQsocket(streamConn),
0,
WAIT_EVENT_LIBPQWALRECEIVER_RECEIVE);
It seems like the existing coding in PQisBusy could allow that to fail.
I scraped the buildfarm for instances of the "cannot wait on socket
event without a socket" error that latch.c would emit if that happened,
but found none going back three months. That's perhaps because of the
other behavior I noted that if the walsender crashes, we'll probably
report write failure immediately in PQsendQuery. So maybe this is
mostly hypothetical in practice, but I think what we actually want
here is as attached.
In any case, I still don't see a way for this error to result in
an infinite loop, so it doesn't seem like an explanation for the
Windows problems.
regards, tom lane
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index 59121873d2..9afd4d88b4 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -1957,10 +1957,14 @@ PQisBusy(PGconn *conn)
parseInput(conn);
/*
- * PQgetResult will return immediately in all states except BUSY, or if we
- * had a write failure.
+ * PQgetResult will return immediately in all states except BUSY. Also,
+ * if we've detected read EOF and dropped the connection, we can expect
+ * that PQgetResult will fail immediately. Note that we do *not* check
+ * conn->write_failed here --- once that's become set, we know we have
+ * trouble, but we need to keep trying to read until we have a complete
+ * server message or detect read EOF.
*/
- return conn->asyncStatus == PGASYNC_BUSY || conn->write_failed;
+ return conn->asyncStatus == PGASYNC_BUSY && conn->status != CONNECTION_BAD;
}
/*