On Wed, Jan 12, 2022 at 4:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
> So here we get similar hanging on WaitLatchOrSocket().
> Just to make sure that it's indeed the same issue, I've removed socket
> shutdown&close and the test executed to the end (several times). Argh.
Ouch. I think our options at this point are:
1. Revert 6051857fc (and put it back when we have a working
long-lived WES as I showed). This is not very satisfying, now that we
understand the bug, because even without that change I guess you must
be able to reach the hanging condition by using Windows postgres_fdw
to talk to a non-Windows server (ie a normal TCP stack with graceful
shutdown/linger on process exit).
2. Put your poll() check into the READABLE side. There's some
precedent for that sort of kludge on the WRITEABLE side (and a
rejection of the fragile idea that clients of latch.c should only
perform "safe" sequences):
/*
* Windows does not guarantee to log an FD_WRITE network event
* indicating that more data can be sent unless the previous send()
* failed with WSAEWOULDBLOCK. While our caller might well have made
* such a call, we cannot assume that here. Therefore, if waiting for
* write-ready, force the issue by doing a dummy send(). If the dummy
* send() succeeds, assume that the socket is in fact write-ready, and
* return immediately. Also, if it fails with something other than
* WSAEWOULDBLOCK, return a write-ready indication to let our caller
* deal with the error condition.
*/