pgsql: Fix lost Windows socket EOF events. - Mailing list pgsql-committers

From Thomas Munro
Subject pgsql: Fix lost Windows socket EOF events.
Date
Msg-id E1sSU9d-001SJl-Ke@gemulon.postgresql.org
Whole thread Raw
List pgsql-committers
Fix lost Windows socket EOF events.

Winsock only signals an FD_CLOSE event once if the other end of the
socket shuts down gracefully.  Because each WaitLatchOrSocket() call
constructs and destroys a new event handle every time, with unlucky
timing we can lose it and hang.  We get away with this only if the other
end disconnects non-gracefully, because FD_CLOSE is repeatedly signaled
in that case.

To fix this design flaw in our Windows socket support fundamentally,
we'd probably need to rearchitect it so that a single event handle
exists for the lifetime of a socket, or switch to completely different
multiplexing or async I/O APIs.  That's going to be a bigger job
and probably wouldn't be back-patchable.

This brute force kludge closes the race by explicitly polling with
MSG_PEEK before sleeping.

Back-patch to all supported releases.  This should hopefully clear up
some random build farm and CI hang failures reported over the years.  It
might also allow us to try using graceful shutdown in more places again
(reverted in commit 29992a6) to fix instability in the transmission of
FATAL error messages, but that isn't done by this commit.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Tested-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/176008.1715492071%40sss.pgh.pa.us

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/a8458f508a7a441242e148f008293128676df003

Modified Files
--------------
src/backend/storage/ipc/latch.c | 32 ++++++++++++++++++++++++++++++++
1 file changed, 32 insertions(+)


pgsql-committers by date:

Previous
From: Andrew Dunstan
Date:
Subject: pgsql: Use diff --strip-trailing-cr in pg_regress.c
Next
From: Thomas Munro
Date:
Subject: pgsql: Fix lost Windows socket EOF events.