Re: Expansion of our checks for connection-loss errors - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Expansion of our checks for connection-loss errors
Date
Msg-id 20201009.100538.833421946389396462.horikyota.ntt@gmail.com
Whole thread Raw
In response to Expansion of our checks for connection-loss errors  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Expansion of our checks for connection-loss errors
List pgsql-hackers
At Thu, 08 Oct 2020 15:15:54 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in 
> Over in the thread at [1], we've tentatively determined that the
> reason buildfarm member lorikeet is currently failing is that its
> network stack returns ECONNABORTED for (some?) connection failures,
> whereas our code is only expecting ECONNRESET.  Fujii Masao therefore
> proposes that we treat ECONNABORTED the same as ECONNRESET.  I think
> this is a good idea, but after a bit of research I feel it does not
> go far enough.  I find these POSIX-standard errnos that also seem
> likely candidates to be returned for a hard loss of connection:
> 
>     ECONNABORTED
>     EHOSTUNREACH
>     ENETDOWN
>     ENETUNREACH
> 
> All of these have been in POSIX since SUSv2, so it seems unlikely
> that we need to #ifdef any of them.  (It is in any case pretty silly
> that we have #ifdefs around a very small minority of our references
> to ECONNRESET :-(.)
> 
> There are some other related errnos, such as ECONNREFUSED, that
> don't seem like they'd be returned for a failure of a pre-existing
> connection, so we don't need to include them in such tests.
> 
> Accordingly, I propose the attached patch (an expansion of
> Fujii-san's) that causes us to test for all five errnos anyplace
> we had been checking for ECONNRESET.  I felt that this was getting to
> the point where we'd better centralize the knowledge of what to check,
> so the patch does that, via an inline function and an admittedly hacky
> macro.  I also upgraded some places such as strerror.c to have full
> support for these symbols.
> 
> All of the machines I have (even as far back as HPUX 10.20) also
> define ENETRESET and EHOSTDOWN.  However, those symbols do not appear
> in SUSv2.  ENETRESET was added at some later point, but EHOSTDOWN is
> still not in POSIX.  For the moment I've left these second-tier
> symbols out of the patch, but there's a case for adding them.  I'm
> not sure whether there'd be any point in trying to #ifdef them.
> 
> BTW, I took out the conditional defines of some of these errnos in
> libpq's win32.h; AFAICS that's been dead code ever since we added
> #define's for them to win32_port.h.  Am I missing something?
> 
> This seems like a bug fix to me, so I'm inclined to back-patch.
> 
>             regards, tom lane
> 
> [1] https://www.postgresql.org/message-id/flat/E1kPc9v-0005L4-2l%40gemulon.postgresql.org

+1 for the direction.

In terms of connection errors, connect(2) and bind(2) can return
EADDRNOTAVAIL. bind(2) and listen(2) can return EADDRINUSE. FWIW I
recetnly saw pgbench getting EADDRNOTAVAIL. (They have mapping from
respective WSA errors in TranslateSocketError())

I'm not sure how we should treat EMFILE/ENFILE/ENOBUFS/ENOMEM from
accept(2). (select(2) can return ENOMEM.)

I'd make errno_is_connection_loss use ALL_CONNECTION_LOSS_ERRNOS to
avoid duplication definition of the errno list.

-    if (ret < 0 && WSAGetLastError() == WSAECONNRESET)
+    if (ret < 0 && errno_is_connection_loss(WSAGetLastError()))

Don't we need to use TranslateSocketError() before?

+        /* We might get ECONNRESET etc here if using TCP and backend died */
+        if (errno_is_connection_loss(SOCK_ERRNO))

Perhaps I'm confused but SOCK_ERROR doesn't seem portable between
Windows and Linux.

=====
/*
 * These macros are needed to let error-handling code be portable between
 * Unix and Windows.  (ugh)
 */
#ifdef WIN32
#define SOCK_ERRNO (WSAGetLastError())
#define SOCK_STRERROR winsock_strerror
#define SOCK_ERRNO_SET(e) WSASetLastError(e)
#else
#define SOCK_ERRNO errno
#define SOCK_STRERROR strerror_r
#define SOCK_ERRNO_SET(e) (errno = (e))
#endif
=====

AFAICS SOCK_ERRNO is intended to be used idiomatically as:

> SOCK_STRERROR(SOCK_ERRNO, ...)

The WSAE values from WSAGetLastError() and E values in errno are not
compatible and needs translation by TranslateSocketError()?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: "Hou, Zhijie"
Date:
Subject: Remove some unnecessary if-condition
Next
From: Tom Lane
Date:
Subject: Re: Expansion of our checks for connection-loss errors