Re: Expansion of our checks for connection-loss errors - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | Re: Expansion of our checks for connection-loss errors |
Date | |
Msg-id | 20201009.100538.833421946389396462.horikyota.ntt@gmail.com Whole thread Raw |
In response to | Expansion of our checks for connection-loss errors (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Expansion of our checks for connection-loss errors
|
List | pgsql-hackers |
At Thu, 08 Oct 2020 15:15:54 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in > Over in the thread at [1], we've tentatively determined that the > reason buildfarm member lorikeet is currently failing is that its > network stack returns ECONNABORTED for (some?) connection failures, > whereas our code is only expecting ECONNRESET. Fujii Masao therefore > proposes that we treat ECONNABORTED the same as ECONNRESET. I think > this is a good idea, but after a bit of research I feel it does not > go far enough. I find these POSIX-standard errnos that also seem > likely candidates to be returned for a hard loss of connection: > > ECONNABORTED > EHOSTUNREACH > ENETDOWN > ENETUNREACH > > All of these have been in POSIX since SUSv2, so it seems unlikely > that we need to #ifdef any of them. (It is in any case pretty silly > that we have #ifdefs around a very small minority of our references > to ECONNRESET :-(.) > > There are some other related errnos, such as ECONNREFUSED, that > don't seem like they'd be returned for a failure of a pre-existing > connection, so we don't need to include them in such tests. > > Accordingly, I propose the attached patch (an expansion of > Fujii-san's) that causes us to test for all five errnos anyplace > we had been checking for ECONNRESET. I felt that this was getting to > the point where we'd better centralize the knowledge of what to check, > so the patch does that, via an inline function and an admittedly hacky > macro. I also upgraded some places such as strerror.c to have full > support for these symbols. > > All of the machines I have (even as far back as HPUX 10.20) also > define ENETRESET and EHOSTDOWN. However, those symbols do not appear > in SUSv2. ENETRESET was added at some later point, but EHOSTDOWN is > still not in POSIX. For the moment I've left these second-tier > symbols out of the patch, but there's a case for adding them. I'm > not sure whether there'd be any point in trying to #ifdef them. > > BTW, I took out the conditional defines of some of these errnos in > libpq's win32.h; AFAICS that's been dead code ever since we added > #define's for them to win32_port.h. Am I missing something? > > This seems like a bug fix to me, so I'm inclined to back-patch. > > regards, tom lane > > [1] https://www.postgresql.org/message-id/flat/E1kPc9v-0005L4-2l%40gemulon.postgresql.org +1 for the direction. In terms of connection errors, connect(2) and bind(2) can return EADDRNOTAVAIL. bind(2) and listen(2) can return EADDRINUSE. FWIW I recetnly saw pgbench getting EADDRNOTAVAIL. (They have mapping from respective WSA errors in TranslateSocketError()) I'm not sure how we should treat EMFILE/ENFILE/ENOBUFS/ENOMEM from accept(2). (select(2) can return ENOMEM.) I'd make errno_is_connection_loss use ALL_CONNECTION_LOSS_ERRNOS to avoid duplication definition of the errno list. - if (ret < 0 && WSAGetLastError() == WSAECONNRESET) + if (ret < 0 && errno_is_connection_loss(WSAGetLastError())) Don't we need to use TranslateSocketError() before? + /* We might get ECONNRESET etc here if using TCP and backend died */ + if (errno_is_connection_loss(SOCK_ERRNO)) Perhaps I'm confused but SOCK_ERROR doesn't seem portable between Windows and Linux. ===== /* * These macros are needed to let error-handling code be portable between * Unix and Windows. (ugh) */ #ifdef WIN32 #define SOCK_ERRNO (WSAGetLastError()) #define SOCK_STRERROR winsock_strerror #define SOCK_ERRNO_SET(e) WSASetLastError(e) #else #define SOCK_ERRNO errno #define SOCK_STRERROR strerror_r #define SOCK_ERRNO_SET(e) (errno = (e)) #endif ===== AFAICS SOCK_ERRNO is intended to be used idiomatically as: > SOCK_STRERROR(SOCK_ERRNO, ...) The WSAE values from WSAGetLastError() and E values in errno are not compatible and needs translation by TranslateSocketError()? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: