Thread: Add ETIMEDOUT to ALL_CONNECTION_FAILURE_ERRNOS
Previously successfully opened TCP connections can still fail on reads with ETIMEDOUT. This should be considered a connection failure, so that the connection in libpq is marked as CONNECTION_BAD. The reason I got an ETIMEDOUT was, because I had set a low tcp_user_timeout in the connection string. However, it can probably also happen due to keepalive limits being reached.
Attachment
Jelte Fennema <Jelte.Fennema@microsoft.com> writes: > Previously successfully opened TCP connections can still fail on reads > with ETIMEDOUT. This should be considered a connection failure, so that > the connection in libpq is marked as CONNECTION_BAD. The reason I got an > ETIMEDOUT was, because I had set a low tcp_user_timeout in the > connection string. However, it can probably also happen due to > keepalive limits being reached. I'm dubious about the portability of this patch, because we don't use ETIMEDOUT elsewhere. strerror.c thinks it may not exist, which is probably overly conservative because POSIX has required it since SUSv2. The bigger problem is that it's not accounted for in the WSAxxx mapping done in port/win32_port.h and TranslateSocketError. That'd have to be fixed for this to behave reasonably on Windows, I think. regards, tom lane
Attached is a new patch that I think addresses your concerns.
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Thursday, September 30, 2021 16:04
To: Jelte Fennema <Jelte.Fennema@microsoft.com>
Cc: pgsql-hackers@postgresql.org <pgsql-hackers@postgresql.org>
Subject: [EXTERNAL] Re: Add ETIMEDOUT to ALL_CONNECTION_FAILURE_ERRNOS
Sent: Thursday, September 30, 2021 16:04
To: Jelte Fennema <Jelte.Fennema@microsoft.com>
Cc: pgsql-hackers@postgresql.org <pgsql-hackers@postgresql.org>
Subject: [EXTERNAL] Re: Add ETIMEDOUT to ALL_CONNECTION_FAILURE_ERRNOS
Jelte Fennema <Jelte.Fennema@microsoft.com> writes:
> Previously successfully opened TCP connections can still fail on reads
> with ETIMEDOUT. This should be considered a connection failure, so that
> the connection in libpq is marked as CONNECTION_BAD. The reason I got an
> ETIMEDOUT was, because I had set a low tcp_user_timeout in the
> connection string. However, it can probably also happen due to
> keepalive limits being reached.
I'm dubious about the portability of this patch, because we don't
use ETIMEDOUT elsewhere. strerror.c thinks it may not exist,
which is probably overly conservative because POSIX has required
it since SUSv2. The bigger problem is that it's not accounted for in
the WSAxxx mapping done in port/win32_port.h and TranslateSocketError.
That'd have to be fixed for this to behave reasonably on Windows,
I think.
regards, tom lane
> Previously successfully opened TCP connections can still fail on reads
> with ETIMEDOUT. This should be considered a connection failure, so that
> the connection in libpq is marked as CONNECTION_BAD. The reason I got an
> ETIMEDOUT was, because I had set a low tcp_user_timeout in the
> connection string. However, it can probably also happen due to
> keepalive limits being reached.
I'm dubious about the portability of this patch, because we don't
use ETIMEDOUT elsewhere. strerror.c thinks it may not exist,
which is probably overly conservative because POSIX has required
it since SUSv2. The bigger problem is that it's not accounted for in
the WSAxxx mapping done in port/win32_port.h and TranslateSocketError.
That'd have to be fixed for this to behave reasonably on Windows,
I think.
regards, tom lane
Attachment
Jelte Fennema <Jelte.Fennema@microsoft.com> writes: > Attached is a new patch that I think addresses your concerns. You missed TranslateSocketError ... Pushed to HEAD only with that fix. regards, tom lane
Oops sorry, I did make that change locally but apparently didn't update my .patch file after committing, so I uploaded anintermediary one... Thanks for fixing that. I saw you added this section to the commit message: > Perhaps this should be back-patched, but I'm hesitant to do so given > the lack of previous complaints, and the hazard that there's a small > ABI break on Windows from redefining the symbol. Even if we decide > to do that, it'd be prudent to let this bake awhile in HEAD first. Personally, I would love to see this backpatched. Since together with a second bug I reported[1] it's causing high querytimeouts in Citus even if tcp_user_timeout is set to a low value. I do understand your worry though. Would a patch likethe one I attached now be a better fit for a backport? Jelte [1]: https://www.postgresql.org/message-id/flat/AM5PR83MB017870DE81FC84D5E21E9D1EF7AA9%40AM5PR83MB0178.EURPRD83.prod.outlook.com
Attachment
Jelte Fennema <Jelte.Fennema@microsoft.com> writes: > Personally, I would love to see this backpatched. Since together with a second bug I reported[1] it's causing high querytimeouts in Citus even if tcp_user_timeout is set to a low value. I do understand your worry though. Would a patch likethe one I attached now be a better fit for a backport? The only way that could work as-intended is if c.h includes whatever header provides TCP_USER_TIMEOUT, which does not seem like a great idea. regards, tom lane
I would still love to get a version of this patch backported. And I just thought of an idea to do so without breaking theWindows ABI, by slightly modifying my previous idea. See the attached patch.