Re: Timeout parameters - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: Timeout parameters
Date
Msg-id 20190328.134311.91428951.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to RE: Timeout parameters  ("Nagaura, Ryohei" <nagaura.ryohei@jp.fujitsu.com>)
Responses RE: Timeout parameters
RE: Timeout parameters
List pgsql-hackers
Hello.

At Thu, 28 Mar 2019 01:11:23 +0000, "Nagaura, Ryohei" <nagaura.ryohei@jp.fujitsu.com> wrote in
<EDA4195584F5064680D8130B1CA91C4540EEAD@G01JPEXMBYT04>
> Hello, Fabien-san.
> 
> > From: Fabien COELHO <coelho@cri.ensmp.fr>
> > About the backend v11 patch.
> > No space or newline before ";". Same comment about the libpq_ timeout.
> Fixed.
> 
> > There is an error in the code, I think it should be < 0 to detect errors.
> Yes, thanks!
> 
> > If the parameter has no effect on Windows, I do not see why its value should be
> > constrained to zero, it should just have no effect. Idem libpq_ timeout.
> I had a misunderstanding.
> Indeed, it doesn't need to be zero. Removed.
> 
> > Documentation:
> > ISTM this is not about libpq connections but about TCP connections. There can be
> > non libpq implementations client side.
> Then, where do you think the correct place?
> I thought that this parameter should be explained here because the communication
> will be made through the library libpq same as keepalive features.

In TCP_backend patch:

+       <para>
+        Define a wrapper for <literal>TCP_USER_TIMEOUT</literal>
+        socket option of libpq connection.
+       </para>
+       <para>
+        Specifies the number of milliseconds after which a TCP connection can be
+        aborted by the operation system due to network problems when sending or
+        receiving the data through this connection. A value of zero uses the
+        system default. In sessions connected via a Unix-domain socket, this
+        parameter is ignored and always reads as zero. This parameter is
+        supported only on systems that support
+        <literal>TCP_USER_TIMEOUT</literal>; on other systems, it has no effect.
+       </para>


I think this is not mentioning backend. Why don't you copy'n
paste then modify the description of tcp_keepalives_idle? Perhaps
it needs a similar caveat related to Windows.


+static const char*
+show_tcp_user_timeout(void)
+{
+    /* See comments in assign_tcp_keepalives_idle */
+    static char nbuf[16];
+
+    snprintf(nbuf, sizeof(nbuf), "%d", tcp_user_timeout);
+    return nbuf;
+}

The reason for, say, tcp_keepalive_idle uses the hook is that it
doesn't show the GUC variable, but the actual value read by
getsockopt. This is just showing the GUC value. I think this
should behave the same way with tcp_keepalive* options. If not,
we should have an explanation of the reason there.


In TCP_interface patch:

+       <para>
+        Define a wrapper for <literal>TCP_USER_TIMEOUT</literal>
+        socket option of libpq connection.

What does the "wrapper" mean?

+       </para>
+       <para>
+        Specifies the number of milliseconds after which a TCP connection can be
+        aborted by the operation system due to network problems when sending or
+        receiving the data through this connection. A value of zero uses the
+        system default. In sessions connected via a Unix-domain socket, this
+        parameter is ignored and always reads as zero. This parameter is
+        supported only on systems that support
+        <literal>TCP_USER_TIMEOUT</literal>; on other systems, it has no effect.
+       </para>

I would suggest the same thing as the backend
part. Copy'n-paste-modify keepalives_idle would be better.


+    if (setsockopt(conn->sock, IPPROTO_TCP, TCP_USER_TIMEOUT,
+             (char *) &timeout, sizeof(timeout)) < 0 && errno != ENOPROTOOPT)
+    {
+        char        sebuf[256];
+
+        appendPQExpBuffer(&conn->errorMessage,
+                    libpq_gettext("setsockopt(TCP_USER_TIMEOUT) failed: %s\n"),

I suppose that the reason ENOPROTOOPT is excluded from error
condition is that the system call may fail with that errno on
older kernels, but I don't think that that justifies ignoring the
failure.


+                    if (!IS_AF_UNIX(addr_cur->ai_family))
+                    {
+                        if (!setTCPUserTimeout(conn))
+                        {
+                            closesocket(conn->sock);
+                            conn->sock = -1;
+                            conn->addr_cur = addr_cur->ai_next;
+                            goto keep_going;
+                        }
+                    }

I don't see a point in the added part having own "if
(!IS_AF_UNIX" block separately from tcp_keepalive options.


+    /* TCP USER TIMEOUT */
+    {"tcp_user_timeout", NULL, NULL, NULL,
+        "TCP_user_timeout", "", 10,    /* strlen(INT32_MAX) == 10 */
+        offsetof(struct pg_conn, pgtcp_user_timeout)},
+

Similary, why isn't this placed just after tcp_keepalives
options?

+    char       *pgtcp_user_timeout;    /* tcp user timeout (numeric string) */

+    char       *tcp_user_timeout;    /* tcp user timeout */

The latter doesn't seem to be used.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: partitioned tables referenced by FKs
Next
From: David Rowley
Date:
Subject: Re: Ordered Partitioned Table Scans