Thread: Timeout parameters

Timeout parameters

From
"Nagaura, Ryohei"
Date:
Hi, all.

I'd like to suggest introducing two parameters to handle client-server communication timeouts.
That is "tcp_user_timeout" and "socket_timeout" parameter.

I implemented "tcp_user_timeout" parameter 
in both backend and frontend side.
This parameter enables us to 
use TCP_USER_TIMEOUT option on linux.
If the parameter is specified, the process sets the value to
TCP_USER_TIMEOUT option.
In my opinion, this option is needed for the following situation:
If the server can't return an ack packet to the request from the client, 
the client performs retransmission processing.
In this case TCP keepalive option can't work.
Therefore we need TCP USER TIMEOUT option.
Andrei Yahorau also refer to the necessity of this option in [1].

"socket_timeout" is the application layer timeout parameter 
from when frontend issues SQL query 
to when frontend receives the execution result from backend.
When this parameter is active and timeout occurs, 
frontend close the socket.
It is a merit for client to set the maximum time 
to wait for SQL.

I'm waiting for your opinions or reviews.

[1] https://www.postgresql.org/message-id/flat/OF4C8A68CE.A350F319-ON432582D0.0028A5FF-432582D0.002FEE28%40iba.by

Bes regards,
---------------------
Ryohei Nagaura


Attachment

Re: Timeout parameters

From
AYahorau@ibagroup.eu
Date:
Hello Ryohei,


I took a look at your changes and I have some notes.
I faced the same issue as you faced. In my opinion hanging of a client is quite critical case and it needs to be overcame.
TCP_USER_TIMEOUT option helps to overcome this problem and I agree with you that it needs to be supported within PostgreSQL.

Nevertheless, it is necessary to take into account that the option TCP_USER_TIMEOUT is supported by Linux kernel starting since 2.6.37. In a lower kernel version these changes will not take affect.

I am not sure that suggested by you “socket_timeout” option should be implemented.
I see that you have changed pqWait() function. In my opinion it contradicts a bit with the comment to this function:
“We also stop waiting and return if the kernel flags an exception condition on the socket.” It means that this function should wait for some condition (ready to read/write) forever. On the other side, there is a function pqWaitTimed() which does the same action but within a timeout. So, in my opinion such changes of this function can lead to the problem with backward compatibility: the caller process expects that it will wait forever but terminates unexpectedly by timeout.

As far as I understand PostgreSQL versioning policy, the implementation of new parameter requires modification of internal PostgreSQL structure. As you know it is not posssible without stopping a service which can be done during migration to the new major version of PostgreSQL which is expected to be released in September 2019.

As a workaround I suggest using asynchronous command processing
https://www.postgresql.org/docs/10/static/libpq-async.html.

Best regards,
Andrei Yahorau



From:        "Nagaura, Ryohei" <nagaura.ryohei@jp.fujitsu.com>
To:        "'pgsql-hackers@postgresql.org'" <pgsql-hackers@postgresql.org>,
Cc:        "AYahorau@ibagroup.eu" <AYahorau@ibagroup.eu>
Date:        23/10/2018 07:37
Subject:        Timeout parameters




Hi, all.

I'd like to suggest introducing two parameters to handle client-server communication timeouts.
That is "tcp_user_timeout" and "socket_timeout" parameter.

I implemented "tcp_user_timeout" parameter
in both backend and frontend side.
This parameter enables us to
use TCP_USER_TIMEOUT option on linux.
If the parameter is specified, the process sets the value to
TCP_USER_TIMEOUT option.
In my opinion, this option is needed for the following situation:
If the server can't return an ack packet to the request from the client,
the client performs retransmission processing.
In this case TCP keepalive option can't work.
Therefore we need TCP USER TIMEOUT option.
Andrei Yahorau also refer to the necessity of this option in [1].

"socket_timeout" is the application layer timeout parameter
from when frontend issues SQL query
to when frontend receives the execution result from backend.
When this parameter is active and timeout occurs,
frontend close the socket.
It is a merit for client to set the maximum time
to wait for SQL.

I'm waiting for your opinions or reviews.

[1]
https://www.postgresql.org/message-id/flat/OF4C8A68CE.A350F319-ON432582D0.0028A5FF-432582D0.002FEE28%40iba.by

Bes regards,
---------------------
Ryohei Nagaura

[attachment "socket_timeout.patch" deleted by Andrei Yahorau/IBA] [attachment "TCP_USER_TIMEOUT_in_backend.patch" deleted by Andrei Yahorau/IBA] [attachment "TCP_USER_TIMEOUT_in_interface.patch" deleted by Andrei Yahorau/IBA]

RE: Timeout parameters

From
"Nagaura, Ryohei"
Date:
Hi Andrei,

Thank you for response.

> TCP_USER_TIMEOUT option helps to overcome this problem and I agree with
> you that it needs to be supported within PostgreSQL.
I'm glad to your agreement.

> Nevertheless, it is necessary to take into account that the option
> TCP_USER_TIMEOUT is supported by Linux kernel starting since 2.6.37. In
> a lower kernel version these changes will not take affect.
Does it mean how do we support Linux OS whose kernel version is less than 2.6.37?

> I am not sure that suggested by you “socket_timeout” option should be
> implemented.
> As a workaround I suggest using asynchronous command processing
> https://www.postgresql.org/docs/10/static/libpq-async.html
There are many applications implemented with synchronous API
(e.g. PQexec()), so "socket_timeout" is useful I think.

Best regards,
---------------------
Ryohei Nagaura


RE: Timeout parameters

From
"Nagaura, Ryohei"
Date:
Hi Andrei,

First, I inform you that I may not contact for the following period:
From November 1st to November 19th

Second, I noticed my misunderstanding in previous mail.
> > Nevertheless, it is necessary to take into account that the option
> > TCP_USER_TIMEOUT is supported by Linux kernel starting since 2.6.37.
> > In a lower kernel version these changes will not take affect.
> Does it mean how do we support Linux OS whose kernel version is less than
> 2.6.37?
I understand that you pointed out my implementation.
I'll remake patch files when I return.

Finally, I write test method for each parameters here roughly.
You may use iptables command on linux when testing TCP_USER_TIMEOUT.
You may use pg_sleep(seconds) command in postgres.
I'll write the details after my returning.

Continue to discuss the socket_timeout, please.

Best regards,
---------------------
Ryohei Nagaura


Re: Timeout parameters

From
Fabien COELHO
Date:
Hello Ryohei,

> I'd like to suggest introducing two parameters to handle client-server 
> communication timeouts.

I'm generally fine with giving more access to low-level parameters to 
users. However, I'm not sure I understand the use case you have that needs 
these new extensions.

> "socket_timeout" parameter.

About the "socket_timout" patch:

Patch does not apply cleanly because of a "trailing whitespace" in a 
comment. Please remove spaces at the end of lines.

I'd like clarifications about the use case that needs this specific 
feature, especially to understand why the server-side "statement_timeout" 
setting is not right enough.

> "socket_timeout" is the application layer timeout parameter from when 
> frontend issues SQL query to when frontend receives the execution result 
> from backend. When this parameter is active and timeout occurs, frontend 
> close the socket. It is a merit for client to set the maximum time to 
> wait for SQL.

I think that there is some kind of a misnomer: this is not a socket-level 
timeout, but a client-side query timeout, so it should be named 
differently? I'm not sure how to name it, though.

I checked that the feature works at the psql level.

   sh> psql "socket_timeout=2"

   psql> SELECT 1;
   1

   psql> SELECT pg_sleep(3);
   timeout expired
   The connection to the server was lost. Attempting reset: Succeeded.

The timeout is per statement, if there are several statements, each get 
its own timeout, just like server-side "statement_timeout".

I think that the way it works is a little extreme, basically the 
connection is aborted from within pqWait, and then restarted from scratch. 
I would not expect that from such a feature, but I'm not sure how to 
cancel a query from libpq, but it is possible, eg:


  psql> SELECT pg_sleep(10);
  ^C Cancel request sent
  ERROR:  canceling statement due to user request

  psql>

Would that be better? It probably assumes that the connection is okay.

The implementation looks awkward, because part of the logic of pqWaitTimed 
is reimplemented in pqWait. Also, I do not understand the computation 
of finish_time, which seems to assume that pqWait is going to be called 
immediately after sending a query, which may or may not be the case, and 
if it is not the time() call there is not the start of the statement.

C style: all commas should be followed by a space (or newline).

There is no clear way to know about the value of the setting (SHOW, \set, 
\pset...). Ok, this is already the case of other connection parameters.

Using "atoi" is a bad idea because it accepts trailing garbage and does 
not detect overflows. Use the "parse_int_param" function instead.

There are no tests.

There is no documentation.

-- 
Fabien.


RE: Timeout parameters

From
"Nagaura, Ryohei"
Date:
Hi, Fabien.

Thank you for your review.
And I'm very sorry to have kept you waiting so long.


About "socket_timeout"

> I'm generally fine with giving more access to low-level parameters to users.
> However, I'm not sure I understand the use case you have that needs these
> new extensions.
If you face the following situation, this parameter will be needed.
1. The connection between the server and the client has been established normally.
2. A server process has been received SQL statement.
3. The server OS can return an ack packet, but it takes time to execute the SQL statement 
   Or return the result because the server process is very busy.
4. The client wants to close the connection while leaving the job to the server.
In this case, "statement_timeout" can't satisfy at line 4.

> I think that there is some kind of a misnomer: this is not a socket-level
> timeout, but a client-side query timeout, so it should be named differently?
Yes, I think so.

> I'm not sure how to name it, though.
Me too.

> I think that the way it works is a little extreme, basically the connection
> is aborted from within pqWait, and then restarted from scratch.
>
> There is no clear way to know about the value of the setting (SHOW, \set,
> \pset...). Ok, this is already the case of other connection parameters.
If this parameter can be needed, I would like to discuss design and optional functions.
How do you think?
I'll correct patch of "socket_timeout" after that.


About "TCP_USER_TIMEOUT"
I fixed on the previous feedback.
Would you review, please?

> There are no tests.
I introduce the test methods of TCP_USER_TIMEOUT.

Test of client-side TCP_USER_TIMEOUT:
[client operation]
1. Connect DB server.
    postgres=# psql postgresql://USERNAME:PASSWORD@hostname:port/dbname?tcp_user_timeout=15000
2. Get the port number by the following command:
    postgres=# select inet_client_port();
3. Close the client port from the other console of the client machine.
   Please rewrite "56750" to the number confirmed on line 2.
    $ iptables -I INPUT -p tcp --dport 56750 -j DROP
4. Query the following SQL:
    postgres=# select pg_sleep(10);
5. TCP USER TIMEOUT works correctly if an error message is output to the console.

Test of server-side TCP_USER_TIMEOUT:
[client operation]
1. Connect DB server.
2. Get the port number by the following command:
    postgres=# select inet_client_port();
3. Set the TCP_USER_TIMEOUT by the following command:
    postgres=# set tcp_user_timeout=15000;
4. Query the following SQL:
    postgres=# select pg_sleep(10);
5. Close the client port from the other console.
   Please rewrite "56750" to the number confirmed on line 2.
    $ iptables -I INPUT -p tcp --dport 56750 -j DROP
[server operation]
6. Verify the logfile.

> There is no documentation.
I made a patch of documentation of TCP USER TIMEOUT.

Best regards,
---------------------
Ryohei Nagaura


Attachment

RE: Timeout parameters

From
"Nagaura, Ryohei"
Date:
Hi,

There was an invisible space, so I removed it.
I registered with 2019-01 commitfest.

Best regards,
---------------------
Ryohei Nagaura

> -----Original Message-----
> From: Nagaura, Ryohei [mailto:nagaura.ryohei@jp.fujitsu.com]
> Sent: Thursday, December 6, 2018 2:20 PM
> To: 'Fabien COELHO' <coelho@cri.ensmp.fr>;
> 'pgsql-hackers@postgresql.org' <pgsql-hackers@postgresql.org>
> Cc: Yahorau, A. (IBA) <AYahorau@ibagroup.eu>
> Subject: RE: Timeout parameters
> 
> Hi, Fabien.
> 
> Thank you for your review.
> And I'm very sorry to have kept you waiting so long.
> 
> 
> About "socket_timeout"
> 
> > I'm generally fine with giving more access to low-level parameters to
> users.
> > However, I'm not sure I understand the use case you have that needs
> > these new extensions.
> If you face the following situation, this parameter will be needed.
> 1. The connection between the server and the client has been established
> normally.
> 2. A server process has been received SQL statement.
> 3. The server OS can return an ack packet, but it takes time to execute
> the SQL statement
>    Or return the result because the server process is very busy.
> 4. The client wants to close the connection while leaving the job to the
> server.
> In this case, "statement_timeout" can't satisfy at line 4.
> 
> > I think that there is some kind of a misnomer: this is not a
> > socket-level timeout, but a client-side query timeout, so it should be
> named differently?
> Yes, I think so.
> 
> > I'm not sure how to name it, though.
> Me too.
> 
> > I think that the way it works is a little extreme, basically the
> > connection is aborted from within pqWait, and then restarted from scratch.
> >
> > There is no clear way to know about the value of the setting (SHOW,
> > \set, \pset...). Ok, this is already the case of other connection
> parameters.
> If this parameter can be needed, I would like to discuss design and optional
> functions.
> How do you think?
> I'll correct patch of "socket_timeout" after that.
> 
> 
> About "TCP_USER_TIMEOUT"
> I fixed on the previous feedback.
> Would you review, please?
> 
> > There are no tests.
> I introduce the test methods of TCP_USER_TIMEOUT.
> 
> Test of client-side TCP_USER_TIMEOUT:
> [client operation]
> 1. Connect DB server.
>     postgres=# psql
> postgresql://USERNAME:PASSWORD@hostname:port/dbname?tcp_user_timeout=1
> 5000
> 2. Get the port number by the following command:
>     postgres=# select inet_client_port();
> 3. Close the client port from the other console of the client machine.
>    Please rewrite "56750" to the number confirmed on line 2.
>     $ iptables -I INPUT -p tcp --dport 56750 -j DROP 4. Query the
> following SQL:
>     postgres=# select pg_sleep(10);
> 5. TCP USER TIMEOUT works correctly if an error message is output to the
> console.
> 
> Test of server-side TCP_USER_TIMEOUT:
> [client operation]
> 1. Connect DB server.
> 2. Get the port number by the following command:
>     postgres=# select inet_client_port();
> 3. Set the TCP_USER_TIMEOUT by the following command:
>     postgres=# set tcp_user_timeout=15000;
> 4. Query the following SQL:
>     postgres=# select pg_sleep(10);
> 5. Close the client port from the other console.
>    Please rewrite "56750" to the number confirmed on line 2.
>     $ iptables -I INPUT -p tcp --dport 56750 -j DROP [server operation]
> 6. Verify the logfile.
> 
> > There is no documentation.
> I made a patch of documentation of TCP USER TIMEOUT.
> 
> Best regards,
> ---------------------
> Ryohei Nagaura


Attachment

RE: Timeout parameters

From
"Nagaura, Ryohei"
Date:
Hi, Fabien.

The next CF will start so I want to restart the discussion.

> About "socket_timeout"
> If you face the following situation, this parameter will be needed.
If you feel that this situation can't happen or the use case is too limited, please point out so.

> > I think that there is some kind of a misnomer: this is not a
> > socket-level timeout, but a client-side query timeout, so it should be
> named differently?
> Yes, I think so.
> 
> > I'm not sure how to name it, though.
> Me too.
Since I want to use the monitoring target as the parameter name, let's decide the parameter name while designing.

> > I think that the way it works is a little extreme, basically the
> > connection is aborted from within pqWait, and then restarted from scratch.
Which motion seems to be uncomfortable?
Or both?

> > There is no clear way to know about the value of the setting (SHOW,
> > \set, \pset...).
That is a nice idea!
If this parameter implementation is decide, I'll also add these features.

> About "TCP_USER_TIMEOUT"
> I introduce the test methods of TCP_USER_TIMEOUT.
I only came up with this test methods with "iptables".
Since this command can be used only by root, I didn't create a script.

Best regards,
---------------------
Ryohei Nagaura




RE: Timeout parameters

From
Fabien COELHO
Date:
>>> I'm not sure I understand the use case you have that needs these new 
>>> extensions.

>> If you face the following situation, this parameter will be needed.
>> 1. The connection between the server and the client has been established
>> normally.
>> 2. A server process has been received SQL statement.
>> 3. The server OS can return an ack packet, but it takes time to execute
>> the SQL statement
>>    Or return the result because the server process is very busy.
>> 4. The client wants to close the connection while leaving the job to the
>> server.
>> In this case, "statement_timeout" can't satisfy at line 4.

Why?

ISTM that "leaving the job" to the server with a client-side connection 
closed is basically an abort, no different from what server-side 
"statement_timeout" already provides?

Also, from a client perspective, if you use statement_timeout, it 
would timeout, then the client would process the error and the connection 
would be ready for the next query without needing to be re-created, which 
is quite costly anyway? Also, if the server is busy, recreating an 
connection is expensive so it won't help much, really?

So from your explanation above I must admit that I do not clearly 
understand the use case for a client-side libpq-level SQL statement 
timeout. I still need some convincing.

About the implementation, I'm wondering whether something simpler could be 
done. Check how psql implements "ctrl-c" to abort a running query: it 
seems that it sends a cancel message, no need to actually abort the 
connection?

>>> I think that there is some kind of a misnomer: this is not a
>>> socket-level timeout, but a client-side query timeout, so it should be
>> named differently?
>> Yes, I think so.

Hmmm.... "client_statement_timeout" maybe?

-- 
Fabien.


RE: Timeout parameters

From
Fabien COELHO
Date:
こんにちは Royhei,

About the patches: you are expected to send consistent patches, i.e. one 
feature with its associated documentation, not two separate features and 
another patch for documenting them.

-- 
Fabien.

RE: Timeout parameters

From
"Nagaura, Ryohei"
Date:
Hi,

On Tue, Dec 25, 2018 at 2:59 AM, Fabien COELHO wrote:
> >> 4. The client wants to close the connection while leaving the job to
> >> the server.
> >> In this case, "statement_timeout" can't satisfy at line 4.
>
> Why?
> ISTM that "leaving the job" to the server with a client-side connection
> closed is basically an abort, no different from what server-side
> "statement_timeout" already provides?
"while leaving the job to the server" means that "while the server continue the job".
# Sorry for the inappropriate explanation.
I understand that "statement_timeout" won't.

> Also, from a client perspective, if you use statement_timeout, it would
> timeout, then the client would process the error and the connection would
> be ready for the next query without needing to be re-created, which is quite
> costly anyway? Also, if the server is busy, recreating an connection is
> expensive so it won't help much, really?
When the recreating the connection the server may be not busy.
In this case, it isn't so costly to reconnect.
Also, if a client do not have to execute the remaining query immediately after timeout, 
the client will have the choice of waiting until the server is not busy.

> About the implementation, I'm wondering whether something simpler could
> be done. Check how psql implements "ctrl-c" to abort a running query: it
> seems that it sends a cancel message, no need to actually abort the
> connection?
This is my homework.

> Hmmm.... "client_statement_timeout" maybe?
I agree.

Best regards,
---------------------
Ryohei Nagaura




RE: Timeout parameters

From
"Nagaura, Ryohei"
Date:
Hi Fabien.

On Wed, Dec 26, 2018 at 3:02 AM, Fabien COELHO wrote:
> About the patches: you are expected to send consistent patches, i.e. one
> feature with its associated documentation, not two separate features and
> another patch for documenting them.
Thank you for teaching me.
I rewrote patches and attached in this mail.

Best regards,
---------------------
Ryohei Nagaura


Attachment

RE: Timeout parameters

From
Fabien COELHO
Date:
Hello Ryohei,

>>>> 4. The client wants to close the connection while leaving the job to
>>>> the server.
>>>> In this case, "statement_timeout" can't satisfy at line 4.
>>
>> Why?
>> ISTM that "leaving the job" to the server with a client-side connection
>> closed is basically an abort, no different from what server-side
>> "statement_timeout" already provides?
> "while leaving the job to the server" means that "while the server continue the job".
> # Sorry for the inappropriate explanation.
> I understand that "statement_timeout" won't.

I still do not understand the use-case specifics: for me, aborting the 
connection, or a softer cancelling the statement, will result in the 
server stopping the statement, so the server does NOT "continue the job", 
so I still do not see how it really differs from the server-side 
statement_timeout setting.

-- 
Fabien.


RE: Timeout parameters

From
"Tsunakawa, Takayuki"
Date:
From: Fabien COELHO [mailto:coelho@cri.ensmp.fr]
> I still do not understand the use-case specifics: for me, aborting the
> connection, or a softer cancelling the statement, will result in the
> server stopping the statement, so the server does NOT "continue the job",
> so I still do not see how it really differs from the server-side
> statement_timeout setting.

How about when the server is so saturated that statement_timeout cannot work?  See SQLNET.SEND_TIMEOUT and
SQLNET.RECV_TIMEOUThere:
 

https://docs.oracle.com/cd/E11882_01/network.112/e10835/sqlnet.htm#NETRF228

As these parameter names suggest, maybe we could use SEND_TIMEO and RECV_TIMEO socket options for setsockopt() instead
ofusing pqWaitTimed().
 

To wrap up, the relevant parameters work like this:

* TCP keepalive and TCP user (retransmission) timeout: for network problems
* statement_timeout: for long-running queries
* socket_timeout (or send/recv_timeout): for saturated servers


FYI, PgJDBC has a parameter named socketTimeout:

https://jdbc.postgresql.org/documentation/head/connect.html#connection-parameters


Regards
Takayuki Tsunakawa




RE: Timeout parameters

From
AYahorau@ibagroup.eu
Date:
Hello,

>  
To wrap up, the relevant parameters work like this:

> * TCP keepalive and TCP user (retransmission) timeout: for network problems
> * statement_timeout: for long-running queries
> * socket_timeout (or send/recv_timeout): for saturated servers


Takayuki Tsunakawa, could you provide wider explanation of socket_timeout? I'm little bit misunderstanding in which cases this parameter is/can be used.


I would like to add some more information about TCP keepalive and TCP_USER_TIMEOUT mechanisms:
1)  Both these mechanisms are used for termination socket connection in case of network problems.

2) This termination of tcp connection is done on operation system level (kernel) and not by application.
3) TCP keepalive and TCP_USER_TIMEOUT work differently and complement each other :
    *  TCP keepalive mechanism works when a socket is in idle state (there is no any transaction in this case)

     * TCP_USER_TIMEOUT mechanism works when a socket is in active state (sending/receiving data).

So, TCP keepalive and TCP_USER_TIMEOUT provide full control under network state directly after creating a TCP socket and applying these parameters to it. Moreover, this control is delegate to the operation system (kernel) in case it supports such mechanisms.

If TCP_USER_TIMEOUT is not supported by PostgreSQL, it means that TCP connection are partly controlled by the operation system (kernel). In this case pqWaitTimed() should be used on the application layer for connection control in data transmission phase.

In my opinion, there is no any difference between server and client connection sides. To avoid mess in the configuration it seems reasonable to give the same name of this option for the client and server sides.

To my mind, this description  of tcp_user_timeout option is not correct (See my comment about TCP_USER_TIMEOUT mechanism above).
> + <listitem>
> + <para>
> + Specify in milliseconds the time to disconnect to the client
> + when there is no ack packet from the client to the server's data transmission.
> + This parameter is supported on linux version 2.6.37 or later.
> + </para>
> + <note>
> + <para>
> + This parameter is not supported on Windows.
> + </para>
> + </note>
> + </listitem>


As for me It better to specify the description as follows:

<listitem>
<para>
Define a wrapper for TCP_USER_TIMEOUT socket option of libpq connection.
</para>
<para>
Specifies the number of milliseconds after which a TCP connection can be aborted by the operation system due to network problems when the data is transmitting through this connection (sending/receiving). A value of 0 uses the system default. This parameter is supported only on systems that support TCP_USER_TIMEOUT or an equivalent socket option, and on Windows; on other systems, it must be zero. In sessions connected via a Unix-domain socket, this parameter is ignored and always reads as zero.
</para>
<note>
<para>
This parameter is not supported on Windows, and must be zero.
</para>
<para>
To enable full control under TCP connection use this option together with keepalive.
</para>
</note>
</listitem>


Best regards,
Andrei Yahorau



From:        "Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com>
To:        'Fabien COELHO' <coelho@cri.ensmp.fr>, "Nagaura, Ryohei" <nagaura.ryohei@jp.fujitsu.com>,
Cc:        "'pgsql-hackers@postgresql.org'" <pgsql-hackers@postgresql.org>, "AYahorau@ibagroup.eu" <AYahorau@ibagroup.eu>
Date:        27/12/2018 11:26
Subject:        RE: Timeout parameters




From: Fabien COELHO [mailto:coelho@cri.ensmp.fr]
> I still do not understand the use-case specifics: for me, aborting the
> connection, or a softer cancelling the statement, will result in the
> server stopping the statement, so the server does NOT "continue the job",
> so I still do not see how it really differs from the server-side
> statement_timeout setting.

How about when the server is so saturated that statement_timeout cannot work?  See SQLNET.SEND_TIMEOUT and SQLNET.RECV_TIMEOUT here:

https://docs.oracle.com/cd/E11882_01/network.112/e10835/sqlnet.htm#NETRF228

As these parameter names suggest, maybe we could use SEND_TIMEO and RECV_TIMEO socket options for setsockopt() instead of using pqWaitTimed().

To wrap up, the relevant parameters work like this:

* TCP keepalive and TCP user (retransmission) timeout: for network problems
* statement_timeout: for long-running queries
* socket_timeout (or send/recv_timeout): for saturated servers


FYI, PgJDBC has a parameter named socketTimeout:

https://jdbc.postgresql.org/documentation/head/connect.html#connection-parameters


Regards
Takayuki Tsunakawa



RE: Timeout parameters

From
"Nagaura, Ryohei"
Date:
Hi,

Sorry for my late.

On Tue, Dec 25, 2018 at 7:40 PM, Fabien COELHO wrote:
> I still do not understand the use-case specifics: for me, aborting the
> connection, or a softer cancelling the statement, will result in the server
> stopping the statement, so the server does NOT "continue the job", 
The server continue the job when the connection is aborted.
You can confirm by the following procedure.
0. connect to the server normally.
1. query the following statement
 =# update tbl set clmn = (select to_number(pg_sleep(10)||'10','20'));
2. kill client-side psql process before the result returns.
3. reconnect the server and "select" query on tbl.

On Thu, Jan 10, 2019 at 3:14 AM, AYahorau@ibagroup.eu wrote:
> Takayuki Tsunakawa, could you provide wider explanation of socket_timeout?
> I'm little bit misunderstanding in which cases this parameter is/can be
> used.
The communication between a client and the server is normal.
The server is so busy that the server can return ack packet and can't work statement_timeout.
In this case, the client may wait for very long time.
This parameter is effective for such clients.

> If TCP_USER_TIMEOUT is not supported by PostgreSQL, it means that TCP
> connection are partly controlled by the operation system (kernel). In this
> case pqWaitTimed() should be used on the application layer for connection
> control in data transmission phase.
In the current postgres, PQgetResult() called by sync command "PQexec()" uses pqWait().
If the user wishes to sync communication, how do you specify the waiting time limit?
It makes sense to implement in pqWait() that can wait clients indefinitely, I think.

> As for me It better to specify the description as follows:
Thank you for your comment.
I adopted your documentation in the current patch.

On Wed, Dec 26, 2018 at 8:25 PM, Tsunakawa, Takayuki wrote:
> To wrap up, the relevant parameters work like this:
> 
> * TCP keepalive and TCP user (retransmission) timeout: for network problems
> * statement_timeout: for long-running queries
> * socket_timeout (or send/recv_timeout): for saturated servers
Thank you for your summary.


Best regards,
---------------------
Ryohei Nagaura


Attachment

Re: Timeout parameters

From
Michael Paquier
Date:
On Mon, Jan 28, 2019 at 04:51:11AM +0000, Nagaura, Ryohei wrote:
> Sorry for my late.

Moved to next CF per the latest updates: there is a patch with no
reviews for it.
--
Michael

Attachment

RE: Timeout parameters

From
"Nagaura, Ryohei"
Date:
Hi Fabien,

Would you review TCP_USER_TIMEOUT patches first please?
I want to avoid the situation that 
the discussion of socket_timeout has been lengthened 
and tcp_user_timeout patch is also not commit in the next CF.

On Mon, Feb 4, 2019 at 2:24 AM, Michael Paquier wrote:
> Moved to next CF per the latest updates: there is a patch with no reviews for it.
Thank you.

Best regards,
---------------------
Ryohei Nagaura




RE: Timeout parameters

From
"Jamison, Kirk"
Date:
Hi,

I tried to re-read the whole thread.
Based from what I read, there are two proposed timeout parameters,
which I think can be discussed and commited separately:
(1) tcp_user_timeout 
(2) tcp_socket_timeout (or suggested client_statement_timeout,
                        send_timeout/recv_timeout)

Regarding the use-case of each parameter, Tsunakawa-san briefly
explained them above. Quoting again:
> * TCP keepalive and TCP user (retransmission) timeout: for network problems
> * statement_timeout: for long-running queries
> * socket_timeout (or send/recv_timeout): for saturated servers

The already existing statement_timeout mainly limits how long
each statement should run. [1]
However, even if statement_timeout was configured, it does not
handle the timeout for instances that a network failure occurs,
so the application would not recover from error. 
Therefore, there's a need for these features, to meet the cases
that statement_timeout currently does not handle.

1) tcp_user_timeout parameter
As for user_timeout param, there seems to be a common agreement
with regards to its need.

Just minor nitpick:
+    char       *tcp_user_timeout;    /* TCP USER TIMEOUT */
I think that's unnecessary capitalization in the user timeout part.

The latest tcp_user_timeout patch seems to be almost in good shape,
feedback about doc (clearer description from Andrei)
and code (whitespace, C-style, parse_int_param, handling old kernel
version) were addressed.

I think this can be "committed" separately when it's finalized.


2) tcp_socket_timeout parameter
On the other hand, there needs to be a further discussion and design
improvement with regards to the implementation of socket_timeout:
- whether (a) it should abort the connection from pqWait() or
  other means, or (b) cancel the statement similar to how psql
  does it as suggested by Fabien
- proper parameter name
 
Based from your description below, I agree with Fabien that it's somehow
the application/client side query timeout
> "socket_timeout" is the application layer timeout parameter from when 
> frontend issues SQL query to when frontend receives the execution result 
> from backend. When this parameter is active and timeout occurs, frontend 
> close the socket. It is a merit for client to set the maximum time to 
> wait for SQL.  

In PgJDBC, it serves two purpose though: query timeout and network problem
detection. The use of socketTimeout aborts the connection. [2]
> The timeout value used for socket read operations. If reading from 
> the server takes longer than this value, the connection is closed. 
> This can be used as both a brute force global query timeout and a
> method of detecting network problems. The timeout is specified in 
> seconds and a value of zero means that it is disabled.
 
Perhaps you could also clarify a bit more through documentation on how
socket_timeout handles the timeout differently from statement_timeout
and tcp_user_timeout.
Then we can decide on the which parameter name is better once the
implementation becomes clearer.

[1] https://www.postgresql.org/docs/devel/runtime-config-client.html
[2] https://jdbc.postgresql.org/documentation/head/connect.html#connection-parameters


Regards,
Kirk Jamison


RE: Timeout parameters

From
"Tsunakawa, Takayuki"
Date:
From: Jamison, Kirk [mailto:k.jamison@jp.fujitsu.com]
> 1) tcp_user_timeout parameter
> I think this can be "committed" separately when it's finalized.

Do you mean you've reviewed and tested the patch by simulating a communication failure in the way Nagaura-san
suggested?


> 2) tcp_socket_timeout parameter
> - whether (a) it should abort the connection from pqWait() or
>   other means, or 
> (b) cancel the statement similar to how psql
>   does it as suggested by Fabien

We have no choice but to terminate the connection, because we can't tell whether we can recover from the problem and
continueto use the connection (e.g. long-running query) or not (permanent server or network failure).
 

Regarding the place, pqWait() is the best (and possibly only) place.  The purpose of this feature is to avoid waiting
forresponse from the server forever (or too long) in any case, as a last resort.
 

Oracle has similar parameters called SQLNET.RECV_TIMEOUT and SQLNET.SEND_TIMEOUT.  From those names, I guess they use
SO_RCVTIMEOand SO_SNDTIMEO socket options.  However, we can't use them because use non-blocking sockets and poll(),
whileSO_RCV/SND_TIMEO do ont have an effect for poll():
 

[excerpt from "man 7 socket"]
--------------------------------------------------
       SO_RCVTIMEO and SO_SNDTIMEO
          Specify the receiving or sending  timeouts  until  reporting  an
          error.  The argument is a struct timeval.  If an input or output
          function blocks for this period of time, and data has been  sent
          or  received,  the  return  value  of  that function will be the
          amount of data transferred; if no data has been transferred  and
          the  timeout has been reached then -1 is returned with errno set
          to EAGAIN or EWOULDBLOCK just as if the socket was specified  to
          be  non-blocking.   If  the timeout is set to zero (the default)
          then the operation  will  never  timeout.   Timeouts  only  have
          effect  for system calls that perform socket I/O (e.g., read(2),
          recvmsg(2), send(2), sendmsg(2)); timeouts have  no  effect  for
          select(2), poll(2), epoll_wait(2), etc.
--------------------------------------------------



> - proper parameter name
> 
> Based from your description below, I agree with Fabien that it's somehow
> the application/client side query timeout

I think the name is good because it indicates the socket-level timeout.  That's just like PgJDBC and Oracle, and I
didn'tfeel strange when I read their manuals.
 


> Perhaps you could also clarify a bit more through documentation on how
> socket_timeout handles the timeout differently from statement_timeout
> and tcp_user_timeout.

Maybe.  Could you suggest good description?


Regards
Takayuki Tsunakawa



RE: Timeout parameters

From
"Nagaura, Ryohei"
Date:
Hi, Kirk-san.

Thank you for summarizing this thread.

On Tue, Feb 19, 2019 at 6:05 PM, Jamison, Kirk wrote:
> 1) tcp_user_timeout parameter
> As for user_timeout param, there seems to be a common agreement with regards
> to its need.
> I think this can be "committed" separately when it's finalized.
I also recognize so.
BTW, tcp_user_timeout parameter of servers and clients have same name in my current implementation.
I think it would be better different name rather than same name.
I'll name them as the following a) or b):
    a) server_tcp_user_timeout and client_tcp_user_timeout
    b) tcp_user_timeout and user_timeout
b) is the same as the naming convention of keepalive, but it is not user-friendly. 
Do you come up with better name?
Or opinion?

> (2) tcp_socket_timeout (or suggested client_statement_timeout,
>                         send_timeout/recv_timeout)
> Perhaps you could also clarify a bit more through documentation on how
> socket_timeout handles the timeout differently from statement_timeout and
> tcp_user_timeout.
> Then we can decide on the which parameter name is better once the
> implementation becomes clearer.
* The following use case is somewhat different from those listed first. Sorry.
Use case:
1) A client query to the server and the statement is delivered correctly.
2) The server become saturated.
3) But the network layor is alive.
Because of 1), tcp_user_timeout doesn't work.
Because of 2), statement_timeout doesn't work.
Because of 3), keepalive doesn't work.
In the result, clients can't release their resource despite that they want.

My suggestion is this solution.

To limit user waiting time in pqWait(), I implemented some same logic of pqWaitTimed().

Best regards,
---------------------
Ryohei Nagaura



RE: Timeout parameters

From
"Tsunakawa, Takayuki"
Date:
From: Nagaura, Ryohei [mailto:nagaura.ryohei@jp.fujitsu.com]
> BTW, tcp_user_timeout parameter of servers and clients have same name in
> my current implementation.
> I think it would be better different name rather than same name.
> I'll name them as the following a) or b):
>     a) server_tcp_user_timeout and client_tcp_user_timeout
>     b) tcp_user_timeout and user_timeout
> b) is the same as the naming convention of keepalive, but it is not
> user-friendly.
> Do you come up with better name?
> Or opinion?

a) is not always accurate, because libpq is also used in the server.  For example, postgres_fdw and WAL receiver in
streamingreplication.
 

I'm OK with either the current naming or b).  Frankly, I felt a bit strange when I first saw the keepalive parameters,
wonderingwhy the same names were not chosen.
 


Regards
Takayuki Tsunakawa






RE: Timeout parameters

From
"Nagaura, Ryohei"
Date:
Hi, Tsunakawa-san

Thank you for your comments.

On Wed, Feb 20, 2019 at 5:56 PM, Tsunakawa, Takayuki wrote:
> > Perhaps you could also clarify a bit more through documentation on how
> > socket_timeout handles the timeout differently from statement_timeout
> > and tcp_user_timeout.
> Maybe.  Could you suggest good description?
Clients wait until the socket become readable when they try to get results of their query.
If the socket state get readable, clients read results.
(See src/interfaces/libpq/fe-exec.c, fe-misc.c)
The current pg uses poll() to monitor socket states.
"socket_timeout" is used as timeout argument of poll().
(See [1] about poll() and its arguments)

Because "tcp_user_timeout" handles the timeout before and after the above procedure,
there are different monitoring target between "socket_timeout" and "tcp_user_timeout".
When the server is saturated, "statement_timeout" doesn't work while "socket_timeout" does work.
In addition to this, the purpose of "statement_timeout" is to reduce server's burden, I think.
My proposal is to enable clients to release their resource which is used communication w/ the saturated server.

On Wed, Feb 20, 2019 at 7:10 PM, Tsunakawa, Takayuki wrote:
> a) is not always accurate, because libpq is also used in the server.  For example,
> postgres_fdw and WAL receiver in streaming replication.
I didn't take that possibility into consideration.
Certainly, a) is bad.

> I'm OK with either the current naming or b).  Frankly, I felt a bit strange when I
> first saw the keepalive parameters, wondering why the same names were not
> chosen.
I will refer to it and wait for opinion of the reviewer.

[1] http://man7.org/linux/man-pages/man2/poll.2.html
Best regards,
---------------------
Ryohei Nagaura



RE: Timeout parameters

From
"Tsunakawa, Takayuki"
Date:
From: Nagaura, Ryohei [mailto:nagaura.ryohei@jp.fujitsu.com]
> > Maybe.  Could you suggest good description?
> Clients wait until the socket become readable when they try to get results
> of their query.
> If the socket state get readable, clients read results.
> (See src/interfaces/libpq/fe-exec.c, fe-misc.c)
> The current pg uses poll() to monitor socket states.
> "socket_timeout" is used as timeout argument of poll().
> (See [1] about poll() and its arguments)
> 
> Because "tcp_user_timeout" handles the timeout before and after the above
> procedure,
> there are different monitoring target between "socket_timeout" and
> "tcp_user_timeout".
> When the server is saturated, "statement_timeout" doesn't work while
> "socket_timeout" does work.
> In addition to this, the purpose of "statement_timeout" is to reduce
> server's burden, I think.

OK, I understand your intent.  What I asked Kirk-san is to suggest a description for socket_timeout parameter that the
userwould see in PostgreSQL documentation.
 


> My proposal is to enable clients to release their resource which is used
> communication w/ the saturated server.

I thought the primary purpose is to prevent applications from hanging too long, so that the user does not have to wait
infinitely. Releasing resources is a secondary purpose.
 


Regards
Takayuki Tsunakawa



RE: Timeout parameters

From
"Jamison, Kirk"
Date:
On Thursday, February 21, 2019 2:56 PM (GMT+9), Tsunakawa, Takayuki wrote:

>> 1) tcp_user_timeout parameter
>> I think this can be "committed" separately when it's finalized.

> Do you mean you've reviewed and tested the patch by simulating a 
> communication failure in the way Nagaura-san suggested?

Although I did review and followed the suggested way in previous email
way back (which uses root user) and it worked as intended, I'd also like
to hear feedback also from Fabien whether it's alright without the test
script, or if there's another way we can test it (maybe none?).
Then I think it's in good shape.
However, what I meant by my statement above was to have a separate
decision in committing the two parameters, because based from the thread
tcp_user_timeout has got more progress when it comes to reviews.
So once it's ready, it can be committed separately without waiting for
the socket_timeout param to be ready. (I dont think that requires a
separate thread)

>> 2) tcp_socket_timeout parameter
>> - whether (a) it should abort the connection from pqWait() or
>>   other means, or
>> (b) cancel the statement similar to how psql
>>   does it as suggested by Fabien

> We have no choice but to terminate the connection, because we can't tell
> whether we can recover from the problem and continue to use the connection
> (e.g. long-running query) or not (permanent server or network failure).
>
> Regarding the place, pqWait() is the best (and possibly only) place. 
> The purpose of this feature is to avoid waiting for response from the
> server forever (or too long) in any case, as a last resort.

Thank you for explaining further. I think that clarifies the implementation
argument on why it needs to terminate the connection over cancelling the query.
In short, I think it's intended purpose is to prevent dead connection.
It addresses the case for when network or server failure occurs and
when DBMS is terminated abruptly. So far, no existing parameter covers that yet.

As for the place, my knowledge is limited so I won't be able to provide
substantial help on it. Fabien pointed out some comments about it that needs
clarification though.
Quoting what Fabien wrote:
    "The implementation looks awkward, because part of the logic of pqWaitTimed 
    is reimplemented in pqWait. Also, I do not understand the computation 
    of finish_time, which seems to assume that pqWait is going to be called 
    immediately after sending a query, which may or may not be the case, and 
    if it is not the time() call there is not the start of the statement."

> Oracle has similar parameters called SQLNET.RECV_TIMEOUT and SQLNET.SEND_TIMEOUT.
> From those names, I guess they use SO_RCVTIMEO and SO_SNDTIMEO socket options. 
> However, we can't use them because use non-blocking sockets and poll(), while
> SO_RCV/SND_TIMEO do ont have an effect for poll():
> [..snipped]
>
>> - proper parameter name
>
> I think the name is good because it indicates the socket-level timeout.
> That's just like PgJDBC and Oracle, and I didn't feel strange when I read their manuals.

Agree. I'm actually +1 with tcp_socket_timeout


>> Perhaps you could also clarify a bit more through documentation on how 
>> socket_timeout handles the timeout differently from statement_timeout 
>> and tcp_user_timeout.

> Maybe.  Could you suggest good description?

Ok. I just realized that requires careful explanation.
How about the one below? 
I got it from combined explanations above. I did not include the 
differences though. This can be improved as the discussion
continues along the way.

tcp_socket_timeout (integer)

Terminate and restart any session that has been idle for more than
the specified number of milliseconds to prevent client from infinite
waiting for server due to dead connection. This can be used both as
a brute force global query timeout and detecting network problems.
A value of zero (the default) turns this off.


Regards,
Kirk Jamison


RE: Timeout parameters

From
MikalaiKeida@ibagroup.eu
Date:
Hello, all.

> tcp_socket_timeout (integer)
>
> Terminate and restart any session that has been idle for more than
> the specified number of milliseconds to prevent client from infinite
> waiting for server due to dead connection. This can be used both as
> a brute force global query timeout and detecting network problems.
> A value of zero (the default) turns this off.


I am not very  familiar with the PostgreSQL source code. Nevertheless,  the main idea of this parameter is clear for me - closing a connection when the PostgreSQL server does not response due to any reason. However, I have not found  in the discussion a reference that this parameter can be applied to the TCP as well as to the UNIX-domain sockets. Moreover, this parameter works out of communication layer. When we consider TCP communication, the failover is covered by keep_alive and tpc_user_timeout parameters.

According to it, we should not use 'tcp' prefix in this parameter name, 'socket' sub string is not suitable too.

'timeout' is OK.

This parameter works on the client side. So the word 'client' is a good candidate for using in this parameter name.
This parameter affects only when we send a 'query' to the pg server.

Based on it, we can build a name for this parameter 'client_query_timeout'.

The suggested explanation of this parameter does not follow the aim of integrating this parameter:

client_query_timeout

Specifies the number of seconds to prevent client from infinite waiting for server acknowledge to the sent query due to dead connection. This can be used both as a force global query timeout and network problems detector. A value of zero (the default) turns this off.

Best regards,
Mikalai Keida

RE: Timeout parameters

From
"Jamison, Kirk"
Date:

Hi,

 

> > tcp_socket_timeout (integer)
> >
> > Terminate and restart any session that has been idle for more than
> > the specified number of milliseconds to prevent client from infinite
> > waiting for server due to dead connection. This can be used both as
> > a brute force global query timeout and detecting network problems.
> > A value of zero (the default) turns this off.


On Friday, February 22, 2019 12:01 AM (GMT+9), MikalaiKeida@ibagroup.eu

wrote:
> I am not very  familiar with the PostgreSQL source code. Nevertheless,

> the main idea of this parameter is clear for me - closing a connection

> when the PostgreSQL server does not response due to any reason. However,

> I have not found  in the discussion a reference that this parameter can

> be applied to the TCP as well as to the UNIX-domain sockets. Moreover,

> this parameter works out of communication layer. When we consider TCP

> communication, the failover is covered by keep_alive and tpc_user_timeout

> parameters.
>

> According to it, we should not use 'tcp' prefix in this parameter name,

> 'socket' sub string is not suitable too.
>
> 'timeout' is OK.
>
> This parameter works on the client side. So the word 'client' is a good

> candidate for using in this parameter name.
> This parameter affects only when we send a 'query' to the pg server.

#Shookedt. Yeah, there should be no “tcp”in that parameter, so it was

my mistake.

Regarding whether we use the “socket_timeout” or “client_query_timeout”,

why is socket not suitable?

Although I’m not arguing against client_query_timeout as it’s also

a good candidate parameter name.


> Based on it, we can build a name for this parameter 'client_query_timeout'.
>
> The suggested explanation of this parameter does not follow the aim of

> integrating this parameter:
>
> client_query_timeout
>
> Specifies the number of seconds to prevent client from infinite waiting

> for server acknowledge to the sent query due to dead connection. This

> can be used both as a force global query timeout and network problems

> detector. A value of zero (the default) turns this off.

 

Thanks for the fix. In the client side though, other parameters are specified

in milliseconds, so I used the same.

 

Regards,

Kirk Jamison

RE: Timeout parameters

From
"Tsunakawa, Takayuki"
Date:
From: Jamison, Kirk/ジャミソン カーク
> Although I did review and followed the suggested way in previous email
> way back (which uses root user) and it worked as intended, I'd also like
> to hear feedback also from Fabien whether it's alright without the test
> script, or if there's another way we can test it (maybe none?).
> Then I think it's in good shape.

I'm afraid there's not a way to incorporate the test into the regression test.  If there is, Fabien should have
respondedearlier.
 



> However, what I meant by my statement above was to have a separate
> decision in committing the two parameters, because based from the thread
> tcp_user_timeout has got more progress when it comes to reviews.
> So once it's ready, it can be committed separately without waiting for
> the socket_timeout param to be ready. (I dont think that requires a
> separate thread)

I see, thanks.


> As for the place, my knowledge is limited so I won't be able to provide
> substantial help on it. Fabien pointed out some comments about it that needs
> clarification though.
> Quoting what Fabien wrote:
>     "The implementation looks awkward, because part of the logic of
> pqWaitTimed
>     is reimplemented in pqWait. Also, I do not understand the
> computation
>     of finish_time, which seems to assume that pqWait is going to be
> called
>     immediately after sending a query, which may or may not be the case,
> and
>     if it is not the time() call there is not the start of the statement."

The patch looks good in that respect.  This timeout is for individual socket operations like PgJDBC and Oracle, not the
querytimeout.  That's what the parameter name suggests.
 


> How about the one below?
> I got it from combined explanations above. I did not include the
> differences though. This can be improved as the discussion
> continues along the way.
> 
> tcp_socket_timeout (integer)
> 
> Terminate and restart any session that has been idle for more than
> the specified number of milliseconds to prevent client from infinite
> waiting for server due to dead connection. This can be used both as
> a brute force global query timeout and detecting network problems.
> A value of zero (the default) turns this off.

Looks better than the original one.  However,

* The unit is not millisecond, but second.
* Doesn't restart the session.
* Choose words that better align with the existing libpq parameters.  e.g. "connection" should be used instead of
"session"as in keepalive_xxx parameters, because there's not a concept of database session at socket level.
 
* Indicate that the timeout is for individual socket operations.


Regards
Takayuki Tsunakawa





RE: Timeout parameters

From
"Tsunakawa, Takayuki"
Date:
From: MikalaiKeida@ibagroup.eu [mailto:MikalaiKeida@ibagroup.eu]
> I am not very  familiar with the PostgreSQL source code. Nevertheless,  the
> main idea of this parameter is clear for me - closing a connection when
> the PostgreSQL server does not response due to any reason. However, I have
> not found  in the discussion a reference that this parameter can be applied
> to the TCP as well as to the UNIX-domain sockets. Moreover, this parameter
> works out of communication layer. When we consider TCP communication, the
> failover is covered by keep_alive and tpc_user_timeout parameters.

Right.  This timeout is for individual socket operations, and the last resort to forcibly terminate the connection when
othertimeouts don't work (TCP keepalive, tcp user timeout, statement_timeout).  Also, there's no reason to restrict
thistimeout to TCP.
 


> According to it, we should not use 'tcp' prefix in this parameter name,
> 'socket' sub string is not suitable too.
> 
> This parameter works on the client side. So the word 'client' is a good
> candidate for using in this parameter name.
> This parameter affects only when we send a 'query' to the pg server.

No, this timeout works not only for queries but for any socket operations to communicate with the server.  For example,
libpqsends query cancellation message to the server, receives server-side parameter value change notifications, etc.
So,like PgJDBC and Oracle, I think the name suggesting socket or communication is good.  That is, socket_timeout or
communication_timeoutis OK.
 


Regards
Takayuki Tsunakawa






RE: Timeout parameters

From
"Jamison, Kirk"
Date:
On Friday, February 22, 2019 9:46 AM (GMT+9), Tsunakawa, Takayuki wrote:

> > Terminate any session that has been idle for more than the 
> > specified number of seconds to prevent client from infinite 
> > waiting for server due to dead connection. This can be used both as a 
> > brute force global query timeout and detecting network problems.
> > A value of zero (the default) turns this off.
>
> Looks better than the original one.  However,
> 
> * The unit is not millisecond, but second.
> * Doesn't restart the session.
> * Choose words that better align with the existing libpq parameters.
>     e.g. "connection" should be used instead of "session" as in 
>     keepalive_xxx parameters, because there's not a concept of database
>     session at socket level.
> * Indicate that the timeout is for individual socket operations.

My bad. I was looking at the wrong documentations, so seconds should be used.
I guess we should just choose socket_timeout as parameter name.


socket_timeout (integer)

Terminate any connection that has been inactive for more than the specified number of seconds to prevent client from
infinitewaiting for individual socket read operations due to dead connection. This can be used both as a force global
querytimeout and network problems detector. A value of zero (the default) turns this off.
 

or

Controls the number of seconds of connection inactivity to prevent client from infinite waiting for individual socket
readoperations due to dead connection. This can be used both as a force global query timeout and network problems
detector.A value of zero (the default) turns this off. 
 
    

Regards,
Kirk Jamison


RE: Timeout parameters

From
"Tsunakawa, Takayuki"
Date:
From: Jamison, Kirk [mailto:k.jamison@jp.fujitsu.com]
> socket_timeout (integer)

libpq documentation does not write the data type on the parameter name line.


> Terminate any connection that has been inactive for more than the specified
> number of seconds to prevent client from infinite waiting for individual
> socket read operations due to dead connection. This can be used both as
> a force global query timeout and network problems detector. A value of zero
> (the default) turns this off.
> 
> or
> 
> Controls the number of seconds of connection inactivity to prevent client
> from infinite waiting for individual socket read operations due to dead
> connection. This can be used both as a force global query timeout and network
> problems detector. A value of zero (the default) turns this off.

The second one is better, but:

* Just "connection inactivity" (i.e. the application hasn't submitted any request to the server for a long time) does
notterminate the connection.  
 
* "due to dead connction" is not the cause for the timeout.  If the timeout occurs, consider the connection dead (see
keepalives_count).
* Not restricted to "read" operation?


Regards
Takayuki Tsunakawa



RE: Timeout parameters

From
"Nagaura, Ryohei"
Date:
Hi, all.

Thank you for discussion.
I made documentation about socket_timeout and reflected Tsunakawa-san's comment in the new patch.
# Mainly only fixing documentation...
The current documentation of socket_timeout is as follows:
socket_timeout
Controls the number of second of client's waiting time for individual socket read/write operations. This can be used
bothas a force 
 
global query timeout and network problems detector. A value of zero (the default) turns this off.

Best regards,
---------------------
Ryohei Nagaura


Attachment

RE: Timeout parameters

From
"Jamison, Kirk"
Date:
On Monday, February 25, 2019 1:49 PM (GMT+9), Nagaura, Ryohei wrote:

> Thank you for discussion.
> I made documentation about socket_timeout and reflected Tsunakawa-san's
> comment in the new patch.
> # Mainly only fixing documentation...
> The current documentation of socket_timeout is as follows:
> socket_timeout
> Controls the number of second of client's waiting time for individual socket
> read/write operations. This can be used both as a force global query timeout
> and network problems detector. A value of zero (the default) turns this off.

Thanks for the update.
However, your socket_timeout patch currently does not apply.
$ git apply ../socket_timeout_v5.patch
fatal: corrupt patch at line 24
--> l24: diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c

>socket_timeout
>
> Controls the number of second of client's waiting time for individual socket read/write
> operations. This can be used both as a force global query timeout and network problems
> detector. A value of zero (the default) turns this off.

Got the doc fix. I wonder if we need to document what effect the parameter does:
terminating the connection. How about:

Controls the number of seconds of client-server communication inactivity
before forcibly closing the connection in order to prevent client from
infinite waiting for individual socket read/write operations. This can
be used both as a force global query timeout and network problems 
detector, i.e. hardware failure and dead connection. A value of zero 
(the default) turns this off.

Well, you may remove the "i.e. hardware failure and dead connection" if
that's not necessary.

I know you've only added the doc recommendation.
But regarding the code, you may also fix other parts:
a. Use parse_int_param instead of atoi
>+        conn->socket_timeout = atoi(conn->pgsocket_timeout);

b. proper spacing after comma
>+    {"socket_timeout", NULL, NULL, NULL,
>+        "Socket-timeout","",10,        /* strlen(INT32_MAX) == 10 */
>+    offsetof(struct pg_conn, pgsocket_timeout)},
... 
>+        result = pqSocketCheck(conn,forRead,forWrite,finish_time);

There are probably other parts I missed.

Regards,
Kirk Jamison


RE: Timeout parameters

From
"Nagaura, Ryohei"
Date:
Hi, kirk-san.

> From: Jamison, Kirk <k.jamison@jp.fujitsu.com>
> $ git apply ../socket_timeout_v5.patch
> fatal: corrupt patch at line 24
> --> l24: diff --git a/src/interfaces/libpq/fe-connect.c
> --> b/src/interfaces/libpq/fe-connect.c
How about applying by "patch -p1"?
I have confirmed that my patch could be applied in that way while not confirmed using "git apply" command.

> a. Use parse_int_param instead of atoi
> >+        conn->socket_timeout = atoi(conn->pgsocket_timeout);
This is my bad. I'll remake it.
Very sorry for the same mistake.


Best regards,
---------------------
Ryohei Nagaura 



RE: Timeout parameters

From
"Nagaura, Ryohei"
Date:
This mail is the same as https://www.postgresql.org/message-id/EDA4195584F5064680D8130B1CA91C453EBE79%40G01JPEXMBYT04
I resend because the mail didn't be reflected.

Hi, kirk-san.

> From: Nagaura, Ryohei <nagaura.ryohei@jp.fujitsu.com> This is my bad. 
> I'll remake it.
> Very sorry for the same mistake.
I remade the patches and attached in this mail.
In socket_timeout patch, I replaced "atoi" to "parse_int_param" and inserted spaces just after some comma.
There are a few changes about documentation for the following reason:

> From: Jamison, Kirk <k.jamison@jp.fujitsu.com> Got the doc fix. I 
> wonder if we need to document what effect the parameter does:
> terminating the connection. How about:
I also don't know, but...

> Controls the number of seconds of client-server communication 
> inactivity before forcibly closing the connection in order to prevent 
> client from infinite waiting for individual socket read/write 
> operations. This can be used both as a force global query timeout and 
> network problems detector, i.e. hardware failure and dead connection. A value of zero (the default) turns this off.
"communication inactivity" seems to be a little extreme.
If the communication layer is truly dead you will use keepalive.
This use case is when socket option is not available for some reason.
So it would be better "terminating the connection" in my thought.
And...

> Well, you may remove the "i.e. hardware failure and dead connection" 
> if that's not necessary.
I don't think it is necessary because you can use this parameter other than that situations.
Not "i.e." but "e.g." have a few chance to be documented.

About TCP_USER_TIMEOUT patches, there are only miscellaneous changes: removing trailing spaces and making comments of
parameterslower case as you pointed out.
 

Best regards,
---------------------
Ryohei Nagaura


Attachment

RE: Timeout parameters

From
"Jamison, Kirk"
Date:
Hi Nagaura-san,

Your socket_timeout patch still does not apply either with
git or patch command. It says it's still corrupted.
I'm not sure about the workaround, because the --ignore-space-change
and --ignore-whitespace did not work for me.
Maybe it might have something to do with your editor when creating
the patch. Could you confirm?
The CFbot is also another way to check if your latest patches work.
http://commitfest.cputube.org/

On Tuesday, February 26, 2019 5:16PM (GMT+9), Nagaura, Ryohei wrote:
> "communication inactivity" seems to be a little extreme.
> If the communication layer is truly dead you will use keepalive.
> This use case is when socket option is not available for some reason.
> So it would be better "terminating the connection" in my thought.

About the doc. Alright. 
- There's a typo in the doc: connectoin --> connection
- In the code, there's a part where between 0-2 seconds
is interpreted as 2 because it's the minimum time. I think
the comment should be improved, and use insist (force) instead
of "need". This should also be added in the documentation.

From:
/*
 * Rounding could cause communication to fail; need at least 2 secs
 */

To:
/*
 * Rounding could cause communication to fail;
 * insist on at least two seconds.
 */

Docs
- added how it should be written as decimal integer, similar
to connect_timeout
- used closing instead of terminating, to be consistent with
other params
- added the minimum allowed timeout

socket_timeout

Controls the number of second (write as a decimal integer, e.g. 10) of
client's waiting time for individual socket read/write operations
before closing the connection. This can be used both as a force global
query timeout and network problems detector. A value of zero (the
default) turns this off, which means wait indefinitely. The minimum
allowed timeout is 2 seconds, so a value of 1 is interpreted as 2.

> About TCP_USER_TIMEOUT patches, there are only miscellaneous changes:
> removing trailing spaces and making comments of parameters lower case
> as you pointed out.
Thanks for the update.
Confirmed the change.


Regards,
Kirk Jamison


RE: Timeout parameters

From
"Nagaura, Ryohei"
Date:
Hi, kirk-san.

Thank you for review.

> From: Jamison, Kirk <k.jamison@jp.fujitsu.com>
> Your socket_timeout patch still does not apply either with git or patch command. It
> says it's still corrupted.
> I'm not sure about the workaround, because the --ignore-space-change and
> --ignore-whitespace did not work for me.
> Maybe it might have something to do with your editor when creating the patch.
> Could you confirm?
> The CFbot is also another way to check if your latest patches work.
> http://commitfest.cputube.org/
It may be caused by git diff command with no option.
The new patches are created by git diff -w.

About socket_timeout:
Your feedback looks good so that I adopted it. Please confirm it.

About TCP_USER_TIMEOUT:
There is no change in sources.
Just change the command option "git diff" -> "git diff -w".

Best regards,
---------------------
Ryohei Nagaura


Attachment

RE: Timeout parameters

From
"Nagaura, Ryohei"
Date:
Hi,

I rewrote two TCP_USER_TIMEOUT patches.
I changed the third argument of setsockopt() from 18 to TCP_USER_TIMEOUT.

This revision has the following two merits.
* Improve readability of source
* Even if the definition of TCP_USER_TIMEOUT is changed, it is not affected.
  e.g., in the current linux, TCP_USER_TIMEOUT is defined, but even if it is changed to 19, it 'll be available.

Best regards,
---------------------
Ryohei Nagaura


Attachment

RE: Timeout parameters

From
Fabien COELHO
Date:
Hello Ryohei-san,

There are three independent patches in this thread.

About the socket timeout patch v7:

Patch applies & compiles cleanly. "make check" is ok, although there are 
no specific tests, which is probably ok. Doc build is ok.

I'd simplify the doc first sentence to:

""" Number of seconds to wait for socket read/write operations before 
closing the connection, as a decimal integer. This can be used both to 
force global client-side query timeout and to detect network problems. A 
value of zero (the default) turns this off, which means wait indefinitely. 
The minimum allowed timeout is 2 seconds, so a value of 1 is interpreted 
as 2."""

The code looks mostly ok.

The comment on finish_time computation does not look very useful to me, it 
just states in English the simple formula below. I'd suggest to remove it.

I've tested setting "socket_timeout=2" and doing "SELECT pg_sleep(10);". 
It works somehow, however after a few attempts I have:

  psql> SELECT COUNT(*) FROM pg_stat_activity WHERE application_name='psql';
  3

I.e. I have just created lingering postmasters, which only stop when the 
queries are actually finished. I cannot say that I'm thrilled by that 
behavior. ISTM that libpq should at least attempt to cancel the query by 
sending a cancel request before closing the connection, as I have already 
suggested in a previous review. The "client side query timeout" use case 
does not seem well addressed by the current implementation.

If the user reconnects, eg "\c db", the setting is lost. The re-connection 
handling should probably take care of this parameter, and maybe others.

The implementation does not check that the parameter is indeed an integer, 
eg "socket_timeout=2bad" is silently ignored. I'd suggest to check like 
already done for other parameters, eg "port".

-- 
Fabien.


RE: Timeout parameters

From
Fabien COELHO
Date:
Hello again Ryohei-san,

About the tcp_user_timeout libpq parameter v8.

Patch applies & compiles cleanly. Global check is ok, although there are 
no specific tests.

Documentation English can be improved. Could a native speaker help, 
please?

ISTM that the documentation both states that it works and does not work on 
Windows. One assertion must be false.

Syntax error, eg "tcp_user_timeout=2bad", are detected, good.

I could not really test the feature, i.e. I could not trigger a timeout. 
Do you have a suggestion on how to test it?

-- 
Fabien.


RE: Timeout parameters

From
Fabien COELHO
Date:
> About the tcp_user_timeout libpq parameter v8.

Basically same thing about the tcp_user_timeout guc v8, especially: do you 
have any advice about how I can test the feature, i.e. trigger a timeout?

> Patch applies & compiles cleanly. Global check is ok, although there are no 
> specific tests.
>
> Documentation English can be improved. Could a native speaker help, please?
>
> ISTM that the documentation both states that it works and does not work on 
> Windows. One assertion must be false.
>
> Syntax error, eg "tcp_user_timeout=2bad", are detected, good.
>
> I could not really test the feature, i.e. I could not trigger a timeout. Do 
> you have a suggestion on how to test it?


-- 
Fabien.


RE: Timeout parameters

From
"Jamison, Kirk"
Date:
On Sunday, March 3, 2019 4:09PM (GMT+9), Fabien COELHO wrote:
>Basically same thing about the tcp_user_timeout guc v8, especially:
> do you have any advice about how I can test the feature, i.e. 
> trigger a timeout?
>
>> Patch applies & compiles cleanly. Global check is ok, although there 
>> are no specific tests.
>
>> Documentation English can be improved. Could a native speaker help, please?
+1 This needs a help from a native English speaker, cause I am not,
although my name sounds like one. ;)

>> ISTM that the documentation both states that it works and does not 
>> work on Windows. One assertion must be false.
>>
>> Syntax error, eg "tcp_user_timeout=2bad", are detected, good.
>
>> I could not really test the feature, i.e. I could not trigger a 
>> timeout. Do you have a suggestion on how to test it?

I have also tested the previous patches and most recent one.
I just followed the test instructions above. And maybe you can also do it too.
Here's how I did it.

Setting:
- Before the test via server root user, add the port and client source
using the firewall-cmd to allow client to connect to DB server.
> systemctl start firewalld 
> firewall-cmd --add-source={clientipaddress}/32 --add-port={server_port}/tcp --zone=public --permanent
> firewall-cmd --reload

Testing (v8 of user timeout parameters):

[Client-Side]
1. $ psql postgresql://USERNAME:PASSWORD@server_host:server_port/dbname?tcp_user_timeout=15000
2. postgres=# select inet_client_port();
 inet_client_port
------------------
            34819
3. (Via root user of client, other console window)
root# iptables -I INPUT -p tcp --dport 34819 -j DROP
4. postgres=# select pg_sleep(10);
5. Error output is displayed.
could not receive data from server: Connection timed out
--
Tested again but switching #3 & #4.
There should be a new client port number by then.
Below are the logs in the server.

[Server-Side Logs]
Test#1
[4736] LOG:  statement: select inet_client_port();
[4736] LOG:  statement: select pg_sleep(10);
[4736] LOG:  could not receive data from client: Connection timed out
Test#2
[5594] LOG:  statement: select inet_client_port();
[5594] LOG:  statement: set tcp_user_timeout=15000;
[5594] LOG:  statement: select pg_sleep(10);
[5594] LOG:  could not receive data from client: Connection timed out

Regards,
Kirk Jamison



RE: Timeout parameters

From
"Nagaura, Ryohei"
Date:
Hi, Fabien-san.

About TCP_USER_TIMEOUT:
> From: Fabien COELHO <coelho@cri.ensmp.fr>
> I could not really test the feature, i.e. I could not trigger a timeout.
> Do you have a suggestion on how to test it?
Please see the previous mail[1] or current kirk-san's mail.

About socket_timeout:
> From: Fabien COELHO <coelho@cri.ensmp.fr>
> are actually finished. I cannot say that I'm thrilled by that behavior. ISTM that libpq
> should at least attempt to cancel the query 
Would you please tell me why you think so? 
If it was implemented so, timeout couldn't work when the server is so busy that can't process a cancel request.
If the client encounters such a case, how does the client stop to wait the server?
How does the client release its resources?
Socket_timeout is helpful for such clients.

I'll send patches after next week.

[1] https://www.postgresql.org/message-id/EDA4195584F5064680D8130B1CA91C453A32DB@G01JPEXMBYT04
Best regards,
---------------------
Ryohei Nagaura




RE: Timeout parameters

From
Fabien COELHO
Date:
Hello Ryohei-san,

> About socket_timeout:
>> From: Fabien COELHO <coelho@cri.ensmp.fr>
>> are actually finished. I cannot say that I'm thrilled by that behavior. ISTM that libpq
>> should at least attempt to cancel the query
> Would you please tell me why you think so?

As I understand the "client-side timeout" use-case, as distinct from 
network-issues-related timeouts proposed in the other patches, the point 
is more or less to deal with an overloaded server.

As it is currently implemented, the connection is broken abruptly when the 
timeout is reached, but that the initial query goes on nevertheless 
server-side, and a new connection is created which would worsens the load.

ISTM that it would be more appropriate to cancel the query and possibly 
keep the connection, if possible.

> If it was implemented so, timeout couldn't work when the server is so 
> busy that can't process a cancel request.

Hmmm. I do not understand how you would know that.

The timeout is raised because getting the result takes time, which may be 
simply because the query really takes time, and it does not imply that the 
server would not be able to process a cancel request. You could at least 
send it before trying the extreme approach.

If the server is really overloaded, the current implementation does not 
help in any way but contributes to worsen the situation, hence my lack of 
enthousiasm.

> If the client encounters such a case, how does the client stop to wait the server?
> How does the client release its resources?
> Socket_timeout is helpful for such clients.

-- 
Fabien.


RE: Timeout parameters

From
"Nagaura, Ryohei"
Date:
Hi, Fabien-san.

> From: Fabien COELHO <coelho@cri.ensmp.fr>
> As I understand the "client-side timeout" use-case, as distinct from
> network-issues-related timeouts proposed in the other patches, the point is more
> or less to deal with an overloaded server.
The main purpose of this parameter is to avoid client's waiting for DB server infinitely, not reducing the server's
burden.
This results in not waiting end-user, which is most important.
As Tsunakawa-san presented, there are similar parameters in JDBC and Oracle, so I think that this idea is valid.

> The timeout is raised because getting the result takes time, which may be simply
> because the query really takes time, and it does not imply that the server would
> not be able to process a cancel request.
You mentioned about when a SQL query is heavy, but I want to talk about when OS hang.
If OS hang occurs, the cost of cancel request processing is so high.
This is the reason why the implementation is just only disconnection without canceling.

Best regards,
---------------------
Ryohei Nagaura




RE: Timeout parameters

From
MikalaiKeida@ibagroup.eu
Date:
Hello Ryohei-san,

I understand the main aim of your suggestion that a client application has to do a lot of work except making quires to the database. I agree with you that "client-side timeout" has to be integrated into the PostgreSQL server and libpq.
I'm with Fabien that "client-side timeout" seems unsafe. Also I agree with Fabien that quire can take much time to be processed by the PosgtreSQL server and it is a normal behavior. There is possible that performance of the PostgreSQL server machine can be low temporary or continuously, especially during upgrading procedure.
I think it is important to add some more information into the description of this parameter which informs end-users that this parameter has to be used very carefully because it can impact as on the client application as on the server.

> You mentioned about when a SQL query is heavy, but I want to talk about when OS hang.
> If OS hang occurs, the cost of cancel request processing is so high.


Such a situation looks to be covered by TCP_USER_TIMEOUT and keep_alive mechanisms. May be it is better to warn in documentation or prohibit in the source code to set
"client-side timeout" less than TCP_USER_TIMEOUT to avoid handling "possible" logical problems ahead to the network problems. Keep in mind that   "client-side timeout" can abort a connection which uses UNIX-domain sockets too.

What do you think about it?

Best regards,
Mikalai Keida

RE: Timeout parameters

From
"Nagaura, Ryohei"
Date:
Hello, Mikalai-san.

Thank you for your mail.

> From: MikalaiKeida@ibagroup.eu <MikalaiKeida@ibagroup.eu>
> I'm with Fabien that "client-side timeout" seems unsafe. Also I agree with Fabien
> that quire can take much time to be processed by the PosgtreSQL server and it is
> a normal behavior. There is possible that performance of the PostgreSQL server
> machine can be low temporary or continuously, especially during upgrading
> procedure.
I'm very poor at English, so I couldn't read your intension.
In conclusion, you think this feature should process something e.g., sending canceling request. Don't you?
If yes, is it hard for you to accept my thought as follows?
1. The "end-user" I mentioned is application/system user.
2. Such end-users don't want to wait responses from system or application for whatever reason.
3. End-users don't care where the failure of the system is when it occurs.
4. They just don't want to wait long.
5. If I made the server processing something when this parameter is triggered, they would wait for the time it takes to
processit.
 
6. Therefore it is not preferable to make servers processing something in this feature.

> Such a situation looks to be covered by TCP_USER_TIMEOUT and keep_alive
> mechanisms. 
i.e., you mean that it can't be happened to occur OS hang with network still alive.
Don't you?

These comments are helpful for documenting. Thank you.
> I think it is important to add some more information into the description of this
> parameter which informs end-users that this parameter has to be used very
> carefully because it can impact as on the client application as on the server.
>...
> May be it is better to warn in documentation or prohibit in the source
> code to set "client-side timeout" less than TCP_USER_TIMEOUT to avoid handling
> "possible" logical problems ahead to the network problems.
> Keep in mind that "client-side timeout" can abort a connection which uses 
> UNIX-domain sockets too.
I didn't take into account it. I'll investigate it. Thanks.

Best regards,
---------------------
Ryohei Nagaura




RE: Timeout parameters

From
MikalaiKeida@ibagroup.eu
Date:
Hello Nagaura-san.

Thank you for your response.

The main idea of my comment was to avoid handling logical errors ( "client-side timeout") in advance to the detection of network problems!!!!
Therefore, I suggested setting "client-side timeout" greater of equal to the TCP_USER_TIMEOUT or note about it in the documentation.

> In conclusion, you think this feature should process something e.g., sending canceling request. Don't you?
> If yes, is it hard for you to accept my thought as follows?
> 1. The "end-user" I mentioned is application/system user.
> 2. Such end-users don't want to wait responses from system or application for whatever reason.
> 3. End-users don't care where the failure of the system is when it occurs.
> 4. They just don't want to wait long.
> 5. If I made the server processing something when this parameter is triggered, they would wait for the time it takes to process it.
> 6. Therefore it is not preferable to make servers processing something in this feature.


I see what you mean and I agree with you.

> > Such a situation looks to be covered by TCP_USER_TIMEOUT and keep_alive
> > mechanisms.
> i.e., you mean that it can't be happened to occur OS hang with network still alive.
> Don't you?


I'm afraid to be mistaken but I think that when network is alive it means that OS does not hang.
I think that it is a responsibility of the OS kernel to manage network connections. Keep_alive mechanism checks a network state by exchanging data between client's and server's OS kernels. I think the same procedure is used in TCP_USER_TIMEOUT mechanism. So, when keep_alive and TCP_USER_TIMEOUT say that network works correct, it means that kernels of client and server do not hang. On the other hand it does not guarantee that  PostgreSQL server is not hang. "client-side timeout" should handle this problem when network problem was not detected.

Best regards,
Mikalai Keida

Re: Timeout parameters

From
Robert Haas
Date:
On Sun, Mar 10, 2019 at 10:25 PM Nagaura, Ryohei
<nagaura.ryohei@jp.fujitsu.com> wrote:
> The main purpose of this parameter is to avoid client's waiting for DB server infinitely, not reducing the server's
burden.
> This results in not waiting end-user, which is most important.

+1.  If the server fails to detect that the client has gone away,
that's the server's problem.  The client is entirely entitled to be
selfish if it so wishes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Timeout parameters

From
Robert Haas
Date:
On Wed, Feb 27, 2019 at 10:07 PM Nagaura, Ryohei
<nagaura.ryohei@jp.fujitsu.com> wrote:
> I rewrote two TCP_USER_TIMEOUT patches.
> I changed the third argument of setsockopt() from 18 to TCP_USER_TIMEOUT.
>
> This revision has the following two merits.
> * Improve readability of source
> * Even if the definition of TCP_USER_TIMEOUT is changed, it is not affected.
>   e.g., in the current linux, TCP_USER_TIMEOUT is defined, but even if it is changed to 19, it 'll be available.

The documentation in TCP_backend_v8.patch needs some work.

+       the system default. This parameter is supported only on
systems that support
+       TCP_USER_TIMEOUT or an equivalent socket option, and on
Windows; on other
+       systems, it must be zero. In sessions connected via a
Unix-domain socket,
+       this parameter is ignored and always reads as zero.

So this says that it works on systems that have TCP_USER_TIMEOUT or an
equivalent socket option and that it also works on Windows, and then a
few lines later....

+       This parameter is not supported on Windows, and must be zero.

This says it actually doesn't work on Windows.

I think the language about an equivalent socket option isn't helpful.
We should document any equivalents we actually support, and not say
anything about anything else.

+       To enable full control under TCP connection use this option
together with
+       keepalive.

That doesn't tell me anything useful.

+        Specify in milliseconds the time to disconnect to the client
+        when there is no ack packet from the client to the server's
data transmission.
+        This parameter is supported on linux version 2.6.37 or later.

Hmm.  This looks like a second, and broadly better, definition of the
parameter.  Now we have a different definition of where it's
supported.  This is the THIRD attempt to tell me which platforms are
supported -- just Linux, 2.6.37 or greater.

+         This parameter is not supported on Windows.

And then, in case I missed the last three attempts to tell me about
platform support, there's this.

Have you checked whether this can be easily supported on FreeBSD and/or NetBSD?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Timeout parameters

From
Fabien COELHO
Date:
Hello Robert,

> <nagaura.ryohei@jp.fujitsu.com> wrote:
>> The main purpose of this parameter is to avoid client's waiting for DB server infinitely, not reducing the server's
burden.
>> This results in not waiting end-user, which is most important.
>
> +1.  If the server fails to detect that the client has gone away,
> that's the server's problem.  The client is entirely entitled to be
> selfish if it so wishes.

With the current libpq implementation the server does not see that the 
client has gone away on a trivial "pg_sleep", where the load is definitely 
not an issue, so the server problem seems to be there already.

Also, I do not see the downside of sending a cancel query before severing 
the connection. If it is not processed, too bad, but if it is then it is 
for the better.

ISTM that the client can be selfish but nice about its timeout.

-- 
Fabien.


RE: Timeout parameters

From
"Nagaura, Ryohei"
Date:
Hello Mikalai-san.

> From: MikalaiKeida@ibagroup.eu <MikalaiKeida@ibagroup.eu>
> The main idea of my comment was to avoid handling logical errors ( "client-side
> timeout") in advance to the detection of network problems!!!!
> Therefore, I suggested setting "client-side timeout" greater of equal to the
> TCP_USER_TIMEOUT or note about it in the documentation.
I have caught up with your intension. You have a point! Thank you.

> I'm afraid to be mistaken but I think that when network is alive it means that OS
> does not hang.
> I think that it is a responsibility of the OS kernel to manage network connections.
> Keep_alive mechanism checks a network state by exchanging data between
> client's and server's OS kernels. I think the same procedure is used in
> TCP_USER_TIMEOUT mechanism. So, when keep_alive and
> TCP_USER_TIMEOUT say that network works correct, it means that kernels of
> client and server do not hang. On the other hand it does not guarantee that
> PostgreSQL server is not hang. "client-side timeout" should handle this problem
> when network problem was not detected.
Thank you for your kind explain.

Best regards,
---------------------
Ryohei Nagaura




RE: Timeout parameters

From
"Nagaura, Ryohei"
Date:
Hello Robert-san.

> From: Robert Haas <robertmhaas@gmail.com>
> So this says that it works on systems that have TCP_USER_TIMEOUT or an
> equivalent socket option and that it also works on Windows, and then a few lines
> later....
> 
> +       This parameter is not supported on Windows, and must be zero.
> 
> This says it actually doesn't work on Windows.
It has been pointed out by Fabien-san before and it has not been corrected yet.
I am very sorry for making "needs review" even though I have not reposted the patch.

> I think the language about an equivalent socket option isn't helpful.
> We should document any equivalents we actually support, and not say anything
> about anything else.
Does it mean that doc explain with words "setsockopt()" and/or its option "TCP_USER_TIMEOUT" ?
BTW, This sentence is to be consistent with the keepalive description e.g., [1].
In my opinion, if this doc need to be changed, we should change docs about keepalive too.
How do you think?

> +       To enable full control under TCP connection use this option
> together with
> +       keepalive.
> That doesn't tell me anything useful.
Oh, I see. Indeed, this is not about postgres but general network.

> 
> 
> +        Specify in milliseconds the time to disconnect to the client
> +        when there is no ack packet from the client to the server's
> data transmission.
> +        This parameter is supported on linux version 2.6.37 or later.
> 
> Hmm.  This looks like a second, and broadly better, definition of the parameter.
> Now we have a different definition of where it's supported.  This is the THIRD
> attempt to tell me which platforms are supported -- just Linux, 2.6.37 or greater.
> 
> +         This parameter is not supported on Windows.
> 
> And then, in case I missed the last three attempts to tell me about platform
> support, there's this.
I intended to delete these sentences. I'll delete it.

> Have you checked whether this can be easily supported on FreeBSD and/or
> NetBSD?
No, not yet. Checked red hat enterprise linux only...
Indeed, I should have checked on some UNIX OS.

[1] https://www.postgresql.org/docs/current/runtime-config-connection.html
Best regards,
---------------------
Ryohei Nagaura


Re: Timeout parameters

From
Robert Haas
Date:
On Wed, Mar 13, 2019 at 2:05 AM Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> Also, I do not see the downside of sending a cancel query before severing
> the connection. If it is not processed, too bad, but if it is then it is
> for the better.

If the network connection is dead, which is the situation the patch
intends to detect, then PQcancel() isn't going to work, but it might
still hang for a period of time or forever.  That seems like a pretty
major downside.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Timeout parameters

From
Fabien COELHO
Date:
Hello Robert,

>> Also, I do not see the downside of sending a cancel query before severing
>> the connection. If it is not processed, too bad, but if it is then it is
>> for the better.
>
> If the network connection is dead, which is the situation the patch
> intends to detect,

Hmmm... ISTM that we are not talking about the same patch...

My point is about the "socket_timeout" patch which timeout on not 
receiving an answer, but is not related to the network connection.

The other two patches, however, deal with tcp timeout both client & server 
side, and are indeed more related to the network connection. Sending 
request on a tcp timeout would not make much sense, but this is not the 
proposal here.

> then PQcancel() isn't going to work, but it might still hang for a 
> period of time or forever.  That seems like a pretty major downside.

The fact that no answer data is received may mean that it takes time to 
compute the result, so cancelling seems appropriate to me, rather than 
severing the connection and starting a new one immediately, leaving the 
server loaded with its query.

-- 
Fabien.


RE: Timeout parameters

From
Fabien COELHO
Date:
> Hello Fabien-san.

The 2nd patch is 700 KB, I think that there is a unvoluntary file copy.

>> If the user reconnects, eg "\c db", the setting is lost. The
>> re-connection handling should probably take care of this parameter, and maybe others.
> I think your opinion is reasonable, but it seems not in this thread.

HI think that your patch is responsible for the added option at least.

I agree that the underlying issue that other parameters should probably 
also be reused, which would be a bug fix, does not belong to this thread.

> * settings Inheritance by "\c" command.
>     suggested by Fabien-san.
>     Not implemented yet.
>     This should be in the other thread.

See above.

-- 
Fabien.


Re: Timeout parameters

From
Robert Haas
Date:
On Wed, Mar 13, 2019 at 1:25 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> Hmmm... ISTM that we are not talking about the same patch...

You are correct!  I was talking about the patches that allow user
control of TCP_USER_TIMEOUT, which is apparently totally different
from the socket_timeout patch that you're talking about.

The first thing I notice about the socket_timeout patch is that the
documentation is definitely wrong:

+        Number of seconds to wait for socket read/write operation before
+        closing the connection, as a decimal integer. This can be
used both as a
+        force global query timeout and network problems detector. A value of

This can't be used to force a global query timeout, because any kind
of protocol message - e.g. a NOTICE or WARNING - would reset the
timeout, allowing the continue past the supposedly-global query
timeout.  There may be some other ways that can happen, too, either
currently or in the future.

> My point is about the "socket_timeout" patch which timeout on not
> receiving an answer, but is not related to the network connection.

The above-quoted documentation disagrees with you about that, claiming
that this can be used as a network problem detector. However, I think
that the documentation is wrong about that, too.  The mere fact that
the connection is idle does not show that there is a network problem;
there could, for example, be TCP keepalives being sent and received
behind the scenes, and if so, this patch would cause a connection that
is known to be valid to be disconnected for no reason.

Actually, I think that socket_timeout_v8.patch is a bad idea and
should be rejected, not because it doesn't send a cancel to the server
but just because it's not the right way of solving either of the
problems that it purports to solve.  If you want a timeout for
queries, you need that to be administered by the server, which knows
when it starts and finishes executing a query; you cannot substitute a
client-side judgement based on the last time we received a byte of
data.  Maybe a client-side judgement would work if the timer were
armed when we send a Query or Execute message and disarmed when we
receive ReadyForQuery.  And, if you want a network problem detector,
then you should use keepalives and perhaps the TCP_USER_TIMEOUT stuff,
so that you don't kill perfectly good connections that just happen to
be idle.

The only argument I can really see for socket_timeout_v8.patch is that
there might be platforms where setting keepalives and/or
TCP_USER_TIMEOUT doesn't work.  However, if that is the issue, then I
think we should look for equivalent facilities that do work on those
platforms and consider supporting those facilities, or else just
decide that if certain features are not supported on obscure platforms
then users of those platforms are going to have to live without those
features.  If we add something like this, people are likely to try to
use it and then get frustrated when it doesn't work right.

It's also worth noting that it's generally a bad idea to try to make
one facilities do two different things with conflicting requirements.
It seems not unlikely that if we add this, somebody will come back and
ask us to add keepalives to the protocol itself (rather than relying
on TCP) to keep this from killing the connection in vain -- and then
somebody else will object to that because they're using it to kill off
queries that run too long (and have raised client_min_messages to
ERROR in an attempt to avoid getting any other messages meanwhile).
By munging together two different requirements into a single facility,
we make it impossible to decide which side of that argument is
correct.  It will just come down to who yells the loudest.

Note that none of this is an argument against the TCP_USER_TIMEOUT
portion of this patch set.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


RE: Timeout parameters

From
"Tsunakawa, Takayuki"
Date:
From: Fabien COELHO [mailto:coelho@cri.ensmp.fr]
> >> If the user reconnects, eg "\c db", the setting is lost. The
> >> re-connection handling should probably take care of this parameter, and
> maybe others.
> > I think your opinion is reasonable, but it seems not in this thread.
> 
> HI think that your patch is responsible for the added option at least.
> 
> I agree that the underlying issue that other parameters should probably
> also be reused, which would be a bug fix, does not belong to this thread.

This doesn't seem to be a bug.  \connect just establishes a new connection, not reusing the previous settings for most
connectionparameters.  As the examples in the following manual page show, the user needs to specify necessary
connectionparameters.
 

https://www.postgresql.org/docs/devel/app-psql.html

=> \c service=foo
=> \c "host=localhost port=5432 dbname=mydb connect_timeout=10 sslmode=disable"


But I'm afraid the description of \connect may not be clear enough about connection parameters, and that may cause
usersto expect the reuse of all connection parameter settings.  Anyway, I think this would be an improvement for psql's
documentationor new feature for psql.  What do you think?
 


Regards
Takayuki Tsunakawa





RE: Timeout parameters

From
"Tsunakawa, Takayuki"
Date:
From: Robert Haas [mailto:robertmhaas@gmail.com]
> The first thing I notice about the socket_timeout patch is that the
> documentation is definitely wrong:

Agreed.  I suppose the description should be clearer about:

* the purpose and what situation this timeout will help: not for canceling a long-running query or recovering from a
networkfailure.
 
* relationship with other timeout parameters: e.g., socket_timeout > connect_timeout, socket_timeout >
statement_timeout,socket_timeout > TCP keepalive/user timeout
 


> Actually, I think that socket_timeout_v8.patch is a bad idea and
> should be rejected, not because it doesn't send a cancel to the server
> but just because it's not the right way of solving either of the
> problems that it purports to solve.  If you want a timeout for
> queries, you need that to be administered by the server, which knows
> when it starts and finishes executing a query; you cannot substitute a
> client-side judgement based on the last time we received a byte of
> data.  Maybe a client-side judgement would work if the timer were
> armed when we send a Query or Execute message and disarmed when we
> receive ReadyForQuery.  And, if you want a network problem detector,
> then you should use keepalives and perhaps the TCP_USER_TIMEOUT stuff,
> so that you don't kill perfectly good connections that just happen to
> be idle.

I think the purpose of socket_timeout is to avoid infinite or unduely long wait and return response to users, where
otherexisting timeout parameters wouldn't help.  For example, OS's process scheduling or paging/swapping problems might
causelong waits before postgres gets control (e.g. due to Transparent HugePage (THP)).  Within postgres, the unfair
lwlockcan unexpected long waits (I experienced dozens of seconds per wait on ProcArrayLock, which was unbelievable.)
Someonemay say that those should be fixed or improved instead of introducing this parameter, but I think it's good to
havethis as a stop-gap measure.  In other words, we can suggest setting this parameter when the user asks "How can I
returncontrol to the end user in any situation?"
 


Regards
Takayuki Tsunakawa




Re: Timeout parameters

From
Robert Haas
Date:
On Wed, Mar 13, 2019 at 10:20 PM Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
> I think the purpose of socket_timeout is to avoid infinite or unduely long wait and return response to users, where
otherexisting timeout parameters wouldn't help.  For example, OS's process scheduling or paging/swapping problems might
causelong waits before postgres gets control (e.g. due to Transparent HugePage (THP)).  Within postgres, the unfair
lwlockcan unexpected long waits (I experienced dozens of seconds per wait on ProcArrayLock, which was unbelievable.)
Someonemay say that those should be fixed or improved instead of introducing this parameter, but I think it's good to
havethis as a stop-gap measure.  In other words, we can suggest setting this parameter when the user asks "How can I
returncontrol to the end user in any situation?" 

But that's not what it will do.  As long as the server continues to
dribble out protocol messages from time to time, the timeout will
never fire no matter how much time passes.  I saw a system once where
every 8kB read took many seconds to complete; such a system could
dribble out sequential scan results over an arbitrarily long period of
time without ever tripping the timeout.  If you really want to return
control to the user in any situation, what you can do is use the libpq
APIs in asynchronous mode which, barring certain limitations of the
current implementation, will actually allow you to maintain control
over the connection at all times.

I think the use case for a timeout that has both false positives (i.e.
it will fire even when there's no problem, as when the connection is
legitimately idle) and false negatives (i.e. it will fail to trigger
when there is a problem, as when there are periodic notices or
notifies from the server connection) is extremely limited if not
nonexistent, and I think the potential for users to be confused is
really high.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


RE: Timeout parameters

From
"Tsunakawa, Takayuki"
Date:
From: Robert Haas [mailto:robertmhaas@gmail.com]
> But that's not what it will do.  As long as the server continues to
> dribble out protocol messages from time to time, the timeout will
> never fire no matter how much time passes.  I saw a system once where
> every 8kB read took many seconds to complete; such a system could
> dribble out sequential scan results over an arbitrarily long period of
> time without ever tripping the timeout.

I understood hat the example is about an SELECT that returns multiple rows.  If so, statement_timeout should handle it,
shouldn'tit?
 

>  If you really want to return
> control to the user in any situation, what you can do is use the libpq
> APIs in asynchronous mode which, barring certain limitations of the
> current implementation, will actually allow you to maintain control
> over the connection at all times.

Maybe.  But the users aren't often in a situation to modify the application to use libpq asynchronous APIs.


> I think the use case for a timeout that has both false positives (i.e.
> it will fire even when there's no problem, as when the connection is
> legitimately idle) and false negatives (i.e. it will fail to trigger
> when there is a problem, as when there are periodic notices or
> notifies from the server connection) is extremely limited if not
> nonexistent, and I think the potential for users to be confused is
> really high.

My understanding is that the false positive case doesn't occur, because libpq doesn't wait on the socket while the
clientis idle and not communicating SQL request/response.  As for the false negative case, resetting the timer upon
noticesor notifies receipt is good, because they show that the database server is functioning.  socket_timeout is not a
mechanismto precisely limit the duration of query request/response.  It is kind of a stop-gap, last resort to assure
returncontrol within reasonable amount of time, rather than minutes or hours.
 


Regards
Takayuki Tsunakawa





Re: Timeout parameters

From
Kyotaro HORIGUCHI
Date:
At Thu, 14 Mar 2019 03:33:20 +0000, "Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com> wrote in
<0A3221C70F24FB45833433255569204D1FBC4191@G01JPEXMBYT05>
> From: Robert Haas [mailto:robertmhaas@gmail.com]
> > But that's not what it will do.  As long as the server continues to
> > dribble out protocol messages from time to time, the timeout will
> > never fire no matter how much time passes.  I saw a system once where
> > every 8kB read took many seconds to complete; such a system could
> > dribble out sequential scan results over an arbitrarily long period of
> > time without ever tripping the timeout.
> 
> I understood hat the example is about an SELECT that returns multiple rows.  If so, statement_timeout should handle
it,shouldn't it?
 

If so, in turn the socket_timeout doesn't work as expected? I
understand that what is proposed here is to disconnect after that
time of waiting for *the first tuple* of a query, regardless of
it is a long query or network failure. On of the problems raised
here is the socket_timetout patch doesn't work that way?

In order to at least to make it work as expected, it should arm a
timeout at PQsendQuery and disarm at the first response.

> >  If you really want to return
> > control to the user in any situation, what you can do is use the libpq
> > APIs in asynchronous mode which, barring certain limitations of the
> > current implementation, will actually allow you to maintain control
> > over the connection at all times.
> 
> Maybe.  But the users aren't often in a situation to modify the application to use libpq asynchronous APIs.

I can't imagine that in the cases where other than applications
cannot be rebuild for some reasons. (E.g. the source code has
been lost or the source code no longer be built on modern
environment..)

If an application is designed to have such a requirement, mustn't
it have implemented that by itself by any means? For example, an
application on JDBC can be designed to kill a querying thread
that takes longer than threshold.

> > I think the use case for a timeout that has both false positives (i.e.
> > it will fire even when there's no problem, as when the connection is
> > legitimately idle) and false negatives (i.e. it will fail to trigger
> > when there is a problem, as when there are periodic notices or
> > notifies from the server connection) is extremely limited if not
> > nonexistent, and I think the potential for users to be confused is
> > really high.
> 
> My understanding is that the false positive case doesn't occur,
> because libpq doesn't wait on the socket while the client is
> idle and not communicating SQL request/response.  As for the
> false negative case, resetting the timer upon notices or
> notifies receipt is good, because they show that the database
> server is functioning.  socket_timeout is not a mechanism to
> precisely limit the duration of query request/response.  It is
> kind of a stop-gap, last resort to assure return control within
> reasonable amount of time, rather than minutes or hours.

Doesn't TCP_KEEPALIVE work that way?

statement_timeout works on a live connection, TCP_USER_TIMEOUT
works for an already failed network but if the network fails
after a backend already sent the ack for a query request, it
doesn't work. TCP_KEEPALIVE would work for the case where any
packet doesn't reach each other for a certain time. Is there any
other situations to save?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



RE: Timeout parameters

From
Fabien COELHO
Date:
>> HI think that your patch is responsible for the added option at least.
>>
>> I agree that the underlying issue that other parameters should probably
>> also be reused, which would be a bug fix, does not belong to this thread.
>
> This doesn't seem to be a bug.  \connect just establishes a new connection, not reusing the previous settings for
mostconnection parameters.  As the examples in the following manual page show, the user needs to specify necessary
connectionparameters.
 
>
> https://www.postgresql.org/docs/devel/app-psql.html
>
> => \c service=foo
> => \c "host=localhost port=5432 dbname=mydb connect_timeout=10 sslmode=disable"
>
>
> But I'm afraid the description of \connect may not be clear enough about 
> connection parameters, and that may cause users to expect the reuse of 
> all connection parameter settings.

I think that the typical use-case of \c is to connect to another database 
on the same host, at least that what I do pretty often. The natural 
expectation is that the same "other" connection parameters are used, 
otherwise it does not make much sense, and there is already a whole logic 
of reusing the previous settings in psql, at least wrt describing the 
target (host, port, user…).

> Anyway, I think this would be an improvement for psql's documentation or 
> new feature for psql.  What do you think?

I think that we should fix the behavior rather than document the current 
weirdness. I do not think that we should introduce more weirdness.

-- 
Fabien.

RE: Timeout parameters

From
"Tsunakawa, Takayuki"
Date:
From: Kyotaro HORIGUCHI [mailto:horiguchi.kyotaro@lab.ntt.co.jp]
> If so, in turn the socket_timeout doesn't work as expected? I
> understand that what is proposed here is to disconnect after that
> time of waiting for *the first tuple* of a query, regardless of
> it is a long query or network failure. On of the problems raised
> here is the socket_timetout patch doesn't work that way?

No, the proposal is to set the timeout for every socket read/write like PgJDBC.  It is not restricted to an SQL command
execution;it applies to any communication with the server that could block the client.
 


> I can't imagine that in the cases where other than applications
> cannot be rebuild for some reasons. (E.g. the source code has
> been lost or the source code no longer be built on modern
> environment..)
> 
> If an application is designed to have such a requirement, mustn't
> it have implemented that by itself by any means? For example, an
> application on JDBC can be designed to kill a querying thread
> that takes longer than threshold.

Any paid or free applications whose source code is closed.  Also, ordinary users don't want to modify psql and pgAdmin
sourcecode by themselves, do they?
 


> Doesn't TCP_KEEPALIVE work that way?
> 
> statement_timeout works on a live connection, TCP_USER_TIMEOUT
> works for an already failed network but if the network fails
> after a backend already sent the ack for a query request, it
> doesn't work. TCP_KEEPALIVE would work for the case where any
> packet doesn't reach each other for a certain time. Is there any
> other situations to save?

For example, OS issues such as abnormally (buggy) slow process scheduling or paging/swapping that prevent control from
beingpassed to postgres.  Or, abnormally long waits on lwlocks in postgres.  statement_timeout doesn't take effect
whilewaiting on a lwlock.  I have experienced both.  And, any other issues in the server that we haven't experienced
andnot predictable.
 


Regards
Takayuki Tsunakawa





RE: Timeout parameters

From
"Tsunakawa, Takayuki"
Date:
From: Fabien COELHO [mailto:coelho@cri.ensmp.fr]
> I think that the typical use-case of \c is to connect to another database
> on the same host, at least that what I do pretty often. The natural
> expectation is that the same "other" connection parameters are used,
> otherwise it does not make much sense, and there is already a whole logic
> of reusing the previous settings in psql, at least wrt describing the
> target (host, port, user…).

I agree about the natural expectation.


> > Anyway, I think this would be an improvement for psql's documentation
> or
> > new feature for psql.  What do you think?
> 
> I think that we should fix the behavior rather than document the current
> weirdness. I do not think that we should introduce more weirdness.

Do you think all connection parameter settings should be reused by default, or when -reuse-previous=on is specified?

Do you think this is a bug to be backported to previous versions, or a new feature?  I think I'll make a patch
separatelyfrom this thread, if time permits.
 


Regards
Takayuki Tsunakawa




RE: Timeout parameters

From
MikalaiKeida@ibagroup.eu
Date:
Hello, all.

The main subject of discussion in this thread relates to the 'socket_timeout'. As I understand there is no any hesitation about applying TCP_USER_TIMEOUT into the PostgreSQL.
We have been waiting for applying TCP_USER_TIMEOUT in PostgreSQL for about 6 moths.
Fabien, I was wondering whether you can apply TCP_USER_TIMEOUT patch and continue discussion about 'socket_timeout'?

> I can't imagine that in the cases where other than applications
> cannot be rebuild for some reasons. (E.g. the source code has
> been lost or the source code no longer be built on modern
> environment..)

> If an application is designed to have such a requirement, mustn't
> it have implemented that by itself by any means? For example, an
> application on JDBC can be designed to kill a querying thread
> that takes longer than threshold.


Kyotaro-san, I am afraid you are mistaken in your statement about JDBC. JDBC is an another level of abstraction which provides only an universal Java interface to interact with different databases.
There is the same ODBC driver which provides an universal C interface to interact with different databases. So, the 'socket_timeout' seems to be a part of functionality of ODBC driver.

libpq provides two interfaces: pqWait() which waits for a socket state forever and psTimedWait() which waits for a socket state for an appropriate timeout. This functionality seems to be enough.

I agree with Robert Haas that this parameter can make a mess in the head of PostgreSQL user because it is very difficult to understand the case when each timeout parameter, which is provided by PostgreSQL,  works.

> For example, OS issues such as abnormally (buggy) slow process scheduling or paging/swapping that prevent control from being passed to postgres.  Or, abnormally long waits on lwlocks in postgres.  statement_timeout doesn't take effect while waiting on a lwlock.  I have experienced both.  And, any other issues in the server that we haven't experienced and not predictable.

For me all mentioned by
Takayuki Tsunakawa problems looks like a lack of design of end-user application or configuration of DB server. It is not a problem of PostgreSQL.

Best regards,
Mikalai Keida

RE: Timeout parameters

From
"Tsunakawa, Takayuki"
Date:
From: MikalaiKeida@ibagroup.eu [mailto:MikalaiKeida@ibagroup.eu]
> > For example, OS issues such as abnormally (buggy) slow process scheduling
> or paging/swapping that prevent control from being passed to postgres.  Or,
> abnormally long waits on lwlocks in postgres.  statement_timeout doesn't
> take effect while waiting on a lwlock.  I have experienced both.  And, any
> other issues in the server that we haven't experienced and not predictable.
> 
> For me all mentioned by Takayuki Tsunakawa problems looks like a lack of
> design of end-user application or configuration of DB server. It is not
> a problem of PostgreSQL.

Certainly, my cases could be avoided by the OS and database configuration.  But we cannot say all such problems can be
avoidedby configuration in advance.  The point is to provide a method to get ready for any situation.
 


Regards
Takayuki Tsunakawa






Re: Timeout parameters

From
Kyotaro HORIGUCHI
Date:
Hello.

At Thu, 14 Mar 2019 12:33:38 +0300, MikalaiKeida@ibagroup.eu wrote in
<OF3AC3E296.5DE954A0-ON432583BD.002E7144-432583BD.00348536@iba.by>
> Hello, all.
> 
> The main subject of discussion in this thread relates to the 
> 'socket_timeout'. As I understand there is no any hesitation about 
> applying TCP_USER_TIMEOUT into the PostgreSQL.
> We have been waiting for applying TCP_USER_TIMEOUT in PostgreSQL for about 
> 6 moths.
> Fabien, I was wondering whether you can apply TCP_USER_TIMEOUT patch and 
> continue discussion about 'socket_timeout'? 
> 
> > I can't imagine that in the cases where other than applications
> > cannot be rebuild for some reasons. (E.g. the source code has
> > been lost or the source code no longer be built on modern
> > environment..)
> 
> > If an application is designed to have such a requirement, mustn't
> > it have implemented that by itself by any means? For example, an
> > application on JDBC can be designed to kill a querying thread
> > that takes longer than threshold.
> 
> Kyotaro-san, I am afraid you are mistaken in your statement about JDBC. 
> JDBC is an another level of abstraction which provides only an universal 
> Java interface to interact with different databases.
> There is the same ODBC driver which provides an universal C interface to 
> interact with different databases. So, the 'socket_timeout' seems to be a 
> part of functionality of ODBC driver.

I think I'm not mistaking that. Java is just an example. What I
said is such kind of timeout can be implemented everwhere. If the
lower layer has the feature, the upper-layer is implmeneted more
easily, but in contrast the the feature gets generic and could be
obscure for upper-layers.  The socket_timeout seems that kind of
feature. (Yes, implementing such timeouts in C is harder than in
Java.)

So the socket_timeout seems to me as if it is provided as a
plaster to mend a badly-designed systems. Maybe as mentioned
below.

> libpq provides two interfaces: pqWait() which waits for a socket state 
> forever and psTimedWait() which waits for a socket state for an 
> appropriate timeout. This functionality seems to be enough.
> 
> I agree with Robert Haas that this parameter can make a mess in the head 
> of PostgreSQL user because it is very difficult to understand the case 
> when each timeout parameter, which is provided by PostgreSQL,  works.
> 
> > For example, OS issues such as abnormally (buggy) slow process 
> scheduling or paging/swapping that prevent control from being passed to 
> postgres.  Or, abnormally long waits on lwlocks in postgres. 
> statement_timeout doesn't take effect while waiting on a lwlock.  I have 
> experienced both.  And, any other issues in the server that we haven't 
> experienced and not predictable.
> 
> For me all mentioned by Takayuki Tsunakawa problems looks like a lack of 
> design of end-user application or configuration of DB server. It is not a 
> problem of PostgreSQL.

It's my concern.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Timeout parameters

From
Kyotaro HORIGUCHI
Date:
Hello.

At Thu, 14 Mar 2019 09:42:44 +0000, "Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com> wrote in
<0A3221C70F24FB45833433255569204D1FBC7626@G01JPEXMBYT05>
> From: MikalaiKeida@ibagroup.eu [mailto:MikalaiKeida@ibagroup.eu]
> > > For example, OS issues such as abnormally (buggy) slow process scheduling
> > or paging/swapping that prevent control from being passed to postgres.  Or,
> > abnormally long waits on lwlocks in postgres.  statement_timeout doesn't
> > take effect while waiting on a lwlock.  I have experienced both.  And, any
> > other issues in the server that we haven't experienced and not predictable.
> > 
> > For me all mentioned by Takayuki Tsunakawa problems looks like a lack of
> > design of end-user application or configuration of DB server. It is not
> > a problem of PostgreSQL.
> 
> Certainly, my cases could be avoided by the OS and database
> configuration.  But we cannot say all such problems can be
> avoided by configuration in advance.  The point is to provide a
> method to get ready for any situation.

Right. We can't perfectly predict in advance what happens on a
server and server cannot perfectly surmise in advance what every
client wants.  Clients can avoid something by using something
that is not fully understood, that leads to another unpredictable
trouble.  This is the reason how a feature works should be
understandable for users. I saw many cases where an HA cluster
fails over for mysterious reasons for users, who are considered
to understood them. (That happens more frequently on virtual
machines.) If the application should return a response in, say,
10 seconds, the application should do that by itself.

# I hope it is comprehensible..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



RE: Timeout parameters

From
MikalaiKeida@ibagroup.eu
Date:
Hello, Takayuki.

> > > For example, OS issues such as abnormally (buggy) slow process scheduling
> > or paging/swapping that prevent control from being passed to postgres.  Or,
> > abnormally long waits on lwlocks in postgres.  statement_timeout doesn't
> > take effect while waiting on a lwlock.  I have experienced both.  And, any
> > other issues in the server that we haven't experienced and not predictable.
> >
> > For me all mentioned by Takayuki Tsunakawa problems looks like a lack of
> > design of end-user application or configuration of DB server. It is not
> > a problem of PostgreSQL.

> Certainly, my cases could be avoided by the OS and database configuration.  But we cannot say all such problems can be avoided by configuration in advance.  The point is to provide a method to get ready for any situation.


Do you mind me asking you whether you have thought that solving your problem can lead to the problem in the other user applications?
Let's imagine a possible problem:
1. end-user sets 'socket_timeout' only for current session

2. 'socket_timeout' is much shorter than 'keep_alive', 'tcp_user_timeout' and 'statement_timeout'
3. end-user invokes a 'heavy' query which is not be possible to process by PostgreSQL server within 'socket_timeout'
4. if the query fails due to timeout, terminate the session and repeat it

As the query is not cancelled, PostgreSQL server will process it till completion or 'statement_timeout' expiration. In such a case PostgreSQL server can be overloaded by an 'unreliable' user/users and other end-users will not be able to work with PostgreSQL server as they expected.

There are a number of timeout parameters which can lead to the terminating a query:
1. keep_alive and tcp_user_timeout terminate a query because of network problems

2. statement_timeout terminates a query because the time for processing this query by PostgreSQL server expired.

These timeouts are absolutely different and do not interact with each other. Suggested 'socket_timeout' can replace all mentioned by me timeout parameters because of willing of 'selfish' end-user.

'SocketTimeout' in JDBC was added to cope with network problems when keep_alive and tcp_user_timeout were not implemented. Implementation of the 'socket_timeout' looks like adding a bug into the libpq. I think that the problems you faced, relate to the other issues in PostgreSQL.

Best regards,
Mikalai Keida

Re: Timeout parameters

From
Robert Haas
Date:
On Wed, Mar 13, 2019 at 11:33 PM Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
> I understood hat the example is about an SELECT that returns multiple rows.  If so, statement_timeout should handle
it,shouldn't it? 

Yes.  If you want a query timeout, you should use statement_timeout, not this.

> > I think the use case for a timeout that has both false positives (i.e.
> > it will fire even when there's no problem, as when the connection is
> > legitimately idle) and false negatives (i.e. it will fail to trigger
> > when there is a problem, as when there are periodic notices or
> > notifies from the server connection) is extremely limited if not
> > nonexistent, and I think the potential for users to be confused is
> > really high.
>
> My understanding is that the false positive case doesn't occur, because libpq doesn't wait on the socket while the
clientis idle and not communicating SQL request/response.  As for the false negative case, resetting the timer upon
noticesor notifies receipt is good, because they show that the database server is functioning.  socket_timeout is not a
mechanismto precisely limit the duration of query request/response.  It is kind of a stop-gap, last resort to assure
returncontrol within reasonable amount of time, rather than minutes or hours. 

Hmm.  You might be right about the false positive case, although there
might be some exceptions.

As for the false negative case, the only sorta-legitimate scenario I
can think of is: suppose the database server process is running, but
someone has stopped the session connected to your backend by sending
it SIGSTOP or attaching a debugger.  Now, the TCP connection is alive,
but the server will not be able to send any data, so statement_timeout
will not fire, nor will any of the network-level stuff.
socket_timeout would.  But that's a pretty remote scenario, I think.

Now you might say - what if the server is stopped not because of
SIGSTOP but because of some other reason, like it's waiting for a
lock?  Well, in that case, the database server is still functioning,
and you will not want the connection to be terminated if no notifies
are sent during the lock wait but not terminated if they are sent.  At
least, I can't see why anyone would want that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Timeout parameters

From
Robert Haas
Date:
One other thing -- I looked a bit into the pgsql-jdbc implementation
of a similarly-named option, and it does seem to match what you are
proposing here.  I wonder what user experiences with that option have
been like.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


RE: Timeout parameters

From
"Tsunakawa, Takayuki"
Date:
From: Robert Haas [mailto:robertmhaas@gmail.com]
> One other thing -- I looked a bit into the pgsql-jdbc implementation
> of a similarly-named option, and it does seem to match what you are
> proposing here.  I wonder what user experiences with that option have
> been like.

One case I faintly recall is that some SQL execution got stuck in NFS access in the server, and statement_timeout
didn'twork (probably due to inability to cancel NFS read/write operations).  The user chose to use PgJDBC's
socketTimeout. I'm not sure whether this use case is fundamental, because I wonder if NFS mount options could have been
thesolution.
 


Regards
Takayuki Tsunakawa





RE: Timeout parameters

From
"Tsunakawa, Takayuki"
Date:
From: Robert Haas [mailto:robertmhaas@gmail.com]
> Now you might say - what if the server is stopped not because of
> SIGSTOP but because of some other reason, like it's waiting for a
> lock?  Well, in that case, the database server is still functioning,
> and you will not want the connection to be terminated if no notifies
> are sent during the lock wait but not terminated if they are sent.  At
> least, I can't see why anyone would want that.

Yes, so I think it would be kind to describe how to set socket_timeout with relation to other timeout parameters --
socket_timeout> statement_timeout > lock_timeout, for example.
 


Regards
Takayuki Tsunakawa


RE: Timeout parameters

From
"Tsunakawa, Takayuki"
Date:
From: MikalaiKeida@ibagroup.eu [mailto:MikalaiKeida@ibagroup.eu]
> Do you mind me asking you whether you have thought that solving your problem
> can lead to the problem in the other user applications?
> Let's imagine a possible problem:
> 1. end-user sets 'socket_timeout' only for current session
> 2. 'socket_timeout' is much shorter than 'keep_alive', 'tcp_user_timeout'
> and 'statement_timeout'
> 3. end-user invokes a 'heavy' query which is not be possible to process
> by PostgreSQL server within 'socket_timeout'
> 4. if the query fails due to timeout, terminate the session and repeat it
> 
> As the query is not cancelled, PostgreSQL server will process it till
> completion or 'statement_timeout' expiration. In such a case PostgreSQL
> server can be overloaded by an 'unreliable' user/users and other end-users
> will not be able to work with PostgreSQL server as they expected.

Yes, so I think it would be necessary to describe how to set socket_timeout with relation to other timeout parameters
--socket_timeout > statement_timeout, emphasizing that socket_timeout is not for canceling long-running queries but for
returningcontrol to the client.
 


Regards
Takayuki Tsunakawa




RE: Timeout parameters

From
MikalaiKeida@ibagroup.eu
Date:
Hello Takayuki-san,

> Yes, so I think it would be necessary to describe how to set socket_timeout with relation to other timeout parameters -- socket_timeout > statement_timeout, emphasizing that socket_timeout is not for canceling long-running queries but for returning control to the client.

For me it does not look enough.
If you would like to implement the 'socket_timeout' parameter, could you pay attention on the Fabien Coelho comment?

> The fact that no answer data is received may mean that it takes time to
> compute the result, so cancelling seems appropriate to me, rather than
> severing the connection and starting a new one immediately, leaving the
> server loaded with its query.


This comment is very important to pay attention on.
Let's imagine once more:
1. end-user makes a query

2. the query has not completed within this timeout due to any reason: the query is too 'heavy', some problem with OS happened, PostgreSQL server terminated
3. if we simple close connection, the first case is negative by mistake. To avoid such a situation I think it would be better to call a PQcancel(). In case of failure PQcancel() terminates in 'socket_timeout'. So, control to the end-user in such a failure situation will be returned in 2 * 'socket_timeout' interval. It is much better than hanging forever in some specific cases. Moreover, such solution will not lead to the overloading of PostgreSQL server by abnormally ignored 'heavy' queries results by end-users.

What do you think about it?


Best regards,
Mikalai Keida

RE: Timeout parameters

From
"Tsunakawa, Takayuki"
Date:
From: MikalaiKeida@ibagroup.eu [mailto:MikalaiKeida@ibagroup.eu]
> In case of failure PQcancel() terminates in 'socket_timeout'. So, control
> to the end-user in such a failure situation will be returned in 2 *
> 'socket_timeout' interval. It is much better than hanging forever in some
> specific cases. Moreover, such solution will not lead to the overloading
> of PostgreSQL server by abnormally ignored 'heavy' queries results by
> end-users.

Oops, unfortunately, PQcancel() does not follow any timeout parameters...  It uses a blocking socket.

Also, I still don't think it's a good idea to request cancellation.  socket_timeout should be sufficiently longer than
theusually expected query execution duration.  And long-running queries should be handled by statement_timeout which
indicatesthe maximum tolerable query execution duration.
 
For example, if the usually expected query execution time is 100 ms, statement_timeout can be set to 3 seconds and
socket_timeoutto 5 seconds.
 


Regards
Takayuki Tsunakawa






RE: Timeout parameters

From
MikalaiKeida@ibagroup.eu
Date:
> Oops, unfortunately, PQcancel() does not follow any timeout parameters...  It uses a blocking socket.

> Also, I still don't think it's a good idea to request cancellation.  socket_timeout should be sufficiently longer than the usually expected query execution duration.  And long-running queries should be handled bystatement_timeout which indicates the maximum tolerable query execution duration.
> For example, if the usually expected query execution time is 100 ms, statement_timeout can be set to 3 seconds and socket_timeout to 5 seconds.


Based on your comment it seems to me that 'socket_timeout' should be connected with statement_timeout. I mean that end-user should wait statement_timeout + 'socket_timeout' for returning control. It looks much more safer for me.

Best regards,
Mikalai Keida

RE: Timeout parameters

From
Fabien COELHO
Date:
Hello,

> Fabien, I was wondering whether you can apply TCP_USER_TIMEOUT patch and
> continue discussion about 'socket_timeout'?

I can apply nothing, I'm just a small-time reviewer.

Committers on the thread are Michaël-san and Robert, however I'm not sure 
that they are very sensitive to "please apply this patch" requests: they 
are the lone judges of their own priorities.

-- 
Fabien.

RE: Timeout parameters

From
"Tsunakawa, Takayuki"
Date:
From: MikalaiKeida@ibagroup.eu [mailto:MikalaiKeida@ibagroup.eu]
> Based on your comment it seems to me that 'socket_timeout' should be
> connected with statement_timeout. I mean that end-user should wait
> statement_timeout + 'socket_timeout' for returning control. It looks much
> more safer for me.

I'm afraid we cannot enforce that relationship programmatically, so I think the documentation should warn that
socket_timeoutbe longer than statement_timeout.
 


Regards
Takayuki Tsunakawa






RE: Timeout parameters

From
"Jamison, Kirk"
Date:
On Saturday, March 16, 2019 5:40 PM (GMT+9), Fabien COELHO wrote:

> > Fabien, I was wondering whether you can apply TCP_USER_TIMEOUT patch 
> > and continue discussion about 'socket_timeout'?

> I can apply nothing, I'm just a small-time reviewer.

> Committers on the thread are Michaël-san and Robert, however I'm not sure 
> that they are very sensitive to "please apply this patch" requests: they 
> are the lone judges of their own priorities.


Regarding user timeout parameters:

Based from previous reviews of the code (it seems good) and reviewers'
comments, everyone seems to agree that user timeout parameters are
needed, so we can just waitfor the updated patch.
The author, Nagaura-san, has gotten feedback from Robert for the doc part.
So if an updated patch is posted with addressed comments, then I think we
can review it again for the final round.

---
As for socket_timeout parameters:

The use case for socket timeout parameter is that it's a stop-gap approach
to prevent applications from infinite waiting for the DB server when other
timeout parameters such as keepalives and tcp_user_timeout fail to detect
the connection error. (That's why I thought it's a network problem detector?)

The main argument here is about the security risk of allowing socket timeout
to cancel valid connections, right? Then to address that, I agree with
Tsunakawa-san to document that the value should at least be (equal? or) higher
than the other timeout parameters.
If documenting is not enough, then we can limit that within the code by making
sure that socket_timeout value must be greater than the other timeout parameters
(keepalives, tcp user timeout, statement timeout, etc.). Otherwise, socket_timeout
parameter should not work even if was switched on. Or is that too much enforcing?

Regards,
Kirk Jamison

Re: Timeout parameters

From
Robert Haas
Date:
On Sun, Mar 17, 2019 at 9:08 PM Jamison, Kirk <k.jamison@jp.fujitsu.com> wrote:
> The main argument here is about the security risk of allowing socket timeout
> to cancel valid connections, right?

I don't think so.  I think it's just a weirdly-design parameter
without a really compelling use case.  Enforcing limits on the value
of the parameter doesn't fix that.  Most of the reviewers who have
opined so far have been somewhere between cautious and negative about
the value of that parameter, so I think we should just not add it.  At
least for now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


RE: Timeout parameters

From
"Tsunakawa, Takayuki"
Date:
From: Robert Haas [mailto:robertmhaas@gmail.com]
> I don't think so.  I think it's just a weirdly-design parameter
> without a really compelling use case.  Enforcing limits on the value
> of the parameter doesn't fix that.  Most of the reviewers who have
> opined so far have been somewhere between cautious and negative about
> the value of that parameter, so I think we should just not add it.  At
> least for now.

I don't think socket_timeout is so bad.  I think Nagaura-san and I presented the use case, giving an answer to every
questionand concern.  OTOH, it may be better to commit the tcp_user_timeout patch when Nagaura-san has refined the
documentation,and then continue socket_timeout.
 


Regards
Takayuki Tsunakawa




RE: Timeout parameters

From
"Nagaura, Ryohei"
Date:
Hi,

First, thank you for your insightful discussion.
I remade patches and attached in this mail.

> From: Tsunakawa, Takayuki
> OTOH, it may be better to commit the tcp_user_timeout patch when
> Nagaura-san has refined the documentation, and then continue
> socket_timeout.
Yes, I want to commit TCP_USER_TIMEOUT patches in PG12.
Also I'd like to continue to discuss about socket_timeout after this CF.
# Thank you for your answering.


About TCP_USER_TIMEOUT:
1) Documentation and checking on UNIX system
As far as I surveyed (solaris and BSD family), these UNIX OS don't have "TCP_USER_TIMEOUT" parameter.
Accordingly I have not checked on real machine.
Also, I modified documentations to remove "equivalent to socket option"


As for socket_timeout:
1) documentation
> From: Robert Haas <robertmhaas@gmail.com>
> This can't be used to force a global query timeout, because any kind of protocol
> message - e.g. a NOTICE or WARNING - would reset the timeout, allowing the
> continue past the supposedly-global query timeout.  There may be some other
> ways that can happen, too, either currently or in the future.
Understood. Indeed, doc was not correct.

Mikalai-san,
You used to advise me about via UNIX domain sockets, but it turned out to be able to work.
Therefore, the documentation doesn't have the sentence about it.

Mikalai-san and Tsunakawa-san,
I modified documentation based on your comments about socket_timeout > other parameters.

2) checking whether int type
I Implemented in the current patch. Please see line 73 and 85 in the patch.

3) setting inheritance by "\c" command.
Inheritance of optional parameters such as this parameter,
(but not basic connection parameters such as dbname,)
should be implemented as one feature of "\c".
It is because, seen from users,
it is strange that some parameters can be taken over while the others can't.


Sorry for my late reply again and again...

Best regards,
---------------------
Ryohei Nagaura


Attachment

RE: Timeout parameters

From
"Jamison, Kirk"
Date:
Hi Nagaura-san,

On Monday, March 25, 2019 2:26 PM (GMT+9), Ryohei Nagaura wrote:

>Yes, I want to commit TCP_USER_TIMEOUT patches in PG12.
>Also I'd like to continue to discuss about socket_timeout after this CF.
Ok. So I'd only take a look at TCP_USER_TIMEOUT parameter for now (this CommitFest),
and maybe we could resume the discussion on socket_timeout in the future.

> About TCP_USER_TIMEOUT:
> 1) Documentation and checking on UNIX system As far as I surveyed 
> (solaris and BSD family), these UNIX OS don't have "TCP_USER_TIMEOUT" parameter.
> Accordingly I have not checked on real machine.
> Also, I modified documentations to remove "equivalent to socket option"

Your patch applies, however in TCP_backend_v10 patch,
your documentation is missing a closing tag </varlistentry>
so it could not be tested.
When that's fixed, it passes the make check.

Also regarding docs,
what's the reason to emphasize Windows not supported in a separate paragraph?
How about simplifying and combining the 2 paragraphs such as below?

Specifies the number of milliseconds after which a TCP connection can be
aborted by the operating system due to network problems when sending or
receiving data through this connection. A value of zero uses the system default.
For connections made via a Unix-domain socket, this parameter is ignored. This
parameter is supported only on systems that support <literal>TCP_USER_TIMEOUT</literal>;
on other systems such as Windows, it has no effect and must be zero.

Or if you really want to emphasize which systems are supported and which are not,
you may remove the last sentence above, then indicate that in the note tag.
Example:
This parameter is supported only on systems that support 
<literal>TCP_USER_TIMEOUT</literal> such as Linux version 2.6.37 or later;
on other systems such as Windows, it has no effect and must be zero.

(Note: I haven't checked which Linux versions are supported,
I got it from your previous patch version.)

Regards,
Kirk Jamison

RE: Timeout parameters

From
"Nagaura, Ryohei"
Date:
Hi, kirk-san.

> From: Jamison, Kirk
> Ok. So I'd only take a look at TCP_USER_TIMEOUT parameter for now (this
> CommitFest), and maybe we could resume the discussion on socket_timeout
> in the future.
Yes, please.

> Your patch applies, however in TCP_backend_v10 patch, your documentation
> is missing a closing tag </varlistentry> so it could not be tested.
> When that's fixed, it passes the make check.
Oops! Fixed.

> what's the reason to emphasize Windows not supported in a separate
> paragraph?
I thought that it needs to be warned because there are some windows users.
But I came upon the idea that there are no need even it.
Thus the former doc you suggested seems better and I minor fixed doc to yours.


> (Note: I haven't checked which Linux versions are supported, I got it
> from your previous patch version.)
FYI, [1]

[1] http://man7.org/linux/man-pages/man7/tcp.7.html
Best regards,
---------------------
Ryohei Nagaura


Attachment

RE: Timeout parameters

From
"Jamison, Kirk"
Date:
On Tuesday, March 26, 2019 2:35 PM (GMT+9), Ryohei Nagaura wrote:

>> Your patch applies, however in TCP_backend_v10 patch, your 
>> documentation is missing a closing tag </varlistentry> so it could not be tested.
>> When that's fixed, it passes the make check.
>Oops! Fixed.

Ok. Confirmed the fix.
Minor nitpick, the semicolon is separated to another line when it shouldn't be.
+        This parameter is supported only on systems that support <literal>TCP_USER_TIMEOUT</literal>
+        ; on other systems such as Windows, it has no effect and must be zero.

Anyway, for now maybe wait for other reviewers to confirm the latest doc
before you update the patches again.

Documentations aside, I tested the patch and below are the results.

1.) apply OK, build OK, make check ok, install ok.

2.) Testing
A. Configuration Setting
[pg_hba.conf]   Allow remote connection from client address
[postgresql.conf]
  listen_addresses = '*'
  # Optional just for checking logs in server 
  logging_collector = on

B. Tests
[Client-Side]
1. $ psql postgresql://USERNAME:PASSWORD(at)server_host:server_port/dbname?tcp_user_timeout=15000

2. postgres=# select inet_client_port();
 inet_client_port
------------------
            59396
3. (Via root user of client, other console window)
root# iptables -I INPUT -p tcp --dport 59396 -j DROP

4. postgres=# select pg_sleep(10);

5. Error output is displayed. Query is cancelled.
could not receive data from server: Connection timed out
postgres#=

#Inputting a query would attempt to reconnect to server

postgres=# select inet_client_port();
no connection to the server
The connection to the server was lost. Attempting reset: Succeeded.
postgres=# 

6. Next test; tested again but switching #3 & #4
(Note: There should be a new client port number by then.)
postgres=# select inet_client_port();
postgres=# set tcp_user_timeout=15000;
postgres=# select pg_sleep(10);
root# iptables -I INPUT -p tcp --dport 59397 -j DROP

# Client hangs. Server could not receive data from client, so connection timed out.
# Need to reconnect again from client.
^Z
[3]+  Stopped


Below are the logs in the server.

[Server-Side Logs]
#Test#1
[12949] LOG:  statement: select inet_client_port();
[12949] LOG:  statement: select pg_sleep(10);
[12949] LOG:  could not receive data from client: Connection timed out
#Test#2
[13163] LOG:  statement: select inet_client_port();
[13163] LOG:  statement: set tcp_user_timeout=15000;
[13163] LOG:  statement: select pg_sleep(10);
[13163] LOG:  could not receive data from client: Connection timed out

---
I also tested invalid keywords for tcp_user_timeout and errors were detected.
ERROR:  invalid value for parameter "tcp_user_timeout": "kkk"
ERROR:  -999 ms is outside the valid range for parameter "tcp_user_timeout" (0 .. 2147483647)


Regards,
Kirk Jamison

RE: Timeout parameters

From
"Nagaura, Ryohei"
Date:
Hello, Fabien-san.

> From: Fabien COELHO <coelho@cri.ensmp.fr>
> About the backend v11 patch.
> No space or newline before ";". Same comment about the libpq_ timeout.
Fixed.

> There is an error in the code, I think it should be < 0 to detect errors.
Yes, thanks!

> If the parameter has no effect on Windows, I do not see why its value should be
> constrained to zero, it should just have no effect. Idem libpq_ timeout.
I had a misunderstanding.
Indeed, it doesn't need to be zero. Removed.

> Documentation:
> ISTM this is not about libpq connections but about TCP connections. There can be
> non libpq implementations client side.
Then, where do you think the correct place?
I thought that this parameter should be explained here because the communication
will be made through the library libpq same as keepalive features.


Best regards,
---------------------
Ryohei Nagaura



Attachment

RE: Timeout parameters

From
"Nagaura, Ryohei"
Date:
Hello, Fabien-san.

> From: Fabien COELHO [mailto:coelho@cri.ensmp.fr]
> The default postgres configuration file should be updated to reflect the
> parameter and its default value.
I'm very sorry for not addressing your review in the last patch.
I modified TCP_backend patch (adding postgresql.conf.sample) and attached in this mail.

Best regards,
---------------------
Ryohei Nagaura


Attachment

Re: Timeout parameters

From
Kyotaro HORIGUCHI
Date:
Hello.

At Thu, 28 Mar 2019 01:11:23 +0000, "Nagaura, Ryohei" <nagaura.ryohei@jp.fujitsu.com> wrote in
<EDA4195584F5064680D8130B1CA91C4540EEAD@G01JPEXMBYT04>
> Hello, Fabien-san.
> 
> > From: Fabien COELHO <coelho@cri.ensmp.fr>
> > About the backend v11 patch.
> > No space or newline before ";". Same comment about the libpq_ timeout.
> Fixed.
> 
> > There is an error in the code, I think it should be < 0 to detect errors.
> Yes, thanks!
> 
> > If the parameter has no effect on Windows, I do not see why its value should be
> > constrained to zero, it should just have no effect. Idem libpq_ timeout.
> I had a misunderstanding.
> Indeed, it doesn't need to be zero. Removed.
> 
> > Documentation:
> > ISTM this is not about libpq connections but about TCP connections. There can be
> > non libpq implementations client side.
> Then, where do you think the correct place?
> I thought that this parameter should be explained here because the communication
> will be made through the library libpq same as keepalive features.

In TCP_backend patch:

+       <para>
+        Define a wrapper for <literal>TCP_USER_TIMEOUT</literal>
+        socket option of libpq connection.
+       </para>
+       <para>
+        Specifies the number of milliseconds after which a TCP connection can be
+        aborted by the operation system due to network problems when sending or
+        receiving the data through this connection. A value of zero uses the
+        system default. In sessions connected via a Unix-domain socket, this
+        parameter is ignored and always reads as zero. This parameter is
+        supported only on systems that support
+        <literal>TCP_USER_TIMEOUT</literal>; on other systems, it has no effect.
+       </para>


I think this is not mentioning backend. Why don't you copy'n
paste then modify the description of tcp_keepalives_idle? Perhaps
it needs a similar caveat related to Windows.


+static const char*
+show_tcp_user_timeout(void)
+{
+    /* See comments in assign_tcp_keepalives_idle */
+    static char nbuf[16];
+
+    snprintf(nbuf, sizeof(nbuf), "%d", tcp_user_timeout);
+    return nbuf;
+}

The reason for, say, tcp_keepalive_idle uses the hook is that it
doesn't show the GUC variable, but the actual value read by
getsockopt. This is just showing the GUC value. I think this
should behave the same way with tcp_keepalive* options. If not,
we should have an explanation of the reason there.


In TCP_interface patch:

+       <para>
+        Define a wrapper for <literal>TCP_USER_TIMEOUT</literal>
+        socket option of libpq connection.

What does the "wrapper" mean?

+       </para>
+       <para>
+        Specifies the number of milliseconds after which a TCP connection can be
+        aborted by the operation system due to network problems when sending or
+        receiving the data through this connection. A value of zero uses the
+        system default. In sessions connected via a Unix-domain socket, this
+        parameter is ignored and always reads as zero. This parameter is
+        supported only on systems that support
+        <literal>TCP_USER_TIMEOUT</literal>; on other systems, it has no effect.
+       </para>

I would suggest the same thing as the backend
part. Copy'n-paste-modify keepalives_idle would be better.


+    if (setsockopt(conn->sock, IPPROTO_TCP, TCP_USER_TIMEOUT,
+             (char *) &timeout, sizeof(timeout)) < 0 && errno != ENOPROTOOPT)
+    {
+        char        sebuf[256];
+
+        appendPQExpBuffer(&conn->errorMessage,
+                    libpq_gettext("setsockopt(TCP_USER_TIMEOUT) failed: %s\n"),

I suppose that the reason ENOPROTOOPT is excluded from error
condition is that the system call may fail with that errno on
older kernels, but I don't think that that justifies ignoring the
failure.


+                    if (!IS_AF_UNIX(addr_cur->ai_family))
+                    {
+                        if (!setTCPUserTimeout(conn))
+                        {
+                            closesocket(conn->sock);
+                            conn->sock = -1;
+                            conn->addr_cur = addr_cur->ai_next;
+                            goto keep_going;
+                        }
+                    }

I don't see a point in the added part having own "if
(!IS_AF_UNIX" block separately from tcp_keepalive options.


+    /* TCP USER TIMEOUT */
+    {"tcp_user_timeout", NULL, NULL, NULL,
+        "TCP_user_timeout", "", 10,    /* strlen(INT32_MAX) == 10 */
+        offsetof(struct pg_conn, pgtcp_user_timeout)},
+

Similary, why isn't this placed just after tcp_keepalives
options?

+    char       *pgtcp_user_timeout;    /* tcp user timeout (numeric string) */

+    char       *tcp_user_timeout;    /* tcp user timeout */

The latter doesn't seem to be used.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




RE: Timeout parameters

From
"Tsunakawa, Takayuki"
Date:
From: Kyotaro HORIGUCHI [mailto:horiguchi.kyotaro@lab.ntt.co.jp]
> +    if (setsockopt(conn->sock, IPPROTO_TCP, TCP_USER_TIMEOUT,
> +             (char *) &timeout, sizeof(timeout)) < 0 && errno !=
> ENOPROTOOPT)
> +    {
> +        char        sebuf[256];
> +
> +        appendPQExpBuffer(&conn->errorMessage,
> +                    libpq_gettext("setsockopt(TCP_USER_TIMEOUT)
> failed: %s\n"),
> 
> I suppose that the reason ENOPROTOOPT is excluded from error
> condition is that the system call may fail with that errno on
> older kernels, but I don't think that that justifies ignoring the
> failure.

I think that's for the case where the modules is built on an OS that supports TCP_USER_TIMEOUT (#ifdef TCP_USER_TIMEOUT
istrue), and the module is used on an older OS that doesn't support TCP_USER_TIMEOUT.  I remember I was sometimes able
todo such a thing on Linux and Solaris.  If we don't have to handle such usage, I agree about removing the special
handlingof ENOTPROTO.
 


Regards
Takayuki Tsunakawa







RE: Timeout parameters

From
"Tsunakawa, Takayuki"
Date:
From: Tsunakawa, Takayuki [mailto:tsunakawa.takay@jp.fujitsu.com]
> From: Kyotaro HORIGUCHI [mailto:horiguchi.kyotaro@lab.ntt.co.jp]
> > +    if (setsockopt(conn->sock, IPPROTO_TCP, TCP_USER_TIMEOUT,
> > +             (char *) &timeout, sizeof(timeout)) < 0 && errno !=
> > ENOPROTOOPT)
> > +    {
> > +        char        sebuf[256];
> > +
> > +        appendPQExpBuffer(&conn->errorMessage,
> > +                    libpq_gettext("setsockopt(TCP_USER_TIMEOUT)
> > failed: %s\n"),
> >
> > I suppose that the reason ENOPROTOOPT is excluded from error
> > condition is that the system call may fail with that errno on
> > older kernels, but I don't think that that justifies ignoring the
> > failure.
> 
> I think that's for the case where the modules is built on an OS that supports
> TCP_USER_TIMEOUT (#ifdef TCP_USER_TIMEOUT is true), and the module is used
> on an older OS that doesn't support TCP_USER_TIMEOUT.  I remember I was
> sometimes able to do such a thing on Linux and Solaris.  If we don't have
> to handle such usage, I agree about removing the special handling of
> ENOTPROTO.

Oops, I agree that we return an error even in the ENOPROTOOPT case, because setsockopt() is called only when the user
specifiestcp_user_timeout.
 


Regards
Takayuki Tsunakawa





RE: Timeout parameters

From
"Nagaura, Ryohei"
Date:
Hello, Horiguchi-san.

Thank you for review.

> In TCP_backend patch:
> I think this is not mentioning backend. Why don't you copy'n paste then
> modify the description of tcp_keepalives_idle? Perhaps it needs a similar
> caveat related to Windows.

> +static const char*
> +show_tcp_user_timeout(void)
> The reason for, say, tcp_keepalive_idle uses the hook is that it doesn't
> show the GUC variable, but the actual value read by getsockopt. This is
> just showing the GUC value. I think this should behave the same way with
> tcp_keepalive* options. If not, we should have an explanation of the
> reason there.
Oh, Thank you for teaching.
I add a function pq_gettcpusertimeout() same as keepalives*.

> In TCP_interface patch:
> I would suggest the same thing as the backend part. Copy'n-paste-modify
> keepalives_idle would be better.
Same as backend-side, I made keepalives documentation reference.
I refered keepalives* documentation and modified my doc.

> I suppose that the reason ENOPROTOOPT is excluded from error condition
> is that the system call may fail with that errno on older kernels, but
> I don't think that that justifies ignoring the failure.
Thank you for your point!
Removed in both patches.

> I don't see a point in the added part having own "if (!IS_AF_UNIX" block
> separately from tcp_keepalive options.
Sorry, my coding was bad...
I integrated it with coding about keepalives.

> +    /* TCP USER TIMEOUT */
> +    {"tcp_user_timeout", NULL, NULL, NULL,
> +        "TCP_user_timeout", "", 10,    /* strlen(INT32_MAX) ==
> 10 */
> +        offsetof(struct pg_conn, pgtcp_user_timeout)},
> +
> 
> Similary, why isn't this placed just after tcp_keepalives options?
Moved!

> +    char       *tcp_user_timeout;    /* tcp user timeout */
> The latter doesn't seem to be used.
This is also my bad coding...
Removed!

Best regards,
---------------------
Ryohei Nagaura


Attachment

RE: Timeout parameters

From
"Nagaura, Ryohei"
Date:
Hi, Tsunakawa-san.

> From: Tsunakawa, Takayuki [mailto:tsunakawa.takay@jp.fujitsu.com]
> I think that's for the case where the modules is built on an OS that
> supports TCP_USER_TIMEOUT (#ifdef TCP_USER_TIMEOUT is true), and the
> module is used on an older OS that doesn't support TCP_USER_TIMEOUT.  I
> remember I was sometimes able to do such a thing on Linux and Solaris.
> If we don't have to handle such usage, I agree about removing the special
> handling of ENOTPROTO.
I don't know how much there are such systems, but your story feels remote for me.
Therefore I adopted Horiguchi-san's idea.

Best regards,
---------------------
Ryohei Nagaura






RE: Timeout parameters

From
"Nagaura, Ryohei"
Date:
Hi, all.

# This is a supplement of my current patches.

> From: Nagaura, Ryohei [mailto:nagaura.ryohei@jp.fujitsu.com]
> > In TCP_backend patch:
> > I think this is not mentioning backend. Why don't you copy'n paste
> > then modify the description of tcp_keepalives_idle? Perhaps it needs
> a
> > similar caveat related to Windows.
I modified documentation with referring ones of tcp_keepalives_*.

Also, Note that:
1) I added <note> about windows OS and some linux kernels.
2) Documentations about TCP_USER_TIMEOUT is made citing description of [1].
    # The client-side patch does as well.
3) Same as keepalives*, I used both "0" and zero.
I can't recognize why to use both of them.
Is there anyone who know it?

[1] https://tools.ietf.org/html/rfc5482
Best regards,
---------------------
Ryohei Nagaura






RE: Timeout parameters

From
"Nagaura, Ryohei"
Date:
Hello,

I updated my patches.

In TCP_USER_TIMEOUT backend patch:
1) linux ver 2.6.26 -> 2.6.36

2) error for systems where doesn't support this parameter

In TCP_USER_TIMEOUT interface patch:
error for systems where doesn't support this parameter

Best regards,
---------------------
Ryohei Nagaura


Attachment

RE: Timeout parameters

From
"Jamison, Kirk"
Date:
Hi Nagaura-san,

>I updated my patches.
Thanks.

>In TCP_USER_TIMEOUT backend patch:
> 1) linux ver 2.6.26 -> 2.6.36
"Linux" should be capitalized.


I confirmed that you followed Horiguchi-san's advice
to base the doc from keepalives*. 
About your question:
> 3) Same as keepalives*, I used both "0" and zero.
> I can't recognize why to use both of them.
> Is there anyone who know it?

In config.sgml it uses both "zero" and "0",
while in libpq.sgml it only uses "zero".
Since you patterned it from keepalives* docs for both files,
I think it's alright doing it that way.

Regards,
Kirk Jamison




RE: Timeout parameters

From
"Nagaura, Ryohei"
Date:
Hello, Kirk-san.

> From: Jamison, Kirk [mailto:k.jamison@jp.fujitsu.com]
> >In TCP_USER_TIMEOUT backend patch:
> > 1) linux ver 2.6.26 -> 2.6.36
> "Linux" should be capitalized.
Oh, yes. I see.

> In config.sgml it uses both "zero" and "0", while in libpq.sgml it only
> uses "zero".
> Since you patterned it from keepalives* docs for both files, I think it's
> alright doing it that way.
Thanks.

In addition, I modified tcp client side patch.
The last patch can't be compiled...
# declaration of sebuf in setTCPUserTimeout()
I'm very sorry for ridiculous mistake.

Best regards,
---------------------
Ryohei Nagaura



Attachment

RE: Timeout parameters

From
"Nagaura, Ryohei"
Date:
Hello,

In the last client-side tcp user timeout patch:
+        appendPQExpBuffer(&conn->errorMessage,
+                    libpq_gettext("setsockopt(TCP_USER_TIMEOUT) not supported: %s\n"),
+                        SOCK_STRERROR(SOCK_ERRNO, sebuf, sizeof(sebuf)));
Keepalives* parameters in client-side libpq don't output error message like above even if the system don't support.
So, I removed this sentence in the current patch (TCP_interface).

Best regards,
---------------------
Ryohei Nagaura



Attachment

RE: Timeout parameters

From
"Tsunakawa, Takayuki"
Date:
Nagaura-san,

The client-side tcp_user_timeout patch looks good.

The server-side tcp_user_timeout patch needs fixing the following:


(1)
+            GUC_UNIT_MS | GUC_NOT_IN_SAMPLE
+        12000, 0, INT_MAX,

GUC_NOT_IN_SAMPLE should be removed because the parameter appears in postgresql.conf.sample.
The default value should be 0, not 12000.


(2)
Now that you've changed show_tcp_usertimeout() to use pq_gettcpusertimeout(), you need to modify
pq_settcpusertimeout();just immitate pq_setkeepalivesidle().  Otherwise, SHOW would display a wrong value.
 


Regards
Takayuki Tsunakawa







RE: Timeout parameters

From
"Tsunakawa, Takayuki"
Date:
Nagaura-san,

The socket_timeout patch needs the following fixes.  Now that others have already tested these patches successfully,
theyappear committable to me.
 


(1)
+        else
+            goto iiv_error;
...
+
+iiv_error:
+    conn->status = CONNECTION_BAD;
+    printfPQExpBuffer(&conn->errorMessage,
+                      libpq_gettext("invalid integer value for socket_timeout\n"));
+    return false;

This goto and its corresponding iiv_error label are redundant.  You can just set the error message and return at the
callsite of parse_int_param().  i.e.:
 

if (!parse_int_param(...))
{
    error processing
    return false;
}
if(conn->socket_timeout > 0 && conn->socket_timeout < 2)
    conn->socket_timeout = 2;

The reason why oom_error label is present is that it is used at multiple places to avoid repeating the same error
processingcode.
 


(2)
+            conn->sock = -1;

Use PGINVALID_SOCKET instead of -1.


Regards
Takayuki Tsunakawa





RE: Timeout parameters

From
"Jamison, Kirk"
Date:
Hi, 

>The socket_timeout patch needs the following fixes.  Now that others have already tested these patches >successfully,
theyappear committable to me.
 

In addition, regarding socket_timeout parameter.
I referred to the doc in libpq.sgml, corrected misspellings,
and rephrased the doc a little bit as below:


Maximum wait in seconds (write as a decimal integer, e.g. 10) for socket read/write operation before closing the
connection.A value of zero (the default) turns this off, which means wait indefinitely. The minimum allowed timeout is
2seconds, so a value of 1 is interpreted as 2.
 

Although this can be used as a stopgap timeout measure, it is recommended
to set a value higher than the other timeout parameters
(<literal>connect_timeout</literal>, <literal>statement_timeout</literal>,
<literal>TCP_KEEP_ALIVES</literal>, <literal>TCP_USER_TIMEOUT</literal>)
because setting a smaller value will make the other configured timeout
parameters meaningless and can cause undesirable disconnection.

Regards,
Kirk Jamison





RE: Timeout parameters

From
"Nagaura, Ryohei"
Date:
Hi, Tsunakawa-san, Kirk-san.

Thank you for your review.


Tsunakawa-san,
> From: Tsunakawa, Takayuki [mailto:tsunakawa.takay@jp.fujitsu.com]
> The client-side tcp_user_timeout patch looks good.
Thanks, but I minorly changed that patch.
It is the declaration place of sebuf[], moving to just before where it'll become needed.


> The server-side tcp_user_timeout patch needs fixing the following:
> (1)
> +            GUC_UNIT_MS | GUC_NOT_IN_SAMPLE
> +        12000, 0, INT_MAX,
> 
> GUC_NOT_IN_SAMPLE should be removed because the parameter appears in 
> postgresql.conf.sample.
> The default value should be 0, not 12000.
Oops, fixed!

> (2)
> Now that you've changed show_tcp_usertimeout() to use 
> pq_gettcpusertimeout(), you need to modify pq_settcpusertimeout(); just immitate pq_setkeepalivesidle().
> Otherwise, SHOW would display a wrong value.
Indeed, pq_settcpusertimeout() should be modified as well.


> The socket_timeout patch needs the following fixes.  Now that others 
> have already tested these patches successfully, they appear committable to me.
Thank you so much for reviewing even socket_timeout patch.

> The reason why oom_error label is present is that it is used at 
> multiple places to avoid repeating the same error processing code.
Understood.

> (2)
> +            conn->sock = -1;
> 
> Use PGINVALID_SOCKET instead of -1.
Oh, I see.


Kirk-san,
> I referred to the doc in libpq.sgml, corrected misspellings, and rephrased the
> doc a little bit as below:
Changed "stop-gap" and "mean" -> "stopgap" and "measure" respectively.
But I can't find the difference between mine and yours.
Would you please tell me if there is a difference like this depending on your rephrasing?

Best regards,
---------------------
Ryohei Nagaura



Attachment

RE: Timeout parameters

From
"Nagaura, Ryohei"
Date:
Hi, Kirk-san,

> From: Jamison, Kirk [mailto:k.jamison@jp.fujitsu.com]
> In addition, regarding socket_timeout parameter.
> I referred to the doc in libpq.sgml, corrected misspellings, and rephrased the
> doc a little bit as below:
You aimed consistency with connect_timeout. Didn't you?
If yes, That is a nice idea!

Best regards,
---------------------
Ryohei Nagaura


Attachment

RE: Timeout parameters

From
"Nagaura, Ryohei"
Date:
Hi all.

I found that connect_timeout uses pqWaitTimed().
Socket_timeout is related to pqWait() not pqWaitTimed().
Thus, I removed connect_timeout in my socket_Timeout patch.

FYI, I summarized a use case of this parameter.
The connection is built successfully.
Suppose that the server is hit by some kind of accident(e,g,. I or Tsunakawa-san suggested).
At this time, there is no guarantee that the server OS can pass control to the postmaster.
Therefore, it is considered natural way to disconnect the connection from the client side.
The place of implement of disconnection is pqWait() because it is where users' infinite wait occurs.

Best regards,
---------------------
Ryohei Nagaura



Attachment

RE: Timeout parameters

From
"Nagaura, Ryohei"
Date:
Hi all.

I found my mistake in backend patch.

I modified from
+    port->keepalives_count = count;
to
+    port->tcp_user_timeout = timeout;
in line 113.

Sorry for mistake like this again and again...
Best regards,
---------------------
Ryohei Nagaura



Attachment

Re: Timeout parameters

From
Kyotaro HORIGUCHI
Date:
Hi, thank you for the new version.

This compiles on Windows. (I didn't comfirmed that it works
correctly..)

# ȁ  (I inserted this garbage to force my mailer to send in utf-8).

At Fri, 29 Mar 2019 06:52:41 +0000, "Nagaura, Ryohei" <nagaura.ryohei@jp.fujitsu.com> wrote in
<EDA4195584F5064680D8130B1CA91C4540F81D@G01JPEXMBYT04>
> Hi all.
> 
> I found my mistake in backend patch.
> 
> I modified from
> +    port->keepalives_count = count;
> to
> +    port->tcp_user_timeout = timeout;
> in line 113.

I hadn't noticed^^;

It look good, but still I have some comments.

+        Specifies the number of milliseconds that transmitted data may
+        remain unacknowledged before a connection is forcefully closed.

Hmm. "forcefully" means powerful or assertive, in Japanese "力強く
" or "強硬に".  "forcibly" means acoomplished through force, in
Japanesee "無理やり" or "強引に". Actually the latter is used in
man 7 tcp.

http://man7.org/linux/man-pages/man7/tcp.7.html
>  time in milliseconds that transmitted data may remain
>  unacknowledged before TCP will forcibly close the
>  corresponding connection and return ETIMEDOUT to the

+#tcp_user_timeout = 0        # TCP_USER_TIMEOUT, in milliseconds;

The comment protrudes left.

+# - TCP USER TIMEOUT -

Just above in the sample file, there are lines like this:

> # - TCP Keepalives -
> # see "man 7 tcp" for details
> 
> #tcp_keepalives_idle = 0        # TCP_KEEPIDLE, in seconds;

Couldn't we have the new variable in the section, like this?

-# - TCP Keepalives -
+# - TCP timeout settings -
 # see "man 7 tcp" for details
..
 #tcp_keepalives_count = 0        # TCP_KEEPCNT;
                     # 0 selects the system default
+#tcp_user_timeout = 0            # TCP_USER_TIMEOUT, in milliseconds;
                    # 0 selects the system default


regars.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

RE: Timeout parameters

From
"Nagaura, Ryohei"
Date:
Hi Horiguchi-san,
Thank you so much for your review!

> From: Kyotaro HORIGUCHI [mailto:horiguchi.kyotaro@lab.ntt.co.jp]
> Hmm. "forcefully" means powerful or assertive, in Japanese "力強く
> " or "強硬に".  "forcibly" means acoomplished through force, in Japanesee "
> 無理やり" or "強引に". Actually the latter is used in man 7 tcp.
Oops, I adopted the latter in both patches about tcp user timeout.

> The comment protrudes left.
Fixed.

> Couldn't we have the new variable in the section, like this?
ok!

Best regards,
---------------------
Ryohei Nagaura



Attachment

RE: Timeout parameters

From
"Jamison, Kirk"
Date:
Hi Nagaura-san,

Thank you for the updated patches.
It became a long thread now, but it's okay, you've done a good amount of work.
There are 3 patches in total: 2 for tcp_user_timeout parameter, 1 for socket_timeout.


A. TCP_USER_TIMEOUT
Since I've been following the updates, I compiled a summary of addressed changes
you made from all the reviewers' comments for TCP_USER_TIMEOUT:    

For both client and server: apply ok, build ok, compile ok.

[DOCS]
I confirmed that docs were patterned from keepalives in both
config.sgml (backend) and libpq.sgml (interface). 
- changed "forcefully" to "forcibly"

It seems OK now to me.

[CODE]
1. SERVER  TCP_backend patch (v19)

- as per advice of Fabien, default postgres configuration file
has been modified (postgresql.conf.sample);
- as per advice of Horiguchi-san, I confirmed that the variable
is now grouped along with keepalives* under the TCP timeout settings.

- confirmed the removal of "errno != ENOPROTOOPT" in backend's pq_settcpusertimeout()

- confirmed the hook from show_tcp_user_timeout to pq_gettcpusertimeout(), 
based from the comment of Horiguchi-san to follow the tcp_keepalive* options' behavior.

- as per advice of Tsunakawa-san, GUC_NOT_IN_SAMPLE has been removed
and the default value is changed to 0.

- confirmed the modification of pq_settcpusertimeout():
a. to follow keepalives' pq_set* behavior as well
b. correct the error of sizeof(timeout)) << 0 to just a single '<'; 
'port->keepalives_count = count' to 'port->tcp_user_timeout = timeout'


2. CLIENT    TCP_interface patch (v18)

- confirmed that tcp_user_timeout is placed after keepalives options
in src/interfaces/libpq/fe-connect.c, 

- confirmed the change / integration for the comment below in fe-connect.c:PQconnectPoll():
>I don't see a point in the added part having own "if
>(!IS_AF_UNIX" block separately from tcp_keepalive options.

- confirmed the removal of the following:
a. "errno != ENOPROTOOPT" in setTCPUserTimeout()
b. unused parameter: char *tcp_user_timeout;
c. error for non-supported systems


Overall, I tested the patch using the same procedure I did in the above email,
and it worked well. Given that you addressed the comments from all
the reviewers, I've judged these 2 patches as committable now.

----------------------------

B. socket_timeout parameter

> FYI, I summarized a use case of this parameter.
> The connection is built successfully.
> Suppose that the server is hit by some kind of accident(e,g,. I or Tsunakawa-san suggested).
> At this time, there is no guarantee that the server OS can pass control to the postmaster.
> Therefore, it is considered natural way to disconnect the connection from the client side.
> The place of implement of disconnection is pqWait() because it is where users' infinite wait occurs.

apply, build, compile --> OK
Doc seems OK now to me.
The patch looks good to me as you addressed the code comments from Tsunakawa-san.
Although it's still the committer who will have the final say.

----------------------------

That said, I am now marking this one "Ready for Committer".
If others have new comments by weekend, feel free to change the status.


Regards,
Kirk Jamison

RE: Timeout parameters

From
Fabien COELHO
Date:
Hello Ryohei-san,

I have further remarks after Kirk-san extensive review on these patches.

* About TCP interface v18.

For homogeneity with the surrounding cases, ISTM that "TCP_user_timeout" 
should be ""TCP-user-timeout".


* About TCP backend v19 patch

I still disagree with "on other systems, it must be zero.": I do not see 
why "it must be zero", I think that the parameter should simply be ignored 
if it does not apply or is not implemented on a platform?

If there are consistency constraint with other timeout parameters, 
probably the documentation should mention it?


* About socket_timeout v12 patch, I'm not sure there is a consensus.

I still think that there should be an attempt at cancelling before 
severing.

Robert pointed out that it is not a timeout wrt the query, but this is not 
clearly explained in the documentation nor the comments. The doc says that
it is the time for socket read/write operations, but it is somehow the 
time between messages, some of which may not be linked to read/write 
operations. I feel that the documentation is not very precise about what 
it really does.

ISTM that the implementation could make the cancelling as low as 1 second 
because of rounding. This could be said somewhere, maybe in the doc, 
surely in a comment.

I still think that this parameter should be preservered on psql's 
reconnections when explicitely set to non zero.

-- 
Fabien.



RE: Timeout parameters

From
"Nagaura, Ryohei"
Date:
Hello, Fabien-san.

> From: Fabien COELHO [mailto:coelho@cri.ensmp.fr]
> I have further remarks after Kirk-san extensive review on these patches.
Yes, I'm welcome.
Thank you very much for your review.


> * About TCP interface v18.
> For homogeneity with the surrounding cases, ISTM that "TCP_user_timeout"
> should be ""TCP-user-timeout".
Oops, it would be better. Thanks.
Note that I changed to "TCP-User-Timeout" not "TCP-user-timeout."
Also, I deleted this
+    /* TCP USER TIMEOUT */


> * About TCP backend v19 patch
> I still disagree with "on other systems, it must be zero.": I do not see why
> "it must be zero", I think that the parameter should simply be ignored if it
> does not apply or is not implemented on a platform?
No, please see the below with "src/backend/libpq/pqcomm.c".
I made consistent with Keepalives documentation and implementation.
i,e,. if the setting value is non-zero on a system that does not support 
these parameters, an "not supported" error is output.
Therefore, "on other systems, it must be zero" in doc.
Also, the error message "not supported" is needed because if an error is not output when you set that parameter on such
asystem, it seems to be available in spite of not available.
 
In my patch, this situation corresponds to line 114.

> If there are consistency constraint with other timeout parameters, probably the
> documentation should mention it?
As I said above.


> * About socket_timeout v12 patch, I'm not sure there is a consensus.
> I still think that there should be an attempt at cancelling before
> severing.
When some accidents hits the server, infinite wait of the user may occur.
This feature is a way to get around it.
It is not intended to reduce server load.
Since sending a cancellation request will cause the user to wait[1], I do not think that it follows the intention.
Do you think that you still need to send a cancellation request?

> Robert pointed out that it is not a timeout wrt the query, but this is not
> clearly explained in the documentation nor the comments.
Sorry, I couldn't recognize your intension.
Is it that you think the description "this is not a query timeout" is needed?

> The doc says that
> it is the time for socket read/write operations, but it is somehow the
> time between messages, some of which may not be linked to read/write
> operations. I feel that the documentation is not very precise about what
> it really does.
What socket_timeout really does is a timeout by poll().
poll() monitors a socket file descriptor,
so the document "the time for socket read/write operations" is correct.

> ISTM that the implementation could make the cancelling as low as 1 second
> because of rounding. This could be said somewhere, maybe in the doc,
> surely in a comment.
It is said in doc. Right?
+        The minimum allowed timeout is 2 seconds, so a value of 1 is
+        interpreted as 2.
Or "because of rounding" is needed?

> I still think that this parameter should be preservered on psql's
> reconnections when explicitely set to non zero.
Would you please tell me your vision ?
Do you want to inherit all parameters with one option?
Or one to one?

[1] https://www.postgresql.org/message-id/0A3221C70F24FB45833433255569204D1FBC7FD1%40G01JPEXMBYT05
Best regards,
---------------------
Ryohei Nagaura



Attachment

RE: Timeout parameters

From
"Jamison, Kirk"
Date:
Hi again,

Since Nagaura-san seems to have addressed the additional comments on
tcp user timeout patches, I still retain the status for the patch set as
ready for committer.

As for socket_timeout, I agree that this can be discussed further.
>Fabien Coelho wrote:
>> I still think that this parameter should be preservered on psql's 
>> reconnections when explicitely set to non zero.

The default value of socket_timeout parameter is 0.
Shouldn't it be back to default value on psql reconnection?
May I ask why and how should the previous non-zero setting be preserved
on psql reconnection?

Regards,
Kirk Jamison




RE: Timeout parameters

From
"Nagaura, Ryohei"
Date:
Hello.

> From: Jamison, Kirk <k.jamison@jp.fujitsu.com>
> As for socket_timeout, I agree that this can be discussed further.
Yes, probably.

> From: Fabien COELHO <coelho@cri.ensmp.fr>
> * About socket_timeout v12 patch, I'm not sure there is a consensus.
I think so too. I just made the patch being able to be committed anytime.

Finally, I rebased all the patches because they have come not to be applied to the updated PG.
Unlike before, note that the current socket_timeout patch premiss to be TCP_interface patch.

Best regards,
---------------------
Ryohei Nagaura


Attachment

RE: Timeout parameters

From
"Jamison, Kirk"
Date:
On Thursday, April 4, 2019 5:20PM (GMT+9), Ryohei Nagaura wrote:

> > From: Fabien COELHO <coelho@cri.ensmp.fr>
> > * About socket_timeout v12 patch, I'm not sure there is a consensus.
> I think so too. I just made the patch being able to be committed anytime.
> 
> Finally, I rebased all the patches because they have come not to be applied to the updated PG.

I just checked and confirmed that the TCP USER TIMEOUT patch set v20 works.
Although you should capitalize "linux" to "Linux" as already mentioned before.
The committer can also just fix that very minor part, if patch is deemed committable.

Note to committer: The "Ready for Committer" status is mainly intended for
tcp user timeout parameter.

OTOH, unless there is consensus with the socket_timeout,
for now the socket_timeout patch status still remains as "Needs Review".

Regards,
Kirk Jamison




Re: Timeout parameters

From
Michael Paquier
Date:
On Fri, Apr 05, 2019 at 04:39:36AM +0000, Jamison, Kirk wrote:
> I just checked and confirmed that the TCP USER TIMEOUT patch set v20
> works.  Although you should capitalize "linux" to "Linux" as already
> mentioned before.  The committer can also just fix that very minor
> part, if patch is deemed committable.

The first letter should be upper-case.

> Note to committer: The "Ready for Committer" status is mainly intended for
> tcp user timeout parameter.
>
> OTOH, unless there is consensus with the socket_timeout,
> for now the socket_timeout patch status still remains as "Needs Review".

I was looking at the patch set a couple of days ago.  The proposed
TCP_backend_v20.patch and TCP_interface_v20.patch make sense, but it
seems to me that socket_timeout_v14.patch should be rejected as it
could cause a connection to go down with no actual reason and that
the server should be in charge of handling timeouts.  Is my impression
right?
--
Michael

Attachment

RE: Timeout parameters

From
"Tsunakawa, Takayuki"
Date:
From: Michael Paquier [mailto:michael@paquier.xyz]
> The first letter should be upper-case.

Thank you for taking care of this patch, and sorry to cause you trouble to fix that...


> to me that socket_timeout_v14.patch should be rejected as it could cause
> a connection to go down with no actual reason and that the server should
> be in charge of handling timeouts.  Is my impression right?

No, the connection goes down for a good reason that the client could not get the response within a tolerable amount of
time.


Regards
Takayuki Tsunakawa






Re: Timeout parameters

From
Michael Paquier
Date:
On Fri, Apr 05, 2019 at 07:34:48AM +0000, Tsunakawa, Takayuki wrote:
> From: Michael Paquier [mailto:michael@paquier.xyz]
>> The first letter should be upper-case.
>
> Thank you for taking care of this patch, and sorry to cause you
> trouble to fix that...

I have just committed the GUC and libpq portion for TCP_USER_TIMEOUT
after a last lookup, and I have cleaned up a couple of places.  It is
a bit disappointing to see the option supported on Linux, but not on
Windows, Solaris and BSD as far as I can see.  Still that's useful in
itself for more control of the TCP connection.  Hopefully I forgot
nobody in the review credits.

For the socket_timeout stuff, its way of solving the problem it thinks
is solves does not seem right to me, and this thread has not reached a
consensus anyway, so I have discarded the issue.

I am marking the CF entry as committed.  In the future, it would be
better to not propose multiple concepts on the same thread, and if the
socket_timeout business is resubmitted, I would suggest a completely
new CF entry, and a new thread.
--
Michael

Attachment

RE: Timeout parameters

From
"Tsunakawa, Takayuki"
Date:
From: Michael Paquier [mailto:michael@paquier.xyz]
> I have just committed the GUC and libpq portion for TCP_USER_TIMEOUT after
> a last lookup, and I have cleaned up a couple of places. 

Thank you for further cleanup and committing.


> For the socket_timeout stuff, its way of solving the problem it thinks is
> solves does not seem right to me, and this thread has not reached a consensus
> anyway, so I have discarded the issue.
> 
> I am marking the CF entry as committed.  In the future, it would be better
> to not propose multiple concepts on the same thread, and if the
> socket_timeout business is resubmitted, I would suggest a completely new
> CF entry, and a new thread.

Understood.  Looking back the review process, it seems that tcp_user_timeout and socket_timeout should have been
handledin separate threads.
 


Regards
Takayuki Tsunakawa





RE: Timeout parameters

From
"Nagaura, Ryohei"
Date:
Hi, Michael-san.

> From: Michael Paquier [mailto:michael@paquier.xyz]
> I have just committed the GUC and libpq portion for TCP_USER_TIMEOUT after a
> last lookup, and I have cleaned up a couple of places.  It is a bit disappointing
> to see the option supported on Linux, but not on Windows, Solaris and BSD as
> far as I can see.  Still that's useful in itself for more control of the TCP
> connection.  Hopefully I forgot nobody in the review credits.
I confirmed your commitment. Thank you.

> I am marking the CF entry as committed.  In the future, it would be better to
> not propose multiple concepts on the same thread, and if the socket_timeout
> business is resubmitted, I would suggest a completely new CF entry, and a new
> thread.
Thank you for your suggestion.
I think it would be better too.

Best regards,
---------------------
Ryohei Nagaura