Thread: Keepalive
Hi All, I have a very long running query that is not being terminated after a keep alive timeout event. The situation is that theclient drops from the network, the servers’ tcp/ip stack drops the connection, and the Postgres query continues to runwithout a network connection. The given system is running on Linux and I’m being told this is expected behavior; however, that is not has not been my experience. My preferred platform to run Postgres on is FreeBSD and in cases like this the Postgres session is also terminatedonce the tcp/ip connection is dropped by the kernel. Does anyone know if there is a difference on how Linux handles interrupted vs FreeBSD? I’ve actually used tcpdrop on FreeBSDto terminate stubborn sessions that were not responding to pg_terminate_backend(). Is this really expected behavior on Linux? -Rui.
Rui DeSousa <rui.desousa@icloud.com> writes: > I have a very long running query that is not being terminated after a keep alive timeout event. The situation is thatthe client drops from the network, the servers’ tcp/ip stack drops the connection, and the Postgres query continues torun without a network connection. > The given system is running on Linux and I’m being told this is expected behavior; however, that is not has not been myexperience. My preferred platform to run Postgres on is FreeBSD and in cases like this the Postgres session is also terminatedonce the tcp/ip connection is dropped by the kernel. Really? I would expect the query to keep running until the backend tries to perform some I/O to the client. How quickly that happens would depend a great deal on the details of the query, but not on which OS you're running on. regards, tom lane
On Fri, 2024-06-14 at 11:22 -0400, Rui DeSousa wrote: > I have a very long running query that is not being terminated after a keep alive timeout event. > The situation is that the client drops from the network, the servers’ tcp/ip stack drops the > connection, and the Postgres query continues to run without a network connection. > > The given system is running on Linux and I’m being told this is expected behavior; however, > that is not has not been my experience. My preferred platform to run Postgres on is FreeBSD > and in cases like this the Postgres session is also terminated once the tcp/ip connection is > dropped by the kernel. That would surprise me. There is the parameter "client_connection_check_interval" exactly for that. Yours, Laurenz Albe
On Jun 14, 2024, at 11:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:Rui DeSousa <rui.desousa@icloud.com> writes:I have a very long running query that is not being terminated after a keep alive timeout event. The situation is that the client drops from the network, the servers’ tcp/ip stack drops the connection, and the Postgres query continues to run without a network connection.The given system is running on Linux and I’m being told this is expected behavior; however, that is not has not been my experience. My preferred platform to run Postgres on is FreeBSD and in cases like this the Postgres session is also terminated once the tcp/ip connection is dropped by the kernel.
Really?
I would expect the query to keep running until the backend tries to
perform some I/O to the client. How quickly that happens would depend
a great deal on the details of the query, but not on which OS you're
running on.
regards, tom lane
I just tried the following spinner() function on FreeBSD and the keep alive timeout event cause both the network connection to be torn down along with the Postgres session -- as I would expect it to do. I will try this exact function on the Linux system and see if I get different results and report back; however, I might not be able to test it out until next week.
create or replace function spinner()
returns void
as $$
declare
_x bigint := 0;
begin
loop
_x := _x + 1;
end loop;
end;
$$ language plpgsql
;
On Jun 14, 2024, at 3:54 PM, Rui DeSousa <rui.desousa@icloud.com> wrote:On Jun 14, 2024, at 11:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:Rui DeSousa <rui.desousa@icloud.com> writes:I have a very long running query that is not being terminated after a keep alive timeout event. The situation is that the client drops from the network, the servers’ tcp/ip stack drops the connection, and the Postgres query continues to run without a network connection.The given system is running on Linux and I’m being told this is expected behavior; however, that is not has not been my experience. My preferred platform to run Postgres on is FreeBSD and in cases like this the Postgres session is also terminated once the tcp/ip connection is dropped by the kernel.
Really?
I would expect the query to keep running until the backend tries to
perform some I/O to the client. How quickly that happens would depend
a great deal on the details of the query, but not on which OS you're
running on.
regards, tom laneI just tried the following spinner() function on FreeBSD and the keep alive timeout event cause both the network connection to be torn down along with the Postgres session -- as I would expect it to do. I will try this exact function on the Linux system and see if I get different results and report back; however, I might not be able to test it out until next week.create or replace function spinner()returns voidas $$declare_x bigint := 0;beginloop_x := _x + 1;end loop;end;$$ language plpgsql;
Actually, I just tested on it the Linux system. The keep alive event occurred, the kernel state of the connection went to CLOSE_WAIT and then it was later completely removed from the kernel state; however, my spinner() function is still running with no network connection in the kernel table.
So, keep alive does behave differently between FreeBSD and Linux. I really do prefer FreeBSD for many reasons.
-Rui.
> On Jun 14, 2024, at 1:47 PM, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > On Fri, 2024-06-14 at 11:22 -0400, Rui DeSousa wrote: >> I have a very long running query that is not being terminated after a keep alive timeout event. >> The situation is that the client drops from the network, the servers’ tcp/ip stack drops the >> connection, and the Postgres query continues to run without a network connection. >> >> The given system is running on Linux and I’m being told this is expected behavior; however, >> that is not has not been my experience. My preferred platform to run Postgres on is FreeBSD >> and in cases like this the Postgres session is also terminated once the tcp/ip connection is >> dropped by the kernel. > > That would surprise me. > > There is the parameter "client_connection_check_interval" exactly for that. > > Yours, > Laurenz Albe I retested the spinner() function on Linux with the client_connection_check_interval set and it now terminates the spinner()function. Thanks! Rui.
Rui DeSousa <rui.desousa@icloud.com> writes: > Actually, I just tested on it the Linux system. The keep alive event occurred, the kernel state of the connection wentto CLOSE_WAIT and then it was later completely removed from the kernel state; however, my spinner() function is stillrunning with no network connection in the kernel table. > So, keep alive does behave differently between FreeBSD and Linux. I really do prefer FreeBSD for many reasons. The behavior you report for Linux is what I'd expect anywhere. I tried to replicate your results on a freshly-updated FreeBSD 14.1 installation, and could not. With a purely stock Postgres configuration, I see the "spinner" query running indefinitely after the client is killed --- although the kernel does show the server process's client connection being in CLOSE_WAIT state. But if I set client_connection_check_interval to a positive value then the query kills itself at the next multiple of that time, again as expected. So I think there is something non-default about your FreeBSD system. Maybe you'd previously configured it with nonzero client_connection_check_interval, and then forgot about that? The alternative is to suppose that that kernel will kill processes as soon as they have a connection in CLOSE_WAIT state, which would be quite evil for many purposes and is certainly not a "preferable" behavior. regards, tom lane
On Jun 15, 2024, at 7:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:Rui DeSousa <rui.desousa@icloud.com> writes:Actually, I just tested on it the Linux system. The keep alive event occurred, the kernel state of the connection went to CLOSE_WAIT and then it was later completely removed from the kernel state; however, my spinner() function is still running with no network connection in the kernel table.So, keep alive does behave differently between FreeBSD and Linux. I really do prefer FreeBSD for many reasons.
The behavior you report for Linux is what I'd expect anywhere.
I tried to replicate your results on a freshly-updated FreeBSD 14.1
installation, and could not. With a purely stock Postgres
configuration, I see the "spinner" query running indefinitely after
the client is killed --- although the kernel does show the server
process's client connection being in CLOSE_WAIT state. But if I set
client_connection_check_interval to a positive value then the query
kills itself at the next multiple of that time, again as expected.
So I think there is something non-default about your FreeBSD system.
Maybe you'd previously configured it with nonzero
client_connection_check_interval, and then forgot about that?
The alternative is to suppose that that kernel will kill processes
as soon as they have a connection in CLOSE_WAIT state, which would be
quite evil for many purposes and is certainly not a "preferable"
behavior.
regards, tom lane
Yes, I see the same behavior. So trying to figure out why my first test was flawed and I determine Murphy's law is in play. I had an appointment, so I kicked off the query, disconnect the client, when to my appointment, came back and the query was gone. What I didn’t expect was to lose power for few minutes while I was out. I just looked at last command and it reported the server had crashed and rebooted. Hmm.. not knowing why, I also checked the switch it’s connected to and it too rebooted at the same time; so it’s safe to say the system crash do to a power outage. Neither of those are plugged into a UPS; although my firewall is.
I did setup a quick cron job to output the netstat for the connection every minute and didn’t see the four minute gap when I looked at…
.
.
.
Fri Jun 14 14:02:00 EDT 2024
tcp4 0 0 10.6.3.10.5432 10.6.3.44.51478 ESTABLISHED
Fri Jun 14 14:03:00 EDT 2024
tcp4 0 0 10.6.3.10.5432 10.6.3.44.51478 ESTABLISHED
Fri Jun 14 14:04:00 EDT 2024
tcp4 0 0 10.6.3.10.5432 10.6.3.44.51478 ESTABLISHED
Fri Jun 14 14:08:00 EDT 2024
Fri Jun 14 14:09:00 EDT 2024
Fri Jun 14 14:10:00 EDT 2024
Fri Jun 14 14:11:00 EDT 2024
.
.
.