Thread: zombie connections

zombie connections

From
Robert Haas
Date:
Hi,

Suppose that the server is executing a lengthy query, and the client
breaks the connection. The operating system will be aware that the
connection is no more, but PostgreSQL doesn't notice, because it's not
try to read from or write to the socket. It's not paying attention to
the socket at all. In theory, the query could be one that runs for a
million years and continue to chew up CPU and I/O, or at the very
least a connection slot, essentially forever. That's sad.

I don't have a terribly specific idea about how to improve this, but
is there some way that we could, at least periodically, check the
socket to see whether it's dead? Noticing the demise of the client
after a configurable interval (maybe 60s by default?) would be
infinitely better than never.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: zombie connections

From
Pavel Stehule
Date:


pá 3. 4. 2020 v 14:30 odesílatel Robert Haas <robertmhaas@gmail.com> napsal:
Hi,

Suppose that the server is executing a lengthy query, and the client
breaks the connection. The operating system will be aware that the
connection is no more, but PostgreSQL doesn't notice, because it's not
try to read from or write to the socket. It's not paying attention to
the socket at all. In theory, the query could be one that runs for a
million years and continue to chew up CPU and I/O, or at the very
least a connection slot, essentially forever. That's sad.

I don't have a terribly specific idea about how to improve this, but
is there some way that we could, at least periodically, check the
socket to see whether it's dead? Noticing the demise of the client
after a configurable interval (maybe 60s by default?) would be
infinitely better than never.

+ 1

Pavel


--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: zombie connections

From
Julien Rouhaud
Date:
On Fri, Apr 03, 2020 at 02:40:30PM +0200, Pavel Stehule wrote:
> pá 3. 4. 2020 v 14:30 odesílatel Robert Haas <robertmhaas@gmail.com> napsal:
> >
> > Suppose that the server is executing a lengthy query, and the client
> > breaks the connection. The operating system will be aware that the
> > connection is no more, but PostgreSQL doesn't notice, because it's not
> > try to read from or write to the socket. It's not paying attention to
> > the socket at all. In theory, the query could be one that runs for a
> > million years and continue to chew up CPU and I/O, or at the very
> > least a connection slot, essentially forever. That's sad.
> >
> > I don't have a terribly specific idea about how to improve this, but
> > is there some way that we could, at least periodically, check the
> > socket to see whether it's dead? Noticing the demise of the client
> > after a configurable interval (maybe 60s by default?) would be
> > infinitely better than never.
> >
> 
> + 1


+1 too, I already saw such behavior.


Maybe the postmaster could send some new PROCSIG SIGUSR1 signal to backends at
a configurable interval and let ProcessInterrupts handle it?



Re: zombie connections

From
Konstantin Knizhnik
Date:

On 03.04.2020 15:29, Robert Haas wrote:
> Hi,
>
> Suppose that the server is executing a lengthy query, and the client
> breaks the connection. The operating system will be aware that the
> connection is no more, but PostgreSQL doesn't notice, because it's not
> try to read from or write to the socket. It's not paying attention to
> the socket at all. In theory, the query could be one that runs for a
> million years and continue to chew up CPU and I/O, or at the very
> least a connection slot, essentially forever. That's sad.
>
> I don't have a terribly specific idea about how to improve this, but
> is there some way that we could, at least periodically, check the
> socket to see whether it's dead? Noticing the demise of the client
> after a configurable interval (maybe 60s by default?) would be
> infinitely better than never.
>

There was a patch on commitfest addressing this problem:
https://commitfest.postgresql.org/21/1882/
It it currently included in PostgrePro EE, but the author of the patch 
is not working in our company any more.
Should we resurrects this patch or there is something wrong with the 
proposed approach?




Re: zombie connections

From
Robert Haas
Date:
On Fri, Apr 3, 2020 at 9:34 AM Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
> There was a patch on commitfest addressing this problem:
> https://commitfest.postgresql.org/21/1882/
> It it currently included in PostgrePro EE, but the author of the patch
> is not working in our company any more.
> Should we resurrects this patch or there is something wrong with the
> proposed approach?

Thanks for the link.

Tom seems to have offered some fairly specific criticism in
https://www.postgresql.org/message-id/31564.1563426253%40sss.pgh.pa.us

I haven't studied that thread in detail, but I would suggest that if
you want to resurrect the patch, that might be a good place to start.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: zombie connections

From
Isaac Morland
Date:
On Fri, 3 Apr 2020 at 08:30, Robert Haas <robertmhaas@gmail.com> wrote:

I don't have a terribly specific idea about how to improve this, but
is there some way that we could, at least periodically, check the
socket to see whether it's dead? Noticing the demise of the client
after a configurable interval (maybe 60s by default?) would be
infinitely better than never.

Does it make any difference if the query is making changes? If the query is just computing a result and returning it to the client, there is no point in continuing once the socket is closed. But if it is updating data or making DDL changes, then at least some of the time it would be preferable for the changes to be made. Having said that, in normal operation one wants, at the client end, to see the message from the server that the changes have been completed, not just fire off a change and hope it completes.

Re: zombie connections

From
Robert Haas
Date:
On Fri, Apr 3, 2020 at 9:52 AM Isaac Morland <isaac.morland@gmail.com> wrote:
> Does it make any difference if the query is making changes? If the query is just computing a result and returning it
tothe client, there is no point in continuing once the socket is closed. But if it is updating data or making DDL
changes,then at least some of the time it would be preferable for the changes to be made. Having said that, in normal
operationone wants, at the client end, to see the message from the server that the changes have been completed, not
justfire off a change and hope it completes. 

The system can't know whether the query is going to change anything,
because even if the query is a SELECT, it doesn't know whether any of
the functions or operators called from that SELECT might write data.

I don't think it would be smart to make behavior like this depend on
whether the statement is a SELECT vs. INSERT/UPDATE/DELETE, or on
things like whether there is an explicit transaction open. I think we
should just have a feature that kills the server process if the
connection goes away. If some people don't want that, it can be
optional.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: zombie connections

From
Pavel Stehule
Date:


pá 3. 4. 2020 v 15:52 odesílatel Isaac Morland <isaac.morland@gmail.com> napsal:
On Fri, 3 Apr 2020 at 08:30, Robert Haas <robertmhaas@gmail.com> wrote:

I don't have a terribly specific idea about how to improve this, but
is there some way that we could, at least periodically, check the
socket to see whether it's dead? Noticing the demise of the client
after a configurable interval (maybe 60s by default?) would be
infinitely better than never.

Does it make any difference if the query is making changes? If the query is just computing a result and returning it to the client, there is no point in continuing once the socket is closed. But if it is updating data or making DDL changes, then at least some of the time it would be preferable for the changes to be made. Having said that, in normal operation one wants, at the client end, to see the message from the server that the changes have been completed, not just fire off a change and hope it completes.

I prefer simple solution without any "intelligence". It is much safer to close connect and rollback. Then it is clean protocol - when server didn't reported successful end of operation, then operation was reverted - always.

Re: zombie connections

From
Tom Lane
Date:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> I prefer simple solution without any "intelligence". It is much safer to
> close connect and rollback. Then it is clean protocol - when server didn't
> reported successful end of operation, then operation was reverted - always.

It would be a fatal mistake to imagine that this feature would offer any
greater guarantees in that line than we have today (which is to say,
none really).  It can be no better than the OS network stack's error
detection/reporting, which is necessarily pretty weak.  The fact that
the kernel accepted a "command complete" message from us doesn't mean
that the client was still alive at that instant, much less that the
message will be deliverable.

In general I think the threshold problem for a patch like this will be
"how do you keep the added overhead down".  As Robert noted upthread,
timeout.c is quite a bit shy of being able to handle timeouts that
persist across statements.  I don't think that there's any fundamental
reason it can't be improved, but it will need improvements.

            regards, tom lane



Re: zombie connections

From
Robert Haas
Date:
On Fri, Apr 3, 2020 at 10:34 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> In general I think the threshold problem for a patch like this will be
> "how do you keep the added overhead down".  As Robert noted upthread,
> timeout.c is quite a bit shy of being able to handle timeouts that
> persist across statements.  I don't think that there's any fundamental
> reason it can't be improved, but it will need improvements.

Why do we need that? If we're not executing a statement, we're
probably trying to read() from the socket, and we'll notice if that
returns 0 or -1. So it seems like we only need periodic checks while
there's a statement  in progress.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: zombie connections

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Apr 3, 2020 at 10:34 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> In general I think the threshold problem for a patch like this will be
>> "how do you keep the added overhead down".  As Robert noted upthread,
>> timeout.c is quite a bit shy of being able to handle timeouts that
>> persist across statements.  I don't think that there's any fundamental
>> reason it can't be improved, but it will need improvements.

> Why do we need that? If we're not executing a statement, we're
> probably trying to read() from the socket, and we'll notice if that
> returns 0 or -1. So it seems like we only need periodic checks while
> there's a statement  in progress.

Maybe you could build it that way, but I'm not sure it's a better way.

(1) You'll need to build a concept of a timeout that's not a statement
timeout, but nonetheless should be canceled exactly when the statement
timeout is (not before or after, unless you'd like to incur additional
setitimer() calls).   That's going to involve either timeout.c surgery
or fragile requirements on the callers.

(2) It only wins if a statement timeout is active, otherwise it makes
things worse, because then you need setitimer() at statement start
and end just to enable/disable the socket check timeout.  Whereas
if you just let a once-a-minute timeout continue to run, you don't
incur those kernel calls.

It's possible that we should run this timeout differently depending
on whether or not a statement timeout is active, though I'd prefer to
avoid such complexity if possible.  On the whole, if we have to
optimize just one of those cases, it should be the no-statement-timeout
case; with that timeout active, you're paying two setitimers per
statement anyway.

Anyway, the core problem with the originally-submitted patch was that
it was totally ignorant that timeout.c had restrictions it was breaking.
You can either fix the restrictions, or you can try to design around them,
but you've got to be aware of what that code can and can't do today.

            regards, tom lane



Re: zombie connections

From
Robert Haas
Date:
On Fri, Apr 3, 2020 at 11:50 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> (2) It only wins if a statement timeout is active, otherwise it makes
> things worse, because then you need setitimer() at statement start
> and end just to enable/disable the socket check timeout.  Whereas
> if you just let a once-a-minute timeout continue to run, you don't
> incur those kernel calls.

Oh, that's a really good point. I should have thought of that.

> Anyway, the core problem with the originally-submitted patch was that
> it was totally ignorant that timeout.c had restrictions it was breaking.
> You can either fix the restrictions, or you can try to design around them,
> but you've got to be aware of what that code can and can't do today.

No disagreement there.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company