Thread: zombie connections
Hi, Suppose that the server is executing a lengthy query, and the client breaks the connection. The operating system will be aware that the connection is no more, but PostgreSQL doesn't notice, because it's not try to read from or write to the socket. It's not paying attention to the socket at all. In theory, the query could be one that runs for a million years and continue to chew up CPU and I/O, or at the very least a connection slot, essentially forever. That's sad. I don't have a terribly specific idea about how to improve this, but is there some way that we could, at least periodically, check the socket to see whether it's dead? Noticing the demise of the client after a configurable interval (maybe 60s by default?) would be infinitely better than never. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pá 3. 4. 2020 v 14:30 odesílatel Robert Haas <robertmhaas@gmail.com> napsal:
Hi,
Suppose that the server is executing a lengthy query, and the client
breaks the connection. The operating system will be aware that the
connection is no more, but PostgreSQL doesn't notice, because it's not
try to read from or write to the socket. It's not paying attention to
the socket at all. In theory, the query could be one that runs for a
million years and continue to chew up CPU and I/O, or at the very
least a connection slot, essentially forever. That's sad.
I don't have a terribly specific idea about how to improve this, but
is there some way that we could, at least periodically, check the
socket to see whether it's dead? Noticing the demise of the client
after a configurable interval (maybe 60s by default?) would be
infinitely better than never.
+ 1
Pavel
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Fri, Apr 03, 2020 at 02:40:30PM +0200, Pavel Stehule wrote: > pá 3. 4. 2020 v 14:30 odesílatel Robert Haas <robertmhaas@gmail.com> napsal: > > > > Suppose that the server is executing a lengthy query, and the client > > breaks the connection. The operating system will be aware that the > > connection is no more, but PostgreSQL doesn't notice, because it's not > > try to read from or write to the socket. It's not paying attention to > > the socket at all. In theory, the query could be one that runs for a > > million years and continue to chew up CPU and I/O, or at the very > > least a connection slot, essentially forever. That's sad. > > > > I don't have a terribly specific idea about how to improve this, but > > is there some way that we could, at least periodically, check the > > socket to see whether it's dead? Noticing the demise of the client > > after a configurable interval (maybe 60s by default?) would be > > infinitely better than never. > > > > + 1 +1 too, I already saw such behavior. Maybe the postmaster could send some new PROCSIG SIGUSR1 signal to backends at a configurable interval and let ProcessInterrupts handle it?
On 03.04.2020 15:29, Robert Haas wrote: > Hi, > > Suppose that the server is executing a lengthy query, and the client > breaks the connection. The operating system will be aware that the > connection is no more, but PostgreSQL doesn't notice, because it's not > try to read from or write to the socket. It's not paying attention to > the socket at all. In theory, the query could be one that runs for a > million years and continue to chew up CPU and I/O, or at the very > least a connection slot, essentially forever. That's sad. > > I don't have a terribly specific idea about how to improve this, but > is there some way that we could, at least periodically, check the > socket to see whether it's dead? Noticing the demise of the client > after a configurable interval (maybe 60s by default?) would be > infinitely better than never. > There was a patch on commitfest addressing this problem: https://commitfest.postgresql.org/21/1882/ It it currently included in PostgrePro EE, but the author of the patch is not working in our company any more. Should we resurrects this patch or there is something wrong with the proposed approach?
On Fri, Apr 3, 2020 at 9:34 AM Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: > There was a patch on commitfest addressing this problem: > https://commitfest.postgresql.org/21/1882/ > It it currently included in PostgrePro EE, but the author of the patch > is not working in our company any more. > Should we resurrects this patch or there is something wrong with the > proposed approach? Thanks for the link. Tom seems to have offered some fairly specific criticism in https://www.postgresql.org/message-id/31564.1563426253%40sss.pgh.pa.us I haven't studied that thread in detail, but I would suggest that if you want to resurrect the patch, that might be a good place to start. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, 3 Apr 2020 at 08:30, Robert Haas <robertmhaas@gmail.com> wrote:
I don't have a terribly specific idea about how to improve this, but
is there some way that we could, at least periodically, check the
socket to see whether it's dead? Noticing the demise of the client
after a configurable interval (maybe 60s by default?) would be
infinitely better than never.
Does it make any difference if the query is making changes? If the query is just computing a result and returning it to the client, there is no point in continuing once the socket is closed. But if it is updating data or making DDL changes, then at least some of the time it would be preferable for the changes to be made. Having said that, in normal operation one wants, at the client end, to see the message from the server that the changes have been completed, not just fire off a change and hope it completes.
On Fri, Apr 3, 2020 at 9:52 AM Isaac Morland <isaac.morland@gmail.com> wrote: > Does it make any difference if the query is making changes? If the query is just computing a result and returning it tothe client, there is no point in continuing once the socket is closed. But if it is updating data or making DDL changes,then at least some of the time it would be preferable for the changes to be made. Having said that, in normal operationone wants, at the client end, to see the message from the server that the changes have been completed, not justfire off a change and hope it completes. The system can't know whether the query is going to change anything, because even if the query is a SELECT, it doesn't know whether any of the functions or operators called from that SELECT might write data. I don't think it would be smart to make behavior like this depend on whether the statement is a SELECT vs. INSERT/UPDATE/DELETE, or on things like whether there is an explicit transaction open. I think we should just have a feature that kills the server process if the connection goes away. If some people don't want that, it can be optional. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pá 3. 4. 2020 v 15:52 odesílatel Isaac Morland <isaac.morland@gmail.com> napsal:
On Fri, 3 Apr 2020 at 08:30, Robert Haas <robertmhaas@gmail.com> wrote:
I don't have a terribly specific idea about how to improve this, but
is there some way that we could, at least periodically, check the
socket to see whether it's dead? Noticing the demise of the client
after a configurable interval (maybe 60s by default?) would be
infinitely better than never.Does it make any difference if the query is making changes? If the query is just computing a result and returning it to the client, there is no point in continuing once the socket is closed. But if it is updating data or making DDL changes, then at least some of the time it would be preferable for the changes to be made. Having said that, in normal operation one wants, at the client end, to see the message from the server that the changes have been completed, not just fire off a change and hope it completes.
I prefer simple solution without any "intelligence". It is much safer to close connect and rollback. Then it is clean protocol - when server didn't reported successful end of operation, then operation was reverted - always.
Pavel Stehule <pavel.stehule@gmail.com> writes: > I prefer simple solution without any "intelligence". It is much safer to > close connect and rollback. Then it is clean protocol - when server didn't > reported successful end of operation, then operation was reverted - always. It would be a fatal mistake to imagine that this feature would offer any greater guarantees in that line than we have today (which is to say, none really). It can be no better than the OS network stack's error detection/reporting, which is necessarily pretty weak. The fact that the kernel accepted a "command complete" message from us doesn't mean that the client was still alive at that instant, much less that the message will be deliverable. In general I think the threshold problem for a patch like this will be "how do you keep the added overhead down". As Robert noted upthread, timeout.c is quite a bit shy of being able to handle timeouts that persist across statements. I don't think that there's any fundamental reason it can't be improved, but it will need improvements. regards, tom lane
On Fri, Apr 3, 2020 at 10:34 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > In general I think the threshold problem for a patch like this will be > "how do you keep the added overhead down". As Robert noted upthread, > timeout.c is quite a bit shy of being able to handle timeouts that > persist across statements. I don't think that there's any fundamental > reason it can't be improved, but it will need improvements. Why do we need that? If we're not executing a statement, we're probably trying to read() from the socket, and we'll notice if that returns 0 or -1. So it seems like we only need periodic checks while there's a statement in progress. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Apr 3, 2020 at 10:34 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> In general I think the threshold problem for a patch like this will be >> "how do you keep the added overhead down". As Robert noted upthread, >> timeout.c is quite a bit shy of being able to handle timeouts that >> persist across statements. I don't think that there's any fundamental >> reason it can't be improved, but it will need improvements. > Why do we need that? If we're not executing a statement, we're > probably trying to read() from the socket, and we'll notice if that > returns 0 or -1. So it seems like we only need periodic checks while > there's a statement in progress. Maybe you could build it that way, but I'm not sure it's a better way. (1) You'll need to build a concept of a timeout that's not a statement timeout, but nonetheless should be canceled exactly when the statement timeout is (not before or after, unless you'd like to incur additional setitimer() calls). That's going to involve either timeout.c surgery or fragile requirements on the callers. (2) It only wins if a statement timeout is active, otherwise it makes things worse, because then you need setitimer() at statement start and end just to enable/disable the socket check timeout. Whereas if you just let a once-a-minute timeout continue to run, you don't incur those kernel calls. It's possible that we should run this timeout differently depending on whether or not a statement timeout is active, though I'd prefer to avoid such complexity if possible. On the whole, if we have to optimize just one of those cases, it should be the no-statement-timeout case; with that timeout active, you're paying two setitimers per statement anyway. Anyway, the core problem with the originally-submitted patch was that it was totally ignorant that timeout.c had restrictions it was breaking. You can either fix the restrictions, or you can try to design around them, but you've got to be aware of what that code can and can't do today. regards, tom lane
On Fri, Apr 3, 2020 at 11:50 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > (2) It only wins if a statement timeout is active, otherwise it makes > things worse, because then you need setitimer() at statement start > and end just to enable/disable the socket check timeout. Whereas > if you just let a once-a-minute timeout continue to run, you don't > incur those kernel calls. Oh, that's a really good point. I should have thought of that. > Anyway, the core problem with the originally-submitted patch was that > it was totally ignorant that timeout.c had restrictions it was breaking. > You can either fix the restrictions, or you can try to design around them, > but you've got to be aware of what that code can and can't do today. No disagreement there. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company