Thread: pg_terminate_backend
Since I have a stuck backend without client again, I'll have to kill -SIGTERM a backend. Fortunately, I do have console access to that machine and it's not win32 but a decent OS. For other cases I'd really really really appreciate if that function would make it into 8.2. utils/adt/misc.c says: #*ifdef* NOT_USED //* Disabled in 8.0 due to reliability concerns; FIXME someday *// Datum *pg_terminate_backend*(PG_FUNCTION_ARGS) Well, AFAIR there were no more issues raised about code paths that don't clean up correctly, so can we please remove that comment and make the function live finally? Regards, Andreas
Andreas Pflug wrote: >Since I have a stuck backend without client again, I'll have to kill -SIGTERM a backend. Fortunately, I do >have console access to that machine and it's not win32 but a decent OS. > > You do know that on Windows you can use pg_ctl to send a pseudo SIGTERM to a backend, don't you? cheers andrew
Andreas Pflug <pgadmin@pse-consulting.de> writes: > utils/adt/misc.c says: > //* Disabled in 8.0 due to reliability concerns; FIXME someday *// > Datum > *pg_terminate_backend*(PG_FUNCTION_ARGS) > Well, AFAIR there were no more issues raised about code paths that don't clean up correctly, so can we please > remove that comment and make the function live finally? No, you have that backwards. The burden of proof is on those who want it to show that it's now safe. The situation is not different than it was before, except that we can now actually point to a specific bug that did exist, whereas the original concern was just an unfocused one that the code path hadn't been adequately exercised. That concern is now even more pressing than it was. regards, tom lane
Andrew Dunstan wrote: > > > Andreas Pflug wrote: > >> Since I have a stuck backend without client again, I'll have to kill >> -SIGTERM a backend. Fortunately, I do have console access to that >> machine and it's not win32 but a decent OS. >> >> > > You do know that on Windows you can use pg_ctl to send a pseudo > SIGTERM to a backend, don't you? The main issue still is that console access id required, on any OS. Regards, Andreas
Tom Lane wrote: > Andreas Pflug <pgadmin@pse-consulting.de> writes: > >> utils/adt/misc.c says: >> //* Disabled in 8.0 due to reliability concerns; FIXME someday *// >> Datum >> *pg_terminate_backend*(PG_FUNCTION_ARGS) >> > > >> Well, AFAIR there were no more issues raised about code paths that don't clean up correctly, so can we please >> remove that comment and make the function live finally? >> > > No, you have that backwards. The burden of proof is on those who want > it to show that it's now safe. The situation is not different than it > was before, except that we can now actually point to a specific bug that > did exist, whereas the original concern was just an unfocused one that > the code path hadn't been adequately exercised. That concern is now > even more pressing than it was. > If the backend's stuck, I'll have to SIGTERM it, whether there's pg_terminate_backend or not. Ultimately, if resources should remain locked, there's no chance except restarting the whole server anyway. SIGTERM gives me a fair chance (>90%) that it will work without restart. The persistent refusal of supporting the function makes it more painful to execute, but not less necessary. Regards, Andreas
Tom Lane wrote: > Andreas Pflug <pgadmin@pse-consulting.de> writes: > > utils/adt/misc.c says: > > //* Disabled in 8.0 due to reliability concerns; FIXME someday *// > > Datum > > *pg_terminate_backend*(PG_FUNCTION_ARGS) > > > Well, AFAIR there were no more issues raised about code paths that don't clean up correctly, so can we please > > remove that comment and make the function live finally? > > No, you have that backwards. The burden of proof is on those who want > it to show that it's now safe. The situation is not different than it > was before, except that we can now actually point to a specific bug that > did exist, whereas the original concern was just an unfocused one that > the code path hadn't been adequately exercised. That concern is now > even more pressing than it was. I am not sure how you prove the non-existance of a bug. Ideas? -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > Tom Lane wrote: >> No, you have that backwards. The burden of proof is on those who want >> it to show that it's now safe. The situation is not different than it >> was before, except that we can now actually point to a specific bug that >> did exist, whereas the original concern was just an unfocused one that >> the code path hadn't been adequately exercised. That concern is now >> even more pressing than it was. > I am not sure how you prove the non-existance of a bug. Ideas? What I'm looking for is some concentrated testing. The fact that some people once in a while SIGTERM a backend doesn't give me any confidence in it. regards, tom lane
> What I'm looking for is some concentrated testing. The fact that some > people once in a while SIGTERM a backend doesn't give me any confidence > in it. Now wait a minute, is there some risk of lockup if I kill a backend ? Cause I do that relatively often (say 20 times a day, when some web users time out but their query keeps running). Should I rather not do it ? Thanks, Csaba.
Csaba Nagy <nagy@ecircle-ag.com> writes: > Now wait a minute, is there some risk of lockup if I kill a backend ? > Cause I do that relatively often (say 20 times a day, when some web > users time out but their query keeps running). Should I rather not do it > ? statement_timeout is your friend. regards, tom lane
You didn't answer the original question: is killing SIGTERM a backend known/suspected to be dangerous ? And if yes, what's the risk (pointers to discussions would be nice too). > statement_timeout is your friend. I know, but unfortunately I can't use it. I did try to use statement_timeout and it worked out quite bad (due to our usage scenario). Some of the web requests which time out on the web should still go through... and we have activities which should not observe statement timeout at all, i.e. they must finish however long that takes. I know it would be possible to use a different user with it's own statement timeout for those requests, but that means we have to rewrite a lot of code which is not possible immediately, and our admins would resist to add even more configuration (additional users=additional connection pool+caches and all to be configured). We also can fix the queries so no timeout happens in the first place, but that will take us even more time. Cheers, Csaba.
Andreas Pflug <pgadmin@pse-consulting.de> writes: > Tom Lane wrote: >> No, you have that backwards. The burden of proof is on those who want >> it to show that it's now safe. > If the backend's stuck, I'll have to SIGTERM it, whether there's > pg_terminate_backend or not. "Stuck?" You have not shown us a case where SIGTERM rather than SIGINT is necessary or appropriate. It seems to me the above is assuming the existence of unknown backend bugs, exactly the same thing you think I shouldn't be assuming ... regards, tom lane
On Thu, 2006-08-03 at 18:10, Csaba Nagy wrote: > You didn't answer the original question: is killing SIGTERM a backend ^^^^^^^^^^^^^^^ Nevermind, I don't do that. I do 'kill backend_pid' without specifying the signal, and I'm sufficiently unfamiliar with the unix signal names to have confused them. Is a plain "kill" still dangerous ? Thanks, Csaba.
> "Stuck?" You have not shown us a case where SIGTERM rather than SIGINT > is necessary or appropriate. It seems to me the above is assuming the > existence of unknown backend bugs, exactly the same thing you think > I shouldn't be assuming ... I do know a case where a plain kill will seem to be stucked: on vacuum of a big table. I guess when it starts an index's cleanup scan it will insist to finish it before stopping. I'm not sure if that's the cause, but I have seen delays of 30 minutes for killing a vacuum... it's true that finally it always did die... but it's also true that I have 'kill -9'-ed it before because I thought it's stucked. Cheers, Csaba.
Csaba Nagy <nagy@ecircle-ag.com> writes: > I do know a case where a plain kill will seem to be stucked: on vacuum > of a big table. I guess when it starts an index's cleanup scan it will > insist to finish it before stopping. We've fixed a few cases of missing CHECK_FOR_INTERRUPTS lately, and will fix more if you can point them out. Note though that SIGTERM is just as vulnerable to that as SIGINT. regards, tom lane
Tom Lane wrote: > Andreas Pflug <pgadmin@pse-consulting.de> writes: > >> Tom Lane wrote: >> >>> No, you have that backwards. The burden of proof is on those who want >>> it to show that it's now safe. >>> > > >> If the backend's stuck, I'll have to SIGTERM it, whether there's >> pg_terminate_backend or not. >> > > "Stuck?" You have not shown us a case where SIGTERM rather than SIGINT > is necessary or appropriate. Last night, I had a long-running query I launched from pgAdmin. It was happily running and completing on the server (took about 2 hours), and the backend went back to <IDLE>. pgAdmin didn't get back a response, assuming the query was still running. Apparently, the VPN router had interrupted the connection silently without notifying either side of the tcp connection. Since the backend is <IDLE>, there's no query to cancel and SIGINT won't help. So "Stuck" for me means a backend *not* responding to SIGINT. BTW, there's another scenario where SIGINT won't help. Imagine an app running wild hammering the server with queries regardless of query cancels (maybe some retry mechanism). You'd like to interrupt that connection, i.e. get rid of the backend. Regards, Andreas
Csaba Nagy wrote: > On Thu, 2006-08-03 at 18:10, Csaba Nagy wrote: > >> You didn't answer the original question: is killing SIGTERM a backend >> > ^^^^^^^^^^^^^^^ > Nevermind, I don't do that. I do 'kill backend_pid' without specifying > the signal, and I'm sufficiently unfamiliar with the unix signal names > to have confused them. Is a plain "kill" still dangerous ? > SIGTERM is the default kill parameter, so you do exactly what I'm talking about. Regards, Andreas
Csaba Nagy <nagy@ecircle-ag.com> writes: > On Thu, 2006-08-03 at 18:10, Csaba Nagy wrote: >> You didn't answer the original question: is killing SIGTERM a backend > ^^^^^^^^^^^^^^^ > Nevermind, I don't do that. I do 'kill backend_pid' without specifying > the signal, "man kill" says the default is SIGTERM. regards, tom lane
Bruce Momjian wrote: > > > I am not sure how you prove the non-existance of a bug. Ideas? > Would be worth at least the Nobel prize :-) Regards, Andreas
> "man kill" says the default is SIGTERM. OK, so that means I do use it... is it known to be dangerous ? I thought till now that it is safe to use. What about "select pg_cancel_backend()" ? Thanks, Csaba.
Csaba Nagy wrote: >> "man kill" says the default is SIGTERM. >> > > OK, so that means I do use it... is it known to be dangerous ? I thought > till now that it is safe to use. Apparently you never suffered any problems from that; neither did I. > What about "select pg_cancel_backend()" > That's the function wrapper around kill -SIGINT, which is probably the way you could safely stop your queries most of the time. Regards, Andreas
<br /><blockquote type="CITE"><pre> <font color="#000000">I am not sure how you prove the non-existance of a bug. Ideas?</font> </pre></blockquote><pre> </pre> I do that by deleting all of my code (usually by accident :-)<br /><br /> No code, no bugs!<br /><br /> --Korry<br /><table cellpadding="0" cellspacing="0" width="100%"><tr><td><br /></td></tr></table>
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > Tom Lane wrote: > >> No, you have that backwards. The burden of proof is on those who want > >> it to show that it's now safe. The situation is not different than it > >> was before, except that we can now actually point to a specific bug that > >> did exist, whereas the original concern was just an unfocused one that > >> the code path hadn't been adequately exercised. That concern is now > >> even more pressing than it was. > > > I am not sure how you prove the non-existance of a bug. Ideas? > > What I'm looking for is some concentrated testing. The fact that some > people once in a while SIGTERM a backend doesn't give me any confidence > in it. OK, here is an opportunity for someone to run tests to get this into 8.2. The code already exists in CVS, but we need testing to enable it. I would think running a huge workload and killing it over and over again would be a good test. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > Tom Lane wrote: >> What I'm looking for is some concentrated testing. The fact that some >> people once in a while SIGTERM a backend doesn't give me any confidence >> in it. > OK, here is an opportunity for someone to run tests to get this into > 8.2. The code already exists in CVS, but we need testing to enable it. > I would think running a huge workload and killing it over and over again > would be a good test. Big multiprocess workload and you kill individual processes at random while letting the rest run. It probably needs to be something that stresses more of the code than pgbench would, too. (For instance, it'd be a good idea if some of the workload involved having a few 2PC transactions getting prepared and then either committed or rolled back ... SIGTERM during a COMMIT PREPARED strikes me as the sort of corner case that's probably never been exercised.) regards, tom lane
Thanks. Good plan. --------------------------------------------------------------------------- Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > Tom Lane wrote: > >> What I'm looking for is some concentrated testing. The fact that some > >> people once in a while SIGTERM a backend doesn't give me any confidence > >> in it. > > > OK, here is an opportunity for someone to run tests to get this into > > 8.2. The code already exists in CVS, but we need testing to enable it. > > I would think running a huge workload and killing it over and over again > > would be a good test. > > Big multiprocess workload and you kill individual processes at random > while letting the rest run. It probably needs to be something that > stresses more of the code than pgbench would, too. (For instance, > it'd be a good idea if some of the workload involved having a few 2PC > transactions getting prepared and then either committed or rolled > back ... SIGTERM during a COMMIT PREPARED strikes me as the sort of > corner case that's probably never been exercised.) > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
> >> Since I have a stuck backend without client again, I'll have to > kill > >> -SIGTERM a backend. Fortunately, I do have console access to > that > >> machine and it's not win32 but a decent OS. > >> > >> > > > > You do know that on Windows you can use pg_ctl to send a pseudo > > SIGTERM to a backend, don't you? > The main issue still is that console access id required, on any OS. Yeah. Though for the Windows case only, we could easily enough make it possible to run pg_ctl kill remotely, since we use a named pipe. Does this seem like a good or bad idea? //Magnus
Magnus Hagander wrote: >>>> Since I have a stuck backend without client again, I'll have to >>>> >> kill >> >>>> -SIGTERM a backend. Fortunately, I do have console access to >>>> >> that >> >>>> machine and it's not win32 but a decent OS. >>>> >>>> >>>> >>> You do know that on Windows you can use pg_ctl to send a pseudo >>> SIGTERM to a backend, don't you? >>> >> The main issue still is that console access id required, on any OS. >> > > Yeah. > Though for the Windows case only, we could easily enough make it > possible to run pg_ctl kill remotely, since we use a named pipe. Does > this seem like a good or bad idea? > Not too helpful. How to kill a win32 backend from a linux workstation? Additionally, NP requires an authenticated RPC connection. I you're not allowed to access the console, you probably haven't got sufficient access permissions to NP as well, or you'd need extra policy tweaking or so. Nightmarish, just to avoid the easy and intuitive way. Regards, Andreas
"Magnus Hagander" <mha@sollentuna.net> writes: > Though for the Windows case only, we could easily enough make it > possible to run pg_ctl kill remotely, since we use a named pipe. Does > this seem like a good or bad idea? Seems like we'd be opening a can of security worms :-( regards, tom lane
> > Though for the Windows case only, we could easily enough make it > > possible to run pg_ctl kill remotely, since we use a named pipe. > Does > > this seem like a good or bad idea? > > Seems like we'd be opening a can of security worms :-( Not really, standard windows ACL already applies to everything, so you need to be an admin on the machine to make it work. Anyhoo, I don't really see the gain in it, which also seems to be what others think, so let's just drop that idea. //Magnus