Thread: pg_terminate_backend

pg_terminate_backend

From
Andreas Pflug
Date:
Since I have a stuck backend without client again, I'll have to kill -SIGTERM a backend. Fortunately, I do 
have console access to that machine and it's not win32 but a decent OS. For other cases I'd really really really 
appreciate if that function would make it into 8.2.

utils/adt/misc.c says:

#*ifdef* NOT_USED

//* Disabled in 8.0 due to reliability concerns; FIXME someday *//
Datum
*pg_terminate_backend*(PG_FUNCTION_ARGS)

Well, AFAIR there were no more issues raised about code paths that don't clean up correctly, so can we please
remove that comment and make the function live finally? 

Regards,
Andreas




Re: pg_terminate_backend

From
Andrew Dunstan
Date:

Andreas Pflug wrote:

>Since I have a stuck backend without client again, I'll have to kill -SIGTERM a backend. Fortunately, I do 
>have console access to that machine and it's not win32 but a decent OS.
>  
>

You do know that on Windows you can use pg_ctl to send a pseudo SIGTERM 
to a backend, don't you?

cheers

andrew


Re: pg_terminate_backend

From
Tom Lane
Date:
Andreas Pflug <pgadmin@pse-consulting.de> writes:
> utils/adt/misc.c says:
> //* Disabled in 8.0 due to reliability concerns; FIXME someday *//
> Datum
> *pg_terminate_backend*(PG_FUNCTION_ARGS)

> Well, AFAIR there were no more issues raised about code paths that don't clean up correctly, so can we please
> remove that comment and make the function live finally? 

No, you have that backwards.  The burden of proof is on those who want
it to show that it's now safe.  The situation is not different than it
was before, except that we can now actually point to a specific bug that
did exist, whereas the original concern was just an unfocused one that
the code path hadn't been adequately exercised.  That concern is now
even more pressing than it was.
        regards, tom lane


Re: pg_terminate_backend

From
Andreas Pflug
Date:
Andrew Dunstan wrote:
>
>
> Andreas Pflug wrote:
>
>> Since I have a stuck backend without client again, I'll have to kill
>> -SIGTERM a backend. Fortunately, I do have console access to that
>> machine and it's not win32 but a decent OS.
>>  
>>
>
> You do know that on Windows you can use pg_ctl to send a pseudo
> SIGTERM to a backend, don't you?
The main issue still is that console access id required, on any OS.

Regards,
Andreas



Re: pg_terminate_backend

From
Andreas Pflug
Date:
Tom Lane wrote:
> Andreas Pflug <pgadmin@pse-consulting.de> writes:
>   
>> utils/adt/misc.c says:
>> //* Disabled in 8.0 due to reliability concerns; FIXME someday *//
>> Datum
>> *pg_terminate_backend*(PG_FUNCTION_ARGS)
>>     
>
>   
>> Well, AFAIR there were no more issues raised about code paths that don't clean up correctly, so can we please
>> remove that comment and make the function live finally? 
>>     
>
> No, you have that backwards.  The burden of proof is on those who want
> it to show that it's now safe.  The situation is not different than it
> was before, except that we can now actually point to a specific bug that
> did exist, whereas the original concern was just an unfocused one that
> the code path hadn't been adequately exercised.  That concern is now
> even more pressing than it was.
>   

If the backend's stuck, I'll have to SIGTERM it, whether there's
pg_terminate_backend or not. Ultimately, if resources should remain
locked, there's no chance except restarting the whole server anyway.
SIGTERM gives me a fair chance (>90%) that it will work without restart.

The persistent refusal of supporting the function makes it more painful
to execute, but not less necessary.

Regards,
Andreas



Re: pg_terminate_backend

From
Bruce Momjian
Date:
Tom Lane wrote:
> Andreas Pflug <pgadmin@pse-consulting.de> writes:
> > utils/adt/misc.c says:
> > //* Disabled in 8.0 due to reliability concerns; FIXME someday *//
> > Datum
> > *pg_terminate_backend*(PG_FUNCTION_ARGS)
> 
> > Well, AFAIR there were no more issues raised about code paths that don't clean up correctly, so can we please
> > remove that comment and make the function live finally? 
> 
> No, you have that backwards.  The burden of proof is on those who want
> it to show that it's now safe.  The situation is not different than it
> was before, except that we can now actually point to a specific bug that
> did exist, whereas the original concern was just an unfocused one that
> the code path hadn't been adequately exercised.  That concern is now
> even more pressing than it was.

I am not sure how you prove the non-existance of a bug.  Ideas?

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: pg_terminate_backend

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Tom Lane wrote:
>> No, you have that backwards.  The burden of proof is on those who want
>> it to show that it's now safe.  The situation is not different than it
>> was before, except that we can now actually point to a specific bug that
>> did exist, whereas the original concern was just an unfocused one that
>> the code path hadn't been adequately exercised.  That concern is now
>> even more pressing than it was.

> I am not sure how you prove the non-existance of a bug.  Ideas?

What I'm looking for is some concentrated testing.  The fact that some
people once in a while SIGTERM a backend doesn't give me any confidence
in it.
        regards, tom lane


Re: pg_terminate_backend

From
Csaba Nagy
Date:
> What I'm looking for is some concentrated testing.  The fact that some
> people once in a while SIGTERM a backend doesn't give me any confidence
> in it.

Now wait a minute, is there some risk of lockup if I kill a backend ?
Cause I do that relatively often (say 20 times a day, when some web
users time out but their query keeps running). Should I rather not do it
?

Thanks,
Csaba.




Re: pg_terminate_backend

From
Tom Lane
Date:
Csaba Nagy <nagy@ecircle-ag.com> writes:
> Now wait a minute, is there some risk of lockup if I kill a backend ?
> Cause I do that relatively often (say 20 times a day, when some web
> users time out but their query keeps running). Should I rather not do it
> ?

statement_timeout is your friend.
        regards, tom lane


Re: pg_terminate_backend

From
Csaba Nagy
Date:
You didn't answer the original question: is killing SIGTERM a backend
known/suspected to be dangerous ? And if yes, what's the risk (pointers
to discussions would be nice too).

> statement_timeout is your friend.

I know, but unfortunately I can't use it. I did try to use
statement_timeout and it worked out quite bad (due to our usage
scenario).

Some of the web requests which time out on the web should still go
through... and we have activities which should not observe statement
timeout at all, i.e. they must finish however long that takes.

I know it would be possible to use a different user with it's own
statement timeout for those requests, but that means we have to rewrite
a lot of code which is not possible immediately, and our admins would
resist to add even more configuration (additional users=additional
connection pool+caches and all to be configured). We also can fix the
queries so no timeout happens in the first place, but that will take us
even more time.

Cheers,
Csaba.




Re: pg_terminate_backend

From
Tom Lane
Date:
Andreas Pflug <pgadmin@pse-consulting.de> writes:
> Tom Lane wrote:
>> No, you have that backwards.  The burden of proof is on those who want
>> it to show that it's now safe.

> If the backend's stuck, I'll have to SIGTERM it, whether there's
> pg_terminate_backend or not.

"Stuck?"  You have not shown us a case where SIGTERM rather than SIGINT
is necessary or appropriate.  It seems to me the above is assuming the
existence of unknown backend bugs, exactly the same thing you think
I shouldn't be assuming ...
        regards, tom lane


Re: pg_terminate_backend

From
Csaba Nagy
Date:
On Thu, 2006-08-03 at 18:10, Csaba Nagy wrote:
> You didn't answer the original question: is killing SIGTERM a backend
^^^^^^^^^^^^^^^
Nevermind, I don't do that. I do 'kill backend_pid' without specifying
the signal, and I'm sufficiently unfamiliar with the unix signal names
to have confused them. Is a plain "kill" still dangerous ?

Thanks,
Csaba.





Re: pg_terminate_backend

From
Csaba Nagy
Date:
> "Stuck?"  You have not shown us a case where SIGTERM rather than SIGINT
> is necessary or appropriate.  It seems to me the above is assuming the
> existence of unknown backend bugs, exactly the same thing you think
> I shouldn't be assuming ...

I do know a case where a plain kill will seem to be stucked: on vacuum
of a big table. I guess when it starts an index's cleanup scan it will
insist to finish it before stopping. I'm not sure if that's the cause,
but I have seen delays of 30 minutes for killing a vacuum... it's true
that finally it always did die... but it's also true that I have 'kill
-9'-ed it before because I thought it's stucked.

Cheers,
Csaba.




Re: pg_terminate_backend

From
Tom Lane
Date:
Csaba Nagy <nagy@ecircle-ag.com> writes:
> I do know a case where a plain kill will seem to be stucked: on vacuum
> of a big table. I guess when it starts an index's cleanup scan it will
> insist to finish it before stopping.

We've fixed a few cases of missing CHECK_FOR_INTERRUPTS lately, and will
fix more if you can point them out.  Note though that SIGTERM is just as
vulnerable to that as SIGINT.
        regards, tom lane


Re: pg_terminate_backend

From
Andreas Pflug
Date:
Tom Lane wrote:
> Andreas Pflug <pgadmin@pse-consulting.de> writes:
>   
>> Tom Lane wrote:
>>     
>>> No, you have that backwards.  The burden of proof is on those who want
>>> it to show that it's now safe.
>>>       
>
>   
>> If the backend's stuck, I'll have to SIGTERM it, whether there's
>> pg_terminate_backend or not.
>>     
>
> "Stuck?"  You have not shown us a case where SIGTERM rather than SIGINT
> is necessary or appropriate. 
Last night, I had a long-running query I launched from pgAdmin. It was
happily running and completing on the server (took about 2 hours), and
the backend went back to <IDLE>. pgAdmin didn't get back a response,
assuming the query was still running. Apparently, the VPN router had
interrupted the connection silently without notifying either side of the
tcp connection. Since the backend is <IDLE>, there's no query to cancel
and SIGINT won't help. So "Stuck" for me means a backend *not*
responding to SIGINT.
BTW, there's another scenario where SIGINT won't help. Imagine an app
running wild hammering the server with queries regardless of query
cancels (maybe some retry mechanism). You'd like to interrupt that
connection, i.e. get rid of the backend.

Regards,
Andreas



Re: pg_terminate_backend

From
Andreas Pflug
Date:
Csaba Nagy wrote:
> On Thu, 2006-08-03 at 18:10, Csaba Nagy wrote:
>   
>> You didn't answer the original question: is killing SIGTERM a backend
>>     
>                                               ^^^^^^^^^^^^^^^
> Nevermind, I don't do that. I do 'kill backend_pid' without specifying
> the signal, and I'm sufficiently unfamiliar with the unix signal names
> to have confused them. Is a plain "kill" still dangerous ?
>   
SIGTERM is the default kill parameter, so you do exactly what I'm
talking about.

Regards,
Andreas



Re: pg_terminate_backend

From
Tom Lane
Date:
Csaba Nagy <nagy@ecircle-ag.com> writes:
> On Thu, 2006-08-03 at 18:10, Csaba Nagy wrote:
>> You didn't answer the original question: is killing SIGTERM a backend
>                                               ^^^^^^^^^^^^^^^
> Nevermind, I don't do that. I do 'kill backend_pid' without specifying
> the signal,

"man kill" says the default is SIGTERM.
        regards, tom lane


Re: pg_terminate_backend

From
Andreas Pflug
Date:
Bruce Momjian wrote:
>
>
> I am not sure how you prove the non-existance of a bug.  Ideas?
>   
Would be worth at least the Nobel prize :-)

Regards,
Andreas




Re: pg_terminate_backend

From
Csaba Nagy
Date:
> "man kill" says the default is SIGTERM.

OK, so that means I do use it... is it known to be dangerous ? I thought
till now that it is safe to use. What about "select pg_cancel_backend()"
?

Thanks,
Csaba.



Re: pg_terminate_backend

From
Andreas Pflug
Date:
Csaba Nagy wrote:
>> "man kill" says the default is SIGTERM.
>>     
>
> OK, so that means I do use it... is it known to be dangerous ? I thought
> till now that it is safe to use. 
Apparently you never suffered any problems from that; neither did I.

> What about "select pg_cancel_backend()"
>   

That's the function wrapper around kill -SIGINT, which is probably the
way you could safely stop your queries most of the time.


Regards,
Andreas



Re: pg_terminate_backend

From
"korryd@enterprisedb.com"
Date:
<br /><blockquote type="CITE"><pre>
<font color="#000000">I am not sure how you prove the non-existance of a bug.  Ideas?</font>
</pre></blockquote><pre>

</pre> I do that by deleting all of my code (usually by accident :-)<br /><br /> No code, no bugs!<br /><br />        
--Korry<br /><table cellpadding="0" cellspacing="0" width="100%"><tr><td><br /></td></tr></table> 

Re: pg_terminate_backend

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Tom Lane wrote:
> >> No, you have that backwards.  The burden of proof is on those who want
> >> it to show that it's now safe.  The situation is not different than it
> >> was before, except that we can now actually point to a specific bug that
> >> did exist, whereas the original concern was just an unfocused one that
> >> the code path hadn't been adequately exercised.  That concern is now
> >> even more pressing than it was.
> 
> > I am not sure how you prove the non-existance of a bug.  Ideas?
> 
> What I'm looking for is some concentrated testing.  The fact that some
> people once in a while SIGTERM a backend doesn't give me any confidence
> in it.

OK, here is an opportunity for someone to run tests to get this into
8.2.  The code already exists in CVS, but we need testing to enable it.
I would think running a huge workload and killing it over and over again
would be a good test.

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: pg_terminate_backend

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Tom Lane wrote:
>> What I'm looking for is some concentrated testing.  The fact that some
>> people once in a while SIGTERM a backend doesn't give me any confidence
>> in it.

> OK, here is an opportunity for someone to run tests to get this into
> 8.2.  The code already exists in CVS, but we need testing to enable it.
> I would think running a huge workload and killing it over and over again
> would be a good test.

Big multiprocess workload and you kill individual processes at random
while letting the rest run.  It probably needs to be something that
stresses more of the code than pgbench would, too.  (For instance,
it'd be a good idea if some of the workload involved having a few 2PC
transactions getting prepared and then either committed or rolled
back ... SIGTERM during a COMMIT PREPARED strikes me as the sort of
corner case that's probably never been exercised.)
        regards, tom lane


Re: pg_terminate_backend

From
Bruce Momjian
Date:
Thanks.  Good plan.

---------------------------------------------------------------------------

Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Tom Lane wrote:
> >> What I'm looking for is some concentrated testing.  The fact that some
> >> people once in a while SIGTERM a backend doesn't give me any confidence
> >> in it.
> 
> > OK, here is an opportunity for someone to run tests to get this into
> > 8.2.  The code already exists in CVS, but we need testing to enable it.
> > I would think running a huge workload and killing it over and over again
> > would be a good test.
> 
> Big multiprocess workload and you kill individual processes at random
> while letting the rest run.  It probably needs to be something that
> stresses more of the code than pgbench would, too.  (For instance,
> it'd be a good idea if some of the workload involved having a few 2PC
> transactions getting prepared and then either committed or rolled
> back ... SIGTERM during a COMMIT PREPARED strikes me as the sort of
> corner case that's probably never been exercised.)
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: pg_terminate_backend

From
"Magnus Hagander"
Date:
> >> Since I have a stuck backend without client again, I'll have to
> kill
> >> -SIGTERM a backend. Fortunately, I do have console access to
> that
> >> machine and it's not win32 but a decent OS.
> >>
> >>
> >
> > You do know that on Windows you can use pg_ctl to send a pseudo
> > SIGTERM to a backend, don't you?
> The main issue still is that console access id required, on any OS.

Yeah.
Though for the Windows case only, we could easily enough make it
possible to run pg_ctl kill remotely, since we use a named pipe. Does
this seem like a good or bad idea?

//Magnus



Re: pg_terminate_backend

From
Andreas Pflug
Date:
Magnus Hagander wrote:
>>>> Since I have a stuck backend without client again, I'll have to
>>>>         
>> kill
>>     
>>>> -SIGTERM a backend. Fortunately, I do have console access to
>>>>         
>> that
>>     
>>>> machine and it's not win32 but a decent OS.
>>>>
>>>>
>>>>         
>>> You do know that on Windows you can use pg_ctl to send a pseudo
>>> SIGTERM to a backend, don't you?
>>>       
>> The main issue still is that console access id required, on any OS.
>>     
>
> Yeah.
> Though for the Windows case only, we could easily enough make it
> possible to run pg_ctl kill remotely, since we use a named pipe. Does
> this seem like a good or bad idea?
>   

Not too helpful. How to kill a win32 backend from a linux workstation?
Additionally, NP requires an authenticated RPC connection. I you're not
allowed to access the console, you probably haven't got sufficient
access permissions to NP as well, or you'd need extra policy tweaking or
so. Nightmarish, just to avoid the easy and intuitive way.

Regards,
Andreas


Re: pg_terminate_backend

From
Tom Lane
Date:
"Magnus Hagander" <mha@sollentuna.net> writes:
> Though for the Windows case only, we could easily enough make it
> possible to run pg_ctl kill remotely, since we use a named pipe. Does
> this seem like a good or bad idea?

Seems like we'd be opening a can of security worms :-(
        regards, tom lane


Re: pg_terminate_backend

From
"Magnus Hagander"
Date:
> > Though for the Windows case only, we could easily enough make it
> > possible to run pg_ctl kill remotely, since we use a named pipe.
> Does
> > this seem like a good or bad idea?
>
> Seems like we'd be opening a can of security worms :-(

Not really, standard windows ACL already applies to everything, so you
need to be an admin on the machine to make it work.

Anyhoo, I don't really see the gain in it, which also seems to be what
others think, so let's just drop that idea.

//Magnus