Thread: Kill a session

Kill a session

From
"Craig A. James"
Date:
There have been dozens, perhaps hundreds, of entries in the pg-admin, pg-general, and pg-performance lists regarding
killinga session, but as far as I can tell, there is no Postgres solution.  Did I miss something? 

This raises the question: Why doesn't Postgres have a "kill session" command that works?  Oracle has it, and it's
invaluable;there is no substitute.  Various writers to these PG lists have raised the question repeatedly.  Is it just
amatter that nobody has had the time to do it (which I respect!), or is there a reason why the Postgres team decided a
"killsession" is a bad idea? 

The rest of this email is just to illustrate the convoluted solution I've had to adopt, and even with this, I can't get
itto work quite right. 

Background: In our web app, we give our users a fair amount of power to formulate difficult queries.  These
long-runningqueries are fork/exec'd from the Apache CGI, and we give the user a "job status" page, with the option to
killthe job. 

I can kill off the CGI, since Apache owns the process.  But the "stock answer" of

    kill -2 backend-pid

won't work, because I don't want my Apache jobs running as super-user (!) or as setuid processes.

So here's my solution:  Install a couple of C extensions like this:

    Datum get_session_id(PG_FUNCTION_ARGS)
    {
      PG_RETURN_INT32(getpid());
    }

    Datum kill_session(PG_FUNCTION_ARGS)
    {
      int4 session_id, status;
      session_id = PG_GETARG_INT32(0);
      fprintf(stderr, "KILLING SESSION: %d, 15\n", session_id);
      status = kill(session_id, 15);
      PG_RETURN_BOOL((status == 0) ? true : false);
    }

These are installed with the appropriate "CREATE OR REPLACE ..." sql.  Although this is dangerous (anyone who can log
into Postgres can kill any Postgres job!), its safe enough in a controlled enviroment.  It allows an Apache CGI to
issuethe kill(2) command through the Postgres backend, which is running as the Postgres user, and thus has permission
todo the deed.  When I start a job, I record the backend's PID, which allows another process to connect and kill the
firstone.  Alright, it's a hack, but it's the best I could think of. 

But in spite earlier posting in these forums that say the killing the backend was the way to go, this doesn't really
work. First, even though the "postgres" backend job is properly killed, a "postmaster" job keeps running at 99% CPU,
whichis pretty useless.  Killing the client's backend didn't kill the process actually doing the work! 

Second, the "KILLING SESSION" message to stderr is only printed in the PG log file sporadically.  This confuses me,
sincethe "KILLING SESSION" is printed by a *different* process than the one being killed, so it shouldn't be affected.
Sowhat happens to fprintf()'s output?  Most of the time, I just get "unexpected EOF on client connection" in the log
whichis presumably the postmaster complaining that the postgres child process died. 

I know the kill_session() is working because it returns "true", and the job is in fact killed.  But the query keeps
runningin postmaster (or is it something else, like a rollback?), and the stderr output disappears. 

Thanks,
Craig

Re: Kill a session

From
Tino Wildenhain
Date:
Craig A. James wrote:
> There have been dozens, perhaps hundreds, of entries in the pg-admin,
> pg-general, and pg-performance lists regarding killing a session, but as
> far as I can tell, there is no Postgres solution.  Did I miss something?
>
> This raises the question: Why doesn't Postgres have a "kill session"
> command that works?  Oracle has it, and it's invaluable; there is no
> substitute.  Various writers to these PG lists have raised the question
> repeatedly.  Is it just a matter that nobody has had the time to do it
> (which I respect!), or is there a reason why the Postgres team decided a
> "kill session" is a bad idea?

You are sure you read:


http://www.postgresql.org/docs/8.1/interactive/protocol-flow.html#AEN60635

?


Regards
Tino Wildenhain

Re: Kill a session

From
"Magnus Hagander"
Date:
> There have been dozens, perhaps hundreds, of entries in the
> pg-admin, pg-general, and pg-performance lists regarding
> killing a session, but as far as I can tell, there is no
> Postgres solution.  Did I miss something?
>
> This raises the question: Why doesn't Postgres have a "kill
> session" command that works?  Oracle has it, and it's
> invaluable; there is no substitute.  Various writers to these
> PG lists have raised the question repeatedly.  Is it just a
> matter that nobody has had the time to do it (which I
> respect!), or is there a reason why the Postgres team decided
> a "kill session" is a bad idea?

[snip]

I beleive the function to kill a backend is actually in the codebase,
it's just commented out because it's considered dangerous. There are
some possible issues (see -hackers archives) about sending SIGTERM
without actually shutting down the whole cluster.

Doing the client-side function to call is the easy part.

In many cases you just need to cancel a query, in which case you can use
pg_cancel_backend() for exmaple. If you need to actually kill it, your
only supported way is to restart postgresql.

> But in spite earlier posting in these forums that say the
> killing the backend was the way to go, this doesn't really
> work.  First, even though the "postgres" backend job is
> properly killed, a "postmaster" job keeps running at 99% CPU,
> which is pretty useless.  Killing the client's backend didn't
> kill the process actually doing the work!

Then you killed the wrong backend...


> Second, the "KILLING SESSION" message to stderr is only
> printed in the PG log file sporadically.  This confuses me,
> since the "KILLING SESSION" is printed by a *different*
> process than the one being killed, so it shouldn't be
> affected.  So what happens to fprintf()'s output?  Most of
> the time, I just get "unexpected EOF on client connection" in
> the log which is presumably the postmaster complaining that
> the postgres child process died.

No, that's the postgres chlid process complaining that your client
(CGI?) died without sending a close message.


> I know the kill_session() is working because it returns
> "true", and the job is in fact killed.  But the query keeps
> running in postmaster (or is it something else, like a
> rollback?), and the stderr output disappears.

No queries run in postmaster. They all run in postgres backends. The
postmaster does very little actual work, other than keeping track of
everybody else.

//Magnus

Re: Kill a session

From
"Craig A. James"
Date:
Magnus Hagander wrote:
>> This raises the question: Why doesn't Postgres have a "kill
>> session" command that works?  Oracle has it, and it's
>> invaluable; there is no substitute.  Various writers to these
>> PG lists have raised the question repeatedly.  Is it just a
>> matter that nobody has had the time to do it (which I
>> respect!), or is there a reason why the Postgres team decided
>> a "kill session" is a bad idea?
>
> I beleive the function to kill a backend is actually in the codebase,
> it's just commented out because it's considered dangerous. There are
> some possible issues (see -hackers archives) about sending SIGTERM
> without actually shutting down the whole cluster.
>
> Doing the client-side function to call is the easy part.
>
> In many cases you just need to cancel a query, in which case you can use
> pg_cancel_backend() for exmaple. If you need to actually kill it, your
> only supported way is to restart postgresql.

In other words, are you confirming that there is no way to kill a query from another process, other than shutting down
thedatabase?  My understanding of the documentation tells me I can't use cancel, because the process doing the killing
isn'tthe original process. 

>> But in spite earlier posting in these forums that say the
>> killing the backend was the way to go, this doesn't really
>> work.  First, even though the "postgres" backend job is
>> properly killed, a "postmaster" job keeps running at 99% CPU,
>> which is pretty useless.  Killing the client's backend didn't
>> kill the process actually doing the work!
>
> Then you killed the wrong backend...
> No queries run in postmaster. They all run in postgres backends. The
> postmaster does very little actual work, other than keeping track of
> everybody else.

It turns out I was confused by this: ps(1) reports a process called "postgres", but top(1) reports a process called
"postmaster",but they both have the same pid.  I guess postmaster replaces its own name in the process table when it's
executinga query, and it's not really the postmaster even though top(1) calls it postmaster. 

So "kill -15 <pid>" is NOT killing the process -- to kill the process, I have to use signal 9.  But if I do that, ALL
queriesin progress are aborted.  I might as well shut down and restart the database, which is an unacceptable solution
fora web site. 

I'm back to my original question: How do you kill a runaway query without bringing down the whole database?  Is there
reallyno answer to this? 

Thanks,
Craig

Re: Kill a session

From
"Steinar H. Gunderson"
Date:
On Wed, Jul 12, 2006 at 08:43:18AM -0700, Craig A. James wrote:
>> Then you killed the wrong backend...
>> No queries run in postmaster. They all run in postgres backends. The
>> postmaster does very little actual work, other than keeping track of
>> everybody else.
>
> It turns out I was confused by this: ps(1) reports a process called
> "postgres", but top(1) reports a process called "postmaster", but they both
> have the same pid.  I guess postmaster replaces its own name in the process
> table when it's executing a query, and it's not really the postmaster even
> though top(1) calls it postmaster.
>
> So "kill -15 <pid>" is NOT killing the process -- to kill the process, I
> have to use signal 9.  But if I do that, ALL queries in progress are
> aborted.  I might as well shut down and restart the database, which is an
> unacceptable solution for a web site.

I don't follow your logic here. If you do "kill -15 <pid>" of the postmaster
doing the work, the query should be aborted without taking down the entire
cluster. I don't see why you'd need -9 (which is a really bad idea anyhow)...

> I'm back to my original question: How do you kill a runaway query without
> bringing down the whole database?  Is there really no answer to this?

Kill it with -15. If you're worried about your CGI scripts, use sudo or some
sort of client/server solution.

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: Kill a session

From
"Magnus Hagander"
Date:
> > I beleive the function to kill a backend is actually in the
> codebase,
> > it's just commented out because it's considered dangerous.
> There are
> > some possible issues (see -hackers archives) about sending SIGTERM
> > without actually shutting down the whole cluster.
> >
> > Doing the client-side function to call is the easy part.
> >
> > In many cases you just need to cancel a query, in which
> case you can
> > use
> > pg_cancel_backend() for exmaple. If you need to actually
> kill it, your
> > only supported way is to restart postgresql.
>
> In other words, are you confirming that there is no way to
> kill a query from another process, other than shutting down
> the database?  My understanding of the documentation tells me
> I can't use cancel, because the process doing the killing
> isn't the original process.

You can't kill another backend, no.
You can *cancel* a query on it and return it to idle state. See
http://www.postgresql.org/docs/8.1/interactive/functions-admin.html,
pg_cancel_backend().


> So "kill -15 <pid>" is NOT killing the process -- to kill the
> process, I have to use signal 9.  But if I do that, ALL
> queries in progress are aborted.  I might as well shut down
> and restart the database, which is an unacceptable solution
> for a web site.
>
> I'm back to my original question: How do you kill a runaway
> query without bringing down the whole database?  Is there
> really no answer to this?

Runaway queries can be killed with pg_cancel_backend(), or from the
commandline using kill -INT <pid>. The backend will still be around, but
it will have cancelled the query.

//Magnus

Re: Kill a session

From
Stefan Kaltenbrunner
Date:
Craig A. James wrote:
> Magnus Hagander wrote:
>>> This raises the question: Why doesn't Postgres have a "kill session"
>>> command that works?  Oracle has it, and it's invaluable; there is no
>>> substitute.  Various writers to these PG lists have raised the
>>> question repeatedly.  Is it just a matter that nobody has had the
>>> time to do it (which I respect!), or is there a reason why the
>>> Postgres team decided a "kill session" is a bad idea?
>>
>> I beleive the function to kill a backend is actually in the codebase,
>> it's just commented out because it's considered dangerous. There are
>> some possible issues (see -hackers archives) about sending SIGTERM
>> without actually shutting down the whole cluster.
>>
>> Doing the client-side function to call is the easy part.
>>
>> In many cases you just need to cancel a query, in which case you can use
>> pg_cancel_backend() for exmaple. If you need to actually kill it, your
>> only supported way is to restart postgresql.
>
> In other words, are you confirming that there is no way to kill a query
> from another process, other than shutting down the database?  My
> understanding of the documentation tells me I can't use cancel, because
> the process doing the killing isn't the original process.
>
>>> But in spite earlier posting in these forums that say the killing the
>>> backend was the way to go, this doesn't really work.  First, even
>>> though the "postgres" backend job is properly killed, a "postmaster"
>>> job keeps running at 99% CPU, which is pretty useless.  Killing the
>>> client's backend didn't kill the process actually doing the work!
>>
>> Then you killed the wrong backend...
>> No queries run in postmaster. They all run in postgres backends. The
>> postmaster does very little actual work, other than keeping track of
>> everybody else.
>
> It turns out I was confused by this: ps(1) reports a process called
> "postgres", but top(1) reports a process called "postmaster", but they
> both have the same pid.  I guess postmaster replaces its own name in the
> process table when it's executing a query, and it's not really the
> postmaster even though top(1) calls it postmaster.
>
> So "kill -15 <pid>" is NOT killing the process -- to kill the process, I
> have to use signal 9.  But if I do that, ALL queries in progress are
> aborted.  I might as well shut down and restart the database, which is
> an unacceptable solution for a web site.
>
> I'm back to my original question: How do you kill a runaway query
> without bringing down the whole database?  Is there really no answer to
> this?

are you maybe looking for pg_cancel_backend() ?

http://www.postgresql.org/docs/current/interactive/functions-admin.html#FUNCTIONS-ADMIN-SIGNAL-TABLE

Stefan

Re: Kill a session

From
Mark Kirkwood
Date:
Craig A. James wrote:
>
>
> I'm back to my original question: How do you kill a runaway query
> without bringing down the whole database?  Is there really no answer to
> this?
>

As others have mentioned, pg_cancel_backend(pid) will stop query
execution by backend process id 'pid'.

While this is often enough, if you actually want to disconnect a backend
process then there is nothing to let you do this remotely. I recently
did a patch for Bizgres that just implements the
pg_terminate_backend(pid) function (currently #ifdef'ed out of the
codebase) as a contrib so it can be easily installed. See
http://pgfoundry.org/pipermail/bizgres-general/2006-May/000484.html

If you want to try it out, please read the README (it discusses possible
dangers associated with sending SIGTERM to backends). And I would
certainly be interested in hearing what level of success (or otherwise)
you have with it!

Best wishes

Mark



Re: Kill a session

From
"Craig A. James"
Date:
Thanks for your reply, Mark:
>> I'm back to my original question: How do you kill a runaway query
>> without bringing down the whole database?  Is there really no answer
>> to this?
>
> ... if you actually want to disconnect a backend
> process then there is nothing to let you do this remotely. I recently
> did a patch for Bizgres that just implements the
> pg_terminate_backend(pid) function (currently #ifdef'ed out of the
> codebase) as a contrib so it can be easily installed. See
> http://pgfoundry.org/pipermail/bizgres-general/2006-May/000484.html

This answers my question.  I've finally got a statement in concrete terms, Postgres has no way to kill a backend
processvia an SQL statement.  "If Mark had to resort to this, then there is no other way." 

> If you want to try it out, please read the README (it discusses possible
> dangers associated with sending SIGTERM to backends). And I would
> certainly be interested in hearing what level of success (or otherwise)
> you have with it!

Thanks, but I've already implemented my own, which is essentially identical in concept to yours, but simpler in the
senseof being even less safe than yours -- I just let anyone send the signal, since I have no users other than my own
app. I'll keep my version since it's embedded in my own plug-in.  That way I won't have to keep remembering to modify
thePostgres code when I upgrade.  I like to keep Postgres "stock". 

Craig

Re: Kill a session

From
"Craig A. James"
Date:
Steinar H. Gunderson wrote:
> On Wed, Jul 12, 2006 at 08:43:18AM -0700, Craig A. James wrote:
>>> Then you killed the wrong backend...
>>> No queries run in postmaster. They all run in postgres backends. The
>>> postmaster does very little actual work, other than keeping track of
>>> everybody else.
>> It turns out I was confused by this: ps(1) reports a process called
>> "postgres", but top(1) reports a process called "postmaster", but they both
>> have the same pid.  I guess postmaster replaces its own name in the process
>> table when it's executing a query, and it's not really the postmaster even
>> though top(1) calls it postmaster.
>>
>> So "kill -15 <pid>" is NOT killing the process -- to kill the process, I
>> have to use signal 9.  But if I do that, ALL queries in progress are
>> aborted.  I might as well shut down and restart the database, which is an
>> unacceptable solution for a web site.
>
> I don't follow your logic here. If you do "kill -15 <pid>" of the postmaster
> doing the work, the query should be aborted without taking down the entire
> cluster. I don't see why you'd need -9 (which is a really bad idea anyhow)...

I've solved this mystery.  "kill -15" doesn't immediately kill the job -- it aborts the query, but it might take 15-30
secondsto clean up. 

This confused me, because the query I was using to test took about 30 seconds, so the SIGTERM didn't seem to make a
difference. But when I used a harder query, one that would run for 5-10 minutes, SIGTERM still stopped it after 15
seconds,which isn't great but it's acceptable.   

Bottom line is that I was expecting "instant death" with SIGTERM, but instead got an agonizing, drawn out -- but safe
--death of the query.  At least that's my deduction based on experiments.  I haven't dug into the source to confirm. 

Thanks everyone for your answers.  My "kill this query" feature is now acceptable.

Craig

Re: Kill a session

From
Tom Lane
Date:
"Craig A. James" <cjames@modgraph-usa.com> writes:
> Bottom line is that I was expecting "instant death" with SIGTERM, but
> instead got an agonizing, drawn out -- but safe -- death of the query.

What was the query exactly?

Our expectation is that all or at least most queries should respond to
SIGINT or SIGTERM interrupts pretty rapidly, say on a less-than-a-second
timescale.  However there are various loops in the backend that fail to
execute CHECK_FOR_INTERRUPTS sufficiently often :-(.  We've been
gradually finding and fixing these, and will be glad to fix your case
if you provide enough details to pin it down.  You might be interested
in this current thread about a similar problem:

http://archives.postgresql.org/pgsql-patches/2006-07/msg00039.php

            regards, tom lane

Re: Kill a session

From
Markus Schaber
Date:
Hi, Tom,

Tom Lane wrote:
> Our expectation is that all or at least most queries should respond to
> SIGINT or SIGTERM interrupts pretty rapidly, say on a less-than-a-second
> timescale.  However there are various loops in the backend that fail to
> execute CHECK_FOR_INTERRUPTS sufficiently often :-(.

The same is true for user-defined C funtions.

The PostGIS GEOS geometry functions come to mind, for complex
geometries, they can need hours to complete. And as GEOS is a 3rd-Party
library, I don't see an easy way to make them CHECK_FOR_INTERRUPTS.

Does anybody know how this is for plpgsql, pljava and plpython?


HTH,
Markus
--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

Re: Kill a session

From
"Craig A. James"
Date:
Tom Lane wrote:
> "Craig A. James" <cjames@modgraph-usa.com> writes:
>> Bottom line is that I was expecting "instant death" with SIGTERM, but
>> instead got an agonizing, drawn out -- but safe -- death of the query.
>
> What was the query exactly?
>
> Our expectation is that all or at least most queries should respond to
> SIGINT or SIGTERM interrupts pretty rapidly, say on a less-than-a-second
> timescale.  However there are various loops in the backend that fail to
> execute CHECK_FOR_INTERRUPTS sufficiently often :-(.  We've been
> gradually finding and fixing these, and will be glad to fix your case
> if you provide enough details to pin it down.  You might be interested
> in this current thread about a similar problem:
>
> http://archives.postgresql.org/pgsql-patches/2006-07/msg00039.php

Thanks, this is good information.  The qsort is a distinct possibility.  The query is a big

   insert into some_hitlist (select id from another_hitlist join data_table on (...))

where the hitlists are unindexed.  So it may be using a merge-join with qsort.  When I have a few minutes, I'll turn on
loggingin the app and find the exact SQL, and run an EXPLAIN ANALYZE and see what's really happening. 

It's also possible that the INSERT itself is the problem, or adds to the problem.  The SIGINT may come after a few
millionrows have been inserted, so it would have to clean that up, right? 

Thanks,
Craig


Re: Kill a session

From
Tom Lane
Date:
"Craig A. James" <cjames@modgraph-usa.com> writes:
> It's also possible that the INSERT itself is the problem, or adds to the problem.  The SIGINT may come after a few
millionrows have been inserted, so it would have to clean that up, right? 

No, because we don't use UNDO.  The next VACUUM would have a bit of a
mess to clean up, but you wouldn't pay for it at the time of the abort.

            regards, tom lane