Thread: Re: [HACKERS] Function to kill backend

Re: [HACKERS] Function to kill backend

From
"Dave Page"
Date:

-----Original Message-----
From: pgsql-patches-owner@postgresql.org on behalf of Magnus Hagander
Sent: Sun 7/25/2004 12:07 PM
To: Tom Lane; Bruce Momjian
Cc: Josh Berkus; PostgreSQL-patches
Subject: Re: [PATCHES] [HACKERS] Function to kill backend

> >much further.  I recall being voted down though ...

> That's not quite the argument I think I had :-) But withuot being able
> to kill the backends, there just no way for me to handle the sitaution
> when I have a hundred clients eating up all available connections and/or
> memory, just sitting idle, because of some freak bug in a client.

The first time I used it was for precisely this reason - some buggy PHP code opened hundreds of connections to a dev
serverwhich then remained open doing nothing except wasting resources. It was particularly useful in that case as I
didn'thave access to the web server at the time. 

Shortly afterwards I added support to pgAdmin's server status tool which has proven quite handy (although I will admit,
mainlyfor canceling ather than terminating). 

I don't know the details of how it works, but is it any worse/better than 'kill -9' (which iirc is no longer considered
anabsolute no-no)? 

Regards, Dave

Re: [HACKERS] Function to kill backend

From
Christopher Kings-Lynne
Date:
> The first time I used it was for precisely this reason - some buggy PHP code opened hundreds of connections to a dev
serverwhich then remained open doing nothing except wasting resources. It was particularly useful in that case as I
didn'thave access to the web server at the time. 
>
> Shortly afterwards I added support to pgAdmin's server status tool which has proven quite handy (although I will
admit,mainly for canceling ather than terminating). 

Yeah, I've added the kill and cancel commands to phppgadmin.  I'm happy
if kill is removed though, i don't want my newbie users panicing their
machines.

phpmyadmin has both kill and cancel since they're sql commands in mysql.

Chris


Re: [HACKERS] Function to kill backend

From
Tom Lane
Date:
"Dave Page" <dpage@vale-housing.co.uk> writes:
> I don't know the details of how it works, but is it any worse/better
> than 'kill -9' (which iirc is no longer considered an absolute no-no)?

What I've been trying to remind people of is that killing just a single
backend with SIGTERM is not the normal code path and can't be considered
well-tested.  We know it works to shut down an entire cluster with
simultaneous SIGTERMs.  However, in that situation the only correctness
requirement is that the final database state on disk be consistent.
We don't really *know* what state is being left behind in the shared
memory segment, because shmem just gets thrown away.  It could be that
sometimes some locks don't get released, or in other ways a SIGTERM'd
backend fails to clean up after itself fully.

In comparison, the query-cancel code path is nearly indistinguishable
from any ordinary elog(ERROR).  We can also have confidence that kill -9
on an individual backend is not going to screw things terribly, because
that simulates a backend crash, and the recovery path for that has been
(ahem) tested pretty frequently over the years.  Note also that in the
kill -9 case, again only the final database state on disk matters, not
the condition of shared memory.

Another way to look at this is that elog(FATAL) in general is not a well
tested code path, because it just hardly ever happens in the field.
The only elog(FATAL)s that get exercised with any regularity are the
ones that reject a connection request during authentication, and those
all occur *before* the backend has become a full-fledged backend and
acquired any resources it might need to release.  The only elog(FATAL)
calls in an up-and-running backend are for "can't happen" conditions,
and by and large indeed those don't happen.

So what it comes down to is that we can put this feature out there
if we choose, but we'd be fooling ourselves to think we can consider it
reliable.  Moreover, since the kinds of cases where you'd use a session
kill don't arise every day, I don't think we could say we'd acquire any
confidence in it over time either.  It'd always remain a little-used
corner of the code, and little-used corners tend to gather bit rot.

If you don't mind plastering a "use at your own risk" sign on it, then
go for it.

            regards, tom lane

Re: [HACKERS] Function to kill backend

From
Andreas Pflug
Date:
Tom Lane wrote:

>
> If you don't mind plastering a "use at your own risk" sign on it, then
> go for it.

killing a backend is obviously much more "at your own risk" than a
descent function.

Taken from your mail, I understand that a killed backend might leave
some loose ends, eg. open locks, which would degrade the cluster's
performance. Still, it should not corrupt the shared mem, just leave it
as if the backend's still alive and sleeping, right?

You'd kill a backend only if your complete cluster is suffering from it,
and you hope to keep it running by just shooting that process. If the
cluster still has that uncleaned locks or so, you're unlucky and need to
shutdown the cluster.

Maybe we should supply a restricted version of pg_terminate_backend
that's callable from admin interfaces only so we can make sure that the
user was warned what he's doing before the termination is executed,
something like that:

ticket := select pg_admin_ticket();
/* calculate well-known stuff on ticket
    and issue before it times out */
select pg_terminate_backend(ticket_hash);

Regards,
Andreas

Re: [HACKERS] Function to kill backend

From
Tom Lane
Date:
Andreas Pflug <pgadmin@pse-consulting.de> writes:
> Taken from your mail, I understand that a killed backend might leave
> some loose ends, eg. open locks, which would degrade the cluster's
> performance. Still, it should not corrupt the shared mem, just leave it
> as if the backend's still alive and sleeping, right?

Well, I was citing that as an example of the sort of trouble that is
foreseeable; I don't say either that it would happen, or that it's the
only bad thing that could happen.  But having backends block on locks
that will never be released sure seems like something that would look
like database corruption to the average DBA.

If you want to put in the function and document that it may cause
problems, I won't object; it's not like we don't have other features
that are poorly implemented :-(.  But my vote would be to remove it.

            regards, tom lane

Re: [HACKERS] Function to kill backend

From
Christopher Kings-Lynne
Date:
> If you want to put in the function and document that it may cause
> problems, I won't object; it's not like we don't have other features
> that are poorly implemented :-(.  But my vote would be to remove it.

I'm down with removing it - people don't read documentation :/

Chris

Re: [HACKERS] Function to kill backend

From
Oliver Jowett
Date:
Andreas Pflug wrote:
> Tom Lane wrote:
>
>>
>> If you don't mind plastering a "use at your own risk" sign on it, then
>> go for it.
>
>
> killing a backend is obviously much more "at your own risk" than a
> descent function.
>

[...]

What about implementing "kill" as "cancel then exit"? Does that
guarantee a safe exit in all cases?

It wouldn't catch *all* the cases where you want to kill a backend, just
the ones where the backend is in a cancellable state, but it seems to me
that the main usecase is killing an otherwise idle backend that the
client doesn't want to let go of. And if the backend isn't cancellable
for an extended period, you probably have other problems anyway.

-O

Re: [HACKERS] Function to kill backend

From
Tom Lane
Date:
Oliver Jowett <oliver@opencloud.com> writes:
> What about implementing "kill" as "cancel then exit"? Does that
> guarantee a safe exit in all cases?

That was exactly what Bruce's patch turned it into.  That would be
workable if we separated this case from the existing elog(FATAL)
behavior, but doing it that way is quite unsafe for real FATAL errors,
and I do not think we want SIGTERM response to behave that way either.
(When init SIGTERMs us, we do *not* want to lollygag around, we want
to get the heck out of there so we can write a shutdown checkpoint
before we get SIGKILL'd.)

So what you'd basically need is a separate signal to trigger that sort
of exit, which would be easy ... if we had any spare signal numbers.

            regards, tom lane

Re: [HACKERS] Function to kill backend

From
Oliver Jowett
Date:
Tom Lane wrote:

> So what you'd basically need is a separate signal to trigger that sort
> of exit, which would be easy ... if we had any spare signal numbers.

What about multiplexing it onto an existing signal? e.g. set a
shared-mem flag saying "exit after cancel" then send SIGINT?

I guess this is getting away from the original patch though..

-O

Re: [HACKERS] Function to kill backend

From
Tom Lane
Date:
Oliver Jowett <oliver@opencloud.com> writes:
> Tom Lane wrote:
>> So what you'd basically need is a separate signal to trigger that sort
>> of exit, which would be easy ... if we had any spare signal numbers.

> What about multiplexing it onto an existing signal? e.g. set a
> shared-mem flag saying "exit after cancel" then send SIGINT?

Possible, but then the *only* way to get the behavior is by using the
backend function --- you couldn't use dear old kill(1) to do it
manually.  It'd be better if it mapped to a signal.

            regards, tom lane

Re: [HACKERS] Function to kill backend

From
Bruce Momjian
Date:
Tom Lane wrote:
> Oliver Jowett <oliver@opencloud.com> writes:
> > Tom Lane wrote:
> >> So what you'd basically need is a separate signal to trigger that sort
> >> of exit, which would be easy ... if we had any spare signal numbers.
>
> > What about multiplexing it onto an existing signal? e.g. set a
> > shared-mem flag saying "exit after cancel" then send SIGINT?
>
> Possible, but then the *only* way to get the behavior is by using the
> backend function --- you couldn't use dear old kill(1) to do it
> manually.  It'd be better if it mapped to a signal.

And what happens if a FATAL comes while it is procesing a signal meant
for termination?  It wouldn't exit fast enough --- bad.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073