Thread: Re: [HACKERS] Function to kill backend

Re: [HACKERS] Function to kill backend

From
"Magnus Hagander"
Date:
>> That's not quite the argument I think I had :-) But withuot
>being able
>> to kill the backends, there just no way for me to handle the
>sitaution
>> when I have a hundred clients eating up all available
>connections and/or
>> memory, just sitting idle, because of some freak bug in a
>client. Yes,
>> if they keep reconnecting it will not save me. But if it's
>just a client
>> that does a new connect say every hour and then forgets to
>close the old
>> one, I can easily manage the situation (until I can fix the
>client. Or,
>> which probably takes a lot more time, convince a vendor that this is
>> actually a problem in the client).
>
>How often does that situation really come up?  This sounds more like
>a theoretical concern than something that actually happens on a regular
>basis.

Regularly, no. Has it happened to me more than one, yes. Enough to care
about it, definitly.


>Would you use a kill operation in the way you describe above
>if you knew
>that it had, say, a 1% chance of causing a database-wide PANIC
>each time
>you used it?

No, I don't think so.
(I've been using it so far without having it PANIC, but that's no
guarantee of course. But it's probably <<1%, from my unplanned testing.)


>The odds of a problem are probably a great deal less than 1%,
>especially
>if the backend is sitting idle.  But they're not nil, and I don't think
>we have the resources to make them nil in this release cycle.
>Therefore I'm uneager to provide this feature simply because of "it
>might be nice to have" arguments.  There's a lot of other stuff that is
>higher on the priority list, IMHO anyway.

Right. I agree there are definitly higher priority things. The query
cancel function is definitly one I'd use a lot more often.

But I certainly think a *lot* of people have been doing this, using the
kill command locally on the server. If nothing else, we need to add a
tip along the line of "don't kill -9 the postmaster" reading "don't kill
-anything a backend". I certainly wasn't aware it was dangerous, and I
know at least a couple of others who don't know it either. If we can't
fix it, we need to let people know.

This leaves the only supported way to get rid of a client connection
shutting down and restarting the entire database, right? Or am I missing
some other way?


>> Don't know exactly what Bruce's patch did, but perhaps if this
>> restriction can be put on it the dangerous parts of the patch can be
>> reverted without removing the capability to terminate a
>backend that is
>> idle?
>
>You have that backwards.  The dangerous part is killing a
>single backend.
>Bruce's patch was an unsuccessful attempt to make it less dangerous.

Right, I got part of it. What I was going for was if we could make it
safe to kill it when it was sitting idle. But from what you write above,
I assume this is not easy either.


>Note also that what's at stake is not whether you can do this at all.
>You can issue manual SIGTERMs all you like.  What's at stake is whether
>we promote the feature into an easy-to-use, presumably supported
>operation.  My real complaint here is that I don't think we
>are prepared to support it.

I agree that if we have a function for it, it definitly has to be
supported. If it's unsafe, the function should be removed until it can
be made safe. But we also need to inform the users that the manual way
of sending the signal is *also* dangerous in that case.

I assume we can't block it, because we can't distinguish betwene a
manual one and one caused by system shutdown?

//Magnus

Re: [HACKERS] Function to kill backend

From
Andreas Pflug
Date:
>>Would you use a kill operation in the way you describe above
>>if you knew
>>that it had, say, a 1% chance of causing a database-wide PANIC
>>each time
>>you used it?

Seems there's the need for some connection killing functionality. If
it's not present, the whole cluster needs to be shut down, which makes
it unavailable with 100 % chance.

If there's a .00001 % chance it *corrupts* the cluster, the function is
not acceptable. But iff it's a good chance to keep the cluster running,
it's worth having it (and should be used sensibly).

Regards,
Andreas


Re: [HACKERS] Function to kill backend

From
Tom Lane
Date:
Andreas Pflug <pgadmin@pse-consulting.de> writes:
> If there's a .00001 % chance it *corrupts* the cluster, the function is
> not acceptable.

See my response to Dave Page just now.  Not only wouldn't I give you
those odds today, but I don't think we could ever get to the point of
saying that session kill is that reliable, at least not from our
ordinary methods of field testing.  It'd require significant focused
code review and testing to acquire such confidence, and continuing
effort to make sure we didn't break it again in the future.

If we had infinite manpower I'd be happy to delegate a developer or
three to stay on top of this particular issue.  But we don't :-(

            regards, tom lane