Thread: Re: Safely Killing Backends (Was: Applications that leak connections)

Re: Safely Killing Backends (Was: Applications that leak connections)

From
Jim Wilson
Date:
> On Fri, Feb 04, 2005 at 01:44:10PM -0600, Thomas F.O\'Connell wrote:
> > Is there any stronger medicine that\'s available (for instance, when
a
> > backend won\'t respond to SIGTERM) and has no unfortunate side
effects?
> > I just ran into this situation the other day (and made the
unfortunate
> > discovery that SIGABRT is as bad as SIGKILL as far as a postmaster
is
> > concerned).
>
> As soon as a backend dies a unnatural death, postmaster will rightly
> consider that it may have corrupted the shared state.  In turn
> postmaster will kill all its children mercilessly so they don\'t
spread
> the disease.
>
> Even SIGTERM can have bad consequences if it arrives at the wrong
time.
> (That\'s why a function to close a remote connection was rejected.)
>
> So, short answer: no.

This could be better than what is however.  Management would be easier
if
there was a way to trigger a series of behaviors on a given signal to a
child:

The child (1) cancels and rollbacks any transactions it has open,
(2) enters a mode where it attempts to communicate with the client
and failing so does an orderly connection close.

I would never go back to them, but I can say that the Sybase SQL Studio
servers where much
easier to manage in this regard.  If you are not very careful about how
you
handle orphaned connections in Postgres you will likely lose data....not
"maybe" like
a long shot...but "likely".  I suppose if your data is fairly static
(e.g. website cms) then this would not happen often,  but anything with
a lot of tansactions it will.

The best protection is to do extensive testing with any application you
use or
develop, but that\'s not possible for everyone to do a sufficient amount
of testing to
avoid some of these issues.

If I was submitting patches for Postgres I\'d push a little harder,  and
if I were, this
problem would be at the top of my list as things to fix in Postgres.

Best regards,

Jim Wilson



Re: Safely Killing Backends (Was: Applications that leak connections)

From
Tom Lane
Date:
Jim Wilson <jimw@kelcomaine.com> writes:
> If you are not very careful about how you handle orphaned connections
> in Postgres you will likely lose data....not "maybe" like a long
> shot...but "likely".

[ raised eyebrow ... ]  Say again?  I don't know of any reason why a
lost connection would cause loss of (successfully committed) transactions.
Not even if a DBA with an itchy "kill -9" trigger finger is in charge of
cleaning up the lost connections.  Please describe the scenarios you've
had problems with.

            regards, tom lane

Re: Safely Killing Backends (Was: Applications that leak connections)

From
Jim Wilson
Date:
> Jim Wilson <jimw@kelcomaine.com> writes:
> > If you are not very careful about how you handle orphaned
connections
> > in Postgres you will likely lose data....not "maybe" like a long
> > shot...but "likely".
>
> [ raised eyebrow ... ]  Say again?  I don\'t know of any reason why a
> lost connection would cause loss of (successfully committed)
transactions.
> Not even if a DBA with an itchy "kill -9" trigger finger is in charge
of
> cleaning up the lost connections.  Please describe the scenarios
you\'ve
> had problems with.
>
>             regards, tom lane
>

Well a couple things... One is I\\\'m talking about 7.3.x. We\\\'ll be
moving our servers up to 7.4.x before the spring, but that\\\'s where
these observations have been and maybe there are certain issues at that
level. It is sometimes difficult to track transaction related issues
down anyway, but I can say that in testing and earlier deployment we
saw some things that did not look good and in practice we are very
carefull dealing with lost connection issues. No kill -9 trigger
fingers.

Rather than getting into the raised eyebrow thing ;-), I\\\'d suggest
checking your "qualifiers". Consider that with Postgres, if killing a
single connection brings the whole server down, you will loose _all_
uncommitted data. If you did not, then I would call that a bug. The
weakness is not in the data integrity (directly), it is in the
integrity
of the server processes and their managability.

Best regards,

Jim



Re: Safely Killing Backends (Was: Applications that leak connections)

From
Alvaro Herrera
Date:
On Fri, Feb 04, 2005 at 05:01:43PM -0500, Jim Wilson wrote:

> Rather than getting into the raised eyebrow thing ;-), I\\\'d suggest
> checking your "qualifiers". Consider that with Postgres, if killing a
> single connection brings the whole server down, you will loose _all_
> uncommitted data. If you did not, then I would call that a bug. The
> weakness is not in the data integrity (directly), it is in the
> integrity of the server processes and their managability.

Are you saying that your applications regularly leave uncommitted
transactions for long periods of time?  That sounds like bugs in your
applications to me.

Maybe I didn't get the part about lost connections.  Do you mean that
you applications lose conectivity to the server, and thus the
transaction they were working with are lost?  If that's the case, then
it certainly sounds dangerous to commit whatever was there; what if the
transaction was incomplete?  Of course, if you can't commit it, the only
way to proceed is to roll it back.


What's with the backslashes anyway?

--
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
Y dijo Dios: "Que sea Satanás, para que la gente no me culpe de todo a mí."
"Y que hayan abogados, para que la gente no culpe de todo a Satanás"

Re: Safely Killing Backends (Was: Applications that leak connections)

From
Tom Lane
Date:
Jim Wilson <jimw@kelcomaine.com> writes:
> Rather than getting into the raised eyebrow thing ;-), I\\\'d suggest
> checking your "qualifiers". Consider that with Postgres, if killing a
> single connection brings the whole server down, you will loose _all_
> uncommitted data. If you did not, then I would call that a bug.

I would too.  So what's your complaint exactly?  That kill -9'ing one
backend takes out all your current uncommitted transactions and not just
the one?  I don't regard that as data loss: uncommitted is uncommitted.

            regards, tom lane

Re: Safely Killing Backends (Was: Applications that leak connections)

From
Jim Wilson
Date:
> Jim Wilson <jimw@kelcomaine.com> writes:
> > Rather than getting into the raised eyebrow thing , I\\\\\\\'d
suggest
> > checking your "qualifiers". Consider that with Postgres, if killing
a
> > single connection brings the whole server down, you will loose
_all_
> > uncommitted data. If you did not, then I would call that a bug.
>
> I would too.  So what\'s your complaint exactly?  That kill -9\'ing
one
> backend takes out all your current uncommitted transactions and not
just
> the one?  I don\'t regard that as data loss: uncommitted is
uncommitted.
>
>             regards, tom lane

We\'d like to be able to take out a connection without risking
postmaster going
down and thus losing all uncommitted data.

Unfortunately the whole world isn\'t encapsulated in Postgres
transactions or we
would never have to worry about uncommitted data.  Just because your
tables are
in sync, and your applications are designed in a way that coordinates
with real
world events and/or multiple entity transactions (e.g. electronic
payments) as well
as possible, doesn\'t mean that data integrity isn\'t put at risk by the
loss of
uncommitted information.

This "problem", which is always there for the application designer, is
exasperated
when the whole server goes down because of an admin who, as delicately
as possible,
is just trying to remove an orphaned connection that\'s blocking
transactions.  Would
I stop using Postgres if this was never changed?  No way.  Do I think it
could be
better than it is?  Absolutely.  For our purposes,  it would be more
important than
some things like further query optimization, etc., that would be at the
top of other
user\'s lists.

Best regards,

Jim



Re: Safely Killing Backends (Was: Applications that leak connections)

From
Jim Wilson
Date:
> On Fri, Feb 04, 2005 at 05:01:43PM -0500, Jim Wilson wrote:
>
> > Rather than getting into the raised eyebrow thing , I\\\\\\\'d
suggest
> > checking your "qualifiers". Consider that with Postgres, if killing
a
> > single connection brings the whole server down, you will loose
_all_
> > uncommitted data. If you did not, then I would call that a bug. The
> > weakness is not in the data integrity (directly), it is in the
> > integrity of the server processes and their managability.
>
> Are you saying that your applications regularly leave uncommitted
> transactions for long periods of time?  That sounds like bugs in your
> applications to me.
>

I never said that.

<snip>
>
> What\'s with the backslashes anyway?
>

Well, I\'m beta testing an email client.  Good question! :-)

Best,

Jim Wilson



Re: Safely Killing Backends (Was: Applications that leak connections)

From
Marco Colombo
Date:
On Fri, 4 Feb 2005, Jim Wilson wrote:

>>             regards, tom lane
>
> We\'d like to be able to take out a connection without risking
> postmaster going
> down and thus losing all uncommitted data.
>
> Unfortunately the whole world isn\'t encapsulated in Postgres
> transactions or we
> would never have to worry about uncommitted data.  Just because your
> tables are
> in sync, and your applications are designed in a way that coordinates
> with real
> world events and/or multiple entity transactions (e.g. electronic
> payments) as well
> as possible, doesn\'t mean that data integrity isn\'t put at risk by the
> loss of
> uncommitted information.

I'm sorry, but I cannot make any sense out of the last sentence.
By definition, data integrity refers only to committed data. Uncommitted
data is not data at all, for both the parties involved. The server may
receive a rollback anytime (which is not a delete!) and the client
may get an error from the server, even at commit time (the last step).
The network may go down in the middle of it. _Anything_ may happen.
If your application is not able to handle a failure _before_ commit time,
you should fix it.

> This "problem", which is always there for the application designer, is
> exasperated
> when the whole server goes down because of an admin who, as delicately
> as possible,
> is just trying to remove an orphaned connection that\'s blocking
> transactions.  Would
> I stop using Postgres if this was never changed?  No way.  Do I think it
> could be
> better than it is?  Absolutely.  For our purposes,  it would be more
> important than
> some things like further query optimization, etc., that would be at the
> top of other
> user\'s lists.

I think that the most important thing, for your purpose, as an "application
designer", is to fix your application. There's no such a thing as
"uncommitted data", it's not an intermediate state, it's not some
kind of "less important" data. Let T be the commit time, before T
there's _nothing_ and after T there's _everything_ (all of it).
That's the A in ACID.

Your application should handle failures in the middle of a transaction,
connection failures included, in a graceful but correct way.

I see your point (being able to safely shut a connection down on the
server side), but it's at the _bottom_ of any list.

.TM.
--
       ____/  ____/   /
      /      /       /            Marco Colombo
     ___/  ___  /   /              Technical Manager
    /          /   /             ESI s.r.l.
  _____/ _____/  _/               Colombo@ESI.it

Re: Safely Killing Backends (Was: Applications that leak connections)

From
Jim Wilson
Date:
>
> Your application should handle failures in the middle of a
transaction,
> connection failures included, in a graceful but correct way.

It does very well, until the next bug is discovered.

>
> I see your point (being able to safely shut a connection down on the
> server side), but it\'s at the _bottom_ of any list.
>
> .TM.
> --
>        /  /   /
>       /      /       /            Marco Colombo

That\'s unfortunate.  I\'ve tried to explain my position off list to
Marco,
but it really isn\'t worth debating.  FWIW I think this thread was
started
by someone with application issues.  The fact is, such things happen.

Unfortunately Marco choses speaks for "any list" and I\'ll just
repeat that I find this instability issue the most significant drawback

for Postgres installations.  This doesn\'t mean that there aren\'t other
areas
of priority for other users.  And no, I do not want to debate the
meaning
of the word "instability". :-)

Best regards,

Jim Wilson



Re: Safely Killing Backends (Was: Applications that leak connections)

From
Marco Colombo
Date:
On Tue, 8 Feb 2005, Jim Wilson wrote:

>>
>> Your application should handle failures in the middle of a
> transaction,
>> connection failures included, in a graceful but correct way.
>
> It does very well, until the next bug is discovered.
>
>>
>> I see your point (being able to safely shut a connection down on the
>> server side), but it\'s at the _bottom_ of any list.
>>
>> .TM.
>> --
>>        /  /   /
>>       /      /       /            Marco Colombo
>
> That\'s unfortunate.  I\'ve tried to explain my position off list to
> Marco,
> but it really isn\'t worth debating.  FWIW I think this thread was
> started
> by someone with application issues.  The fact is, such things happen.
>
> Unfortunately Marco choses speaks for "any list" and I\'ll just
> repeat that I find this instability issue the most significant drawback
>
> for Postgres installations.  This doesn\'t mean that there aren\'t other
> areas
> of priority for other users.  And no, I do not want to debate the
> meaning
> of the word "instability". :-)
>
> Best regards,
>
> Jim Wilson

As I wrote in private mail, authenticated clients have many means to
perform a DoS attack (whether intentionally or not). Most of cases
can be handled only with a server restart. To put simply, PostgreSQL
is not designed to handle hostile clients well.

IMHO, a friendly enviroment (client behaviour) is a safe assumption
for a RDBMS. It's not its job to paperbag over application bugs.

Anyway, I agree in ending this thread.
I recognize we have different meanings for "instability" and "data loss".

.TM.
--
       ____/  ____/   /
      /      /       /            Marco Colombo
     ___/  ___  /   /              Technical Manager
    /          /   /             ESI s.r.l.
  _____/ _____/  _/               Colombo@ESI.it

Re: Safely Killing Backends (Was: Applications that leak connections)

From
Martijn van Oosterhout
Date:
On Tue, Feb 08, 2005 at 07:31:13AM -0500, Jim Wilson wrote:
> That\'s unfortunate.  I\'ve tried to explain my position off list to
> Marco, but it really isn\'t worth debating.  FWIW I think this thread
> was started by someone with application issues.  The fact is, such
> things happen.

Well, I read the thread on pg-hackers [1] about this being a bad idea
currently and the issue seems to be:

1. The SIGTERM is the same as a FATAL error and this code path has not
been very well tested. Are locks, etc all correctly removed? The only
cases that *are* well tested are cases where these things don't matter.

In other words, it will probably work fine, but it's not so well tested
that the pg hackers are willing to bless a backend function
implementing it.

2. If the backend is so stuck that SIGTERM isn't working, then I guess
that's a bug but not enough examples have been collected to work out
the problem.  In this case you probably can't exit without considering
the shared memory corrupt.

3. In theory it would be nice to have a "cancel then exit" signal, but
we're clean out of signal numbers.

4. It appears the original person had a problem with not tracking used
resources properly in a language that neither garbage-collects nor
reference-counts. If you know you only ever want to open one connection
you can solve this problem by creating an open_connection function
which checks a global variable to see if a connection has already been
opened and returns the same one if it has.

> Unfortunately Marco choses speaks for "any list" and I\'ll just
> repeat that I find this instability issue the most significant
> drawback for Postgres installations.  This doesn\'t mean that there
> aren\'t other areas of priority for other users.  And no, I do not
> want to debate the meaning of the word "instability". :-)

I guess it appears on the list of anybody who regularly deals with this
problem. That list appears to be mutally exclusive with anyone who can
fix it...

I wonder how one would test the SIGTERM path anyway... To quote Tom
Lane on chances of corruption [2]:

> Not only wouldn't I give you those odds today, but I don't think we
> could ever get to the point of saying that session kill is that
> reliable, at least not from our ordinary methods of field testing.
> It'd require significant focused code review and testing to acquire
> such confidence, and continuing effort to make sure we didn't break
> it again in the future.
>
> If we had infinite manpower I'd be happy to delegate a developer or
> three to stay on top of this particular issue.  But we don't :-(

I don't know if PostgreSQL has ever had the concept of bounties for
stuff. It's an interesting idea...

[1] http://archives.postgresql.org/pgsql-patches/2004-07/msg00457.php
[2] http://archives.postgresql.org/pgsql-patches/2004-07/msg00480.php

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment