Re: idle_in_transaction_timeout - Mailing list pgsql-hackers

From Robert Haas
Subject Re: idle_in_transaction_timeout
Date
Msg-id CA+TgmoZMJVwPs2nBj2vZsGbQhc5JLfyzbWtTSRU8e1o7FuvUrg@mail.gmail.com
Whole thread Raw
In response to Re: idle_in_transaction_timeout  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: idle_in_transaction_timeout
List pgsql-hackers
On Tue, Jun 3, 2014 at 5:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>> Out of curiosity, how much harder would it have been just to abort the
>> transaction?  I think breaking the connection is probably the right
>> behavior, but before folks start arguing it out, I wanted to know if
>> aborting the transaction is even a reasonable thing to do.
>
> FWIW, I think aborting the transaction is probably better, especially
> if the patch is designed to do nothing to already-aborted transactions.
> If the client is still there, it will see the abort as a failure in its
> next query, which is less likely to confuse it completely than a
> connection loss.  (I think, anyway.)

I thought the reason why this hasn't been implemented before now is
that sending an ErrorResponse to the client will result in a loss of
protocol sync.  Sure, when the client sends the next query, they'll
then read the ErrorResponse.  So far so good.  The problem is that
they *won't* read whatever we send back as a response to their query,
because they think they already have, when in reality they've only
read the ErrorResponse sent much earlier.  This, at least as I've
understood it, is the reason why recovery conflicts kill off idle
sessions entirely instead of just aborting the transaction.  Andres
tried to fix that problem a few years ago without much luck; the most
relevant post to this particular issue seems to be:

http://www.postgresql.org/message-id/23631.1292521603@sss.pgh.pa.us

Assuming that the state of play hasn't changed in some way I'm unaware
of since 2010, I think the best argument for FATAL here is that it's
what we can implement.  I happen to think it's better anyway, because
the cases I've seen where this would actually be useful involve
badly-written applications that are not under the same administrative
control as the database and supposedly have built-in connection
poolers, except sometimes they seem to forget about some of the
connections they themselves opened.  The DBAs can't make the app
developers fix the app; they just have to try to survive.  Aborting
the transaction is a step in the right direction but since the guy at
the other end of the connection is actually or in effect dead, that
just leaves you with an eternally idle connection. Now we can say "use
TCP keepalives for that" but that only helps if the connection has
actually been severed; if the guy at the other end is still
technically there but is too brain-damaged to actually try to *use*
the connection for anything, it doesn't help.  Also, as I recently
discovered, there are still a few machines out there that don't
actually support TCP keepalives on a per-connection basis; you can
only configure it system-wide, and that's not always granular enough.

Anyway, if somebody really wants to go to the trouble of finding a way
to make this work without killing off the connection, I think that
would be cool and useful and whatever technology we develop there
could doubtless could be applied to other situations.  But I have a
nervous feeling that might be a hard enough problem to sink the whole
patch, which would be a shame, since the cases I've actually
encountered would be better off with FATAL anyway.

Just my $0.019999999999997 to go with Josh's $0.019999999999998.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Vik Fearing
Date:
Subject: Re: idle_in_transaction_timeout
Next
From: Peter Eisentraut
Date:
Subject: Perl coding error in msvc build system?