Thread: idle_in_transaction_timeout

idle_in_transaction_timeout

From

Vik Fearing

Date:

03 June 2014, 13:07:31

This patch implements a timeout for broken clients that idle in transaction.

This is a TODO item.

When the timeout occurs, the backend commits suicide with a FATAL
ereport.  I thought about just aborting the transaction to free the
locks but decided that the client is clearly broken so might as well
free up the connection as well.

The same behavior can be achieved by an external script that monitors
pg_stat_activity but by having it in core we get much finer tuning (it
is session settable) and also traces of it directly in the PostgreSQL
logs so it can be graphed by things like pgbadger.

Unfortunately, no notification is sent to the client because there's no
real way to do that right now.

--
Vik

Attachment

iitt.v1.patch

Re: idle_in_transaction_timeout

From

Abhijit Menon-Sen

Date:

03 June 2014, 13:30:52

At 2014-06-03 15:06:11 +0200, vik.fearing@dalibo.com wrote:
>
> This patch implements a timeout for broken clients that idle in
> transaction.

I think this is a nice feature, but I suggest that (at the very least)
the GUC should be named "idle_transaction_timeout".

> +        <para>
> +         Terminate any session that is idle in transaction for longer than the specified
> +         number of seconds.  This not only allows any locks to be released, but also allows
> +         the connection slot to be reused.  However, aborted idle in transaction sessions
> +         are not affected.  A value of zero (the default) turns this off.
> +        </para>

I suggest:
   Terminate any session with an open transaction that has been idle   for longer than the specified duration in
seconds.This allows any   locks to be released and the connection slot to be reused.

It's not clear to me what "However, aborted idle in transaction sessions
are not affected" means.
   The default value of 0 means that such sessions will not be   terminated.

-- Abhijit

Re: idle_in_transaction_timeout

From

Vik Fearing

Date:

03 June 2014, 14:08:07

On 06/03/2014 03:30 PM, Abhijit Menon-Sen wrote:
> At 2014-06-03 15:06:11 +0200, vik.fearing@dalibo.com wrote:
>> This patch implements a timeout for broken clients that idle in
>> transaction.
> I think this is a nice feature, but I suggest that (at the very least)
> the GUC should be named "idle_transaction_timeout".

I prefer the way I have it, but not enough to put up a fight if other
people like your version better.

>> +        <para>
>> +         Terminate any session that is idle in transaction for longer than the specified
>> +         number of seconds.  This not only allows any locks to be released, but also allows
>> +         the connection slot to be reused.  However, aborted idle in transaction sessions
>> +         are not affected.  A value of zero (the default) turns this off.
>> +        </para>
> I suggest:
>
>     Terminate any session with an open transaction that has been idle
>     for longer than the specified duration in seconds. This allows any
>     locks to be released and the connection slot to be reused.
>
> It's not clear to me what "However, aborted idle in transaction sessions
> are not affected" means.
>
>     The default value of 0 means that such sessions will not be
>     terminated.

How about this?
   Terminate any session with an open transaction that has been idle   for longer than the specified duration in
seconds.This allows any   locks to be released and the connection slot to be reused.
 
   Sessions in the state "idle in transaction (aborted)" occupy a   connection slot but because they hold no locks,
theyare not   considered by this parameter.
 
   The default value of 0 means that such sessions will not be   terminated.
-- 
Vik

Re: idle_in_transaction_timeout

From

David G Johnston

Date:

03 June 2014, 18:32:50

Vik Fearing wrote
> On 06/03/2014 03:30 PM, Abhijit Menon-Sen wrote:
>> At 2014-06-03 15:06:11 +0200, 

> vik.fearing@

>  wrote:
>>> This patch implements a timeout for broken clients that idle in
>>> transaction.
>> I think this is a nice feature, but I suggest that (at the very least)
>> the GUC should be named "idle_transaction_timeout".
> 
> I prefer the way I have it, but not enough to put up a fight if other
> people like your version better.
> 
>>> +        
> <para>
>>> +         Terminate any session that is idle in transaction for longer
>>> than the specified
>>> +         number of seconds.  This not only allows any locks to be
>>> released, but also allows
>>> +         the connection slot to be reused.  However, aborted idle in
>>> transaction sessions
>>> +         are not affected.  A value of zero (the default) turns this
>>> off.
>>> +        
> </para>
>> I suggest:
>>
>>     Terminate any session with an open transaction that has been idle
>>     for longer than the specified duration in seconds. This allows any
>>     locks to be released and the connection slot to be reused.
>>
>> It's not clear to me what "However, aborted idle in transaction sessions
>> are not affected" means.
>>
>>     The default value of 0 means that such sessions will not be
>>     terminated.
> 
> How about this?
> 
>     Terminate any session with an open transaction that has been idle
>     for longer than the specified duration in seconds. This allows any
>     locks to be released and the connection slot to be reused.
> 
>     Sessions in the state "idle in transaction (aborted)" occupy a
>     connection slot but because they hold no locks, they are not
>     considered by this parameter.
> 
>     The default value of 0 means that such sessions will not be
>     terminated.
> -- 
> Vik

I do not see any reason for an aborted idle in transaction session to be
treated differently.

Given the similarity to "statement_timeout" using similar wording for both
would be advised.

*Statement Timeout:*
"Abort any statement that takes more than the specified number of
milliseconds, starting from the time the command arrives at the server from
the client. If log_min_error_statement is set to ERROR or lower, the
statement that timed out will also be logged. A value of zero (the default)
turns this off.

Setting statement_timeout in postgresql.conf is not recommended because it
would affect all sessions."

*Idle Transaction Timeout:*

> Disconnect any session that has been "idle in transaction" (including
> aborted) for more than the specified number of milliseconds, starting from
> {however such is determined}.  
> 
> A value of zero (the default) turns this off.
> 
> Typical usage would be to set this to a small positive number in
> postgresql.conf and require sessions that expect long-running, idle,
> transactions to set it back to zero or some reasonable higher value.

While seconds is the better unit of measure I don't think that justifies
making this different from statement_timeout.  In either case users can
specify their own units.


Since we are killing the entire session, not just the transaction, a better
label would:

idle_transaction_session_timeout

David J.






--
View this message in context:
http://postgresql.1045698.n5.nabble.com/idle-in-transaction-timeout-tp5805859p5805904.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: idle_in_transaction_timeout

From

Josh Berkus

Date:

03 June 2014, 21:23:11

Vik,

> When the timeout occurs, the backend commits suicide with a FATAL
> ereport.  I thought about just aborting the transaction to free the
> locks but decided that the client is clearly broken so might as well
> free up the connection as well.

Out of curiosity, how much harder would it have been just to abort the
transaction?  I think breaking the connection is probably the right
behavior, but before folks start arguing it out, I wanted to know if
aborting the transaction is even a reasonable thing to do.

Argument in favor of breaking the connection: most of the time, IITs are
caused by poor error-handling or garbage-collection code on the client
side, which has already abandoned the application session but hadn't let
go of the database handle.  Since the application is never going to
reuse the handle, terminating it is the right thing to do.

Argument in favor of just aborting the transaction: client connection
management software may not be able to deal cleanly with the broken
connection and may halt operation completely, especially if the client
finds out the connection is gone when they try to start their next
transaction on that connection.

My $0.019999999999998: terminating the connection is the preferred behavior.

> The same behavior can be achieved by an external script that monitors
> pg_stat_activity but by having it in core we get much finer tuning (it
> is session settable) and also traces of it directly in the PostgreSQL
> logs so it can be graphed by things like pgbadger.
> 
> Unfortunately, no notification is sent to the client because there's no
> real way to do that right now.

Well, if you abort the connection, presumably the client finds out when
they try to send a query ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: idle_in_transaction_timeout

From

Tom Lane

Date:

03 June 2014, 21:53:45

Josh Berkus <josh@agliodbs.com> writes:
> Out of curiosity, how much harder would it have been just to abort the
> transaction?  I think breaking the connection is probably the right
> behavior, but before folks start arguing it out, I wanted to know if
> aborting the transaction is even a reasonable thing to do.

FWIW, I think aborting the transaction is probably better, especially
if the patch is designed to do nothing to already-aborted transactions.
If the client is still there, it will see the abort as a failure in its
next query, which is less likely to confuse it completely than a
connection loss.  (I think, anyway.)

The argument that we might want to close the connection to free up
connection slots doesn't seem to me to hold water as long as we allow
a client that *isn't* inside a transaction to sit on an idle connection
forever.  Perhaps there is room for a second timeout that limits how
long you can sit idle independently of being in a transaction, but that
isn't this patch.
        regards, tom lane

Re: idle_in_transaction_timeout

From

Alvaro Herrera

Date:

03 June 2014, 21:55:27

Josh Berkus wrote:

> Argument in favor of breaking the connection: most of the time, IITs are
> caused by poor error-handling or garbage-collection code on the client
> side, which has already abandoned the application session but hadn't let
> go of the database handle.  Since the application is never going to
> reuse the handle, terminating it is the right thing to do.

In some frameworks where garbage-collection is not involved with things
like this, what happens is that the connection object is reused later.
If you just drop the connection, the right error message will never
reach the application, but if you abort the xact, the next BEGIN issued
by the framework will receive the "connection aborted due to
idle-in-transaction session" message, which I assume is what this patch
would have sent.

Therefore I think if we want this at all it should abort the xact, not
drop the connection.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: idle_in_transaction_timeout

From

Andrew Dunstan

Date:

03 June 2014, 21:58:21

On 06/03/2014 05:53 PM, Tom Lane wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>> Out of curiosity, how much harder would it have been just to abort the
>> transaction?  I think breaking the connection is probably the right
>> behavior, but before folks start arguing it out, I wanted to know if
>> aborting the transaction is even a reasonable thing to do.
> FWIW, I think aborting the transaction is probably better, especially
> if the patch is designed to do nothing to already-aborted transactions.
> If the client is still there, it will see the abort as a failure in its
> next query, which is less likely to confuse it completely than a
> connection loss.  (I think, anyway.)
>
> The argument that we might want to close the connection to free up
> connection slots doesn't seem to me to hold water as long as we allow
> a client that *isn't* inside a transaction to sit on an idle connection
> forever.  Perhaps there is room for a second timeout that limits how
> long you can sit idle independently of being in a transaction, but that
> isn't this patch.
>
>             

Yes, I had the same thought.

cheers

andrew

Re: idle_in_transaction_timeout

From

Josh Berkus

Date:

03 June 2014, 23:39:01

On 06/03/2014 02:53 PM, Tom Lane wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>> Out of curiosity, how much harder would it have been just to abort the
>> transaction?  I think breaking the connection is probably the right
>> behavior, but before folks start arguing it out, I wanted to know if
>> aborting the transaction is even a reasonable thing to do.
> 
> FWIW, I think aborting the transaction is probably better, especially
> if the patch is designed to do nothing to already-aborted transactions.
> If the client is still there, it will see the abort as a failure in its
> next query, which is less likely to confuse it completely than a
> connection loss.  (I think, anyway.)

Personally, I think we'll get about equal amounts of confusion either way.

> The argument that we might want to close the connection to free up
> connection slots doesn't seem to me to hold water as long as we allow
> a client that *isn't* inside a transaction to sit on an idle connection
> forever.  Perhaps there is room for a second timeout that limits how
> long you can sit idle independently of being in a transaction, but that
> isn't this patch.

Like I said, I'm marginally in favor of terminating the connection, but
I'd be completely satisfied with a timeout which aborted the transaction
instead.  Assuming that change doesn't derail this patch and keep it
from getting into 9.5, of course.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: idle_in_transaction_timeout

From

Vik Fearing

Date:

03 June 2014, 23:54:44

On 06/04/2014 01:38 AM, Josh Berkus wrote:
> On 06/03/2014 02:53 PM, Tom Lane wrote:
>> Josh Berkus <josh@agliodbs.com> writes:
>>> Out of curiosity, how much harder would it have been just to abort the
>>> transaction?  I think breaking the connection is probably the right
>>> behavior, but before folks start arguing it out, I wanted to know if
>>> aborting the transaction is even a reasonable thing to do.
>> FWIW, I think aborting the transaction is probably better, especially
>> if the patch is designed to do nothing to already-aborted transactions.
>> If the client is still there, it will see the abort as a failure in its
>> next query, which is less likely to confuse it completely than a
>> connection loss.  (I think, anyway.)
> Personally, I think we'll get about equal amounts of confusion either way.
>
>> The argument that we might want to close the connection to free up
>> connection slots doesn't seem to me to hold water as long as we allow
>> a client that *isn't* inside a transaction to sit on an idle connection
>> forever.  Perhaps there is room for a second timeout that limits how
>> long you can sit idle independently of being in a transaction, but that
>> isn't this patch.
> Like I said, I'm marginally in favor of terminating the connection, but
> I'd be completely satisfied with a timeout which aborted the transaction
> instead.  Assuming that change doesn't derail this patch and keep it
> from getting into 9.5, of course.

The change is as simple as changing the ereport from FATAL to ERROR.

Attached is a new patch doing it that way.

--
Vik

Attachment

iitt.v2.patch

Re: idle_in_transaction_timeout

From

David G Johnston

Date:

04 June 2014, 00:01:06

On Tue, Jun 3, 2014 at 7:40 PM, Josh Berkus [via PostgreSQL] <[hidden email]> wrote:

On 06/03/2014 02:53 PM, Tom Lane wrote:

> Josh Berkus <[hidden email]> writes:
>> Out of curiosity, how much harder would it have been just to abort the
>> transaction? I think breaking the connection is probably the right
>> behavior, but before folks start arguing it out, I wanted to know if
>> aborting the transaction is even a reasonable thing to do.
>
> FWIW, I think aborting the transaction is probably better, especially
> if the patch is designed to do nothing to already-aborted transactions.
> If the client is still there, it will see the abort as a failure in its
> next query, which is less likely to confuse it completely than a
> connection loss. (I think, anyway.)
Personally, I think we'll get about equal amounts of confusion either way.

> The argument that we might want to close the connection to free up
> connection slots doesn't seem to me to hold water as long as we allow
> a client that *isn't* inside a transaction to sit on an idle connection
> forever. Perhaps there is room for a second timeout that limits how
> long you can sit idle independently of being in a transaction, but that
> isn't this patch.

Like I said, I'm marginally in favor of terminating the connection, but
I'd be completely satisfied with a timeout which aborted the transaction
instead. Assuming that change doesn't derail this patch and keep it
from getting into 9.5, of course.

Default to dropping the connection but give the usersadministrators the capability to decide for themselves?

I still haven't heard an argument for why this wouldn't apply to aborted idle-in-transactions. I get the focus in on releasing locks but if the transaction fails but still hangs around forever it is just as broken as one that doesn't fail and hangs around forever. Even if you limit the end result to only aborting the transaction the end-user will likely want to distinguish between their transaction failing and their failed transaction remaining idle too long - if only to avoid the situation where they make the transaction no longer fail but still hit the timeout.

Whether a connection is a resource the DBA wants to restore (at the expense of better server-client communication) is a parental decision we shouldn't force on them given how direct (feelings about GUCs aside). The decision to force the release of locks - the primary purpose of the patch - is made by simply using the setting in the first place.

David J.

View this message in context: Re: idle_in_transaction_timeout
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: idle_in_transaction_timeout

From

Vik Fearing

Date:

04 June 2014, 00:55:13

<div class="moz-cite-prefix">On 06/04/2014 02:01 AM, David G Johnston wrote:<br /></div><blockquote
cite="mid:CAKFQuwYwHkZXwt-NaUXsEP3XuSAunzbgPo8cbe_2Nv6M89hN1g@mail.gmail.com"type="cite"><div class="gmail_default"
style="font-family:arial,helvetica,sans-serif">Defaultto dropping the connection but give the usersadministrators the
capabilityto decide for themselves?</div></blockquote><br /> Meh.<br /><br /><blockquote
cite="mid:CAKFQuwYwHkZXwt-NaUXsEP3XuSAunzbgPo8cbe_2Nv6M89hN1g@mail.gmail.com"type="cite"><div class="gmail_default"
style="font-family:arial,helvetica,sans-serif">Istill haven't heard an argument for why this wouldn't apply to aborted
idle-in-transactions. I get the focus in on releasing locks but if the transaction fails but still hangs around forever
itis just as broken as one that doesn't fail and hangs around forever.  <br /></div></blockquote><br /> My main concern
waswith locks and blocking VACUUM.  Aborted transactions don't do either of those things.  The correct solution is to
terminateaborted transaction, too, or not terminate anything and abort the idle ones.<br /><br /><blockquote
cite="mid:CAKFQuwYwHkZXwt-NaUXsEP3XuSAunzbgPo8cbe_2Nv6M89hN1g@mail.gmail.com"type="cite"><div class="gmail_default"
style="font-family:arial,helvetica,sans-serif">Evenif you limit the end result to only aborting the transaction the
end-userwill likely want to distinguish between their transaction failing and their failed transaction remaining idle
toolong - if only to avoid the situation where they make the transaction no longer fail but still hit the
timeout.</div></blockquote><br/> But hitting the timeout *is* failing.<br /><br /> With the new patch, the first query
willsay that the transaction was aborted due to timeout.  Subsequent queries will do as they've always done.<br /><pre
class="moz-signature"cols="72">-- 
 
Vik</pre>

Re: idle_in_transaction_timeout

From

Robert Haas

Date:

04 June 2014, 02:28:31

On Tue, Jun 3, 2014 at 5:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>> Out of curiosity, how much harder would it have been just to abort the
>> transaction?  I think breaking the connection is probably the right
>> behavior, but before folks start arguing it out, I wanted to know if
>> aborting the transaction is even a reasonable thing to do.
>
> FWIW, I think aborting the transaction is probably better, especially
> if the patch is designed to do nothing to already-aborted transactions.
> If the client is still there, it will see the abort as a failure in its
> next query, which is less likely to confuse it completely than a
> connection loss.  (I think, anyway.)

I thought the reason why this hasn't been implemented before now is
that sending an ErrorResponse to the client will result in a loss of
protocol sync.  Sure, when the client sends the next query, they'll
then read the ErrorResponse.  So far so good.  The problem is that
they *won't* read whatever we send back as a response to their query,
because they think they already have, when in reality they've only
read the ErrorResponse sent much earlier.  This, at least as I've
understood it, is the reason why recovery conflicts kill off idle
sessions entirely instead of just aborting the transaction.  Andres
tried to fix that problem a few years ago without much luck; the most
relevant post to this particular issue seems to be:

http://www.postgresql.org/message-id/23631.1292521603@sss.pgh.pa.us

Assuming that the state of play hasn't changed in some way I'm unaware
of since 2010, I think the best argument for FATAL here is that it's
what we can implement.  I happen to think it's better anyway, because
the cases I've seen where this would actually be useful involve
badly-written applications that are not under the same administrative
control as the database and supposedly have built-in connection
poolers, except sometimes they seem to forget about some of the
connections they themselves opened.  The DBAs can't make the app
developers fix the app; they just have to try to survive.  Aborting
the transaction is a step in the right direction but since the guy at
the other end of the connection is actually or in effect dead, that
just leaves you with an eternally idle connection. Now we can say "use
TCP keepalives for that" but that only helps if the connection has
actually been severed; if the guy at the other end is still
technically there but is too brain-damaged to actually try to *use*
the connection for anything, it doesn't help.  Also, as I recently
discovered, there are still a few machines out there that don't
actually support TCP keepalives on a per-connection basis; you can
only configure it system-wide, and that's not always granular enough.

Anyway, if somebody really wants to go to the trouble of finding a way
to make this work without killing off the connection, I think that
would be cool and useful and whatever technology we develop there
could doubtless could be applied to other situations.  But I have a
nervous feeling that might be a hard enough problem to sink the whole
patch, which would be a shame, since the cases I've actually
encountered would be better off with FATAL anyway.

Just my $0.019999999999997 to go with Josh's $0.019999999999998.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: idle_in_transaction_timeout

From

Tom Lane

Date:

04 June 2014, 03:37:56

Robert Haas <robertmhaas@gmail.com> writes:
> I thought the reason why this hasn't been implemented before now is
> that sending an ErrorResponse to the client will result in a loss of
> protocol sync.

Hmm ... you are right that this isn't as simple as an ereport(ERROR),
but I'm not sure it's impossible.  We could for instance put the backend
into skip-till-Sync state so that it effectively ignored the next command
message.  Causing that to happen might be impracticably messy, though.

I'm not sure whether cancel-transaction behavior is enough better than
cancel-session to warrant extra effort here.
        regards, tom lane

Re: idle_in_transaction_timeout

From

Andres Freund

Date:

04 June 2014, 06:06:20

On 2014-06-03 23:37:28 -0400, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > I thought the reason why this hasn't been implemented before now is
> > that sending an ErrorResponse to the client will result in a loss of
> > protocol sync.
> 
> Hmm ... you are right that this isn't as simple as an ereport(ERROR),
> but I'm not sure it's impossible.  We could for instance put the backend
> into skip-till-Sync state so that it effectively ignored the next command
> message.  Causing that to happen might be impracticably messy, though.

Isn't another problem here that we're reading from the client, possibly
nested inside openssl? So we can't just longjmp out without possibly
destroying openssl's internal state?
I think that's solveable by first returning from openssl, signalling a
short read, and only *then* jumping out. I remember making something
like that work in a POC patch at least... But it's been a couple of years.

> I'm not sure whether cancel-transaction behavior is enough better than
> cancel-session to warrant extra effort here.

I don't think idle_in_transaction_timeout is worth this, but if we could
implement it we could finally have recovery conflicts against idle in
txn sessions not use FATAL...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: idle_in_transaction_timeout

From

Robert Haas

Date:

04 June 2014, 12:42:47

On Tue, Jun 3, 2014 at 11:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I thought the reason why this hasn't been implemented before now is
>> that sending an ErrorResponse to the client will result in a loss of
>> protocol sync.
>
> Hmm ... you are right that this isn't as simple as an ereport(ERROR),
> but I'm not sure it's impossible.  We could for instance put the backend
> into skip-till-Sync state so that it effectively ignored the next command
> message.  Causing that to happen might be impracticably messy, though.

Another thing we could maybe do is AbortCurrentTransaction() and send
the client a NoticeResponse saying "hey, expect all of your future
commands to fail with complaints about the transaction being aborted".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: idle_in_transaction_timeout

From

Kevin Grittner

Date:

18 June 2014, 18:50:09

Robert Haas <robertmhaas@gmail.com> wrote:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:

>>> I thought the reason why this hasn't been implemented before
>>> now is that sending an ErrorResponse to the client will result
>>> in a loss of protocol sync.
>>
>> Hmm ... you are right that this isn't as simple as an
>> ereport(ERROR), but I'm not sure it's impossible.  We could for
>> instance put the backend into skip-till-Sync state so that it
>> effectively ignored the next command message.  Causing that to
>> happen might be impracticably messy, though.
>
> Another thing we could maybe do is AbortCurrentTransaction() and
> send the client a NoticeResponse saying "hey, expect all of your
> future commands to fail with complaints about the transaction
> being aborted".

Wow, until I read through the thread on this patch and the old
thread that Robert linked to I had forgotten the attempt by Andres
to deal with this three and a half years ago.  That patch died
because of how complicated it was to do the right thing if the
connection was left open.

Personally, where I have seen a use for this, treating it as FATAL
would be better anyway; but was OK with treating it as an ERROR if
that didn't sink the patch.  I guess that puts me in the same camp
as Josh and Robert.

Vik has submitted v1 and v2 of this patch; the only difference
between them is that v1 makes a timeout FATAL and v2 makes it an
ERROR (with appropriate corresponding changes to code comments,
documentation, and message text).  It's not hard to show the
problems with an ERROR under v2, confirming that the simple
approach of just changing the ereport severity is not enough:

test=# set idle_in_transaction_timeout = '3s';
SET
test=# begin;
BEGIN
test=# select 1;
ERROR:  current transaction is aborted due to idle-in-transaction timeout
test=# commit;
ERROR:  current transaction is aborted, commands ignored until end of transaction block
test=# commit;
ROLLBACK

The first thing is that I don't think a delay between the BEGIN and
the SELECT should cause a timeout to trigger, but more importantly
there should not be two ERROR responses to one SELECT statement.

I'm inclined to abandon the ERROR approach as not worth the effort
and fragility, and focus on v1 of the patch.  If we can't get to
consensus on that, I think that this patch should be flagged
"Returned with Feedback", noting that any follow-up version
requires some way to deal with the issues raised regarding multiple
ERROR messages.

Abridged for space. hopefully without losing the main points of the
authors, so far:

Josh Berkus:
> My $0.019999999999998: terminating the connection is the
> preferred behavior.

Tom Lane:
> FWIW, I think aborting the transaction is probably better,
> especially if the patch is designed to do nothing to
> already-aborted transactions.  If the client is still there, it
> will see the abort as a failure in its next query, which is less
> likely to confuse it completely than a connection loss.  (I
> think, anyway.)

Álvaro Herrera:
> I think if we want this at all it should abort the xact, not drop
> the connection.

Andrew Dunstan:
> [quotes Tom's argument]
> Yes, I had the same thought.

David G Johnston:
> Default to dropping the connection but give the
> usersadministrators the capability to decide for themselves?

Robert Haas:
> Assuming that the state of play hasn't changed in some way I'm
> unaware of since 2010, I think the best argument for FATAL here
> is that it's what we can implement.  I happen to think it's
> better anyway, because the cases I've seen where this would
> actually be useful involve badly-written applications that are
> not under the same administrative control as the database and
> supposedly have built-in connection poolers, except sometimes
> they seem to forget about some of the connections they themselves
> opened.  The DBAs can't make the app developers fix the app; they
> just have to try to survive.  Aborting the transaction is a step
> in the right direction but since the guy at the other end of the
> connection is actually or in effect dead, that just leaves you
> with an eternally idle connection.

Tom Lane (reprise):
> I'm not sure whether cancel-transaction behavior is enough better
> than cancel-session to warrant extra effort here.

Andres Freund:
> [quotes Tom's latest (above)]
> I don't think idle_in_transaction_timeout is worth this, but if
> we could implement it we could finally have recovery conflicts
> against idle in txn sessions not use FATAL...

Kevin Grittner:
> Personally, where I have seen a use for this, treating it as
> FATAL would be better anyway

A few ideas were put forward on how a much more complicated patch
could allow transaction rollback with an ERROR work acceptably, but
I think that would probably need to be a new patch in a later CF,
so that is omitted here.

So, can we agree to use FATAL to terminate the connection?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: idle_in_transaction_timeout

From

Josh Berkus

Date:

18 June 2014, 19:24:07

On 06/18/2014 11:50 AM, Kevin Grittner wrote:
> The first thing is that I don't think a delay between the BEGIN and
> the SELECT should cause a timeout to trigger, but more importantly
> there should not be two ERROR responses to one SELECT statement.

I do think a delay between BEGIN and SELECT should trigger the timeout.There are plenty of badly-written applications
which"auto-begin", that
 
is, they issue a "BEGIN;" immediately after every "COMMIT;" whether or
not there's any additional work to do.  This is a major source of IIT
and the timeout should not ignore it.

> I'm inclined to abandon the ERROR approach as not worth the effort
> and fragility, and focus on v1 of the patch.  If we can't get to
> consensus on that, I think that this patch should be flagged
> "Returned with Feedback", noting that any follow-up version
> requires some way to deal with the issues raised regarding multiple
> ERROR messages.

+1

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: idle_in_transaction_timeout

From

Tom Lane

Date:

18 June 2014, 19:32:47

Josh Berkus <josh@agliodbs.com> writes:
>  There are plenty of badly-written applications which "auto-begin", that
> is, they issue a "BEGIN;" immediately after every "COMMIT;" whether or
> not there's any additional work to do.  This is a major source of IIT
> and the timeout should not ignore it.

Nonsense.  We explicitly don't do anything useful until the first actual
command arrives, precisely to avoid that problem.

It might be that we should slap such apps' wrists anyway, but given
that we've gone to the trouble of working around the behavior at the
system structural level, I'd be inclined to say not.  What you'd be
doing is preventing people who have to deal with such apps from using
the timeout in any useful fashion.
        regards, tom lane

Re: idle_in_transaction_timeout

From

Josh Berkus

Date:

18 June 2014, 19:53:59

On 06/18/2014 12:32 PM, Tom Lane wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>>  There are plenty of badly-written applications which "auto-begin", that
>> is, they issue a "BEGIN;" immediately after every "COMMIT;" whether or
>> not there's any additional work to do.  This is a major source of IIT
>> and the timeout should not ignore it.
> 
> Nonsense.  We explicitly don't do anything useful until the first actual
> command arrives, precisely to avoid that problem.

Oh, we don't allocate a snapshot?  If not, then no objection here.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: idle_in_transaction_timeout

From

Robert Haas

Date:

18 June 2014, 20:41:36

On Wed, Jun 18, 2014 at 3:53 PM, Josh Berkus <josh@agliodbs.com> wrote:
> On 06/18/2014 12:32 PM, Tom Lane wrote:
>> Josh Berkus <josh@agliodbs.com> writes:
>>>  There are plenty of badly-written applications which "auto-begin", that
>>> is, they issue a "BEGIN;" immediately after every "COMMIT;" whether or
>>> not there's any additional work to do.  This is a major source of IIT
>>> and the timeout should not ignore it.
>>
>> Nonsense.  We explicitly don't do anything useful until the first actual
>> command arrives, precisely to avoid that problem.
>
> Oh, we don't allocate a snapshot?  If not, then no objection here.

The only problem I see is that it makes the semantics kind of weird
and confusing.  "Kill connections that are idle in transaction for too
long" is a pretty clear spec; "kill connections that are idle in
transaction except if they haven't executed any commands yet because
we think you don't care about that case" is not quite as clear, and
not really what the GUC name says, and maybe not what everybody wants,
and maybe masterminding.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: idle_in_transaction_timeout

From

Bruce Momjian

Date:

18 June 2014, 21:52:56

On Wed, Jun 18, 2014 at 04:41:30PM -0400, Robert Haas wrote:
> On Wed, Jun 18, 2014 at 3:53 PM, Josh Berkus <josh@agliodbs.com> wrote:
> > On 06/18/2014 12:32 PM, Tom Lane wrote:
> >> Josh Berkus <josh@agliodbs.com> writes:
> >>>  There are plenty of badly-written applications which "auto-begin", that
> >>> is, they issue a "BEGIN;" immediately after every "COMMIT;" whether or
> >>> not there's any additional work to do.  This is a major source of IIT
> >>> and the timeout should not ignore it.
> >>
> >> Nonsense.  We explicitly don't do anything useful until the first actual
> >> command arrives, precisely to avoid that problem.
> >
> > Oh, we don't allocate a snapshot?  If not, then no objection here.
> 
> The only problem I see is that it makes the semantics kind of weird
> and confusing.  "Kill connections that are idle in transaction for too
> long" is a pretty clear spec; "kill connections that are idle in
> transaction except if they haven't executed any commands yet because
> we think you don't care about that case" is not quite as clear, and
> not really what the GUC name says, and maybe not what everybody wants,
> and maybe masterminding.

"Kill connections that are idle in non-empty transaction block for too
long"

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +

Re: idle_in_transaction_timeout

From

Josh Berkus

Date:

18 June 2014, 23:47:01

On 06/18/2014 02:52 PM, Bruce Momjian wrote:
> On Wed, Jun 18, 2014 at 04:41:30PM -0400, Robert Haas wrote:
>> The only problem I see is that it makes the semantics kind of weird
>> and confusing.  "Kill connections that are idle in transaction for too
>> long" is a pretty clear spec; "kill connections that are idle in
>> transaction except if they haven't executed any commands yet because
>> we think you don't care about that case" is not quite as clear, and
>> not really what the GUC name says, and maybe not what everybody wants,
>> and maybe masterminding.
> 
> "Kill connections that are idle in non-empty transaction block for too
> long"

Here's the POLS violation in this:

"I have idle_in_transaction_timeout set to 10min, but according to
pg_stat_activity I have six connections which are IIT for over an hour.What gives?"

Robert's right, not killing the "BEGIN;" only transactions is liable to
result in user confusion unless we label those sessions differently in
pg_stat_activity. Tom is right in that killing them will cause some
users to not use IIT_timeout when they should, or will set the timeout
too high to be useful.

So it seems like what we should do is NOT call sessions IIT if they've
merely executed a BEGIN; and not done anything else.  Then the timeout
and pg_stat_activity would be consistent.

Counter-argument: most app frameworks which do an automatic BEGIN; also
do other stuff like SET TIMEZONE each time as well.  Is this really a
case worth worrying about?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: idle_in_transaction_timeout

From

Marko Tiikkaja

Date:

18 June 2014, 23:54:28

On 2014-06-19 1:46 AM, Josh Berkus wrote:
> Robert's right, not killing the "BEGIN;" only transactions is liable to
> result in user confusion unless we label those sessions differently in
> pg_stat_activity.

Wouldn't they be labeled differently already?  i.e. the last query would 
be "BEGIN".  Unless your app tries to unsuccessfully use nested 
transactions, you would know why it hasn't been killed.

.marko

Re: idle_in_transaction_timeout

From

Josh Berkus

Date:

19 June 2014, 00:01:32

On 06/18/2014 04:54 PM, Marko Tiikkaja wrote:
> On 2014-06-19 1:46 AM, Josh Berkus wrote:
>> Robert's right, not killing the "BEGIN;" only transactions is liable to
>> result in user confusion unless we label those sessions differently in
>> pg_stat_activity.
> 
> Wouldn't they be labeled differently already?  i.e. the last query would
> be "BEGIN".  Unless your app tries to unsuccessfully use nested
> transactions, you would know why it hasn't been killed.

That's pretty darned obscure for a casual user.  *you* would know, and
*I* would know, but 99.5% of our users would be very confused.

Plus, if a session which has only issued a "BEGIN;" doesn't have a
snapshot and isn't holding any locks, then I'd argue we shouldn't lable
it IIT in the first place because it's not doing any of the bad stuff we
want to resolve by killing IITs.  Effectively, it's just "idle".

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: idle_in_transaction_timeout

From

Tom Lane

Date:

19 June 2014, 00:44:11

Josh Berkus <josh@agliodbs.com> writes:
> Counter-argument: most app frameworks which do an automatic BEGIN; also
> do other stuff like SET TIMEZONE each time as well.  Is this really a
> case worth worrying about?

SET TIMEZONE doesn't result in taking a snapshot either.
        regards, tom lane

Re: idle_in_transaction_timeout

From

David G Johnston

Date:

19 June 2014, 01:36:37

On Wed, Jun 18, 2014 at 8:01 PM, Josh Berkus [via PostgreSQL] <[hidden email]> wrote:

On 06/18/2014 04:54 PM, Marko Tiikkaja wrote:
> On 2014-06-19 1:46 AM, Josh Berkus wrote:
>> Robert's right, not killing the "BEGIN;" only transactions is liable to
>> result in user confusion unless we label those sessions differently in
>> pg_stat_activity.
>
> Wouldn't they be labeled differently already? i.e. the last query would
> be "BEGIN". Unless your app tries to unsuccessfully use nested
> transactions, you would know why it hasn't been killed.

That's pretty darned obscure for a casual user. *you* would know, and
*I* would know, but 99.5% of our users would be very confused.

Plus, if a session which has only issued a "BEGIN;" doesn't have a
snapshot and isn't holding any locks, then I'd argue we shouldn't lable
it IIT in the first place because it's not doing any of the bad stuff we
want to resolve by killing IITs. Effectively, it's just "idle".

Since the BEGIN is not meaningfully interpreted until the first subsequent command (SET excepting apparently - are there others?) is issued no transaction has begun at this point so "idle" and not "IIT" should be the reported state on pg_stat_activity.

Though I can understand some level of surprise if someone sees "idle" with a "BEGIN" (or SET TIMEZONE) as the last executed command - so maybe "idle before transaction" instead of "idle in transaction" - which hopefully will not be assumed to be controlled by the "idle_in_transaction_timeout" GUC. I presume that we have some way to distinguish this particular case and can report it accordingly. Even if that state endures after a SET TIMEZONE or similar session configuration command explaining that all those are "pre-transaction" shouldn't be too hard - though as a third option "idle initialized transaction" could be the state indicator.

Depending on how "idle in transaction (aborted)" gets resolved we could also consider "idle in transaction (initializing)" - but since the former, IMO, should be dropped (and thus matches the "in" specification of the GUC) naming the later - which should not be dropped (I'll go with the stated opinion here for now) - with "in" is not advisable.

The current behavior is transparent to the casual user so sticking with "idle" does seem like the best choice to maintain the undocumented nature of the behavior.

David J.

View this message in context: Re: idle_in_transaction_timeout
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: idle_in_transaction_timeout

From

Pavel Stehule

Date:

19 June 2014, 08:59:23

Hello

pgbouncer has idle_transaction_timeout defined some years without any problems, so we can take this design

idle_transaction_timeout

If client has been in "idle in transaction" state longer, it will be disconnected. [seconds]

Default: 0.0 (disabled)

This feature can be very important, and I seen a few databases thas was unavailable due leaked transaction.

Regards

Pavel

2014-06-19 1:46 GMT+02:00 Josh Berkus <josh@agliodbs.com>:

On 06/18/2014 02:52 PM, Bruce Momjian wrote:
> On Wed, Jun 18, 2014 at 04:41:30PM -0400, Robert Haas wrote:
>> The only problem I see is that it makes the semantics kind of weird
>> and confusing. "Kill connections that are idle in transaction for too
>> long" is a pretty clear spec; "kill connections that are idle in
>> transaction except if they haven't executed any commands yet because
>> we think you don't care about that case" is not quite as clear, and
>> not really what the GUC name says, and maybe not what everybody wants,
>> and maybe masterminding.
>
> "Kill connections that are idle in non-empty transaction block for too
> long"

Here's the POLS violation in this:

"I have idle_in_transaction_timeout set to 10min, but according to
pg_stat_activity I have six connections which are IIT for over an hour.
What gives?"

Robert's right, not killing the "BEGIN;" only transactions is liable to
result in user confusion unless we label those sessions differently in
pg_stat_activity. Tom is right in that killing them will cause some
users to not use IIT_timeout when they should, or will set the timeout
too high to be useful.

So it seems like what we should do is NOT call sessions IIT if they've
merely executed a BEGIN; and not done anything else. Then the timeout
and pg_stat_activity would be consistent.

Counter-argument: most app frameworks which do an automatic BEGIN; also
do other stuff like SET TIMEZONE each time as well. Is this really a
case worth worrying about?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: idle_in_transaction_timeout

From

Abhijit Menon-Sen

Date:

19 June 2014, 09:12:50

At 2014-06-19 10:58:34 +0200, pavel.stehule@gmail.com wrote:
>
> Hello
> 
> pgbouncer has idle_transaction_timeout defined some years without any
> problems, so we can take this design
> 
> idle_transaction_timeout

Let's steal the name too. :-)

(I didn't realise there was precedent in pgbouncer, but given that the
name is in use already, it'll be less confusing to use that name for
Postgres as well.)

-- Abhijit

Re: idle_in_transaction_timeout

From

Kevin Grittner

Date:

19 June 2014, 15:42:08

Abhijit Menon-Sen <ams@2ndquadrant.com> wrote:
> At 2014-06-19 10:58:34 +0200, pavel.stehule@gmail.com wrote:

>> pgbouncer has idle_transaction_timeout defined some years
>> without any problems, so we can take this design
>>
>> idle_transaction_timeout
>
> Let's steal the name too. :-)
>
> (I didn't realise there was precedent in pgbouncer, but given
> that the name is in use already, it'll be less confusing to use
> that name for Postgres as well.)

I'm not sure whether using the same name as pgbouncer for the same
functionality makes it less confusing or might lead to confusion
about which layer is disconnecting the client.  I'm leaning toward
using the same name, but I'm really not sure.  Does anyone else
want to offer an opinion?

Somehow I had thought that pg_stat_activity didn't show a
connection as "idle in transaction" as soon as BEGIN was run -- I
thought some subsequent statement was needed to put it into that
state; but I see that I was wrong about that.  As long as BEGIN
causes a connection to show as "idle in transaction" I think the
BEGIN should start the clock on this timeout, based on POLA.  It
wouldn't bother me to see us distinguish among early transaction
states, but I'm not sure whether it's really worth the effort.  We
couldn't base it just on whether the transaction has acquired a
snapshot, since it is not unusual for explicit LOCK statements at
the front of the transaction to run before a snapshot is acquired,
and we would want to terminate a transaction sitting idle and
holding locks.

Also, it seems like most are ok with the FATAL approach (as in v1
of this patch).  Does anyone want to make an argument against
proceeding with that, in light of discussion so far?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: idle_in_transaction_timeout

From

Vik Fearing

Date:

19 June 2014, 15:53:55

On 06/19/2014 05:42 PM, Kevin Grittner wrote:
> Abhijit Menon-Sen <ams@2ndquadrant.com> wrote:
>> At 2014-06-19 10:58:34 +0200, pavel.stehule@gmail.com wrote:
> 
>>> pgbouncer has idle_transaction_timeout defined some years
>>> without any problems, so we can take this design
>>>
>>> idle_transaction_timeout
>>
>> Let's steal the name too. :-)
>>
>> (I didn't realise there was precedent in pgbouncer, but given
>> that the name is in use already, it'll be less confusing to use
>> that name for Postgres as well.)
> 
> I'm not sure whether using the same name as pgbouncer for the same
> functionality makes it less confusing or might lead to confusion
> about which layer is disconnecting the client.  I'm leaning toward
> using the same name, but I'm really not sure.  Does anyone else
> want to offer an opinion?

I much prefer with "in" but it doesn't much matter.

> Somehow I had thought that pg_stat_activity didn't show a
> connection as "idle in transaction" as soon as BEGIN was run -- I
> thought some subsequent statement was needed to put it into that
> state; but I see that I was wrong about that.  As long as BEGIN
> causes a connection to show as "idle in transaction" I think the
> BEGIN should start the clock on this timeout, based on POLA.  

For me, this is a separate patch.  Whether it goes before or after this
one doesn't make a big difference, I don't think.

> It wouldn't bother me to see us distinguish among early transaction
> states, but I'm not sure whether it's really worth the effort.  We
> couldn't base it just on whether the transaction has acquired a
> snapshot, since it is not unusual for explicit LOCK statements at
> the front of the transaction to run before a snapshot is acquired,
> and we would want to terminate a transaction sitting idle and
> holding locks.
> 
> Also, it seems like most are ok with the FATAL approach (as in v1
> of this patch).  Does anyone want to make an argument against
> proceeding with that, in light of discussion so far?

From my view, we have these outstanding issues: - the name - begin-only transactions - aborted transactions

Those last two could arguably be considered identical.  If the client
issued a BEGIN, then the client is in a transaction and I don't think we
should report otherwise.  So then we need to decide why "idle in
transaction (aborted)" gets killed but "idle in transaction (initiated)"
doesn't.  The v1 patch doesn't currently kill the former but there was
question that it should.  I still believe it shouldn't.

Anyway, I'm willing to modify the patch in any way that consensus dictates.
-- 
Vik

Re: idle_in_transaction_timeout

From

Abhijit Menon-Sen

Date:

19 June 2014, 16:03:55

At 2014-06-19 08:42:01 -0700, kgrittn@ymail.com wrote:
>
> I'm not sure whether using the same name as pgbouncer for the same
> functionality makes it less confusing or might lead to confusion
> about which layer is disconnecting the client.

I don't think the names of the respective configuration settings will
significantly affect that confusion either way.

(I can imagine people being confused if they're using pgbouncer without
the timeout in front of a server with the timeout. But since they would
have to have turned it on explicitly on the server, it can't all THAT
much of a surprise. Or at least not that hard to figure out.)

> As long as BEGIN causes a connection to show as "idle in transaction"
> I think the BEGIN should start the clock on this timeout, based on
> POLA.

Yes, agreed. I don't think it's worth doing anything more complicated.

> Also, it seems like most are ok with the FATAL approach (as in v1
> of this patch).

I don't have a strong opinion, but I think FATAL is the pragmatic
approach.

-- Abhijit

Re: idle_in_transaction_timeout

From

Abhijit Menon-Sen

Date:

19 June 2014, 16:40:22

At 2014-06-19 17:53:17 +0200, vik.fearing@dalibo.com wrote:
>
> I much prefer with "in" but it doesn't much matter.

If you look at similar settings like statement_timeout, lock_timeout,
etc., what comes before the _timeout is a concrete *thing* that can
timeout, or that a timeout can be applied to (e.g. wal_receiver).

"What's timing out?" "A statement."

But in "idle_in_transaction_timeout", "idle_in_transaction" is not a
thing. It's a description of the state of a thing (the thing being a
session in the FATAL variant of your patch, or a transaction in the
ERROR variant).

"What's timing out?" "An idle in transaction." "Huh?"

Strictly speaking, by this logic, the consistent name for the setting in
the FATAL variant would be "idle_in_transaction_session_timeout", while
for the ERROR variant it would be "idle_transaction_timeout".

Both those names pass the "What's timing out?" test. But
"idle_in_transaction_timeout" alone doesn't, and that's why I can't
bring myself to like it. And pgbouncer's use of
"idle_transaction_timeout" is a weak precedent to continue to use the
same name for the same functionality.

Anyway, as you say, it doesn't matter so much. I promise I won't beat
the nomenclature horse any more. I just wanted to explain my thinking
once. Sorry for dragging it out.

-- Abhijit

Re: idle_in_transaction_timeout

From

David G Johnston

Date:

19 June 2014, 17:39:54

On Thu, Jun 19, 2014 at 12:40 PM, Abhijit Menon-Sen-2 [via PostgreSQL] <[hidden email]> wrote:

At 2014-06-19 17:53:17 +0200, [hidden email] wrote:
>
> I much prefer with "in" but it doesn't much matter.

If you look at similar settings like statement_timeout, lock_timeout,
etc., what comes before the _timeout is a concrete *thing* that can
timeout, or that a timeout can be applied to (e.g. wal_receiver).

"What's timing out?" "A statement."

But in "idle_in_transaction_timeout", "idle_in_transaction" is not a
thing. It's a description of the state of a thing (the thing being a
session in the FATAL variant of your patch, or a transaction in the
ERROR variant).

"What's timing out?" "An idle in transaction." "Huh?"

Strictly speaking, by this logic, the consistent name for the setting in
the FATAL variant would be "idle_in_transaction_session_timeout",

+1; I even suggested something similar (I omitted the "in") - though we hadn't come to a firm conclusion on what behavior we were going to implement at the time. Adding session reasonably precludes us from easily changing our mind later (or giving the user an option) but that doesn't seem likely anyway.

Advocating for the Devil (or a more robust, if probably complicated, solution):

"idle_in_transaction_timeout=10s"

"idle_in_transaction_target=session|transaction"

Would be a valid pair since the first intentionally would not want to specify a target - and thus does not fit within the pattern you defined.

"idle_transaction_timeout" is bad since idle is a valid state that is not being affected by this patch; whereas pgbouncer indeed drops truly idle connections.

David J.

View this message in context: Re: idle_in_transaction_timeout
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: idle_in_transaction_timeout

From

Josh Berkus

Date:

19 June 2014, 21:38:39

On 06/19/2014 10:39 AM, David G Johnston wrote:
> Advocating for the Devil (or a more robust, if probably complicated,
> solution):
> "idle_in_transaction_timeout=10s"
> "idle_in_transaction_target=session|transaction"

Per Kevin Grittner's assessment, aborting the transaction is currently a
nonstarter. So no need for a 2nd GUC.

Also, I really hope that nobody is setting this to 10s ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: idle_in_transaction_timeout

From

Kevin Grittner

Date:

19 June 2014, 21:44:42

Josh Berkus <josh@agliodbs.com> wrote:

> Also, I really hope that nobody is setting this to 10s ...

Well, we get to decide what the minimum allowed value is.  What do
you think that should be?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: idle_in_transaction_timeout

From

Josh Berkus

Date:

19 June 2014, 21:48:46

On 06/19/2014 02:44 PM, Kevin Grittner wrote:
> Josh Berkus <josh@agliodbs.com> wrote:
> 
>> Also, I really hope that nobody is setting this to 10s ...
> 
> Well, we get to decide what the minimum allowed value is.  What do
> you think that should be?

1s?

I don't think that setting it to 1s would ever be a good idea, but I
don't want to tie users' hands if they know better than I do. Also,
unless we use 1s granuarity, users can't set it to values like 45s,
which is more reasonable. I can't see any circumstance under which
sub-second values would ever make sense.

Now ... can we decrease CPU overhead (wakeups) if we only check once per
minute?  If so, there's a good argument for making 1min the minimum.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: idle_in_transaction_timeout

From

Tom Lane

Date:

19 June 2014, 22:18:51

Josh Berkus <josh@agliodbs.com> writes:
> Now ... can we decrease CPU overhead (wakeups) if we only check once per
> minute?  If so, there's a good argument for making 1min the minimum.

Polling wakeups are right out, and are unnecessary anyway.  The 
utils/misc/timeout.c infrastructure calculates the time left till the
closest timeout event.  So I don't see a need to worry about that end of
it.

ISTM our realistic options are for seconds or msec as the unit.  If it's
msec, we'd be limited to INT_MAX msec or around 600 hours at the top end,
which seems like enough to me but maybe somebody thinks differently?
Seconds are probably OK but I'm worried about somebody complaining that
that's not enough resolution, especially as machines get faster.

Whichever the unit, I don't see a reason to set a lower bound different
from "one".  You ask for a 1ms timeout, we'll give it to you, it's your
problem whether that's sane in your environment.
        regards, tom lane

Re: idle_in_transaction_timeout

From

Josh Berkus

Date:

19 June 2014, 22:33:14

Tom,

> ISTM our realistic options are for seconds or msec as the unit.  If it's
> msec, we'd be limited to INT_MAX msec or around 600 hours at the top end,
> which seems like enough to me but maybe somebody thinks differently?
> Seconds are probably OK but I'm worried about somebody complaining that
> that's not enough resolution, especially as machines get faster.

I can picture a 500ms timeout more readily than I can picture a 1000hr
timeout.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: idle_in_transaction_timeout

From

Alvaro Herrera

Date:

20 June 2014, 01:14:30

Josh Berkus wrote:
> Tom,
> 
> > ISTM our realistic options are for seconds or msec as the unit.  If it's
> > msec, we'd be limited to INT_MAX msec or around 600 hours at the top end,
> > which seems like enough to me but maybe somebody thinks differently?
> > Seconds are probably OK but I'm worried about somebody complaining that
> > that's not enough resolution, especially as machines get faster.
> 
> I can picture a 500ms timeout more readily than I can picture a 1000hr
> timeout.

Agreed.  600 hours are upwards of 25 days.  Dead tuples accumulated for
that long would be a really serious problem, unless your database is
almost totally idle.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: idle_in_transaction_timeout

From

Andrew Dunstan

Date:

20 June 2014, 02:01:24

On 06/19/2014 06:33 PM, Josh Berkus wrote:
> Tom,
>
>> ISTM our realistic options are for seconds or msec as the unit.  If it's
>> msec, we'd be limited to INT_MAX msec or around 600 hours at the top end,
>> which seems like enough to me but maybe somebody thinks differently?
>> Seconds are probably OK but I'm worried about somebody complaining that
>> that's not enough resolution, especially as machines get faster.
> I can picture a 500ms timeout more readily than I can picture a 1000hr
> timeout.
>


As long as we can specify the units, and don't have to say 1000 to mean 
1 second, I agree. I would normally expect this to be set in terms of 
minutes rather than millisecs.

cheers

andrew

Re: idle_in_transaction_timeout

From

Kevin Grittner

Date:

21 June 2014, 18:23:57

Andrew Dunstan <andrew@dunslane.net> wrote:

> On 06/19/2014 06:33 PM, Josh Berkus wrote:

>>> ISTM our realistic options are for seconds or msec as the unit.  If it's
>>> msec, we'd be limited to INT_MAX msec or around 600 hours at the top end,
>>> which seems like enough to me but maybe somebody thinks differently?
>>> Seconds are probably OK but I'm worried about somebody complaining that
>>> that's not enough resolution, especially as machines get faster.
>> I can picture a 500ms timeout more readily than I can picture a 1000hr
>> timeout.
>
> As long as we can specify the units, and don't have to say 1000 to mean
> 1 second, I agree. I would normally expect this to be set in terms of
> minutes rather than millisecs.

OK, so I think we want to see a patch based on v1 (FATAL approach)
with a change of the name to idle_in_transaction_session_timeout
and the units changed to milliseconds.  I don't see why the
remoteversion test shouldn't be changed to use 90500 now, too.

I'll flip this to Waiting on Author for those changes.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: idle_in_transaction_timeout

From

Vik Fearing

Date:

21 June 2014, 21:55:43

On 06/21/2014 08:23 PM, Kevin Grittner wrote:
> OK, so I think we want to see a patch based on v1 (FATAL approach)
> with a change of the name to idle_in_transaction_session_timeout
> and the units changed to milliseconds.  I don't see why the
> remoteversion test shouldn't be changed to use 90500 now, too.

The attached patch, rebased to current master, addresses all of these
issues.

> I'll flip this to Waiting on Author for those changes.

And back to Needs Review.
--
Vik

Attachment

iitt.v3.patch

Re: idle_in_transaction_timeout

From

Abhijit Menon-Sen

Date:

22 June 2014, 01:34:29

At 2014-06-21 23:54:58 +0200, vik.fearing@dalibo.com wrote:
> 
> > I'll flip this to Waiting on Author for those changes.
> 
> And back to Needs Review.

I've marked it Ready for Committer after a quick read-through.

-- Abhijit

Re: idle_in_transaction_timeout

From

Kevin Grittner

Date:

22 June 2014, 15:11:28

Abhijit Menon-Sen <ams@2ndquadrant.com>

> I've marked it Ready for Committer after a quick read-through.

I took a pass through it with an eye toward committing it.  I found
a couple minor whitespace issues, where the patch didn't follow
conventional indenting practice; I can fix that no problem.  I
found that as it stood, the patch reduced the number of user
timeouts which could be registered; I have a fix for that which I
hope will also prevent future problems in that regard.  None of
that would have held up pushing it.

I found one substantive issue that had been missed in discussion,
though.  The patch modifies the postgres_fdw extension to make it
automatically exempt from an attempt to set a limit like this on
the server to which it connects.  I'm not sure that's a good idea.
Why should this type of connection be allowed to sit indefinitely
with an idle open transaction?  I'm inclined to omit this part of
the patch:

+++ b/contrib/postgres_fdw/connection.c
@@ -343,6 +343,13 @@ configure_remote_session(PGconn *conn)
        do_sql_command(conn, "SET extra_float_digits = 3");
    else
        do_sql_command(conn, "SET extra_float_digits = 2");
+
+   /*
+    * Ensure the remote server doesn't kill us off if we remain idle in a
+    * transaction for too long.
+    */
+   if (remoteversion >= 90500)
+       do_sql_command(conn, "SET idle_in_transaction_session_timeout = 0");
 }
 
 /*

(Please forgive any mangling of the whitespace above by my email
provider.)

Thoughts?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: idle_in_transaction_timeout

From

Andres Freund

Date:

22 June 2014, 15:34:01

On 2014-06-21 11:23:44 -0700, Kevin Grittner wrote:
> Andrew Dunstan <andrew@dunslane.net> wrote:
> 
> 
> > On 06/19/2014 06:33 PM, Josh Berkus wrote:
> 
> >>> ISTM our realistic options are for seconds or msec as the unit.  If it's
> >>> msec, we'd be limited to INT_MAX msec or around 600 hours at the top end,
> >>> which seems like enough to me but maybe somebody thinks differently?
> >>> Seconds are probably OK but I'm worried about somebody complaining that
> >>> that's not enough resolution, especially as machines get faster.
> >> I can picture a 500ms timeout more readily than I can picture a 1000hr
> >> timeout.
> >
> > As long as we can specify the units, and don't have to say 1000 to mean
> > 1 second, I agree. I would normally expect this to be set in terms of
> > minutes rather than millisecs.
> 
> 
> OK, so I think we want to see a patch based on v1 (FATAL approach)
> with a change of the name to idle_in_transaction_session_timeout
> and the units changed to milliseconds.

The idea with the GUC name is that if we ever get support for cancelling
transactions we can name that idle_in_transaction_transaction_timeout?
That seems a bit awkward...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: idle_in_transaction_timeout

From

Kevin Grittner

Date:

22 June 2014, 16:27:31

Andres Freund <andres@2ndquadrant.com> wrote:

> The idea with the GUC name is that if we ever get support for
> cancelling transactions we can name that
> idle_in_transaction_transaction_timeout?
> That seems a bit awkward...

No, the argument was that for all the other *_timeout settings what
came before _timeout was the thing that was being terminated.  I
think there were some votes in favor of the name on that basis, and
none against.  Feel free to give your reasons for supporting some
other name.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: idle_in_transaction_timeout

From

Andres Freund

Date:

22 June 2014, 17:47:59

On 2014-06-22 09:27:24 -0700, Kevin Grittner wrote:
> Andres Freund <andres@2ndquadrant.com> wrote:
> 
> > The idea with the GUC name is that if we ever get support for
> > cancelling transactions we can name that
> > idle_in_transaction_transaction_timeout?
> > That seems a bit awkward...
> 
> No, the argument was that for all the other *_timeout settings what
> came before _timeout was the thing that was being terminated.  I
> think there were some votes in favor of the name on that basis, and
> none against.  Feel free to give your reasons for supporting some
> other name.

My reasons for not liking the current GUC name are hinted at above. I think
we'll want a version of this that just fails the transaction once we
have the infrastructure. So we should choose a name that allows for
a complimentary GUC.
CAKFQuwZCg2uur=tUdz_C2aUwBo87ofFGhn9Mx_HZ4QD-b8fe2Q@mail.gmail.com
suggested
On 2014-06-19 10:39:48 -0700, David G Johnston wrote:
> "idle_in_transaction_timeout=10s"
> "idle_in_transaction_target=session|transaction"

but I don't like that much. Not sure what'd be good, the best I
currently can come up with is:
idle_in_transaction_termination_timeout =
idle_in_transaction_cancellation_timeout =

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: idle_in_transaction_timeout

From

Pavel Stehule

Date:

22 June 2014, 18:22:12

2014-06-22 19:47 GMT+02:00 Andres Freund <andres@2ndquadrant.com>:

On 2014-06-22 09:27:24 -0700, Kevin Grittner wrote:
> Andres Freund <andres@2ndquadrant.com> wrote:
>
> > The idea with the GUC name is that if we ever get support for
> > cancelling transactions we can name that
> > idle_in_transaction_transaction_timeout?
> > That seems a bit awkward...
>
> No, the argument was that for all the other *_timeout settings what
> came before _timeout was the thing that was being terminated. I
> think there were some votes in favor of the name on that basis, and
> none against. Feel free to give your reasons for supporting some
> other name.

My reasons for not liking the current GUC name are hinted at above. I think
we'll want a version of this that just fails the transaction once we
have the infrastructure. So we should choose a name that allows for
a complimentary GUC.
CAKFQuwZCg2uur=tUdz_C2aUwBo87ofFGhn9Mx_HZ4QD-b8fe2Q@mail.gmail.com
suggested
On 2014-06-19 10:39:48 -0700, David G Johnston wrote:
> "idle_in_transaction_timeout=10s"
> "idle_in_transaction_target=session|transaction"

but I don't like that much. Not sure what'd be good, the best I
currently can come up with is:
idle_in_transaction_termination_timeout =
idle_in_transaction_cancellation_timeout =

Pavel

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: idle_in_transaction_timeout

From

Kevin Grittner

Date:

23 June 2014, 02:03:47

Andres Freund <andres@2ndquadrant.com> wrote:

> I think we'll want a version of this that just fails the
> transaction once we have the infrastructure. So we should choose
> a name that allows for a complimentary GUC.

If we stick with the rule that what is to the left of _timeout is
what is being cancelled, the a GUC to cancel a transaction which
remains idle for too long could be called idle_transaction_timeout.

Do you disagree with the general idea of following that pattern?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: idle_in_transaction_timeout

From

David G Johnston

Date:

23 June 2014, 02:45:25

On Sunday, June 22, 2014, Kevin Grittner-5 [via PostgreSQL] <[hidden email]> wrote:

Andres Freund <[hidden email]> wrote:

> I think we'll want a version of this that just fails the
> transaction once we have the infrastructure. So we should choose
> a name that allows for a complimentary GUC.

If we stick with the rule that what is to the left of _timeout is
what is being cancelled, the a GUC to cancel a transaction which
remains idle for too long could be called idle_transaction_timeout.

Do you disagree with the general idea of following that pattern?

If we ever do give the user an option the non-specific name with separate type GUC could be used and this session specific variable deprecated. And disallow both to be active at the same time. Or something else. I agree that idle_in_transaction_transaction would be proper but troublesome for the alternative but crossing that bridge if we ever get there seems reasonable in light of picking the best single name for this specific feature.

Idle_transaction_timeout has already been discarded since truly idle transactions are not being affected, only those that are in transaction. The first quote above is limited to that subset as well.

David J.

View this message in context: Re: idle_in_transaction_timeout
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: idle_in_transaction_timeout

From

Abhijit Menon-Sen

Date:

23 June 2014, 04:02:29

At 2014-06-22 19:45:08 -0700, david.g.johnston@gmail.com wrote:
>
> On Sunday, June 22, 2014, Kevin Grittner-5 [via PostgreSQL] <
> ml-node+s1045698n5808309h3@n5.nabble.com> wrote:
> 
> > If we stick with the rule that what is to the left of _timeout is
> > what is being cancelled, the a GUC to cancel a transaction which
> > remains idle for too long could be called idle_transaction_timeout.

I (somewhat reluctantly) agree with Kevin that
"idle_in_transaction_session_timeout" (for FATAL) and
"idle_transaction_timeout" (for ERROR) would work.

The only other alternative I see is to use "idle_transaction_timeout"
now (even when we're killing the session) and later introduce another
setting named "idle_transaction_timeout_keep_session" (default false)
or something like that. (I'd prefer an extra boolean to something set
to 'session' or 'transaction'.)

> Idle_transaction_timeout has already been discarded since truly idle
> transactions are not being affected, only those that are in
> transaction.

I have no idea what this means.

-- Abhijit

Re: idle_in_transaction_timeout

From

Andres Freund

Date:

23 June 2014, 10:48:19

On 2014-06-22 19:03:32 -0700, Kevin Grittner wrote:
> Andres Freund <andres@2ndquadrant.com> wrote:
> 
> > I think we'll want a version of this that just fails the
> > transaction once we have the infrastructure. So we should choose
> > a name that allows for a complimentary GUC.
> 
> If we stick with the rule that what is to the left of _timeout is
> what is being cancelled, the a GUC to cancel a transaction which
> remains idle for too long could be called idle_transaction_timeout.
> 
> Do you disagree with the general idea of following that pattern?

I think that'd be rather confusing. For one it'd need to be
idle_in_transaction_timeout which already seems less clear (because the
transaction belongs to idle) and for another that distinction seems to
be to subtle for users.

The reason I suggested
idle_in_transaction_termination/cancellation_timeout is that that maps
nicely to pg_terminate/cancel_backend() and is rather descriptive.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: idle_in_transaction_timeout

From

Fujii Masao

Date:

23 June 2014, 11:29:27

On Mon, Jun 23, 2014 at 7:48 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-06-22 19:03:32 -0700, Kevin Grittner wrote:
>> Andres Freund <andres@2ndquadrant.com> wrote:
>>
>> > I think we'll want a version of this that just fails the
>> > transaction once we have the infrastructure. So we should choose
>> > a name that allows for a complimentary GUC.
>>
>> If we stick with the rule that what is to the left of _timeout is
>> what is being cancelled, the a GUC to cancel a transaction which
>> remains idle for too long could be called idle_transaction_timeout.
>>
>> Do you disagree with the general idea of following that pattern?
>
> I think that'd be rather confusing. For one it'd need to be
> idle_in_transaction_timeout which already seems less clear (because the
> transaction belongs to idle) and for another that distinction seems to
> be to subtle for users.
>
> The reason I suggested
> idle_in_transaction_termination/cancellation_timeout is that that maps
> nicely to pg_terminate/cancel_backend() and is rather descriptive.

Maybe we can remove IIT_termination_timeout when we've implemented
IIT_cancellation_timeout. Right? I'm not sure if IIT_termination_timeout is
still useful even at that case. *If* it's not useful, I think we don't need to
have those two parameters and can just define one parameter IIT_timeout.
That's quite simple and it's similar to the current style of statement_timeout
and lock_timeout (IOW, we don't have something like
statement_termination_timeout and lock_termination_timeout).

Regards,

-- 
Fujii Masao

Re: idle_in_transaction_timeout

From

Andres Freund

Date:

23 June 2014, 11:34:44

On 2014-06-23 20:29:17 +0900, Fujii Masao wrote:
> On Mon, Jun 23, 2014 at 7:48 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2014-06-22 19:03:32 -0700, Kevin Grittner wrote:
> >> Andres Freund <andres@2ndquadrant.com> wrote:
> >>
> >> > I think we'll want a version of this that just fails the
> >> > transaction once we have the infrastructure. So we should choose
> >> > a name that allows for a complimentary GUC.
> >>
> >> If we stick with the rule that what is to the left of _timeout is
> >> what is being cancelled, the a GUC to cancel a transaction which
> >> remains idle for too long could be called idle_transaction_timeout.
> >>
> >> Do you disagree with the general idea of following that pattern?
> >
> > I think that'd be rather confusing. For one it'd need to be
> > idle_in_transaction_timeout which already seems less clear (because the
> > transaction belongs to idle) and for another that distinction seems to
> > be to subtle for users.
> >
> > The reason I suggested
> > idle_in_transaction_termination/cancellation_timeout is that that maps
> > nicely to pg_terminate/cancel_backend() and is rather descriptive.
> 
> Maybe we can remove IIT_termination_timeout when we've implemented
> IIT_cancellation_timeout. Right? I'm not sure if IIT_termination_timeout is
> still useful even at that case.

I think both can actually be sensible depending on the use case. It's
also not nice to remove a feature without need when people started to
rely on it.
For a web app termination is probably more sensible. For interactive
clients cancellation.

> *If* it's not useful, I think we don't need to
> have those two parameters and can just define one parameter IIT_timeout.
> That's quite simple and it's similar to the current style of statement_timeout
> and lock_timeout (IOW, we don't have something like
> statement_termination_timeout and lock_termination_timeout).

I don't think those really are comparable. A long idle in transaction
state pretty much always indicates a problematic interaction with
postgres.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: idle_in_transaction_timeout

From

Vik Fearing

Date:

23 June 2014, 11:35:02

On 06/22/2014 07:47 PM, Andres Freund wrote:
> On 2014-06-22 09:27:24 -0700, Kevin Grittner wrote:
>> Andres Freund <andres@2ndquadrant.com> wrote:
>>
>>> The idea with the GUC name is that if we ever get support for
>>> cancelling transactions we can name that
>>> idle_in_transaction_transaction_timeout?
>>> That seems a bit awkward...
>>
>> No, the argument was that for all the other *_timeout settings what
>> came before _timeout was the thing that was being terminated.  I
>> think there were some votes in favor of the name on that basis, and
>> none against.  Feel free to give your reasons for supporting some
>> other name.
> 
> My reasons for not liking the current GUC name are hinted at above. I think
> we'll want a version of this that just fails the transaction once we
> have the infrastructure. So we should choose a name that allows for
> a complimentary GUC.
> CAKFQuwZCg2uur=tUdz_C2aUwBo87ofFGhn9Mx_HZ4QD-b8fe2Q@mail.gmail.com
> suggested
> On 2014-06-19 10:39:48 -0700, David G Johnston wrote:
>> "idle_in_transaction_timeout=10s"
>> "idle_in_transaction_target=session|transaction"
> 
> but I don't like that much. Not sure what'd be good, the best I
> currently can come up with is:
> idle_in_transaction_termination_timeout =
> idle_in_transaction_cancellation_timeout =

Except the transaction wouldn't be cancelled, it would be aborted.

idle_in_transaction_abortion_timeout seems a little... weird.
-- 
Vik

Re: idle_in_transaction_timeout

From

Andres Freund

Date:

23 June 2014, 11:35:34

On 2014-06-23 13:33:46 +0200, Vik Fearing wrote:
> On 06/22/2014 07:47 PM, Andres Freund wrote:
> > On 2014-06-22 09:27:24 -0700, Kevin Grittner wrote:
> >> Andres Freund <andres@2ndquadrant.com> wrote:
> > but I don't like that much. Not sure what'd be good, the best I
> > currently can come up with is:
> > idle_in_transaction_termination_timeout =
> > idle_in_transaction_cancellation_timeout =
> 
> Except the transaction wouldn't be cancelled, it would be aborted.

That ship has sailed with pg_cancel_backend(), no?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: idle_in_transaction_timeout

From

Josh Berkus

Date:

23 June 2014, 17:57:21

On 06/22/2014 09:02 PM, Abhijit Menon-Sen wrote:
> I (somewhat reluctantly) agree with Kevin that
> "idle_in_transaction_session_timeout" (for FATAL) and
> "idle_transaction_timeout" (for ERROR) would work.

Given that an IIT timeout has been a TODO for at least 6 years before
being addressed, I'm not convinced that we need to worry about what an
eventual error vs. fatal timeout should be named or how it should be
scoped.  Let's attack that when someone actually shows an inclination to
work on it.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: idle_in_transaction_timeout

From

Kevin Grittner

Date:

23 June 2014, 20:19:52

Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-06-23 20:29:17 +0900, Fujii Masao wrote:
>> On Mon, Jun 23, 2014 at 7:48 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>>> On 2014-06-22 19:03:32 -0700, Kevin Grittner wrote:
>>>> Andres Freund <andres@2ndquadrant.com> wrote:
>>>>
>>>>> I think we'll want a version of this that just fails the
>>>>> transaction once we have the infrastructure. So we should choose
>>>>> a name that allows for a complimentary GUC.

How about choosing a name for that, if we ever get there, where
what precedes _timeout describes what is being terminated?  Like
the idle transaction itself, rather than the session which shows as
being in "idle in transaction" status because of the idle
transaction which is using the session?

>>>> If we stick with the rule that what is to the left of _timeout is
>>>> what is being cancelled, the a GUC to cancel a transaction which
>>>> remains idle for too long could be called idle_transaction_timeout.
>>>>
>>>> Do you disagree with the general idea of following that pattern?
>>>
>>> I think that'd be rather confusing. For one it'd need to be
>>> idle_in_transaction_timeout

Why?  We're cancelling an idle transaction, not an "idle in
transaction", whatever that is.

>>> which already seems less clear (because the transaction belongs
>>> to idle)

I have no idea what that means.

>>> and for another that distinction seems to be to subtle for users.

The difference between an "idle in transaction session" and an
"idle transaction" is too subtle for someone preparing to terminate
one of those?

>>> The reason I suggested
>>> idle_in_transaction_termination/cancellation_timeout is that that maps
>>> nicely to pg_terminate/cancel_backend() and is rather descriptive.

I've always hated that naming because it is so confusing and
doesn't describe what is being terminated.  What makes it obvious
that "terminate" is something you do to a session rather than a
transaction?  What makes it obvious that "cancel" is something you
do to a transaction but not to a session?  Now if those were named
something like pg_rollback_backend_transaction() and
pg_close_backend_session() (or anything else where the names
actually made clear what was happening) then borrowing terminology
would be a stronger argument.  It still seems to me to be a weaker
argument than continuing the pattern we have for *_timeout GUCs.
though.

>> Maybe we can remove IIT_termination_timeout when we've implemented
>> IIT_cancellation_timeout. Right? I'm not sure if IIT_termination_timeout is
>> still useful even at that case.

In my experience, for the same reasons Robert gave much earlier in
the thread, if both were available the one which would be more
appropriate for cases I've seen would be the one that closed the
session with a FATAL message in the log.  If we were only going to
have one or the other, the one that terminated the session is the
one that *I* would prefer to see.

> A long idle in transaction state pretty much always indicates a
> problematic interaction with postgres.

True.  Which makes me wonder whether we shouldn't default this to
something non-zero -- even if it is 5 or 10 days.  It also gets me
thinking about whether a good follow-on patch would be a timeout
for prepared transactions.  I would have trouble counting how many
nasty production problems I've seen which would have been prevented
by having both time out after a few days.

BTW, since nobody has commented on the issue of the postgres_fdw
automatically exempting itself from the timeout, I will plan on
removing that when the naming argument reaches something resembling
a conclusion and I go to push this ... unless someone objects
before that.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: idle_in_transaction_timeout

From

David G Johnston

Date:

23 June 2014, 20:34:13

>>>
>>> I think that'd be rather confusing. For one it'd need to be
>>> idle_in_transaction_timeout

Why? We're cancelling an idle transaction, not an "idle in
transaction", whatever that is.

The confusion derives from the fact we are affecting a session whose state is "idle in transaction", not one that is idle. We are then, for this discussion, choosing to either kill the entire session or just the currently active transaction. After "idle_in_transaction" there is an unstated "session" being mentally placed by myself and probably others. Following that is then either "session" or "transaction" to denote what is being affected should the timeout interval come to pass.

Discarding that, probably flawed, mental model makes "idle_transaction_timeout" seem fine. "idle_in_transaction_session_timeout" would indeed be a natural complement to this.

I do not expect this concept, should it come to pass, to be that difficult to document or for someone to learn.

Along with other I still see no reason to avoid "IIT_session_timeout" at this point.

David J.

View this message in context: Re: idle_in_transaction_timeout
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: idle_in_transaction_timeout

From

David G Johnston

Date:

23 June 2014, 20:40:57

> A long idle in transaction state pretty much always indicates a
> problematic interaction with postgres.

True. Which makes me wonder whether we shouldn't default this to
something non-zero -- even if it is 5 or 10 days.

I guess it depends on how parental we want to be. But if we go that route wouldn't a more harsh/in-your-face default make more sense? Something in Minutes, not Days...say '5min'...

David J.

View this message in context: Re: idle_in_transaction_timeout
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: idle_in_transaction_timeout

From

Andres Freund

Date:

23 June 2014, 22:52:17

On 2014-06-23 13:19:47 -0700, Kevin Grittner wrote:
> >>> which already seems less clear (because the transaction belongs
> >>> to idle)
> 
> I have no idea what that means.

It's "idle_in_transaction"_session_timeout. Not
"idle_in"_transaction_session_timeout.

> >>> and for another that distinction seems to be to subtle for users.
> 
> The difference between an "idle in transaction session" and an
> "idle transaction" is too subtle for someone preparing to terminate
> one of those?

Yes. To me that's an academic distinction. As a nonnative speaker it
looks pretty much random that one has an "in" in it and the other
doesn't. Maybe I'm just having a grammar fail, but there doesn't seem to
be much sense in it.

> >>> The reason I suggested
> >>> idle_in_transaction_termination/cancellation_timeout is that that maps
> >>> nicely to pg_terminate/cancel_backend() and is rather descriptive.
> 
> I've always hated that naming because it is so confusing and
> doesn't describe what is being terminated.

Well, it's what's there. And it doesn't seem to cause much confusion. I
don't see how it get's less confusing by introducing different terminology.

> In my experience, for the same reasons Robert gave much earlier in
> the thread, if both were available the one which would be more
> appropriate for cases I've seen would be the one that closed the
> session with a FATAL message in the log.  If we were only going to
> have one or the other, the one that terminated the session is the
> one that *I* would prefer to see.

I think it's just different usecases. For OLTP clients you probably
primarily want to FATAL the session. But if you interactive sessions
that's much less the case. If some DBA and even more so an analytics guy
has a transaction open he'll have to live with the transaction being
aborted. But you won't be liked for needlessly removing the 10 multi GB
temporary tables with intermediate results. And don't you say protecting
against that isn't necessary - I've seen databases slow to a crawl
because some data analyst  went home over the weekend with a open
transaction after a meeting went on longer than planned.

> > A long idle in transaction state pretty much always indicates a
> > problematic interaction with postgres.
> 
> True.  Which makes me wonder whether we shouldn't default this to
> something non-zero -- even if it is 5 or 10 days.

-1. Can't really say why though. Just seems a bit odd.

> It also gets me
> thinking about whether a good follow-on patch would be a timeout
> for prepared transactions.

I think that might be useful although I think it'd be relatively hard to
implement. I guess that for now monitoring pg_prepared_xacts will have
to suffice.

> BTW, since nobody has commented on the issue of the postgres_fdw
> automatically exempting itself from the timeout, I will plan on
> removing that when the naming argument reaches something resembling
> a conclusion and I go to push this ... unless someone objects
> before that.

Sounds good. I haven't yet seen a justification for it so far...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: idle_in_transaction_timeout

From

Vik Fearing

Date:

24 June 2014, 13:19:52

On 06/22/2014 05:11 PM, Kevin Grittner wrote:
> I found one substantive issue that had been missed in discussion,
> though.  The patch modifies the postgres_fdw extension to make it
> automatically exempt from an attempt to set a limit like this on
> the server to which it connects.  I'm not sure that's a good idea. 
> Why should this type of connection be allowed to sit indefinitely
> with an idle open transaction?  I'm inclined to omit this part of
> the patch

My reasoning for doing it the way I did is that if a transaction touches
a foreign table and then goes bumbling along with other things the
transaction is active but the connection to the remote server remains
idle in transaction.  If it hits the timeout, when the local transaction
goes to commit it errors out and you lose all your work.

If the local transaction is actually idle in transaction and the local
server doesn't have a timeout, we're no worse off than before this patch.
-- 
Vik

Re: idle_in_transaction_timeout

From

David G Johnston

Date:

24 June 2014, 13:29:28

On Tue, Jun 24, 2014 at 9:20 AM, Vik Fearing [via PostgreSQL] <[hidden email]> wrote:

On 06/22/2014 05:11 PM, Kevin Grittner wrote:
> I found one substantive issue that had been missed in discussion,
> though. The patch modifies the postgres_fdw extension to make it
> automatically exempt from an attempt to set a limit like this on
> the server to which it connects. I'm not sure that's a good idea.
> Why should this type of connection be allowed to sit indefinitely
> with an idle open transaction? I'm inclined to omit this part of
> the patch

My reasoning for doing it the way I did is that if a transaction touches
a foreign table and then goes bumbling along with other things the
transaction is active but the connection to the remote server remains
idle in transaction. If it hits the timeout, when the local transaction
goes to commit it errors out and you lose all your work.

If the local transaction is actually idle in transaction and the local
server doesn't have a timeout, we're no worse off than before this patch.

Going off of this reading alone wouldn't we have to allow the client to set the timeout on the fdw_server - to zero - to ensure reasonable operation? If the client has a process that requires 10 minutes to complete, and the foreign server has a default 5 minute timeout, if the client does not disable the timeout on the server wouldn't the foreign server always cause the process to abort?

David J.

View this message in context: Re: idle_in_transaction_timeout
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: idle_in_transaction_timeout

From

Kevin Grittner

Date:

24 June 2014, 13:36:42

David G Johnston <david.g.johnston@gmail.com> wrote:
> Vik Fearing [via PostgreSQL] <[hidden email]>wrote:
>> On 06/22/2014 05:11 PM, Kevin Grittner wrote:
>>> I found one substantive issue that had been missed in discussion,
>>> though.  The patch modifies the postgres_fdw extension to make it
>>> automatically exempt from an attempt to set a limit like this on
>>> the server to which it connects.  I'm not sure that's a good idea.
>>> Why should this type of connection be allowed to sit indefinitely
>>> with an idle open transaction?  I'm inclined to omit this part of
>>> the patch
>>
>> My reasoning for doing it the way I did is that if a transaction touches
>> a foreign table and then goes bumbling along with other things the
>> transaction is active but the connection to the remote server remains
>> idle in transaction.  If it hits the timeout, when the local transaction
>> goes to commit it errors out and you lose all your work.
>>
>> If the local transaction is actually idle in transaction and the local
>> server doesn't have a timeout, we're no worse off than before this patch.
>>
>
>
> Going off of this reading alone wouldn't we have to allow the
> client to set the timeout on the fdw_server - to zero - to ensure
> reasonable operation?  If the client has a process that requires
> 10 minutes to complete, and the foreign server has a default 5
> minute timeout, if the client does not disable the timeout on the
> server wouldn't the foreign server always cause the process to
> abort?

That's what Vik did in his patch, and what I was questioning.  I
think he might be right, but I want to think about it.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: idle_in_transaction_timeout

From

Vik Fearing

Date:

24 June 2014, 13:37:34

On 06/24/2014 03:29 PM, David G Johnston wrote:
> On Tue, Jun 24, 2014 at 9:20 AM, Vik Fearing [via PostgreSQL] <[hidden
> email] </user/SendEmail.jtp?type=node&node=5808883&i=0>>wrote:
> 
>     On 06/22/2014 05:11 PM, Kevin Grittner wrote:
>     > I found one substantive issue that had been missed in discussion,
>     > though.  The patch modifies the postgres_fdw extension to make it
>     > automatically exempt from an attempt to set a limit like this on
>     > the server to which it connects.  I'm not sure that's a good idea.
>     > Why should this type of connection be allowed to sit indefinitely
>     > with an idle open transaction?  I'm inclined to omit this part of
>     > the patch
> 
>     My reasoning for doing it the way I did is that if a transaction
>     touches
>     a foreign table and then goes bumbling along with other things the
>     transaction is active but the connection to the remote server remains
>     idle in transaction.  If it hits the timeout, when the local
>     transaction
>     goes to commit it errors out and you lose all your work.
> 
>     If the local transaction is actually idle in transaction and the local
>     server doesn't have a timeout, we're no worse off than before this
>     patch. 
> 
> 
> Going off of this reading alone wouldn't we have to allow the client to
> set the timeout on the fdw_server - to zero - to ensure reasonable
> operation?

That's what the patch currently does.

> If the client has a process that requires 10 minutes to
> complete, and the foreign server has a default 5 minute timeout, if the
> client does not disable the timeout on the server wouldn't the foreign
> server always cause the process to abort?

Yes.
-- 
Vik

Re: idle_in_transaction_timeout

From

Robert Haas

Date:

24 June 2014, 14:04:27

On Tue, Jun 24, 2014 at 9:18 AM, Vik Fearing <vik.fearing@dalibo.com> wrote:
> On 06/22/2014 05:11 PM, Kevin Grittner wrote:
>> I found one substantive issue that had been missed in discussion,
>> though.  The patch modifies the postgres_fdw extension to make it
>> automatically exempt from an attempt to set a limit like this on
>> the server to which it connects.  I'm not sure that's a good idea.
>> Why should this type of connection be allowed to sit indefinitely
>> with an idle open transaction?  I'm inclined to omit this part of
>> the patch
>
> My reasoning for doing it the way I did is that if a transaction touches
> a foreign table and then goes bumbling along with other things the
> transaction is active but the connection to the remote server remains
> idle in transaction.  If it hits the timeout, when the local transaction
> goes to commit it errors out and you lose all your work.
>
> If the local transaction is actually idle in transaction and the local
> server doesn't have a timeout, we're no worse off than before this patch.

I think we are.  First, the correct timeout is a matter of
remote-server-policy, not local-server-policy.  If the remote server
wants to boot people with long-running idle transactions, it's
entitled to do that, and postgres_fdw shouldn't assume that it's
"special".  The local server policy may be different, and may not even
have been configured by the same person.  Second, setting another GUC
at every session start adds overhead for all users of postgres_fdw.

Now, it might be that postgres_fdw should have a facility to allow
arbitrary options to be set on the foreign side at each connection
startup.  Then that could be used here if someone wants this behavior.
But I don't think we should hard-code it, because it could also be NOT
what someone wants.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: idle_in_transaction_timeout

From

Andres Freund

Date:

24 June 2014, 14:11:07

On 2014-06-24 10:04:03 -0400, Robert Haas wrote:
> On Tue, Jun 24, 2014 at 9:18 AM, Vik Fearing <vik.fearing@dalibo.com> wrote:
> > My reasoning for doing it the way I did is that if a transaction touches
> > a foreign table and then goes bumbling along with other things the
> > transaction is active but the connection to the remote server remains
> > idle in transaction.  If it hits the timeout, when the local transaction
> > goes to commit it errors out and you lose all your work.
> >
> > If the local transaction is actually idle in transaction and the local
> > server doesn't have a timeout, we're no worse off than before this patch.
> 
> I think we are.  First, the correct timeout is a matter of
> remote-server-policy, not local-server-policy.  If the remote server
> wants to boot people with long-running idle transactions, it's
> entitled to do that, and postgres_fdw shouldn't assume that it's
> "special".  The local server policy may be different, and may not even
> have been configured by the same person.  Second, setting another GUC
> at every session start adds overhead for all users of postgres_fdw.

+1

> Now, it might be that postgres_fdw should have a facility to allow
> arbitrary options to be set on the foreign side at each connection
> startup.  Then that could be used here if someone wants this behavior.
> But I don't think we should hard-code it, because it could also be NOT
> what someone wants.

I think options=-c idle_in_transaction_timeout=0 in the server config
should already do the trick.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: idle_in_transaction_timeout

From

David G Johnston

Date:

24 June 2014, 14:31:31

On Tue, Jun 24, 2014 at 10:05 AM, Robert Haas [via PostgreSQL] <[hidden email]> wrote:

On Tue, Jun 24, 2014 at 9:18 AM, Vik Fearing <[hidden email]> wrote:

> On 06/22/2014 05:11 PM, Kevin Grittner wrote:
>> I found one substantive issue that had been missed in discussion,
>> though. The patch modifies the postgres_fdw extension to make it
>> automatically exempt from an attempt to set a limit like this on
>> the server to which it connects. I'm not sure that's a good idea.
>> Why should this type of connection be allowed to sit indefinitely
>> with an idle open transaction? I'm inclined to omit this part of
>> the patch
>
> My reasoning for doing it the way I did is that if a transaction touches
> a foreign table and then goes bumbling along with other things the
> transaction is active but the connection to the remote server remains
> idle in transaction. If it hits the timeout, when the local transaction
> goes to commit it errors out and you lose all your work.
>
> If the local transaction is actually idle in transaction and the local
> server doesn't have a timeout, we're no worse off than before this patch.
I think we are. First, the correct timeout is a matter of
remote-server-policy, not local-server-policy. If the remote server
wants to boot people with long-running idle transactions, it's
entitled to do that, and postgres_fdw shouldn't assume that it's
"special". The local server policy may be different, and may not even
have been configured by the same person. Second, setting another GUC
at every session start adds overhead for all users of postgres_fdw.

Now, it might be that postgres_fdw should have a facility to allow
arbitrary options to be set on the foreign side at each connection
startup. Then that could be used here if someone wants this behavior.
But I don't think we should hard-code it, because it could also be NOT
what someone wants.

The missing ability is that while the user only cares about the one logical session we are dealing with two physical sessions in a parent-child relationship where the child session state does not match that of its parent. For me, this whole line of thought is based upon the logical "idle_in_transaction" - did the application really mean to leave this hanging?

Say that 90% of the time disabling the timeout will be the correct course of action; making the user do this explicitly does not seem reasonable. And if "doesn't matter" is the current state when the foreign server is configured no setting will be passed. Then if the remote server does institute a timeout all the relevant configurations will need to be changed.

ISTM that the additional overhead in this case would be very small in percentage terms; at least enough so that usability would be my default choice.

I have no problem allowing for user-specified behavior but the default of disabling the timeout seems reasonable. I am doubting that actually synchronizing the parent and child sessions, so that the child reports the same status as the parent, is a valid option - though it would be the "best" solution since the child would only report IIT if the parent was IIT.

For me, a meaningful default and usability are trumping the unknown performance degradation. I can go either way on allowing the local definition to specify its own non-zero timeout but it probably isn't worth the effort. The foreign server administrator ultimately will have to be aware of which users are connecting via FDW and address his "long-running transaction" concerns in a more nuanced way than this parameter allows. In effect this becomes an 80% solution because it is not (all that) useful on the remote end of a FDW connection; though at least the local end can make proper use of it to protect both servers.

David J.

View this message in context: Re: idle_in_transaction_timeout
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: idle_in_transaction_timeout

From

Vik Fearing

Date:

24 June 2014, 14:50:15

On 06/24/2014 04:04 PM, Robert Haas wrote:
>> If the local transaction is actually idle in transaction and the local
>> > server doesn't have a timeout, we're no worse off than before this patch.
>
> I think we are.  First, the correct timeout is a matter of
> remote-server-policy, not local-server-policy.  If the remote server
> wants to boot people with long-running idle transactions, it's
> entitled to do that, and postgres_fdw shouldn't assume that it's
> "special".

So how would the local transaction ever get its work done?  What option
does it have to tell the remote server that it isn't actually idling, it
just doesn't need to use the remote connection for a while?

Once the remote times out, the local transaction is doomed (and won't
even know it until it tries to commit).  If we don't allow the fdw to be
special, then the local transaction can't run at all.  Ever.

The point of the patch is to allow the DBA to knock off broken clients,
but this isn't a broken client, it just looks like one.
-- 
Vik

Re: idle_in_transaction_timeout

From

Robert Haas

Date:

24 June 2014, 15:09:07

On Tue, Jun 24, 2014 at 10:50 AM, Vik Fearing <vik.fearing@dalibo.com> wrote:
> On 06/24/2014 04:04 PM, Robert Haas wrote:
>>> If the local transaction is actually idle in transaction and the local
>>> > server doesn't have a timeout, we're no worse off than before this patch.
>>
>> I think we are.  First, the correct timeout is a matter of
>> remote-server-policy, not local-server-policy.  If the remote server
>> wants to boot people with long-running idle transactions, it's
>> entitled to do that, and postgres_fdw shouldn't assume that it's
>> "special".
>
> So how would the local transaction ever get its work done?  What option
> does it have to tell the remote server that it isn't actually idling, it
> just doesn't need to use the remote connection for a while?

It *is* idling.  You're going to get bloat, and lock contention, and
so on, just as you would for any other idle session.

I mean, you could make this assumption about any session: I'm not done
with the transaction yet, e.g. I'm waiting for user input before
deciding what to do next.  That doesn't mean that the DBA doesn't want
to kill it.

> The point of the patch is to allow the DBA to knock off broken clients,
> but this isn't a broken client, it just looks like one.

If it walks like a duck, and quacks like a duck, it's a duck.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: idle_in_transaction_timeout

From

David G Johnston

Date:

24 June 2014, 15:39:58

On Tue, Jun 24, 2014 at 11:11 AM, Robert Haas [via PostgreSQL] <[hidden email]> wrote:

On Tue, Jun 24, 2014 at 10:50 AM, Vik Fearing <[hidden email]> wrote:

> On 06/24/2014 04:04 PM, Robert Haas wrote:
>>> If the local transaction is actually idle in transaction and the local
>>> > server doesn't have a timeout, we're no worse off than before this patch.
>>
>> I think we are. First, the correct timeout is a matter of
>> remote-server-policy, not local-server-policy. If the remote server
>> wants to boot people with long-running idle transactions, it's
>> entitled to do that, and postgres_fdw shouldn't assume that it's
>> "special".
>
> So how would the local transaction ever get its work done? What option
> does it have to tell the remote server that it isn't actually idling, it
> just doesn't need to use the remote connection for a while?
It *is* idling. You're going to get bloat, and lock contention, and
so on, just as you would for any other idle session.

If an application is making use of the foreign server directly then there is the option to commit after using the foreign server, while saving the relevant data for the main transaction. But if you make use of API functions there can only be a single transaction encompassing both the local and foreign servers. But even then, if the user needs a logical super-transaction across both servers - even though the bulk of the work occurs locally - that option to commit is then removed regardless of client usage.

IMO this tool is too blunt to properly allow servers to self-manage fdw-initiated transactions/sessions; and allowing it to be used is asking for end-user confusion and frustration.

OTOH, requiring the administrator of the foreign server to issue an ALTER ROLE fdw_user SET idle_in_transaction_session_timeout = 0; would be fairly easy to justify. Allowing them to distinguish between known long-running and problematic transactions and those that are expected to execute quickly has value as well.

Ultimately you give the users power and then just need to make sure we provide sufficient documentation suggestions on how best to configure the two servers in various typical usage scenarios.

David J.

View this message in context: Re: idle_in_transaction_timeout
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: idle_in_transaction_timeout

From

Josh Berkus

Date:

24 June 2014, 16:43:16

On 06/23/2014 03:52 PM, Andres Freund wrote:
> On 2014-06-23 13:19:47 -0700, Kevin Grittner wrote:
>>>>> which already seems less clear (because the transaction belongs
>>>>> to idle)
>>
>> I have no idea what that means.
> 
> It's "idle_in_transaction"_session_timeout. Not
> "idle_in"_transaction_session_timeout.
> 
>>>>> and for another that distinction seems to be to subtle for users.
>>
>> The difference between an "idle in transaction session" and an
>> "idle transaction" is too subtle for someone preparing to terminate
>> one of those?
> 
> Yes. To me that's an academic distinction. As a nonnative speaker it
> looks pretty much random that one has an "in" in it and the other
> doesn't. Maybe I'm just having a grammar fail, but there doesn't seem to
> be much sense in it.

As a native speaker, I find the distinction elusive as well.  If someone
was actually planning to commit transaction cancel, I'd object to it.

And frankly, it doesn't make any sense to have two independent timeouts
anyway.  Only one of them would ever be invoked, whichever one came
first.  If you really want to plan for a feature I doubt anyone is going
to write, the appropriate two GUCs are:

idle_transaction_timeout: ## ms
idle_transaction_timeout_action: cancel | terminate

However, since I'm not convinced that anyone is ever going to write the
"cancel" version, can we please just leave the 2nd GUC out for now?

>>> A long idle in transaction state pretty much always indicates a
>>> problematic interaction with postgres.
>>
>> True.  Which makes me wonder whether we shouldn't default this to
>> something non-zero -- even if it is 5 or 10 days.

I'd go for even shorter: 48 hours.  I'd suggest 24 hours, but that would
trip up some users who just need really long pg_dumps.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: idle_in_transaction_timeout

From

Vik Fearing

Date:

24 June 2014, 16:52:43

On 06/24/2014 06:43 PM, Josh Berkus wrote:
>>>> A long idle in transaction state pretty much always indicates a
>>>> >>> problematic interaction with postgres.
>>> >>
>>> >> True.  Which makes me wonder whether we shouldn't default this to
>>> >> something non-zero -- even if it is 5 or 10 days.
>
> I'd go for even shorter: 48 hours.  I'd suggest 24 hours, but that would
> trip up some users who just need really long pg_dumps.

Why would pg_dump be idle for 24 hours?
-- 
Vik

Re: idle_in_transaction_timeout

From

Pavel Stehule

Date:

24 June 2014, 16:56:20

2014-06-24 18:43 GMT+02:00 Josh Berkus <josh@agliodbs.com>:

On 06/23/2014 03:52 PM, Andres Freund wrote:
> On 2014-06-23 13:19:47 -0700, Kevin Grittner wrote:
>>>>> which already seems less clear (because the transaction belongs
>>>>> to idle)
>>
>> I have no idea what that means.
>
> It's "idle_in_transaction"_session_timeout. Not
> "idle_in"_transaction_session_timeout.
>
>>>>> and for another that distinction seems to be to subtle for users.
>>
>> The difference between an "idle in transaction session" and an
>> "idle transaction" is too subtle for someone preparing to terminate
>> one of those?
>
> Yes. To me that's an academic distinction. As a nonnative speaker it
> looks pretty much random that one has an "in" in it and the other
> doesn't. Maybe I'm just having a grammar fail, but there doesn't seem to
> be much sense in it.

As a native speaker, I find the distinction elusive as well. If someone
was actually planning to commit transaction cancel, I'd object to it.

And frankly, it doesn't make any sense to have two independent timeouts
anyway. Only one of them would ever be invoked, whichever one came
first. If you really want to plan for a feature I doubt anyone is going
to write, the appropriate two GUCs are:

idle_transaction_timeout: ## ms
idle_transaction_timeout_action: cancel | terminate

However, since I'm not convinced that anyone is ever going to write the
"cancel" version, can we please just leave the 2nd GUC out for now?

>>> A long idle in transaction state pretty much always indicates a
>>> problematic interaction with postgres.
>>
>> True. Which makes me wonder whether we shouldn't default this to
>> something non-zero -- even if it is 5 or 10 days.

I'd go for even shorter: 48 hours. I'd suggest 24 hours, but that would
trip up some users who just need really long pg_dumps.

long transactions should not be a problem - this should to break transaction when it does >>nothing<< long time.

Regards

Pavel

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: idle_in_transaction_timeout

From

Josh Berkus

Date:

24 June 2014, 16:58:33

On 06/24/2014 07:50 AM, Vik Fearing wrote:
> On 06/24/2014 04:04 PM, Robert Haas wrote:
>>> If the local transaction is actually idle in transaction and the local
>>>> server doesn't have a timeout, we're no worse off than before this patch.
>>
>> I think we are.  First, the correct timeout is a matter of
>> remote-server-policy, not local-server-policy.  If the remote server
>> wants to boot people with long-running idle transactions, it's
>> entitled to do that, and postgres_fdw shouldn't assume that it's
>> "special".
> 
> So how would the local transaction ever get its work done?  What option
> does it have to tell the remote server that it isn't actually idling, it
> just doesn't need to use the remote connection for a while?
> 
> Once the remote times out, the local transaction is doomed (and won't
> even know it until it tries to commit).  If we don't allow the fdw to be
> special, then the local transaction can't run at all.  Ever.

I'm unclear on how the FDW could be special.  From the point of the
remote server, how does it even know that it's receiving an FDW
connection and not some other kind of connection?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: idle_in_transaction_timeout

From

Tom Lane

Date:

24 June 2014, 17:05:54

Josh Berkus <josh@agliodbs.com> writes:
> On 06/24/2014 07:50 AM, Vik Fearing wrote:
>> Once the remote times out, the local transaction is doomed (and won't
>> even know it until it tries to commit).  If we don't allow the fdw to be
>> special, then the local transaction can't run at all.  Ever.

> I'm unclear on how the FDW could be special.  From the point of the
> remote server, how does it even know that it's receiving an FDW
> connection and not some other kind of connection?

One way you could do it is to use a user id that's only for FDW
connections, and do an ALTER ROLE on that id to set the appropriate
timeout.

Personally I'm violently against having postgres_fdw mess with this
setting; for one thing, the proposed coding would prevent DBAs from
controlling the timeout as they see fit, because it would override
any ALTER ROLE or other remote-side setting.  It doesn't satisfy the
POLA either.  postgres_fdw does not for example override
statement_timeout; why should it override this timeout?
        regards, tom lane

Re: idle_in_transaction_timeout

From

Tom Lane

Date:

24 June 2014, 17:18:27

Josh Berkus <josh@agliodbs.com> writes:
> On 06/23/2014 03:52 PM, Andres Freund wrote:
>> True.  Which makes me wonder whether we shouldn't default this to
>> something non-zero -- even if it is 5 or 10 days.

> I'd go for even shorter: 48 hours.  I'd suggest 24 hours, but that would
> trip up some users who just need really long pg_dumps.

FWIW, I do not think we should have a nonzero default for this.
We could not safely set it to any value that would be small enough
to be really useful in the field.

BTW, has anyone thought about the interaction of this feature with
prepared transactions?  I wonder whether there shouldn't be a similar but
separately-settable maximum time for a transaction to stay in the prepared
state.  If we could set a nonzero default on that, perhaps on the order of
a few minutes, we could solve the ancient bugaboo that "prepared
transactions are too dangerous to enable by default".
        regards, tom lane

Re: idle_in_transaction_timeout

From

Vik Fearing

Date:

24 June 2014, 17:25:47

On 06/24/2014 07:17 PM, Tom Lane wrote:
> BTW, has anyone thought about the interaction of this feature with
> prepared transactions?  I wonder whether there shouldn't be a similar but
> separately-settable maximum time for a transaction to stay in the prepared
> state.  If we could set a nonzero default on that, perhaps on the order of
> a few minutes, we could solve the ancient bugaboo that "prepared
> transactions are too dangerous to enable by default".

I did not think about that, but I could probably cook up a patch for it.I don't believe it belongs in this patch,
though.
-- 
Vik

Re: idle_in_transaction_timeout

From

Andres Freund

Date:

24 June 2014, 17:30:03

On 2014-06-24 10:17:49 -0700, Tom Lane wrote:
> BTW, has anyone thought about the interaction of this feature with
> prepared transactions?  I wonder whether there shouldn't be a similar but
> separately-settable maximum time for a transaction to stay in the prepared
> state.  If we could set a nonzero default on that, perhaps on the order of
> a few minutes, we could solve the ancient bugaboo that "prepared
> transactions are too dangerous to enable by default".

I'd very much like that feature, but I'm not sure how to implement
it. Which process would do that check? We currently only allow rollbacks
from the corresponding database...
The best idea I have is to do it via autovacuum.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: idle_in_transaction_timeout

From

Tom Lane

Date:

24 June 2014, 17:42:58

Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-06-24 10:17:49 -0700, Tom Lane wrote:
>> BTW, has anyone thought about the interaction of this feature with
>> prepared transactions?  I wonder whether there shouldn't be a similar but
>> separately-settable maximum time for a transaction to stay in the prepared
>> state.  If we could set a nonzero default on that, perhaps on the order of
>> a few minutes, we could solve the ancient bugaboo that "prepared
>> transactions are too dangerous to enable by default".

> I'd very much like that feature, but I'm not sure how to implement
> it. Which process would do that check? We currently only allow rollbacks
> from the corresponding database...
> The best idea I have is to do it via autovacuum.

I did not actually have any plan in mind when I wrote that, but your
mention of autovacuum suggests an idea for it: consider the code that
kicks autovacuum off a table when somebody wants exclusive lock.
In the same way, we could teach processes that want a lock that conflicts
with a prepared xact that they can kill the prepared xact if it's more
than X seconds old.

The other way in which old prepared xacts are dangerous is in blocking
cleanup of dead tuples, and I agree with your thought that maybe
autovacuum is the place to deal with that.  I don't know whether we'd
really need both mechanisms, or if just one would be enough.

In either case, this wouldn't directly be a timeout but rather a "license
to kill" once a prepared xact exceeds the threshold and is getting in
somebody's way.
        regards, tom lane

Re: idle_in_transaction_timeout

From

Kevin Grittner

Date:

24 June 2014, 17:43:29

Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Josh Berkus <josh@agliodbs.com> writes:

>>> Which makes me wonder whether we shouldn't default this to
>>> something non-zero -- even if it is 5 or 10 days.
>
>> I'd go for even shorter: 48 hours.  I'd suggest 24 hours, but that
>> would trip up some users who just need really long pg_dumps.
>
> FWIW, I do not think we should have a nonzero default for this.
> We could not safely set it to any value that would be small enough
> to be really useful in the field.

I have seen production environments where users asked for help when
performance had gradually degraded to a fraction of what it was,
due to a connection sitting "idle in transaction" for several
weeks.  Even a timeout of five or ten days would have saved a lot
of pain.  What concerns me on the other side is that I've been
known to start a long-running conversion or data fix on a Friday
and check the results on Monday before committing.  Something like
that might sit for a day or two with little or no concurrent
activity to cause a problem.  It would be a real forehead-slapper
to have forgotten to set a longer timeout before starting the run
on Friday.  A five day timeout seems likely to prevent extreme pain
in the former circumstances while not being likely to mess up ad
hoc bulk activity like the latter.

Of course, if I were managing a cluster and was knowingly and
consciously setting a value, it would probably be more like 5min.
If I have actually set such a policy I am much less likely to
forget it when it needs to be extended or disabled, and far less
likely to be mad at anyone else if it cancels my work.

> BTW, has anyone thought about the interaction of this feature with
> prepared transactions?  I wonder whether there shouldn't be a similar but
> separately-settable maximum time for a transaction to stay in the prepared
> state.  If we could set a nonzero default on that, perhaps on the order of
> a few minutes, we could solve the ancient bugaboo that "prepared
> transactions are too dangerous to enable by default".

I thought about it enough to mention it briefly.  I haven't taken
it further than to note that it would be a great follow-up patch
once this is in.  I'm not sure that a few minutes would be
sufficient, though.  Theoretically, a crash of the transaction
manager, or one of the other data stores managed by it, or even a
WAN connection to one of the servers, should cause the transaction
manager to finish things up after recovery from the problem.  I
think that a default would need to allow sufficient time for that,
so we can have some confidence that the transaction manager has
actually lost track of it.  If I were configuring this for a real
production environment, I would be in mind of frequently having
seen WAN outages of several hours, and a few which lasted two or
three days.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: idle_in_transaction_timeout

From

Stephen Frost

Date:

24 June 2014, 17:51:46

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2014-06-24 10:17:49 -0700, Tom Lane wrote:
> >> BTW, has anyone thought about the interaction of this feature with
> >> prepared transactions?  I wonder whether there shouldn't be a similar but
> >> separately-settable maximum time for a transaction to stay in the prepared
> >> state.  If we could set a nonzero default on that, perhaps on the order of
> >> a few minutes, we could solve the ancient bugaboo that "prepared
> >> transactions are too dangerous to enable by default".
>
> > I'd very much like that feature, but I'm not sure how to implement
> > it. Which process would do that check? We currently only allow rollbacks
> > from the corresponding database...
> > The best idea I have is to do it via autovacuum.
>
> I did not actually have any plan in mind when I wrote that, but your
> mention of autovacuum suggests an idea for it: consider the code that
> kicks autovacuum off a table when somebody wants exclusive lock.
> In the same way, we could teach processes that want a lock that conflicts
> with a prepared xact that they can kill the prepared xact if it's more
> than X seconds old.
>
> The other way in which old prepared xacts are dangerous is in blocking
> cleanup of dead tuples, and I agree with your thought that maybe
> autovacuum is the place to deal with that.  I don't know whether we'd
> really need both mechanisms, or if just one would be enough.
>
> In either case, this wouldn't directly be a timeout but rather a "license
> to kill" once a prepared xact exceeds the threshold and is getting in
> somebody's way.

Why isn't this what we want for idle-in-transaction sessions..?

Sounds like exactly what I'd want, at least.  Don't kill it off unless
it's blocking something or preventing xmin progression...

Indeed, we have specifically implemented a Nagios check which does
exactly this- looks to see if any idle-in-transaction process is
blocking something else and if it's been idle for too long it gets
killed.  We don't have prepared transactions enabled, so we havn't had
to address that.  We do have a check which alerts (but doesn't kill,
yet) idle-in-transaction processes which have been idle for a long time.
Thanks,
    Stephen

Re: idle_in_transaction_timeout

From

Josh Berkus

Date:

24 June 2014, 18:41:37

On 06/24/2014 10:17 AM, Tom Lane wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>> On 06/23/2014 03:52 PM, Andres Freund wrote:
>>> True.  Which makes me wonder whether we shouldn't default this to
>>> something non-zero -- even if it is 5 or 10 days.
> 
>> I'd go for even shorter: 48 hours.  I'd suggest 24 hours, but that would
>> trip up some users who just need really long pg_dumps.
> 
> FWIW, I do not think we should have a nonzero default for this.
> We could not safely set it to any value that would be small enough
> to be really useful in the field.

48 hours would actually be a useful value; I've dealt multiple times
with newbie users who had a transaction which had been open for a week.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: idle_in_transaction_timeout

From

Fujii Masao

Date:

25 June 2014, 05:02:02

On Sun, Jun 22, 2014 at 6:54 AM, Vik Fearing <vik.fearing@dalibo.com> wrote:
> On 06/21/2014 08:23 PM, Kevin Grittner wrote:
>> OK, so I think we want to see a patch based on v1 (FATAL approach)
>> with a change of the name to idle_in_transaction_session_timeout
>> and the units changed to milliseconds.  I don't see why the
>> remoteversion test shouldn't be changed to use 90500 now, too.
>
> The attached patch, rebased to current master, addresses all of these
> issues.

Sorry if this has already been discussed before....

Why is IIT timeout turned on only when send_ready_for_query is true?
I was thinking it should be turned on every time a message is received.
Imagine the case where the session is in idle-in-transaction state and
a client gets stuck after sending Parse message and before sending Bind
message. This case would also cause long transaction problem and should
be resolved by IIT timeout. But IIT timeout that this patch adds cannot
handle this case because it's enabled only when send_ready_for_query is
true. Thought?

Regards,

-- 
Fujii Masao

Re: idle_in_transaction_timeout

From

Tom Lane

Date:

25 June 2014, 15:41:16

Fujii Masao <masao.fujii@gmail.com> writes:
> Why is IIT timeout turned on only when send_ready_for_query is true?
> I was thinking it should be turned on every time a message is received.
> Imagine the case where the session is in idle-in-transaction state and
> a client gets stuck after sending Parse message and before sending Bind
> message. This case would also cause long transaction problem and should
> be resolved by IIT timeout. But IIT timeout that this patch adds cannot
> handle this case because it's enabled only when send_ready_for_query is
> true. Thought?

I think you just moved the goalposts to the next county.

The point of this feature, AFAICS, is to detect clients that are failing
to issue another query or close their transaction as a result of bad
client logic.  It's not to protect against network glitches.

Moreover, there would be no way to implement a timeout like that without
adding a gettimeofday() call after every packet receipt, which is overhead
we do not need and cannot afford.  I don't think this feature should add
*any* gettimeofday's beyond the ones that are already there.
        regards, tom lane

Re: idle_in_transaction_timeout

From

Fujii Masao

Date:

26 June 2014, 04:46:06

On Thu, Jun 26, 2014 at 12:40 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Fujii Masao <masao.fujii@gmail.com> writes:
>> Why is IIT timeout turned on only when send_ready_for_query is true?
>> I was thinking it should be turned on every time a message is received.
>> Imagine the case where the session is in idle-in-transaction state and
>> a client gets stuck after sending Parse message and before sending Bind
>> message. This case would also cause long transaction problem and should
>> be resolved by IIT timeout. But IIT timeout that this patch adds cannot
>> handle this case because it's enabled only when send_ready_for_query is
>> true. Thought?
>
> I think you just moved the goalposts to the next county.
>
> The point of this feature, AFAICS, is to detect clients that are failing
> to issue another query or close their transaction as a result of bad
> client logic.  It's not to protect against network glitches.

If so, the document should explain that cleanly. Otherwise users may
misunderstand this parameter and try to use it to protect even long transaction
generated by network glitches or client freeze.

> Moreover, there would be no way to implement a timeout like that without
> adding a gettimeofday() call after every packet receipt, which is overhead
> we do not need and cannot afford.

Hmm.. right. I was thinking to just call enable_timeout_after() every time
message is received. But it calls gettimeofday() and causes overhead.

>  I don't think this feature should add
> *any* gettimeofday's beyond the ones that are already there.

But, ISTM that the patch adds enable_timeout_after() which calls
GetCurrentTimestamp()->gettimeofday() before receiving new message
when send_ready_for_query is true. When send_ready_for_query is true,
it's expected that next message takes time to arrive at server, so
calling gettimeofday() in that case might not be so harmful, though.

Regards,

-- 
Fujii Masao

Re: idle_in_transaction_timeout

From

Vik Fearing

Date:

26 June 2014, 08:51:26

On 06/26/2014 06:45 AM, Fujii Masao wrote:
>> The point of this feature, AFAICS, is to detect clients that are failing
>> > to issue another query or close their transaction as a result of bad
>> > client logic.  It's not to protect against network glitches.
>
> If so, the document should explain that cleanly. Otherwise users may
> misunderstand this parameter and try to use it to protect even long transaction
> generated by network glitches or client freeze.

What does pg_stat_activity say for those cases?  If it's able to say
"idle in transaction", then this patch covers it.  If it isn't, then
that seems like a different patch.
-- 
Vik

Re: idle_in_transaction_timeout

From

Robert Haas

Date:

27 June 2014, 11:45:35

On Wed, Jun 25, 2014 at 11:40 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Fujii Masao <masao.fujii@gmail.com> writes:
>> Why is IIT timeout turned on only when send_ready_for_query is true?
>> I was thinking it should be turned on every time a message is received.
>> Imagine the case where the session is in idle-in-transaction state and
>> a client gets stuck after sending Parse message and before sending Bind
>> message. This case would also cause long transaction problem and should
>> be resolved by IIT timeout. But IIT timeout that this patch adds cannot
>> handle this case because it's enabled only when send_ready_for_query is
>> true. Thought?
>
> I think you just moved the goalposts to the next county.
>
> The point of this feature, AFAICS, is to detect clients that are failing
> to issue another query or close their transaction as a result of bad
> client logic.  It's not to protect against network glitches.

Hmm, so there's no reason a client, after sending one protocol
message, might not pause before sending the next protocol message?
That seems like a surprising argument.  Someone couldn't Parse and
then wait before sending Bind and Execute, or Parse and Bind and then
wait before sending Execute?

> Moreover, there would be no way to implement a timeout like that without
> adding a gettimeofday() call after every packet receipt, which is overhead
> we do not need and cannot afford.  I don't think this feature should add
> *any* gettimeofday's beyond the ones that are already there.

That's an especially good point if we think that this feature will be
enabled commonly or by default - but as Fujii Masao says, it might be
tricky to avoid.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: idle_in_transaction_timeout

From

Kevin Grittner

Date:

29 June 2014, 16:32:14

Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Jun 25, 2014 at 11:40 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

>>  Fujii Masao <masao.fujii@gmail.com> writes:
>>>  Why is IIT timeout turned on only when send_ready_for_query is true?
>>>  I was thinking it should be turned on every time a message is received.
>>>  Imagine the case where the session is in idle-in-transaction state and
>>>  a client gets stuck after sending Parse message and before sending Bind
>>>  message. This case would also cause long transaction problem and should
>>>  be resolved by IIT timeout. But IIT timeout that this patch adds cannot
>>>  handle this case because it's enabled only when send_ready_for_query is
>>>  true. Thought?
>>
>>  I think you just moved the goalposts to the next county.
>>
>>  The point of this feature, AFAICS, is to detect clients that are failing
>>  to issue another query or close their transaction as a result of bad
>>  client logic.  It's not to protect against network glitches.
>
> Hmm, so there's no reason a client, after sending one protocol
> message, might not pause before sending the next protocol message?
> That seems like a surprising argument.  Someone couldn't Parse and
> then wait before sending Bind and Execute, or Parse and Bind and then
> wait before sending Execute?
>
>
>>  Moreover, there would be no way to implement a timeout like that without
>>  adding a gettimeofday() call after every packet receipt, which is overhead
>>  we do not need and cannot afford.  I don't think this feature should add
>>  *any* gettimeofday's beyond the ones that are already there.
>
> That's an especially good point if we think that this feature will be
> enabled commonly or by default - but as Fujii Masao says, it might be
> tricky to avoid.  :-(

I think that this patch, as it stands, is a clear win if the
postgres_fdw part is excluded.  Remaining points of disagreement
seem to be the postgres_fdw, whether a default value measured in
days might be better than a default of off, and whether it's worth
the extra work of covering more.  The preponderance of opinion
seems to be in favor of excluding the postgres_fdw changes, with
Tom being violently opposed to including it.  I consider the idea
of the FDW ignoring the server setting dead.  Expanding the
protected area of code seems like it would be sane to ask whoever
wants to extend the protection in that direction to propose a
patch.  My sense is that there is more support for a default of a
small number of days than a default of never, but that seems far
from settled.

I propose to push this as it stands except for the postgres_fdw
part.  The default is easy enough to change if we reach consensus,
and expanding the scope can be a new patch in a new CF.
Objections?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: idle_in_transaction_timeout

From

Tom Lane

Date:

29 June 2014, 16:54:33

Kevin Grittner <kgrittn@ymail.com> writes:
> I propose to push this as it stands except for the postgres_fdw
> part.� The default is easy enough to change if we reach consensus,
> and expanding the scope can be a new patch in a new CF. 
> Objections?

There's been enough noise about the external definition that I think
no one has bothered to worry about whether the patch is actually
implemented sanely.  I do not think it is: specifically, the notion
that we will call ereport(FATAL) directly from a signal handler
does not fill me with warm fuzzies.  While the scope of code in
which the signal is enabled is relatively narrow, that doesn't mean
there's no problem.  Consider in particular the case where we are using
SSL: this will mean that we take control away from OpenSSL, which might be
in the midst of doing something if we're unlucky timing-wise, and then
we'll re-entrantly invoke it to try to send an error message to the
client.  That seems pretty unsafe.  Another risky flow of control is
if ReadCommand throws an ordinary error and then the timeout interrupt
happens while we're somewhere in the transaction abort logic (the
sigsetjmp stanza in postgres.c).

I'd be happier if this were implemented in the more traditional
style where the signal handler just sets a volatile flag variable,
which would be consulted at determinate places in the mainline logic.
Or possibly it could be made safe if we only let it throw the error
directly when ImmediateInterruptOK is true (compare the handling
of notify/catchup interrupts).
        regards, tom lane

Re: idle_in_transaction_timeout

From

Robert Haas

Date:

29 June 2014, 18:52:24

On Sun, Jun 29, 2014 at 12:32 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
>>>  Moreover, there would be no way to implement a timeout like that without
>>>  adding a gettimeofday() call after every packet receipt, which is overhead
>>>  we do not need and cannot afford.  I don't think this feature should add
>>>  *any* gettimeofday's beyond the ones that are already there.
>>
>> That's an especially good point if we think that this feature will be
>> enabled commonly or by default - but as Fujii Masao says, it might be
>> tricky to avoid.  :-(
>
> I think that this patch, as it stands, is a clear win if the
> postgres_fdw part is excluded.  Remaining points of disagreement
> seem to be the postgres_fdw, whether a default value measured in
> days might be better than a default of off, and whether it's worth
> the extra work of covering more.  The preponderance of opinion
> seems to be in favor of excluding the postgres_fdw changes, with
> Tom being violently opposed to including it.  I consider the idea
> of the FDW ignoring the server setting dead.  Expanding the
> protected area of code seems like it would be sane to ask whoever
> wants to extend the protection in that direction to propose a
> patch.  My sense is that there is more support for a default of a
> small number of days than a default of never, but that seems far
> from settled.
>
> I propose to push this as it stands except for the postgres_fdw
> part.  The default is easy enough to change if we reach consensus,
> and expanding the scope can be a new patch in a new CF.
> Objections?

Yeah, I think someone should do some analysis of whether this is
adding gettimeofday() calls, and how many, and what the performance
implications are.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: idle_in_transaction_timeout

From

Tom Lane

Date:

29 June 2014, 19:48:48

Robert Haas <robertmhaas@gmail.com> writes:
> On Sun, Jun 29, 2014 at 12:32 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
>> I propose to push this as it stands except for the postgres_fdw
>> part.  The default is easy enough to change if we reach consensus,
>> and expanding the scope can be a new patch in a new CF.
>> Objections?

> Yeah, I think someone should do some analysis of whether this is
> adding gettimeofday() calls, and how many, and what the performance
> implications are.

I believe that as the patch stands, we'd incur one new gettimeofday()
per query-inside-a-transaction, inside the enable_timeout_after() call.
(I think the disable_timeout() call would not result in a gettimeofday
call, since there would be no remaining live timeout events.)

We could possibly refactor enough to share the clock reading with the call
done in pgstat_report_activity.  Not sure how ugly that would be or
whether it's worth the trouble.  Note that in the not-a-transaction-block
case, we already have got two gettimeofday calls in this sequence, one in
pgstat_report_stat and one in pgstat_report_activity :-(
        regards, tom lane

Re: idle_in_transaction_timeout

From

Andres Freund

Date:

29 June 2014, 21:19:56

On 2014-06-29 15:48:15 -0400, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Sun, Jun 29, 2014 at 12:32 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
> >> I propose to push this as it stands except for the postgres_fdw
> >> part.  The default is easy enough to change if we reach consensus,
> >> and expanding the scope can be a new patch in a new CF.
> >> Objections?
> 
> > Yeah, I think someone should do some analysis of whether this is
> > adding gettimeofday() calls, and how many, and what the performance
> > implications are.
> 
> I believe that as the patch stands, we'd incur one new gettimeofday()
> per query-inside-a-transaction, inside the enable_timeout_after() call.
> (I think the disable_timeout() call would not result in a gettimeofday
> call, since there would be no remaining live timeout events.)
> 
> We could possibly refactor enough to share the clock reading with the call
> done in pgstat_report_activity.  Not sure how ugly that would be or
> whether it's worth the trouble.  Note that in the not-a-transaction-block
> case, we already have got two gettimeofday calls in this sequence, one in
> pgstat_report_stat and one in pgstat_report_activity :-(

I've seen several high throughput production servers where code around
gettimeofday is in the top three profile entries - so I'd be hesitant to
add more there. Especially as the majority of people here seems to think
we should enable this by default.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: idle_in_transaction_timeout

From

Tom Lane

Date:

29 June 2014, 21:28:42

Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-06-29 15:48:15 -0400, Tom Lane wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> Yeah, I think someone should do some analysis of whether this is
>>> adding gettimeofday() calls, and how many, and what the performance
>>> implications are.

>> I believe that as the patch stands, we'd incur one new gettimeofday()
>> per query-inside-a-transaction, inside the enable_timeout_after() call.
>> (I think the disable_timeout() call would not result in a gettimeofday
>> call, since there would be no remaining live timeout events.)
>> 
>> We could possibly refactor enough to share the clock reading with the call
>> done in pgstat_report_activity.  Not sure how ugly that would be or
>> whether it's worth the trouble.  Note that in the not-a-transaction-block
>> case, we already have got two gettimeofday calls in this sequence, one in
>> pgstat_report_stat and one in pgstat_report_activity :-(

> I've seen several high throughput production servers where code around
> gettimeofday is in the top three profile entries - so I'd be hesitant to
> add more there. Especially as the majority of people here seems to think
> we should enable this by default.

Note that we'd presumably also be adding two kernel calls associated
with setting/killing the SIGALRM timer.  I'm not sure how much those
cost, but it likely wouldn't be negligible compared to the gettimeofday
cost.

OTOH, one should not forget that there's also going to be a client
round trip involved here, so it's possible this is all down in the
noise compared to that.  But we ought to try to quantify it rather
than just hope for the best.

A thought that comes to mind in connection with that is whether we
shouldn't be doing the ReadyForQuery call (which I believe includes
fflush'ing the previous query response out to the client) before
rather than after all this statistics housekeeping.  Then at least
we'd have a shot at spending these cycles in parallel with the
network I/O and client think-time, rather than serializing it all.
        regards, tom lane

Re: idle_in_transaction_timeout

From

Andres Freund

Date:

29 June 2014, 22:06:33

On 2014-06-29 12:53:56 -0400, Tom Lane wrote:
> I do not think it is: specifically, the notion
> that we will call ereport(FATAL) directly from a signal handler
> does not fill me with warm fuzzies.

Aren't we already pretty much doing that for
SIGTERM/pg_terminate_backend() and recovery conflict interrupts?

If we get a SIGTERM while reading a command die() will set
ProcDiePending() and call ProcessInterrupts() after disabling some other
interrupts. Then the latter will FATAL out.

Imo the idle timeout handler pretty much needs a copy of die(), just
setting a different variable than (or in addition to?) ProcDiePending.

BUT: why is what ProcessInterrupts() is doing safe? Shouldn't it always
at least set whereToSendOutput = DestNone when FATALing while reading
(potentially via openssl)? The current behaviour imo both a protocol
violation and dangerous because of what you explained?

> I'd be happier if this were implemented in the more traditional
> style where the signal handler just sets a volatile flag variable,
> which would be consulted at determinate places in the mainline logic.
> Or possibly it could be made safe if we only let it throw the error
> directly when ImmediateInterruptOK is true (compare the handling
> of notify/catchup interrupts).

Hm. That sounds approximately like what I've written above.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: idle_in_transaction_timeout

From

Andres Freund

Date:

29 June 2014, 22:51:07

On 2014-06-29 17:28:06 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2014-06-29 15:48:15 -0400, Tom Lane wrote:
> >> Robert Haas <robertmhaas@gmail.com> writes:
> >>> Yeah, I think someone should do some analysis of whether this is
> >>> adding gettimeofday() calls, and how many, and what the performance
> >>> implications are.
> 
> >> I believe that as the patch stands, we'd incur one new gettimeofday()
> >> per query-inside-a-transaction, inside the enable_timeout_after() call.
> >> (I think the disable_timeout() call would not result in a gettimeofday
> >> call, since there would be no remaining live timeout events.)
> >> 
> >> We could possibly refactor enough to share the clock reading with the call
> >> done in pgstat_report_activity.  Not sure how ugly that would be or
> >> whether it's worth the trouble.  Note that in the not-a-transaction-block
> >> case, we already have got two gettimeofday calls in this sequence, one in
> >> pgstat_report_stat and one in pgstat_report_activity :-(
> 
> > I've seen several high throughput production servers where code around
> > gettimeofday is in the top three profile entries - so I'd be hesitant to
> > add more there. Especially as the majority of people here seems to think
> > we should enable this by default.
> 
> Note that we'd presumably also be adding two kernel calls associated
> with setting/killing the SIGALRM timer.  I'm not sure how much those
> cost, but it likely wouldn't be negligible compared to the gettimeofday
> cost.

It's probably higher, at least if we get around to replacing
gettimeofday() with clock_gettime() :(

So, i've traced a SELECT 1. We're currently doing:
1) gettimeofday() in SetCurrentStatementStartTimestamp
2) gettimeofday() pgstat_report_activity()
3) gettimeofday() for enable_timeout_after (id=STATEMENT_TIMEOUT)
4) setitimer() for schedule_alarm for STATEMENT_TIMEOUT
5) gettimeofday() for pgstat_report_activity()

Interestingly recvfrom(), setitimer(), sendto() are the only calls to
actually fully hit the kernel via syscalls (i.e. visible via strace).

The performance difference of setting up statement_timeout=10s for a
pgbench run that does:
\setrandom aid 1 1000000
SELECT * FROM pgbench_accounts WHERE aid = :aid;
is 164850.368336 (no statment_timeout) vs 157866.924725
(statement_timeout=10s). That's the best of 10 10s runs.
for SELECT 1 it's 242846.348628 vs 236764.177593.

Not too bad. Absolutely bleeding edge kernel/libc though; I seem to
recall different results with earlier libc/kernel combos.

I think statement_timeout's overhead should be fairly similar to what's
proposed for iit_t?

> A thought that comes to mind in connection with that is whether we
> shouldn't be doing the ReadyForQuery call (which I believe includes
> fflush'ing the previous query response out to the client) before
> rather than after all this statistics housekeeping.  Then at least
> we'd have a shot at spending these cycles in parallel with the
> network I/O and client think-time, rather than serializing it all.

Worth a try. It'd be also rather neat to to consolidate the first three
gettimeofday()'s above. Afaics they should all be consolidated via
GetCurrentTransactionStartTimestamp()...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: idle_in_transaction_timeout

From

Tom Lane

Date:

29 June 2014, 23:14:16

Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-06-29 12:53:56 -0400, Tom Lane wrote:
>> I do not think it is: specifically, the notion
>> that we will call ereport(FATAL) directly from a signal handler
>> does not fill me with warm fuzzies.

> Aren't we already pretty much doing that for
> SIGTERM/pg_terminate_backend() and recovery conflict interrupts?

We do that *only at safe spots*, ie CHECK_FOR_INTERRUPTS. Or at least
I sure hope so. Otherwise interrupting, eg, malloc will lead to much
unhappiness.

> BUT: why is what ProcessInterrupts() is doing safe? Shouldn't it always
> at least set whereToSendOutput = DestNone when FATALing while reading
> (potentially via openssl)?

Uh, no, that would pretty much destroy the point of trying to report
an error message at all.

We do restrict the immediate interrupt to occur only while we're actually
doing a recv(), see prepare_for_client_read and client_read_ended.
I grant that in the case of an SSL connection, that could be in the middle
of some sort of SSL handshake, so it might not be completely safe
protocol-wise --- but it's not likely to mean instant data structure
corruption.
        regards, tom lane

Re: idle_in_transaction_timeout

From

Andres Freund

Date:

29 June 2014, 23:32:02

On 2014-06-29 19:13:55 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2014-06-29 12:53:56 -0400, Tom Lane wrote:
> >> I do not think it is: specifically, the notion
> >> that we will call ereport(FATAL) directly from a signal handler
> >> does not fill me with warm fuzzies.
> 
> > Aren't we already pretty much doing that for
> > SIGTERM/pg_terminate_backend() and recovery conflict interrupts?
> 
> We do that *only at safe spots*, ie CHECK_FOR_INTERRUPTS. Or at least
> I sure hope so. Otherwise interrupting, eg, malloc will lead to much
> unhappiness.

I was specifically thinking about us immediately reacting to those while
we're reading from the client. We're indeed doing that directly:

#1  0x000000000076648a in proc_exit (code=1) at /home/andres/src/postgresql/src/backend/storage/ipc/ipc.c:143
#2  0x00000000008bcbf7 in errfinish (dummy=0) at /home/andres/src/postgresql/src/backend/utils/error/elog.c:555
#3  0x00000000007909f7 in ProcessInterrupts () at /home/andres/src/postgresql/src/backend/tcop/postgres.c:2869
#4  0x0000000000790469 in die (postgres_signal_arg=15) at /home/andres/src/postgresql/src/backend/tcop/postgres.c:2604
#5  <signal handler called>
#6  0x00007fb7c8ca0ebb in __libc_recv (fd=10, buf=0xd5f240 <PqRecvBuffer>, n=8192, flags=-1) at
../sysdeps/unix/sysv/linux/x86_64/recv.c:29
#7  0x000000000066a07c in secure_read (port=0x1a29d30, ptr=0xd5f240 <PqRecvBuffer>, len=8192)   at
/home/andres/src/postgresql/src/backend/libpq/be-secure.c:317
#8  0x00000000006770b5 in pq_recvbuf () at /home/andres/src/postgresql/src/backend/libpq/pqcomm.c:854
#9  0x000000000067714f in pq_getbyte () at /home/andres/src/postgresql/src/backend/libpq/pqcomm.c:895
#10 0x000000000078d26b in SocketBackend (inBuf=0x7fffbc02bc30) at
/home/andres/src/postgresql/src/backend/tcop/postgres.c:335
#11 0x000000000078d659 in ReadCommand (inBuf=0x7fffbc02bc30) at
/home/andres/src/postgresql/src/backend/tcop/postgres.c:483

Note that we're *inside* recv() here. Both paths to recv, without ssl
and with, have called prepare_for_client_read() which sets
ImmediateInterruptOK = true. Right now I fail to see why it's safe to do
so, at least when inside openssl.

> > BUT: why is what ProcessInterrupts() is doing safe? Shouldn't it always
> > at least set whereToSendOutput = DestNone when FATALing while reading
> > (potentially via openssl)?
> 
> Uh, no, that would pretty much destroy the point of trying to report
> an error message at all.

I only mean that we should do so in scenarios where we're currently
reading from the client. For die(), while we're reading from the client,
we're sending a message the client doesn't expect - and thus
unsurprisingly doesn't report. The client will just report 'server
closed the connection unexpectedly'.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: idle_in_transaction_timeout

From

Tom Lane

Date:

29 June 2014, 23:51:55

Andres Freund <andres@2ndquadrant.com> writes:
> Note that we're *inside* recv() here.

Well, actually, the recv() has probably failed with an EINTR at this
point, or else it will when/if control returns.

>> Uh, no, that would pretty much destroy the point of trying to report
>> an error message at all.

> I only mean that we should do so in scenarios where we're currently
> reading from the client. For die(), while we're reading from the client,
> we're sending a message the client doesn't expect - and thus
> unsurprisingly doesn't report.

This is nonsense.  The client will see the error as a response to its
query.  Of course, if it gets an EPIPE it might complain about that first
-- but that would only be likely to happen with a multi-packet query
string, at least over a TCP connection.

Experimenting with this on a RHEL6 box, I do see the "FATAL:  terminating
connection due to administrator command" message if I SIGTERM a backend
while idle and it's using a TCP connection; psql sees that as a response
next time it issues a command.  I do get the EPIPE behavior over a
Unix-socket connection, but I wonder if we shouldn't try to fix that.
It would make sense to see if there's data available before complaining
about the EPIPE.

Don't currently have an SSL configuration to experiment with.
        regards, tom lane

Re: idle_in_transaction_timeout

From

Andres Freund

Date:

30 June 2014, 01:19:53

On 2014-06-29 19:51:05 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > Note that we're *inside* recv() here.
> 
> Well, actually, the recv() has probably failed with an EINTR at this
> point, or else it will when/if control returns.

Well, the point is that we'll be reentering ssl when sending the error
message, without having left it properly. I.e. we're already hitting the
problem you've described.

Sure, if we're not FATALing, it'll return EINTR after that. But that's
not really the point here.

I wonder if we should instead *not* set ImmediateInterruptOK = true and
do a CHECK_FOR_INTERRUPT in secure_read, after returning from
openssl. When the recv in my_sock_read() sets BIO_set_retry_read()
because the signal handler caused an EINTR to be returned openssl will
return control to the caller. Which then can do the
CHECK_FOR_INTERRUPT(), jumping out without having to deal with openssl.

Probably has some problems with annoying platforms like windows though
:(. Not sure how the signal emulation plays with recv() being
interrupted there.

> >> Uh, no, that would pretty much destroy the point of trying to report
> >> an error message at all.
> 
> > I only mean that we should do so in scenarios where we're currently
> > reading from the client. For die(), while we're reading from the client,
> > we're sending a message the client doesn't expect - and thus
> > unsurprisingly doesn't report.
> 
> This is nonsense.  The client will see the error as a response to its
> query.

Man. Don't be so quick to judge stuff you can't immediately follow or
find wrongly stated as 'nonsense'.

We're sending messages back to the client while the client isn't
expecting any from the server. E.g. evidenced by the fact that libpq's
pqParseInput3() doesn't treat error messages during that phase as
errors... It instead emits them via the notice hooks, expecting them to
be NOTICEs...
This e.g. means that there's no error message stored in
conn->errorMessage.

That happens to be somewhat ok in the case of FATALs because the
connection is killed afterwards so any confusion won't be long lived,
but you can't tell me that you'd e.g. find it acceptable to send an
ERROR there.

>  Of course, if it gets an EPIPE it might complain about that first
> -- but that would only be likely to happen with a multi-packet query
> string, at least over a TCP connection.
> 
> Experimenting with this on a RHEL6 box, I do see the "FATAL:  terminating
> connection due to administrator command" message if I SIGTERM a backend
> while idle and it's using a TCP connection; psql sees that as a response
> next time it issues a command.  I do get the EPIPE behavior over a
> Unix-socket connection, but I wonder if we shouldn't try to fix that.
> It would make sense to see if there's data available before complaining
> about the EPIPE.

Even over unix sockets the data seems to be read - courtesy of
pqHandleSendFailure():
sendto(3, "Q\0\0\0\16SELECT 1;\0", 15, MSG_NOSIGNAL, NULL, 0) = -1 EPIPE (Broken pipe)
recvfrom(3, "E\0\0\0mSFATAL\0C57P01\0Mterminating "..., 16384, 0, NULL, NULL) = 110

The reason we don't print anything is that pqDropConnection(), which is
called by the second pqReadData() invocation in the loop, resets the
data positions:     conn->inStart = conn->inCursor = conn->inEnd = 0;

Moving the parseInput(conn) into the loop seems to "fix" it.

Haven't analyzed why, but if FATALs arrive during a query they're
printed twice. I've seen that before...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: idle_in_transaction_timeout

From

Bruce Momjian

Date:

08 July 2014, 03:30:55

On Tue, Jun 24, 2014 at 10:17:49AM -0700, Tom Lane wrote:
> Josh Berkus <josh@agliodbs.com> writes:
> > On 06/23/2014 03:52 PM, Andres Freund wrote:
> >> True.  Which makes me wonder whether we shouldn't default this to
> >> something non-zero -- even if it is 5 or 10 days.
> 
> > I'd go for even shorter: 48 hours.  I'd suggest 24 hours, but that would
> > trip up some users who just need really long pg_dumps.
> 
> FWIW, I do not think we should have a nonzero default for this.
> We could not safely set it to any value that would be small enough
> to be really useful in the field.
> 
> BTW, has anyone thought about the interaction of this feature with
> prepared transactions?  I wonder whether there shouldn't be a similar but
> separately-settable maximum time for a transaction to stay in the prepared
> state.  If we could set a nonzero default on that, perhaps on the order of
> a few minutes, we could solve the ancient bugaboo that "prepared
> transactions are too dangerous to enable by default".

Added as a TODO item.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +