Thread: idle_in_transaction_timeout
This patch implements a timeout for broken clients that idle in transaction. This is a TODO item. When the timeout occurs, the backend commits suicide with a FATAL ereport. I thought about just aborting the transaction to free the locks but decided that the client is clearly broken so might as well free up the connection as well. The same behavior can be achieved by an external script that monitors pg_stat_activity but by having it in core we get much finer tuning (it is session settable) and also traces of it directly in the PostgreSQL logs so it can be graphed by things like pgbadger. Unfortunately, no notification is sent to the client because there's no real way to do that right now. -- Vik
Attachment
At 2014-06-03 15:06:11 +0200, vik.fearing@dalibo.com wrote: > > This patch implements a timeout for broken clients that idle in > transaction. I think this is a nice feature, but I suggest that (at the very least) the GUC should be named "idle_transaction_timeout". > + <para> > + Terminate any session that is idle in transaction for longer than the specified > + number of seconds. This not only allows any locks to be released, but also allows > + the connection slot to be reused. However, aborted idle in transaction sessions > + are not affected. A value of zero (the default) turns this off. > + </para> I suggest: Terminate any session with an open transaction that has been idle for longer than the specified duration in seconds.This allows any locks to be released and the connection slot to be reused. It's not clear to me what "However, aborted idle in transaction sessions are not affected" means. The default value of 0 means that such sessions will not be terminated. -- Abhijit
On 06/03/2014 03:30 PM, Abhijit Menon-Sen wrote: > At 2014-06-03 15:06:11 +0200, vik.fearing@dalibo.com wrote: >> This patch implements a timeout for broken clients that idle in >> transaction. > I think this is a nice feature, but I suggest that (at the very least) > the GUC should be named "idle_transaction_timeout". I prefer the way I have it, but not enough to put up a fight if other people like your version better. >> + <para> >> + Terminate any session that is idle in transaction for longer than the specified >> + number of seconds. This not only allows any locks to be released, but also allows >> + the connection slot to be reused. However, aborted idle in transaction sessions >> + are not affected. A value of zero (the default) turns this off. >> + </para> > I suggest: > > Terminate any session with an open transaction that has been idle > for longer than the specified duration in seconds. This allows any > locks to be released and the connection slot to be reused. > > It's not clear to me what "However, aborted idle in transaction sessions > are not affected" means. > > The default value of 0 means that such sessions will not be > terminated. How about this? Terminate any session with an open transaction that has been idle for longer than the specified duration in seconds.This allows any locks to be released and the connection slot to be reused. Sessions in the state "idle in transaction (aborted)" occupy a connection slot but because they hold no locks, theyare not considered by this parameter. The default value of 0 means that such sessions will not be terminated. -- Vik
Vik Fearing wrote > On 06/03/2014 03:30 PM, Abhijit Menon-Sen wrote: >> At 2014-06-03 15:06:11 +0200, > vik.fearing@ > wrote: >>> This patch implements a timeout for broken clients that idle in >>> transaction. >> I think this is a nice feature, but I suggest that (at the very least) >> the GUC should be named "idle_transaction_timeout". > > I prefer the way I have it, but not enough to put up a fight if other > people like your version better. > >>> + > <para> >>> + Terminate any session that is idle in transaction for longer >>> than the specified >>> + number of seconds. This not only allows any locks to be >>> released, but also allows >>> + the connection slot to be reused. However, aborted idle in >>> transaction sessions >>> + are not affected. A value of zero (the default) turns this >>> off. >>> + > </para> >> I suggest: >> >> Terminate any session with an open transaction that has been idle >> for longer than the specified duration in seconds. This allows any >> locks to be released and the connection slot to be reused. >> >> It's not clear to me what "However, aborted idle in transaction sessions >> are not affected" means. >> >> The default value of 0 means that such sessions will not be >> terminated. > > How about this? > > Terminate any session with an open transaction that has been idle > for longer than the specified duration in seconds. This allows any > locks to be released and the connection slot to be reused. > > Sessions in the state "idle in transaction (aborted)" occupy a > connection slot but because they hold no locks, they are not > considered by this parameter. > > The default value of 0 means that such sessions will not be > terminated. > -- > Vik I do not see any reason for an aborted idle in transaction session to be treated differently. Given the similarity to "statement_timeout" using similar wording for both would be advised. *Statement Timeout:* "Abort any statement that takes more than the specified number of milliseconds, starting from the time the command arrives at the server from the client. If log_min_error_statement is set to ERROR or lower, the statement that timed out will also be logged. A value of zero (the default) turns this off. Setting statement_timeout in postgresql.conf is not recommended because it would affect all sessions." *Idle Transaction Timeout:* > Disconnect any session that has been "idle in transaction" (including > aborted) for more than the specified number of milliseconds, starting from > {however such is determined}. > > A value of zero (the default) turns this off. > > Typical usage would be to set this to a small positive number in > postgresql.conf and require sessions that expect long-running, idle, > transactions to set it back to zero or some reasonable higher value. While seconds is the better unit of measure I don't think that justifies making this different from statement_timeout. In either case users can specify their own units. Since we are killing the entire session, not just the transaction, a better label would: idle_transaction_session_timeout David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/idle-in-transaction-timeout-tp5805859p5805904.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
Vik, > When the timeout occurs, the backend commits suicide with a FATAL > ereport. I thought about just aborting the transaction to free the > locks but decided that the client is clearly broken so might as well > free up the connection as well. Out of curiosity, how much harder would it have been just to abort the transaction? I think breaking the connection is probably the right behavior, but before folks start arguing it out, I wanted to know if aborting the transaction is even a reasonable thing to do. Argument in favor of breaking the connection: most of the time, IITs are caused by poor error-handling or garbage-collection code on the client side, which has already abandoned the application session but hadn't let go of the database handle. Since the application is never going to reuse the handle, terminating it is the right thing to do. Argument in favor of just aborting the transaction: client connection management software may not be able to deal cleanly with the broken connection and may halt operation completely, especially if the client finds out the connection is gone when they try to start their next transaction on that connection. My $0.019999999999998: terminating the connection is the preferred behavior. > The same behavior can be achieved by an external script that monitors > pg_stat_activity but by having it in core we get much finer tuning (it > is session settable) and also traces of it directly in the PostgreSQL > logs so it can be graphed by things like pgbadger. > > Unfortunately, no notification is sent to the client because there's no > real way to do that right now. Well, if you abort the connection, presumably the client finds out when they try to send a query ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: > Out of curiosity, how much harder would it have been just to abort the > transaction? I think breaking the connection is probably the right > behavior, but before folks start arguing it out, I wanted to know if > aborting the transaction is even a reasonable thing to do. FWIW, I think aborting the transaction is probably better, especially if the patch is designed to do nothing to already-aborted transactions. If the client is still there, it will see the abort as a failure in its next query, which is less likely to confuse it completely than a connection loss. (I think, anyway.) The argument that we might want to close the connection to free up connection slots doesn't seem to me to hold water as long as we allow a client that *isn't* inside a transaction to sit on an idle connection forever. Perhaps there is room for a second timeout that limits how long you can sit idle independently of being in a transaction, but that isn't this patch. regards, tom lane
Josh Berkus wrote: > Argument in favor of breaking the connection: most of the time, IITs are > caused by poor error-handling or garbage-collection code on the client > side, which has already abandoned the application session but hadn't let > go of the database handle. Since the application is never going to > reuse the handle, terminating it is the right thing to do. In some frameworks where garbage-collection is not involved with things like this, what happens is that the connection object is reused later. If you just drop the connection, the right error message will never reach the application, but if you abort the xact, the next BEGIN issued by the framework will receive the "connection aborted due to idle-in-transaction session" message, which I assume is what this patch would have sent. Therefore I think if we want this at all it should abort the xact, not drop the connection. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 06/03/2014 05:53 PM, Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: >> Out of curiosity, how much harder would it have been just to abort the >> transaction? I think breaking the connection is probably the right >> behavior, but before folks start arguing it out, I wanted to know if >> aborting the transaction is even a reasonable thing to do. > FWIW, I think aborting the transaction is probably better, especially > if the patch is designed to do nothing to already-aborted transactions. > If the client is still there, it will see the abort as a failure in its > next query, which is less likely to confuse it completely than a > connection loss. (I think, anyway.) > > The argument that we might want to close the connection to free up > connection slots doesn't seem to me to hold water as long as we allow > a client that *isn't* inside a transaction to sit on an idle connection > forever. Perhaps there is room for a second timeout that limits how > long you can sit idle independently of being in a transaction, but that > isn't this patch. > > Yes, I had the same thought. cheers andrew
On 06/03/2014 02:53 PM, Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: >> Out of curiosity, how much harder would it have been just to abort the >> transaction? I think breaking the connection is probably the right >> behavior, but before folks start arguing it out, I wanted to know if >> aborting the transaction is even a reasonable thing to do. > > FWIW, I think aborting the transaction is probably better, especially > if the patch is designed to do nothing to already-aborted transactions. > If the client is still there, it will see the abort as a failure in its > next query, which is less likely to confuse it completely than a > connection loss. (I think, anyway.) Personally, I think we'll get about equal amounts of confusion either way. > The argument that we might want to close the connection to free up > connection slots doesn't seem to me to hold water as long as we allow > a client that *isn't* inside a transaction to sit on an idle connection > forever. Perhaps there is room for a second timeout that limits how > long you can sit idle independently of being in a transaction, but that > isn't this patch. Like I said, I'm marginally in favor of terminating the connection, but I'd be completely satisfied with a timeout which aborted the transaction instead. Assuming that change doesn't derail this patch and keep it from getting into 9.5, of course. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 06/04/2014 01:38 AM, Josh Berkus wrote: > On 06/03/2014 02:53 PM, Tom Lane wrote: >> Josh Berkus <josh@agliodbs.com> writes: >>> Out of curiosity, how much harder would it have been just to abort the >>> transaction? I think breaking the connection is probably the right >>> behavior, but before folks start arguing it out, I wanted to know if >>> aborting the transaction is even a reasonable thing to do. >> FWIW, I think aborting the transaction is probably better, especially >> if the patch is designed to do nothing to already-aborted transactions. >> If the client is still there, it will see the abort as a failure in its >> next query, which is less likely to confuse it completely than a >> connection loss. (I think, anyway.) > Personally, I think we'll get about equal amounts of confusion either way. > >> The argument that we might want to close the connection to free up >> connection slots doesn't seem to me to hold water as long as we allow >> a client that *isn't* inside a transaction to sit on an idle connection >> forever. Perhaps there is room for a second timeout that limits how >> long you can sit idle independently of being in a transaction, but that >> isn't this patch. > Like I said, I'm marginally in favor of terminating the connection, but > I'd be completely satisfied with a timeout which aborted the transaction > instead. Assuming that change doesn't derail this patch and keep it > from getting into 9.5, of course. The change is as simple as changing the ereport from FATAL to ERROR. Attached is a new patch doing it that way. -- Vik
Attachment
On 06/03/2014 02:53 PM, Tom Lane wrote:Personally, I think we'll get about equal amounts of confusion either way.
> Josh Berkus <[hidden email]> writes:
>> Out of curiosity, how much harder would it have been just to abort the
>> transaction? I think breaking the connection is probably the right
>> behavior, but before folks start arguing it out, I wanted to know if
>> aborting the transaction is even a reasonable thing to do.
>
> FWIW, I think aborting the transaction is probably better, especially
> if the patch is designed to do nothing to already-aborted transactions.
> If the client is still there, it will see the abort as a failure in its
> next query, which is less likely to confuse it completely than a
> connection loss. (I think, anyway.)Like I said, I'm marginally in favor of terminating the connection, but
> The argument that we might want to close the connection to free up
> connection slots doesn't seem to me to hold water as long as we allow
> a client that *isn't* inside a transaction to sit on an idle connection
> forever. Perhaps there is room for a second timeout that limits how
> long you can sit idle independently of being in a transaction, but that
> isn't this patch.
I'd be completely satisfied with a timeout which aborted the transaction
instead. Assuming that change doesn't derail this patch and keep it
from getting into 9.5, of course.
Default to dropping the connection but give the usersadministrators the capability to decide for themselves?
I still haven't heard an argument for why this wouldn't apply to aborted idle-in-transactions. I get the focus in on releasing locks but if the transaction fails but still hangs around forever it is just as broken as one that doesn't fail and hangs around forever. Even if you limit the end result to only aborting the transaction the end-user will likely want to distinguish between their transaction failing and their failed transaction remaining idle too long - if only to avoid the situation where they make the transaction no longer fail but still hit the timeout.
Whether a connection is a resource the DBA wants to restore (at the expense of better server-client communication) is a parental decision we shouldn't force on them given how direct (feelings about GUCs aside). The decision to force the release of locks - the primary purpose of the patch - is made by simply using the setting in the first place.
David J.
View this message in context: Re: idle_in_transaction_timeout
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
<div class="moz-cite-prefix">On 06/04/2014 02:01 AM, David G Johnston wrote:<br /></div><blockquote cite="mid:CAKFQuwYwHkZXwt-NaUXsEP3XuSAunzbgPo8cbe_2Nv6M89hN1g@mail.gmail.com"type="cite"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Defaultto dropping the connection but give the usersadministrators the capabilityto decide for themselves?</div></blockquote><br /> Meh.<br /><br /><blockquote cite="mid:CAKFQuwYwHkZXwt-NaUXsEP3XuSAunzbgPo8cbe_2Nv6M89hN1g@mail.gmail.com"type="cite"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Istill haven't heard an argument for why this wouldn't apply to aborted idle-in-transactions. I get the focus in on releasing locks but if the transaction fails but still hangs around forever itis just as broken as one that doesn't fail and hangs around forever. <br /></div></blockquote><br /> My main concern waswith locks and blocking VACUUM. Aborted transactions don't do either of those things. The correct solution is to terminateaborted transaction, too, or not terminate anything and abort the idle ones.<br /><br /><blockquote cite="mid:CAKFQuwYwHkZXwt-NaUXsEP3XuSAunzbgPo8cbe_2Nv6M89hN1g@mail.gmail.com"type="cite"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Evenif you limit the end result to only aborting the transaction the end-userwill likely want to distinguish between their transaction failing and their failed transaction remaining idle toolong - if only to avoid the situation where they make the transaction no longer fail but still hit the timeout.</div></blockquote><br/> But hitting the timeout *is* failing.<br /><br /> With the new patch, the first query willsay that the transaction was aborted due to timeout. Subsequent queries will do as they've always done.<br /><pre class="moz-signature"cols="72">-- Vik</pre>
On Tue, Jun 3, 2014 at 5:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Josh Berkus <josh@agliodbs.com> writes: >> Out of curiosity, how much harder would it have been just to abort the >> transaction? I think breaking the connection is probably the right >> behavior, but before folks start arguing it out, I wanted to know if >> aborting the transaction is even a reasonable thing to do. > > FWIW, I think aborting the transaction is probably better, especially > if the patch is designed to do nothing to already-aborted transactions. > If the client is still there, it will see the abort as a failure in its > next query, which is less likely to confuse it completely than a > connection loss. (I think, anyway.) I thought the reason why this hasn't been implemented before now is that sending an ErrorResponse to the client will result in a loss of protocol sync. Sure, when the client sends the next query, they'll then read the ErrorResponse. So far so good. The problem is that they *won't* read whatever we send back as a response to their query, because they think they already have, when in reality they've only read the ErrorResponse sent much earlier. This, at least as I've understood it, is the reason why recovery conflicts kill off idle sessions entirely instead of just aborting the transaction. Andres tried to fix that problem a few years ago without much luck; the most relevant post to this particular issue seems to be: http://www.postgresql.org/message-id/23631.1292521603@sss.pgh.pa.us Assuming that the state of play hasn't changed in some way I'm unaware of since 2010, I think the best argument for FATAL here is that it's what we can implement. I happen to think it's better anyway, because the cases I've seen where this would actually be useful involve badly-written applications that are not under the same administrative control as the database and supposedly have built-in connection poolers, except sometimes they seem to forget about some of the connections they themselves opened. The DBAs can't make the app developers fix the app; they just have to try to survive. Aborting the transaction is a step in the right direction but since the guy at the other end of the connection is actually or in effect dead, that just leaves you with an eternally idle connection. Now we can say "use TCP keepalives for that" but that only helps if the connection has actually been severed; if the guy at the other end is still technically there but is too brain-damaged to actually try to *use* the connection for anything, it doesn't help. Also, as I recently discovered, there are still a few machines out there that don't actually support TCP keepalives on a per-connection basis; you can only configure it system-wide, and that's not always granular enough. Anyway, if somebody really wants to go to the trouble of finding a way to make this work without killing off the connection, I think that would be cool and useful and whatever technology we develop there could doubtless could be applied to other situations. But I have a nervous feeling that might be a hard enough problem to sink the whole patch, which would be a shame, since the cases I've actually encountered would be better off with FATAL anyway. Just my $0.019999999999997 to go with Josh's $0.019999999999998. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > I thought the reason why this hasn't been implemented before now is > that sending an ErrorResponse to the client will result in a loss of > protocol sync. Hmm ... you are right that this isn't as simple as an ereport(ERROR), but I'm not sure it's impossible. We could for instance put the backend into skip-till-Sync state so that it effectively ignored the next command message. Causing that to happen might be impracticably messy, though. I'm not sure whether cancel-transaction behavior is enough better than cancel-session to warrant extra effort here. regards, tom lane
On 2014-06-03 23:37:28 -0400, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > I thought the reason why this hasn't been implemented before now is > > that sending an ErrorResponse to the client will result in a loss of > > protocol sync. > > Hmm ... you are right that this isn't as simple as an ereport(ERROR), > but I'm not sure it's impossible. We could for instance put the backend > into skip-till-Sync state so that it effectively ignored the next command > message. Causing that to happen might be impracticably messy, though. Isn't another problem here that we're reading from the client, possibly nested inside openssl? So we can't just longjmp out without possibly destroying openssl's internal state? I think that's solveable by first returning from openssl, signalling a short read, and only *then* jumping out. I remember making something like that work in a POC patch at least... But it's been a couple of years. > I'm not sure whether cancel-transaction behavior is enough better than > cancel-session to warrant extra effort here. I don't think idle_in_transaction_timeout is worth this, but if we could implement it we could finally have recovery conflicts against idle in txn sessions not use FATAL... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Jun 3, 2014 at 11:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I thought the reason why this hasn't been implemented before now is >> that sending an ErrorResponse to the client will result in a loss of >> protocol sync. > > Hmm ... you are right that this isn't as simple as an ereport(ERROR), > but I'm not sure it's impossible. We could for instance put the backend > into skip-till-Sync state so that it effectively ignored the next command > message. Causing that to happen might be impracticably messy, though. Another thing we could maybe do is AbortCurrentTransaction() and send the client a NoticeResponse saying "hey, expect all of your future commands to fail with complaints about the transaction being aborted". -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> wrote: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> I thought the reason why this hasn't been implemented before >>> now is that sending an ErrorResponse to the client will result >>> in a loss of protocol sync. >> >> Hmm ... you are right that this isn't as simple as an >> ereport(ERROR), but I'm not sure it's impossible. We could for >> instance put the backend into skip-till-Sync state so that it >> effectively ignored the next command message. Causing that to >> happen might be impracticably messy, though. > > Another thing we could maybe do is AbortCurrentTransaction() and > send the client a NoticeResponse saying "hey, expect all of your > future commands to fail with complaints about the transaction > being aborted". Wow, until I read through the thread on this patch and the old thread that Robert linked to I had forgotten the attempt by Andres to deal with this three and a half years ago. That patch died because of how complicated it was to do the right thing if the connection was left open. Personally, where I have seen a use for this, treating it as FATAL would be better anyway; but was OK with treating it as an ERROR if that didn't sink the patch. I guess that puts me in the same camp as Josh and Robert. Vik has submitted v1 and v2 of this patch; the only difference between them is that v1 makes a timeout FATAL and v2 makes it an ERROR (with appropriate corresponding changes to code comments, documentation, and message text). It's not hard to show the problems with an ERROR under v2, confirming that the simple approach of just changing the ereport severity is not enough: test=# set idle_in_transaction_timeout = '3s'; SET test=# begin; BEGIN test=# select 1; ERROR: current transaction is aborted due to idle-in-transaction timeout test=# commit; ERROR: current transaction is aborted, commands ignored until end of transaction block test=# commit; ROLLBACK The first thing is that I don't think a delay between the BEGIN and the SELECT should cause a timeout to trigger, but more importantly there should not be two ERROR responses to one SELECT statement. I'm inclined to abandon the ERROR approach as not worth the effort and fragility, and focus on v1 of the patch. If we can't get to consensus on that, I think that this patch should be flagged "Returned with Feedback", noting that any follow-up version requires some way to deal with the issues raised regarding multiple ERROR messages. Abridged for space. hopefully without losing the main points of the authors, so far: Josh Berkus: > My $0.019999999999998: terminating the connection is the > preferred behavior. Tom Lane: > FWIW, I think aborting the transaction is probably better, > especially if the patch is designed to do nothing to > already-aborted transactions. If the client is still there, it > will see the abort as a failure in its next query, which is less > likely to confuse it completely than a connection loss. (I > think, anyway.) Álvaro Herrera: > I think if we want this at all it should abort the xact, not drop > the connection. Andrew Dunstan: > [quotes Tom's argument] > Yes, I had the same thought. David G Johnston: > Default to dropping the connection but give the > usersadministrators the capability to decide for themselves? Robert Haas: > Assuming that the state of play hasn't changed in some way I'm > unaware of since 2010, I think the best argument for FATAL here > is that it's what we can implement. I happen to think it's > better anyway, because the cases I've seen where this would > actually be useful involve badly-written applications that are > not under the same administrative control as the database and > supposedly have built-in connection poolers, except sometimes > they seem to forget about some of the connections they themselves > opened. The DBAs can't make the app developers fix the app; they > just have to try to survive. Aborting the transaction is a step > in the right direction but since the guy at the other end of the > connection is actually or in effect dead, that just leaves you > with an eternally idle connection. Tom Lane (reprise): > I'm not sure whether cancel-transaction behavior is enough better > than cancel-session to warrant extra effort here. Andres Freund: > [quotes Tom's latest (above)] > I don't think idle_in_transaction_timeout is worth this, but if > we could implement it we could finally have recovery conflicts > against idle in txn sessions not use FATAL... Kevin Grittner: > Personally, where I have seen a use for this, treating it as > FATAL would be better anyway A few ideas were put forward on how a much more complicated patch could allow transaction rollback with an ERROR work acceptably, but I think that would probably need to be a new patch in a later CF, so that is omitted here. So, can we agree to use FATAL to terminate the connection? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 06/18/2014 11:50 AM, Kevin Grittner wrote: > The first thing is that I don't think a delay between the BEGIN and > the SELECT should cause a timeout to trigger, but more importantly > there should not be two ERROR responses to one SELECT statement. I do think a delay between BEGIN and SELECT should trigger the timeout.There are plenty of badly-written applications which"auto-begin", that is, they issue a "BEGIN;" immediately after every "COMMIT;" whether or not there's any additional work to do. This is a major source of IIT and the timeout should not ignore it. > I'm inclined to abandon the ERROR approach as not worth the effort > and fragility, and focus on v1 of the patch. If we can't get to > consensus on that, I think that this patch should be flagged > "Returned with Feedback", noting that any follow-up version > requires some way to deal with the issues raised regarding multiple > ERROR messages. +1 -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: > There are plenty of badly-written applications which "auto-begin", that > is, they issue a "BEGIN;" immediately after every "COMMIT;" whether or > not there's any additional work to do. This is a major source of IIT > and the timeout should not ignore it. Nonsense. We explicitly don't do anything useful until the first actual command arrives, precisely to avoid that problem. It might be that we should slap such apps' wrists anyway, but given that we've gone to the trouble of working around the behavior at the system structural level, I'd be inclined to say not. What you'd be doing is preventing people who have to deal with such apps from using the timeout in any useful fashion. regards, tom lane
On 06/18/2014 12:32 PM, Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: >> There are plenty of badly-written applications which "auto-begin", that >> is, they issue a "BEGIN;" immediately after every "COMMIT;" whether or >> not there's any additional work to do. This is a major source of IIT >> and the timeout should not ignore it. > > Nonsense. We explicitly don't do anything useful until the first actual > command arrives, precisely to avoid that problem. Oh, we don't allocate a snapshot? If not, then no objection here. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Wed, Jun 18, 2014 at 3:53 PM, Josh Berkus <josh@agliodbs.com> wrote: > On 06/18/2014 12:32 PM, Tom Lane wrote: >> Josh Berkus <josh@agliodbs.com> writes: >>> There are plenty of badly-written applications which "auto-begin", that >>> is, they issue a "BEGIN;" immediately after every "COMMIT;" whether or >>> not there's any additional work to do. This is a major source of IIT >>> and the timeout should not ignore it. >> >> Nonsense. We explicitly don't do anything useful until the first actual >> command arrives, precisely to avoid that problem. > > Oh, we don't allocate a snapshot? If not, then no objection here. The only problem I see is that it makes the semantics kind of weird and confusing. "Kill connections that are idle in transaction for too long" is a pretty clear spec; "kill connections that are idle in transaction except if they haven't executed any commands yet because we think you don't care about that case" is not quite as clear, and not really what the GUC name says, and maybe not what everybody wants, and maybe masterminding. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Jun 18, 2014 at 04:41:30PM -0400, Robert Haas wrote: > On Wed, Jun 18, 2014 at 3:53 PM, Josh Berkus <josh@agliodbs.com> wrote: > > On 06/18/2014 12:32 PM, Tom Lane wrote: > >> Josh Berkus <josh@agliodbs.com> writes: > >>> There are plenty of badly-written applications which "auto-begin", that > >>> is, they issue a "BEGIN;" immediately after every "COMMIT;" whether or > >>> not there's any additional work to do. This is a major source of IIT > >>> and the timeout should not ignore it. > >> > >> Nonsense. We explicitly don't do anything useful until the first actual > >> command arrives, precisely to avoid that problem. > > > > Oh, we don't allocate a snapshot? If not, then no objection here. > > The only problem I see is that it makes the semantics kind of weird > and confusing. "Kill connections that are idle in transaction for too > long" is a pretty clear spec; "kill connections that are idle in > transaction except if they haven't executed any commands yet because > we think you don't care about that case" is not quite as clear, and > not really what the GUC name says, and maybe not what everybody wants, > and maybe masterminding. "Kill connections that are idle in non-empty transaction block for too long" -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On 06/18/2014 02:52 PM, Bruce Momjian wrote: > On Wed, Jun 18, 2014 at 04:41:30PM -0400, Robert Haas wrote: >> The only problem I see is that it makes the semantics kind of weird >> and confusing. "Kill connections that are idle in transaction for too >> long" is a pretty clear spec; "kill connections that are idle in >> transaction except if they haven't executed any commands yet because >> we think you don't care about that case" is not quite as clear, and >> not really what the GUC name says, and maybe not what everybody wants, >> and maybe masterminding. > > "Kill connections that are idle in non-empty transaction block for too > long" Here's the POLS violation in this: "I have idle_in_transaction_timeout set to 10min, but according to pg_stat_activity I have six connections which are IIT for over an hour.What gives?" Robert's right, not killing the "BEGIN;" only transactions is liable to result in user confusion unless we label those sessions differently in pg_stat_activity. Tom is right in that killing them will cause some users to not use IIT_timeout when they should, or will set the timeout too high to be useful. So it seems like what we should do is NOT call sessions IIT if they've merely executed a BEGIN; and not done anything else. Then the timeout and pg_stat_activity would be consistent. Counter-argument: most app frameworks which do an automatic BEGIN; also do other stuff like SET TIMEZONE each time as well. Is this really a case worth worrying about? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 2014-06-19 1:46 AM, Josh Berkus wrote: > Robert's right, not killing the "BEGIN;" only transactions is liable to > result in user confusion unless we label those sessions differently in > pg_stat_activity. Wouldn't they be labeled differently already? i.e. the last query would be "BEGIN". Unless your app tries to unsuccessfully use nested transactions, you would know why it hasn't been killed. .marko
On 06/18/2014 04:54 PM, Marko Tiikkaja wrote: > On 2014-06-19 1:46 AM, Josh Berkus wrote: >> Robert's right, not killing the "BEGIN;" only transactions is liable to >> result in user confusion unless we label those sessions differently in >> pg_stat_activity. > > Wouldn't they be labeled differently already? i.e. the last query would > be "BEGIN". Unless your app tries to unsuccessfully use nested > transactions, you would know why it hasn't been killed. That's pretty darned obscure for a casual user. *you* would know, and *I* would know, but 99.5% of our users would be very confused. Plus, if a session which has only issued a "BEGIN;" doesn't have a snapshot and isn't holding any locks, then I'd argue we shouldn't lable it IIT in the first place because it's not doing any of the bad stuff we want to resolve by killing IITs. Effectively, it's just "idle". -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: > Counter-argument: most app frameworks which do an automatic BEGIN; also > do other stuff like SET TIMEZONE each time as well. Is this really a > case worth worrying about? SET TIMEZONE doesn't result in taking a snapshot either. regards, tom lane
On 06/18/2014 04:54 PM, Marko Tiikkaja wrote:That's pretty darned obscure for a casual user. *you* would know, and
> On 2014-06-19 1:46 AM, Josh Berkus wrote:
>> Robert's right, not killing the "BEGIN;" only transactions is liable to
>> result in user confusion unless we label those sessions differently in
>> pg_stat_activity.
>
> Wouldn't they be labeled differently already? i.e. the last query would
> be "BEGIN". Unless your app tries to unsuccessfully use nested
> transactions, you would know why it hasn't been killed.
*I* would know, but 99.5% of our users would be very confused.
Plus, if a session which has only issued a "BEGIN;" doesn't have a
snapshot and isn't holding any locks, then I'd argue we shouldn't lable
it IIT in the first place because it's not doing any of the bad stuff we
want to resolve by killing IITs. Effectively, it's just "idle".
+1
Since the BEGIN is not meaningfully interpreted until the first subsequent command (SET excepting apparently - are there others?) is issued no transaction has begun at this point so "idle" and not "IIT" should be the reported state on pg_stat_activity.
Though I can understand some level of surprise if someone sees "idle" with a "BEGIN" (or SET TIMEZONE) as the last executed command - so maybe "idle before transaction" instead of "idle in transaction" - which hopefully will not be assumed to be controlled by the "idle_in_transaction_timeout" GUC. I presume that we have some way to distinguish this particular case and can report it accordingly. Even if that state endures after a SET TIMEZONE or similar session configuration command explaining that all those are "pre-transaction" shouldn't be too hard - though as a third option "idle initialized transaction" could be the state indicator.
Depending on how "idle in transaction (aborted)" gets resolved we could also consider "idle in transaction (initializing)" - but since the former, IMO, should be dropped (and thus matches the "in" specification of the GUC) naming the later - which should not be dropped (I'll go with the stated opinion here for now) - with "in" is not advisable.
The current behavior is transparent to the casual user so sticking with "idle" does seem like the best choice to maintain the undocumented nature of the behavior.
David J.
View this message in context: Re: idle_in_transaction_timeout
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
Hello
pgbouncer has idle_transaction_timeout defined some years without any problems, so we can take this designidle_transaction_timeout
If client has been in "idle in transaction" state longer, it will be disconnected. [seconds]
Default: 0.0 (disabled)
Pavel
2014-06-19 1:46 GMT+02:00 Josh Berkus <josh@agliodbs.com>:
On 06/18/2014 02:52 PM, Bruce Momjian wrote:
> On Wed, Jun 18, 2014 at 04:41:30PM -0400, Robert Haas wrote:>> The only problem I see is that it makes the semantics kind of weirdHere's the POLS violation in this:
>> and confusing. "Kill connections that are idle in transaction for too
>> long" is a pretty clear spec; "kill connections that are idle in
>> transaction except if they haven't executed any commands yet because
>> we think you don't care about that case" is not quite as clear, and
>> not really what the GUC name says, and maybe not what everybody wants,
>> and maybe masterminding.
>
> "Kill connections that are idle in non-empty transaction block for too
> long"
"I have idle_in_transaction_timeout set to 10min, but according to
pg_stat_activity I have six connections which are IIT for over an hour.
What gives?"
Robert's right, not killing the "BEGIN;" only transactions is liable to
result in user confusion unless we label those sessions differently in
pg_stat_activity. Tom is right in that killing them will cause some
users to not use IIT_timeout when they should, or will set the timeout
too high to be useful.
So it seems like what we should do is NOT call sessions IIT if they've
merely executed a BEGIN; and not done anything else. Then the timeout
and pg_stat_activity would be consistent.
Counter-argument: most app frameworks which do an automatic BEGIN; also
do other stuff like SET TIMEZONE each time as well. Is this really a
case worth worrying about?--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
At 2014-06-19 10:58:34 +0200, pavel.stehule@gmail.com wrote: > > Hello > > pgbouncer has idle_transaction_timeout defined some years without any > problems, so we can take this design > > idle_transaction_timeout Let's steal the name too. :-) (I didn't realise there was precedent in pgbouncer, but given that the name is in use already, it'll be less confusing to use that name for Postgres as well.) -- Abhijit
Abhijit Menon-Sen <ams@2ndquadrant.com> wrote: > At 2014-06-19 10:58:34 +0200, pavel.stehule@gmail.com wrote: >> pgbouncer has idle_transaction_timeout defined some years >> without any problems, so we can take this design >> >> idle_transaction_timeout > > Let's steal the name too. :-) > > (I didn't realise there was precedent in pgbouncer, but given > that the name is in use already, it'll be less confusing to use > that name for Postgres as well.) I'm not sure whether using the same name as pgbouncer for the same functionality makes it less confusing or might lead to confusion about which layer is disconnecting the client. I'm leaning toward using the same name, but I'm really not sure. Does anyone else want to offer an opinion? Somehow I had thought that pg_stat_activity didn't show a connection as "idle in transaction" as soon as BEGIN was run -- I thought some subsequent statement was needed to put it into that state; but I see that I was wrong about that. As long as BEGIN causes a connection to show as "idle in transaction" I think the BEGIN should start the clock on this timeout, based on POLA. It wouldn't bother me to see us distinguish among early transaction states, but I'm not sure whether it's really worth the effort. We couldn't base it just on whether the transaction has acquired a snapshot, since it is not unusual for explicit LOCK statements at the front of the transaction to run before a snapshot is acquired, and we would want to terminate a transaction sitting idle and holding locks. Also, it seems like most are ok with the FATAL approach (as in v1 of this patch). Does anyone want to make an argument against proceeding with that, in light of discussion so far? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 06/19/2014 05:42 PM, Kevin Grittner wrote: > Abhijit Menon-Sen <ams@2ndquadrant.com> wrote: >> At 2014-06-19 10:58:34 +0200, pavel.stehule@gmail.com wrote: > >>> pgbouncer has idle_transaction_timeout defined some years >>> without any problems, so we can take this design >>> >>> idle_transaction_timeout >> >> Let's steal the name too. :-) >> >> (I didn't realise there was precedent in pgbouncer, but given >> that the name is in use already, it'll be less confusing to use >> that name for Postgres as well.) > > I'm not sure whether using the same name as pgbouncer for the same > functionality makes it less confusing or might lead to confusion > about which layer is disconnecting the client. I'm leaning toward > using the same name, but I'm really not sure. Does anyone else > want to offer an opinion? I much prefer with "in" but it doesn't much matter. > Somehow I had thought that pg_stat_activity didn't show a > connection as "idle in transaction" as soon as BEGIN was run -- I > thought some subsequent statement was needed to put it into that > state; but I see that I was wrong about that. As long as BEGIN > causes a connection to show as "idle in transaction" I think the > BEGIN should start the clock on this timeout, based on POLA. For me, this is a separate patch. Whether it goes before or after this one doesn't make a big difference, I don't think. > It wouldn't bother me to see us distinguish among early transaction > states, but I'm not sure whether it's really worth the effort. We > couldn't base it just on whether the transaction has acquired a > snapshot, since it is not unusual for explicit LOCK statements at > the front of the transaction to run before a snapshot is acquired, > and we would want to terminate a transaction sitting idle and > holding locks. > > Also, it seems like most are ok with the FATAL approach (as in v1 > of this patch). Does anyone want to make an argument against > proceeding with that, in light of discussion so far? From my view, we have these outstanding issues: - the name - begin-only transactions - aborted transactions Those last two could arguably be considered identical. If the client issued a BEGIN, then the client is in a transaction and I don't think we should report otherwise. So then we need to decide why "idle in transaction (aborted)" gets killed but "idle in transaction (initiated)" doesn't. The v1 patch doesn't currently kill the former but there was question that it should. I still believe it shouldn't. Anyway, I'm willing to modify the patch in any way that consensus dictates. -- Vik
At 2014-06-19 08:42:01 -0700, kgrittn@ymail.com wrote: > > I'm not sure whether using the same name as pgbouncer for the same > functionality makes it less confusing or might lead to confusion > about which layer is disconnecting the client. I don't think the names of the respective configuration settings will significantly affect that confusion either way. (I can imagine people being confused if they're using pgbouncer without the timeout in front of a server with the timeout. But since they would have to have turned it on explicitly on the server, it can't all THAT much of a surprise. Or at least not that hard to figure out.) > As long as BEGIN causes a connection to show as "idle in transaction" > I think the BEGIN should start the clock on this timeout, based on > POLA. Yes, agreed. I don't think it's worth doing anything more complicated. > Also, it seems like most are ok with the FATAL approach (as in v1 > of this patch). I don't have a strong opinion, but I think FATAL is the pragmatic approach. -- Abhijit
At 2014-06-19 17:53:17 +0200, vik.fearing@dalibo.com wrote: > > I much prefer with "in" but it doesn't much matter. If you look at similar settings like statement_timeout, lock_timeout, etc., what comes before the _timeout is a concrete *thing* that can timeout, or that a timeout can be applied to (e.g. wal_receiver). "What's timing out?" "A statement." But in "idle_in_transaction_timeout", "idle_in_transaction" is not a thing. It's a description of the state of a thing (the thing being a session in the FATAL variant of your patch, or a transaction in the ERROR variant). "What's timing out?" "An idle in transaction." "Huh?" Strictly speaking, by this logic, the consistent name for the setting in the FATAL variant would be "idle_in_transaction_session_timeout", while for the ERROR variant it would be "idle_transaction_timeout". Both those names pass the "What's timing out?" test. But "idle_in_transaction_timeout" alone doesn't, and that's why I can't bring myself to like it. And pgbouncer's use of "idle_transaction_timeout" is a weak precedent to continue to use the same name for the same functionality. Anyway, as you say, it doesn't matter so much. I promise I won't beat the nomenclature horse any more. I just wanted to explain my thinking once. Sorry for dragging it out. -- Abhijit
At 2014-06-19 17:53:17 +0200, [hidden email] wrote:If you look at similar settings like statement_timeout, lock_timeout,
>
> I much prefer with "in" but it doesn't much matter.
etc., what comes before the _timeout is a concrete *thing* that can
timeout, or that a timeout can be applied to (e.g. wal_receiver).
"What's timing out?" "A statement."
But in "idle_in_transaction_timeout", "idle_in_transaction" is not a
thing. It's a description of the state of a thing (the thing being a
session in the FATAL variant of your patch, or a transaction in the
ERROR variant).
"What's timing out?" "An idle in transaction." "Huh?"
Strictly speaking, by this logic, the consistent name for the setting in
the FATAL variant would be "idle_in_transaction_session_timeout",
+1; I even suggested something similar (I omitted the "in") - though we hadn't come to a firm conclusion on what behavior we were going to implement at the time. Adding session reasonably precludes us from easily changing our mind later (or giving the user an option) but that doesn't seem likely anyway.
Advocating for the Devil (or a more robust, if probably complicated, solution):
"idle_in_transaction_timeout=10s"
"idle_in_transaction_target=session|transaction"
Would be a valid pair since the first intentionally would not want to specify a target - and thus does not fit within the pattern you defined.
"idle_transaction_timeout" is bad since idle is a valid state that is not being affected by this patch; whereas pgbouncer indeed drops truly idle connections.
David J.
View this message in context: Re: idle_in_transaction_timeout
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
On 06/19/2014 10:39 AM, David G Johnston wrote: > Advocating for the Devil (or a more robust, if probably complicated, > solution): > "idle_in_transaction_timeout=10s" > "idle_in_transaction_target=session|transaction" Per Kevin Grittner's assessment, aborting the transaction is currently a nonstarter. So no need for a 2nd GUC. Also, I really hope that nobody is setting this to 10s ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> wrote: > Also, I really hope that nobody is setting this to 10s ... Well, we get to decide what the minimum allowed value is. What do you think that should be? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 06/19/2014 02:44 PM, Kevin Grittner wrote: > Josh Berkus <josh@agliodbs.com> wrote: > >> Also, I really hope that nobody is setting this to 10s ... > > Well, we get to decide what the minimum allowed value is. What do > you think that should be? 1s? I don't think that setting it to 1s would ever be a good idea, but I don't want to tie users' hands if they know better than I do. Also, unless we use 1s granuarity, users can't set it to values like 45s, which is more reasonable. I can't see any circumstance under which sub-second values would ever make sense. Now ... can we decrease CPU overhead (wakeups) if we only check once per minute? If so, there's a good argument for making 1min the minimum. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: > Now ... can we decrease CPU overhead (wakeups) if we only check once per > minute? If so, there's a good argument for making 1min the minimum. Polling wakeups are right out, and are unnecessary anyway. The utils/misc/timeout.c infrastructure calculates the time left till the closest timeout event. So I don't see a need to worry about that end of it. ISTM our realistic options are for seconds or msec as the unit. If it's msec, we'd be limited to INT_MAX msec or around 600 hours at the top end, which seems like enough to me but maybe somebody thinks differently? Seconds are probably OK but I'm worried about somebody complaining that that's not enough resolution, especially as machines get faster. Whichever the unit, I don't see a reason to set a lower bound different from "one". You ask for a 1ms timeout, we'll give it to you, it's your problem whether that's sane in your environment. regards, tom lane
Tom, > ISTM our realistic options are for seconds or msec as the unit. If it's > msec, we'd be limited to INT_MAX msec or around 600 hours at the top end, > which seems like enough to me but maybe somebody thinks differently? > Seconds are probably OK but I'm worried about somebody complaining that > that's not enough resolution, especially as machines get faster. I can picture a 500ms timeout more readily than I can picture a 1000hr timeout. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus wrote: > Tom, > > > ISTM our realistic options are for seconds or msec as the unit. If it's > > msec, we'd be limited to INT_MAX msec or around 600 hours at the top end, > > which seems like enough to me but maybe somebody thinks differently? > > Seconds are probably OK but I'm worried about somebody complaining that > > that's not enough resolution, especially as machines get faster. > > I can picture a 500ms timeout more readily than I can picture a 1000hr > timeout. Agreed. 600 hours are upwards of 25 days. Dead tuples accumulated for that long would be a really serious problem, unless your database is almost totally idle. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 06/19/2014 06:33 PM, Josh Berkus wrote: > Tom, > >> ISTM our realistic options are for seconds or msec as the unit. If it's >> msec, we'd be limited to INT_MAX msec or around 600 hours at the top end, >> which seems like enough to me but maybe somebody thinks differently? >> Seconds are probably OK but I'm worried about somebody complaining that >> that's not enough resolution, especially as machines get faster. > I can picture a 500ms timeout more readily than I can picture a 1000hr > timeout. > As long as we can specify the units, and don't have to say 1000 to mean 1 second, I agree. I would normally expect this to be set in terms of minutes rather than millisecs. cheers andrew
Andrew Dunstan <andrew@dunslane.net> wrote: > On 06/19/2014 06:33 PM, Josh Berkus wrote: >>> ISTM our realistic options are for seconds or msec as the unit. If it's >>> msec, we'd be limited to INT_MAX msec or around 600 hours at the top end, >>> which seems like enough to me but maybe somebody thinks differently? >>> Seconds are probably OK but I'm worried about somebody complaining that >>> that's not enough resolution, especially as machines get faster. >> I can picture a 500ms timeout more readily than I can picture a 1000hr >> timeout. > > As long as we can specify the units, and don't have to say 1000 to mean > 1 second, I agree. I would normally expect this to be set in terms of > minutes rather than millisecs. OK, so I think we want to see a patch based on v1 (FATAL approach) with a change of the name to idle_in_transaction_session_timeout and the units changed to milliseconds. I don't see why the remoteversion test shouldn't be changed to use 90500 now, too. I'll flip this to Waiting on Author for those changes. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 06/21/2014 08:23 PM, Kevin Grittner wrote: > OK, so I think we want to see a patch based on v1 (FATAL approach) > with a change of the name to idle_in_transaction_session_timeout > and the units changed to milliseconds. I don't see why the > remoteversion test shouldn't be changed to use 90500 now, too. The attached patch, rebased to current master, addresses all of these issues. > I'll flip this to Waiting on Author for those changes. And back to Needs Review. -- Vik
Attachment
At 2014-06-21 23:54:58 +0200, vik.fearing@dalibo.com wrote: > > > I'll flip this to Waiting on Author for those changes. > > And back to Needs Review. I've marked it Ready for Committer after a quick read-through. -- Abhijit
Abhijit Menon-Sen <ams@2ndquadrant.com> > I've marked it Ready for Committer after a quick read-through. I took a pass through it with an eye toward committing it. I found a couple minor whitespace issues, where the patch didn't follow conventional indenting practice; I can fix that no problem. I found that as it stood, the patch reduced the number of user timeouts which could be registered; I have a fix for that which I hope will also prevent future problems in that regard. None of that would have held up pushing it. I found one substantive issue that had been missed in discussion, though. The patch modifies the postgres_fdw extension to make it automatically exempt from an attempt to set a limit like this on the server to which it connects. I'm not sure that's a good idea. Why should this type of connection be allowed to sit indefinitely with an idle open transaction? I'm inclined to omit this part of the patch: +++ b/contrib/postgres_fdw/connection.c @@ -343,6 +343,13 @@ configure_remote_session(PGconn *conn) do_sql_command(conn, "SET extra_float_digits = 3"); else do_sql_command(conn, "SET extra_float_digits = 2"); + + /* + * Ensure the remote server doesn't kill us off if we remain idle in a + * transaction for too long. + */ + if (remoteversion >= 90500) + do_sql_command(conn, "SET idle_in_transaction_session_timeout = 0"); } /* (Please forgive any mangling of the whitespace above by my email provider.) Thoughts? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2014-06-21 11:23:44 -0700, Kevin Grittner wrote: > Andrew Dunstan <andrew@dunslane.net> wrote: > > > > On 06/19/2014 06:33 PM, Josh Berkus wrote: > > >>> ISTM our realistic options are for seconds or msec as the unit. If it's > >>> msec, we'd be limited to INT_MAX msec or around 600 hours at the top end, > >>> which seems like enough to me but maybe somebody thinks differently? > >>> Seconds are probably OK but I'm worried about somebody complaining that > >>> that's not enough resolution, especially as machines get faster. > >> I can picture a 500ms timeout more readily than I can picture a 1000hr > >> timeout. > > > > As long as we can specify the units, and don't have to say 1000 to mean > > 1 second, I agree. I would normally expect this to be set in terms of > > minutes rather than millisecs. > > > OK, so I think we want to see a patch based on v1 (FATAL approach) > with a change of the name to idle_in_transaction_session_timeout > and the units changed to milliseconds. The idea with the GUC name is that if we ever get support for cancelling transactions we can name that idle_in_transaction_transaction_timeout? That seems a bit awkward... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> wrote: > The idea with the GUC name is that if we ever get support for > cancelling transactions we can name that > idle_in_transaction_transaction_timeout? > That seems a bit awkward... No, the argument was that for all the other *_timeout settings what came before _timeout was the thing that was being terminated. I think there were some votes in favor of the name on that basis, and none against. Feel free to give your reasons for supporting some other name. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2014-06-22 09:27:24 -0700, Kevin Grittner wrote: > Andres Freund <andres@2ndquadrant.com> wrote: > > > The idea with the GUC name is that if we ever get support for > > cancelling transactions we can name that > > idle_in_transaction_transaction_timeout? > > That seems a bit awkward... > > No, the argument was that for all the other *_timeout settings what > came before _timeout was the thing that was being terminated. I > think there were some votes in favor of the name on that basis, and > none against. Feel free to give your reasons for supporting some > other name. My reasons for not liking the current GUC name are hinted at above. I think we'll want a version of this that just fails the transaction once we have the infrastructure. So we should choose a name that allows for a complimentary GUC. CAKFQuwZCg2uur=tUdz_C2aUwBo87ofFGhn9Mx_HZ4QD-b8fe2Q@mail.gmail.com suggested On 2014-06-19 10:39:48 -0700, David G Johnston wrote: > "idle_in_transaction_timeout=10s" > "idle_in_transaction_target=session|transaction" but I don't like that much. Not sure what'd be good, the best I currently can come up with is: idle_in_transaction_termination_timeout = idle_in_transaction_cancellation_timeout = Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
2014-06-22 19:47 GMT+02:00 Andres Freund <andres@2ndquadrant.com>:
+1
On 2014-06-22 09:27:24 -0700, Kevin Grittner wrote:My reasons for not liking the current GUC name are hinted at above. I think
> Andres Freund <andres@2ndquadrant.com> wrote:
>
> > The idea with the GUC name is that if we ever get support for
> > cancelling transactions we can name that
> > idle_in_transaction_transaction_timeout?
> > That seems a bit awkward...
>
> No, the argument was that for all the other *_timeout settings what
> came before _timeout was the thing that was being terminated. I
> think there were some votes in favor of the name on that basis, and
> none against. Feel free to give your reasons for supporting some
> other name.
we'll want a version of this that just fails the transaction once we
have the infrastructure. So we should choose a name that allows for
a complimentary GUC.
CAKFQuwZCg2uur=tUdz_C2aUwBo87ofFGhn9Mx_HZ4QD-b8fe2Q@mail.gmail.com
suggested
On 2014-06-19 10:39:48 -0700, David G Johnston wrote:
> "idle_in_transaction_timeout=10s"
> "idle_in_transaction_target=session|transaction"
but I don't like that much. Not sure what'd be good, the best I
currently can come up with is:
idle_in_transaction_termination_timeout =
idle_in_transaction_cancellation_timeout =
+1
Pavel
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@2ndquadrant.com> wrote: > I think we'll want a version of this that just fails the > transaction once we have the infrastructure. So we should choose > a name that allows for a complimentary GUC. If we stick with the rule that what is to the left of _timeout is what is being cancelled, the a GUC to cancel a transaction which remains idle for too long could be called idle_transaction_timeout. Do you disagree with the general idea of following that pattern? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sunday, June 22, 2014, Kevin Grittner-5 [via PostgreSQL] <[hidden email]> wrote:
View this message in context: Re: idle_in_transaction_timeout
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
Andres Freund <[hidden email]> wrote:
> I think we'll want a version of this that just fails the
> transaction once we have the infrastructure. So we should choose
> a name that allows for a complimentary GUC.
If we stick with the rule that what is to the left of _timeout is
what is being cancelled, the a GUC to cancel a transaction which
remains idle for too long could be called idle_transaction_timeout.
Do you disagree with the general idea of following that pattern?
If we ever do give the user an option the non-specific name with separate type GUC could be used and this session specific variable deprecated. And disallow both to be active at the same time. Or something else. I agree that idle_in_transaction_transaction would be proper but troublesome for the alternative but crossing that bridge if we ever get there seems reasonable in light of picking the best single name for this specific feature.
Idle_transaction_timeout has already been discarded since truly idle transactions are not being affected, only those that are in transaction. The first quote above is limited to that subset as well.
David J.
View this message in context: Re: idle_in_transaction_timeout
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
At 2014-06-22 19:45:08 -0700, david.g.johnston@gmail.com wrote: > > On Sunday, June 22, 2014, Kevin Grittner-5 [via PostgreSQL] < > ml-node+s1045698n5808309h3@n5.nabble.com> wrote: > > > If we stick with the rule that what is to the left of _timeout is > > what is being cancelled, the a GUC to cancel a transaction which > > remains idle for too long could be called idle_transaction_timeout. I (somewhat reluctantly) agree with Kevin that "idle_in_transaction_session_timeout" (for FATAL) and "idle_transaction_timeout" (for ERROR) would work. The only other alternative I see is to use "idle_transaction_timeout" now (even when we're killing the session) and later introduce another setting named "idle_transaction_timeout_keep_session" (default false) or something like that. (I'd prefer an extra boolean to something set to 'session' or 'transaction'.) > Idle_transaction_timeout has already been discarded since truly idle > transactions are not being affected, only those that are in > transaction. I have no idea what this means. -- Abhijit
On 2014-06-22 19:03:32 -0700, Kevin Grittner wrote: > Andres Freund <andres@2ndquadrant.com> wrote: > > > I think we'll want a version of this that just fails the > > transaction once we have the infrastructure. So we should choose > > a name that allows for a complimentary GUC. > > If we stick with the rule that what is to the left of _timeout is > what is being cancelled, the a GUC to cancel a transaction which > remains idle for too long could be called idle_transaction_timeout. > > Do you disagree with the general idea of following that pattern? I think that'd be rather confusing. For one it'd need to be idle_in_transaction_timeout which already seems less clear (because the transaction belongs to idle) and for another that distinction seems to be to subtle for users. The reason I suggested idle_in_transaction_termination/cancellation_timeout is that that maps nicely to pg_terminate/cancel_backend() and is rather descriptive. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Jun 23, 2014 at 7:48 PM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2014-06-22 19:03:32 -0700, Kevin Grittner wrote: >> Andres Freund <andres@2ndquadrant.com> wrote: >> >> > I think we'll want a version of this that just fails the >> > transaction once we have the infrastructure. So we should choose >> > a name that allows for a complimentary GUC. >> >> If we stick with the rule that what is to the left of _timeout is >> what is being cancelled, the a GUC to cancel a transaction which >> remains idle for too long could be called idle_transaction_timeout. >> >> Do you disagree with the general idea of following that pattern? > > I think that'd be rather confusing. For one it'd need to be > idle_in_transaction_timeout which already seems less clear (because the > transaction belongs to idle) and for another that distinction seems to > be to subtle for users. > > The reason I suggested > idle_in_transaction_termination/cancellation_timeout is that that maps > nicely to pg_terminate/cancel_backend() and is rather descriptive. Maybe we can remove IIT_termination_timeout when we've implemented IIT_cancellation_timeout. Right? I'm not sure if IIT_termination_timeout is still useful even at that case. *If* it's not useful, I think we don't need to have those two parameters and can just define one parameter IIT_timeout. That's quite simple and it's similar to the current style of statement_timeout and lock_timeout (IOW, we don't have something like statement_termination_timeout and lock_termination_timeout). Regards, -- Fujii Masao
On 2014-06-23 20:29:17 +0900, Fujii Masao wrote: > On Mon, Jun 23, 2014 at 7:48 PM, Andres Freund <andres@2ndquadrant.com> wrote: > > On 2014-06-22 19:03:32 -0700, Kevin Grittner wrote: > >> Andres Freund <andres@2ndquadrant.com> wrote: > >> > >> > I think we'll want a version of this that just fails the > >> > transaction once we have the infrastructure. So we should choose > >> > a name that allows for a complimentary GUC. > >> > >> If we stick with the rule that what is to the left of _timeout is > >> what is being cancelled, the a GUC to cancel a transaction which > >> remains idle for too long could be called idle_transaction_timeout. > >> > >> Do you disagree with the general idea of following that pattern? > > > > I think that'd be rather confusing. For one it'd need to be > > idle_in_transaction_timeout which already seems less clear (because the > > transaction belongs to idle) and for another that distinction seems to > > be to subtle for users. > > > > The reason I suggested > > idle_in_transaction_termination/cancellation_timeout is that that maps > > nicely to pg_terminate/cancel_backend() and is rather descriptive. > > Maybe we can remove IIT_termination_timeout when we've implemented > IIT_cancellation_timeout. Right? I'm not sure if IIT_termination_timeout is > still useful even at that case. I think both can actually be sensible depending on the use case. It's also not nice to remove a feature without need when people started to rely on it. For a web app termination is probably more sensible. For interactive clients cancellation. > *If* it's not useful, I think we don't need to > have those two parameters and can just define one parameter IIT_timeout. > That's quite simple and it's similar to the current style of statement_timeout > and lock_timeout (IOW, we don't have something like > statement_termination_timeout and lock_termination_timeout). I don't think those really are comparable. A long idle in transaction state pretty much always indicates a problematic interaction with postgres. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 06/22/2014 07:47 PM, Andres Freund wrote: > On 2014-06-22 09:27:24 -0700, Kevin Grittner wrote: >> Andres Freund <andres@2ndquadrant.com> wrote: >> >>> The idea with the GUC name is that if we ever get support for >>> cancelling transactions we can name that >>> idle_in_transaction_transaction_timeout? >>> That seems a bit awkward... >> >> No, the argument was that for all the other *_timeout settings what >> came before _timeout was the thing that was being terminated. I >> think there were some votes in favor of the name on that basis, and >> none against. Feel free to give your reasons for supporting some >> other name. > > My reasons for not liking the current GUC name are hinted at above. I think > we'll want a version of this that just fails the transaction once we > have the infrastructure. So we should choose a name that allows for > a complimentary GUC. > CAKFQuwZCg2uur=tUdz_C2aUwBo87ofFGhn9Mx_HZ4QD-b8fe2Q@mail.gmail.com > suggested > On 2014-06-19 10:39:48 -0700, David G Johnston wrote: >> "idle_in_transaction_timeout=10s" >> "idle_in_transaction_target=session|transaction" > > but I don't like that much. Not sure what'd be good, the best I > currently can come up with is: > idle_in_transaction_termination_timeout = > idle_in_transaction_cancellation_timeout = Except the transaction wouldn't be cancelled, it would be aborted. idle_in_transaction_abortion_timeout seems a little... weird. -- Vik
On 2014-06-23 13:33:46 +0200, Vik Fearing wrote: > On 06/22/2014 07:47 PM, Andres Freund wrote: > > On 2014-06-22 09:27:24 -0700, Kevin Grittner wrote: > >> Andres Freund <andres@2ndquadrant.com> wrote: > > but I don't like that much. Not sure what'd be good, the best I > > currently can come up with is: > > idle_in_transaction_termination_timeout = > > idle_in_transaction_cancellation_timeout = > > Except the transaction wouldn't be cancelled, it would be aborted. That ship has sailed with pg_cancel_backend(), no? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 06/22/2014 09:02 PM, Abhijit Menon-Sen wrote: > I (somewhat reluctantly) agree with Kevin that > "idle_in_transaction_session_timeout" (for FATAL) and > "idle_transaction_timeout" (for ERROR) would work. Given that an IIT timeout has been a TODO for at least 6 years before being addressed, I'm not convinced that we need to worry about what an eventual error vs. fatal timeout should be named or how it should be scoped. Let's attack that when someone actually shows an inclination to work on it. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Andres Freund <andres@2ndquadrant.com> wrote: > On 2014-06-23 20:29:17 +0900, Fujii Masao wrote: >> On Mon, Jun 23, 2014 at 7:48 PM, Andres Freund <andres@2ndquadrant.com> wrote: >>> On 2014-06-22 19:03:32 -0700, Kevin Grittner wrote: >>>> Andres Freund <andres@2ndquadrant.com> wrote: >>>> >>>>> I think we'll want a version of this that just fails the >>>>> transaction once we have the infrastructure. So we should choose >>>>> a name that allows for a complimentary GUC. How about choosing a name for that, if we ever get there, where what precedes _timeout describes what is being terminated? Like the idle transaction itself, rather than the session which shows as being in "idle in transaction" status because of the idle transaction which is using the session? >>>> If we stick with the rule that what is to the left of _timeout is >>>> what is being cancelled, the a GUC to cancel a transaction which >>>> remains idle for too long could be called idle_transaction_timeout. >>>> >>>> Do you disagree with the general idea of following that pattern? >>> >>> I think that'd be rather confusing. For one it'd need to be >>> idle_in_transaction_timeout Why? We're cancelling an idle transaction, not an "idle in transaction", whatever that is. >>> which already seems less clear (because the transaction belongs >>> to idle) I have no idea what that means. >>> and for another that distinction seems to be to subtle for users. The difference between an "idle in transaction session" and an "idle transaction" is too subtle for someone preparing to terminate one of those? >>> The reason I suggested >>> idle_in_transaction_termination/cancellation_timeout is that that maps >>> nicely to pg_terminate/cancel_backend() and is rather descriptive. I've always hated that naming because it is so confusing and doesn't describe what is being terminated. What makes it obvious that "terminate" is something you do to a session rather than a transaction? What makes it obvious that "cancel" is something you do to a transaction but not to a session? Now if those were named something like pg_rollback_backend_transaction() and pg_close_backend_session() (or anything else where the names actually made clear what was happening) then borrowing terminology would be a stronger argument. It still seems to me to be a weaker argument than continuing the pattern we have for *_timeout GUCs. though. >> Maybe we can remove IIT_termination_timeout when we've implemented >> IIT_cancellation_timeout. Right? I'm not sure if IIT_termination_timeout is >> still useful even at that case. In my experience, for the same reasons Robert gave much earlier in the thread, if both were available the one which would be more appropriate for cases I've seen would be the one that closed the session with a FATAL message in the log. If we were only going to have one or the other, the one that terminated the session is the one that *I* would prefer to see. > A long idle in transaction state pretty much always indicates a > problematic interaction with postgres. True. Which makes me wonder whether we shouldn't default this to something non-zero -- even if it is 5 or 10 days. It also gets me thinking about whether a good follow-on patch would be a timeout for prepared transactions. I would have trouble counting how many nasty production problems I've seen which would have been prevented by having both time out after a few days. BTW, since nobody has commented on the issue of the postgres_fdw automatically exempting itself from the timeout, I will plan on removing that when the naming argument reaches something resembling a conclusion and I go to push this ... unless someone objects before that. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Why? We're cancelling an idle transaction, not an "idle in
>>>
>>> I think that'd be rather confusing. For one it'd need to be
>>> idle_in_transaction_timeout
transaction", whatever that is.
The confusion derives from the fact we are affecting a session whose state is "idle in transaction", not one that is idle. We are then, for this discussion, choosing to either kill the entire session or just the currently active transaction. After "idle_in_transaction" there is an unstated "session" being mentally placed by myself and probably others. Following that is then either "session" or "transaction" to denote what is being affected should the timeout interval come to pass.
Discarding that, probably flawed, mental model makes "idle_transaction_timeout" seem fine. "idle_in_transaction_session_timeout" would indeed be a natural complement to this.
I do not expect this concept, should it come to pass, to be that difficult to document or for someone to learn.
Along with other I still see no reason to avoid "IIT_session_timeout" at this point.
David J.
View this message in context: Re: idle_in_transaction_timeout
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
> A long idle in transaction state pretty much always indicates aTrue. Which makes me wonder whether we shouldn't default this to
> problematic interaction with postgres.
something non-zero -- even if it is 5 or 10 days.
I guess it depends on how parental we want to be. But if we go that route wouldn't a more harsh/in-your-face default make more sense? Something in Minutes, not Days...say '5min'...
David J.
View this message in context: Re: idle_in_transaction_timeout
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
On 2014-06-23 13:19:47 -0700, Kevin Grittner wrote: > >>> which already seems less clear (because the transaction belongs > >>> to idle) > > I have no idea what that means. It's "idle_in_transaction"_session_timeout. Not "idle_in"_transaction_session_timeout. > >>> and for another that distinction seems to be to subtle for users. > > The difference between an "idle in transaction session" and an > "idle transaction" is too subtle for someone preparing to terminate > one of those? Yes. To me that's an academic distinction. As a nonnative speaker it looks pretty much random that one has an "in" in it and the other doesn't. Maybe I'm just having a grammar fail, but there doesn't seem to be much sense in it. > >>> The reason I suggested > >>> idle_in_transaction_termination/cancellation_timeout is that that maps > >>> nicely to pg_terminate/cancel_backend() and is rather descriptive. > > I've always hated that naming because it is so confusing and > doesn't describe what is being terminated. Well, it's what's there. And it doesn't seem to cause much confusion. I don't see how it get's less confusing by introducing different terminology. > In my experience, for the same reasons Robert gave much earlier in > the thread, if both were available the one which would be more > appropriate for cases I've seen would be the one that closed the > session with a FATAL message in the log. If we were only going to > have one or the other, the one that terminated the session is the > one that *I* would prefer to see. I think it's just different usecases. For OLTP clients you probably primarily want to FATAL the session. But if you interactive sessions that's much less the case. If some DBA and even more so an analytics guy has a transaction open he'll have to live with the transaction being aborted. But you won't be liked for needlessly removing the 10 multi GB temporary tables with intermediate results. And don't you say protecting against that isn't necessary - I've seen databases slow to a crawl because some data analyst went home over the weekend with a open transaction after a meeting went on longer than planned. > > A long idle in transaction state pretty much always indicates a > > problematic interaction with postgres. > > True. Which makes me wonder whether we shouldn't default this to > something non-zero -- even if it is 5 or 10 days. -1. Can't really say why though. Just seems a bit odd. > It also gets me > thinking about whether a good follow-on patch would be a timeout > for prepared transactions. I think that might be useful although I think it'd be relatively hard to implement. I guess that for now monitoring pg_prepared_xacts will have to suffice. > BTW, since nobody has commented on the issue of the postgres_fdw > automatically exempting itself from the timeout, I will plan on > removing that when the naming argument reaches something resembling > a conclusion and I go to push this ... unless someone objects > before that. Sounds good. I haven't yet seen a justification for it so far... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 06/22/2014 05:11 PM, Kevin Grittner wrote: > I found one substantive issue that had been missed in discussion, > though. The patch modifies the postgres_fdw extension to make it > automatically exempt from an attempt to set a limit like this on > the server to which it connects. I'm not sure that's a good idea. > Why should this type of connection be allowed to sit indefinitely > with an idle open transaction? I'm inclined to omit this part of > the patch My reasoning for doing it the way I did is that if a transaction touches a foreign table and then goes bumbling along with other things the transaction is active but the connection to the remote server remains idle in transaction. If it hits the timeout, when the local transaction goes to commit it errors out and you lose all your work. If the local transaction is actually idle in transaction and the local server doesn't have a timeout, we're no worse off than before this patch. -- Vik
On 06/22/2014 05:11 PM, Kevin Grittner wrote:My reasoning for doing it the way I did is that if a transaction touches
> I found one substantive issue that had been missed in discussion,
> though. The patch modifies the postgres_fdw extension to make it
> automatically exempt from an attempt to set a limit like this on
> the server to which it connects. I'm not sure that's a good idea.
> Why should this type of connection be allowed to sit indefinitely
> with an idle open transaction? I'm inclined to omit this part of
> the patch
a foreign table and then goes bumbling along with other things the
transaction is active but the connection to the remote server remains
idle in transaction. If it hits the timeout, when the local transaction
goes to commit it errors out and you lose all your work.
If the local transaction is actually idle in transaction and the local
server doesn't have a timeout, we're no worse off than before this patch.
Going off of this reading alone wouldn't we have to allow the client to set the timeout on the fdw_server - to zero - to ensure reasonable operation? If the client has a process that requires 10 minutes to complete, and the foreign server has a default 5 minute timeout, if the client does not disable the timeout on the server wouldn't the foreign server always cause the process to abort?
David J.
View this message in context: Re: idle_in_transaction_timeout
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
David G Johnston <david.g.johnston@gmail.com> wrote: > Vik Fearing [via PostgreSQL] <[hidden email]>wrote: >> On 06/22/2014 05:11 PM, Kevin Grittner wrote: >>> I found one substantive issue that had been missed in discussion, >>> though. The patch modifies the postgres_fdw extension to make it >>> automatically exempt from an attempt to set a limit like this on >>> the server to which it connects. I'm not sure that's a good idea. >>> Why should this type of connection be allowed to sit indefinitely >>> with an idle open transaction? I'm inclined to omit this part of >>> the patch >> >> My reasoning for doing it the way I did is that if a transaction touches >> a foreign table and then goes bumbling along with other things the >> transaction is active but the connection to the remote server remains >> idle in transaction. If it hits the timeout, when the local transaction >> goes to commit it errors out and you lose all your work. >> >> If the local transaction is actually idle in transaction and the local >> server doesn't have a timeout, we're no worse off than before this patch. >> > > > Going off of this reading alone wouldn't we have to allow the > client to set the timeout on the fdw_server - to zero - to ensure > reasonable operation? If the client has a process that requires > 10 minutes to complete, and the foreign server has a default 5 > minute timeout, if the client does not disable the timeout on the > server wouldn't the foreign server always cause the process to > abort? That's what Vik did in his patch, and what I was questioning. I think he might be right, but I want to think about it. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 06/24/2014 03:29 PM, David G Johnston wrote: > On Tue, Jun 24, 2014 at 9:20 AM, Vik Fearing [via PostgreSQL] <[hidden > email] </user/SendEmail.jtp?type=node&node=5808883&i=0>>wrote: > > On 06/22/2014 05:11 PM, Kevin Grittner wrote: > > I found one substantive issue that had been missed in discussion, > > though. The patch modifies the postgres_fdw extension to make it > > automatically exempt from an attempt to set a limit like this on > > the server to which it connects. I'm not sure that's a good idea. > > Why should this type of connection be allowed to sit indefinitely > > with an idle open transaction? I'm inclined to omit this part of > > the patch > > My reasoning for doing it the way I did is that if a transaction > touches > a foreign table and then goes bumbling along with other things the > transaction is active but the connection to the remote server remains > idle in transaction. If it hits the timeout, when the local > transaction > goes to commit it errors out and you lose all your work. > > If the local transaction is actually idle in transaction and the local > server doesn't have a timeout, we're no worse off than before this > patch. > > > Going off of this reading alone wouldn't we have to allow the client to > set the timeout on the fdw_server - to zero - to ensure reasonable > operation? That's what the patch currently does. > If the client has a process that requires 10 minutes to > complete, and the foreign server has a default 5 minute timeout, if the > client does not disable the timeout on the server wouldn't the foreign > server always cause the process to abort? Yes. -- Vik
On Tue, Jun 24, 2014 at 9:18 AM, Vik Fearing <vik.fearing@dalibo.com> wrote: > On 06/22/2014 05:11 PM, Kevin Grittner wrote: >> I found one substantive issue that had been missed in discussion, >> though. The patch modifies the postgres_fdw extension to make it >> automatically exempt from an attempt to set a limit like this on >> the server to which it connects. I'm not sure that's a good idea. >> Why should this type of connection be allowed to sit indefinitely >> with an idle open transaction? I'm inclined to omit this part of >> the patch > > My reasoning for doing it the way I did is that if a transaction touches > a foreign table and then goes bumbling along with other things the > transaction is active but the connection to the remote server remains > idle in transaction. If it hits the timeout, when the local transaction > goes to commit it errors out and you lose all your work. > > If the local transaction is actually idle in transaction and the local > server doesn't have a timeout, we're no worse off than before this patch. I think we are. First, the correct timeout is a matter of remote-server-policy, not local-server-policy. If the remote server wants to boot people with long-running idle transactions, it's entitled to do that, and postgres_fdw shouldn't assume that it's "special". The local server policy may be different, and may not even have been configured by the same person. Second, setting another GUC at every session start adds overhead for all users of postgres_fdw. Now, it might be that postgres_fdw should have a facility to allow arbitrary options to be set on the foreign side at each connection startup. Then that could be used here if someone wants this behavior. But I don't think we should hard-code it, because it could also be NOT what someone wants. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2014-06-24 10:04:03 -0400, Robert Haas wrote: > On Tue, Jun 24, 2014 at 9:18 AM, Vik Fearing <vik.fearing@dalibo.com> wrote: > > My reasoning for doing it the way I did is that if a transaction touches > > a foreign table and then goes bumbling along with other things the > > transaction is active but the connection to the remote server remains > > idle in transaction. If it hits the timeout, when the local transaction > > goes to commit it errors out and you lose all your work. > > > > If the local transaction is actually idle in transaction and the local > > server doesn't have a timeout, we're no worse off than before this patch. > > I think we are. First, the correct timeout is a matter of > remote-server-policy, not local-server-policy. If the remote server > wants to boot people with long-running idle transactions, it's > entitled to do that, and postgres_fdw shouldn't assume that it's > "special". The local server policy may be different, and may not even > have been configured by the same person. Second, setting another GUC > at every session start adds overhead for all users of postgres_fdw. +1 > Now, it might be that postgres_fdw should have a facility to allow > arbitrary options to be set on the foreign side at each connection > startup. Then that could be used here if someone wants this behavior. > But I don't think we should hard-code it, because it could also be NOT > what someone wants. I think options=-c idle_in_transaction_timeout=0 in the server config should already do the trick. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Jun 24, 2014 at 10:05 AM, Robert Haas [via PostgreSQL] <[hidden email]> wrote:
On Tue, Jun 24, 2014 at 9:18 AM, Vik Fearing <[hidden email]> wrote:I think we are. First, the correct timeout is a matter of
> On 06/22/2014 05:11 PM, Kevin Grittner wrote:
>> I found one substantive issue that had been missed in discussion,
>> though. The patch modifies the postgres_fdw extension to make it
>> automatically exempt from an attempt to set a limit like this on
>> the server to which it connects. I'm not sure that's a good idea.
>> Why should this type of connection be allowed to sit indefinitely
>> with an idle open transaction? I'm inclined to omit this part of
>> the patch
>
> My reasoning for doing it the way I did is that if a transaction touches
> a foreign table and then goes bumbling along with other things the
> transaction is active but the connection to the remote server remains
> idle in transaction. If it hits the timeout, when the local transaction
> goes to commit it errors out and you lose all your work.
>
> If the local transaction is actually idle in transaction and the local
> server doesn't have a timeout, we're no worse off than before this patch.
remote-server-policy, not local-server-policy. If the remote server
wants to boot people with long-running idle transactions, it's
entitled to do that, and postgres_fdw shouldn't assume that it's
"special". The local server policy may be different, and may not even
have been configured by the same person. Second, setting another GUC
at every session start adds overhead for all users of postgres_fdw.
Now, it might be that postgres_fdw should have a facility to allow
arbitrary options to be set on the foreign side at each connection
startup. Then that could be used here if someone wants this behavior.
But I don't think we should hard-code it, because it could also be NOT
what someone wants.
The missing ability is that while the user only cares about the one logical session we are dealing with two physical sessions in a parent-child relationship where the child session state does not match that of its parent. For me, this whole line of thought is based upon the logical "idle_in_transaction" - did the application really mean to leave this hanging?
Say that 90% of the time disabling the timeout will be the correct course of action; making the user do this explicitly does not seem reasonable. And if "doesn't matter" is the current state when the foreign server is configured no setting will be passed. Then if the remote server does institute a timeout all the relevant configurations will need to be changed.
ISTM that the additional overhead in this case would be very small in percentage terms; at least enough so that usability would be my default choice.
I have no problem allowing for user-specified behavior but the default of disabling the timeout seems reasonable. I am doubting that actually synchronizing the parent and child sessions, so that the child reports the same status as the parent, is a valid option - though it would be the "best" solution since the child would only report IIT if the parent was IIT.
For me, a meaningful default and usability are trumping the unknown performance degradation. I can go either way on allowing the local definition to specify its own non-zero timeout but it probably isn't worth the effort. The foreign server administrator ultimately will have to be aware of which users are connecting via FDW and address his "long-running transaction" concerns in a more nuanced way than this parameter allows. In effect this becomes an 80% solution because it is not (all that) useful on the remote end of a FDW connection; though at least the local end can make proper use of it to protect both servers.
David J.
View this message in context: Re: idle_in_transaction_timeout
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
On 06/24/2014 04:04 PM, Robert Haas wrote: >> If the local transaction is actually idle in transaction and the local >> > server doesn't have a timeout, we're no worse off than before this patch. > > I think we are. First, the correct timeout is a matter of > remote-server-policy, not local-server-policy. If the remote server > wants to boot people with long-running idle transactions, it's > entitled to do that, and postgres_fdw shouldn't assume that it's > "special". So how would the local transaction ever get its work done? What option does it have to tell the remote server that it isn't actually idling, it just doesn't need to use the remote connection for a while? Once the remote times out, the local transaction is doomed (and won't even know it until it tries to commit). If we don't allow the fdw to be special, then the local transaction can't run at all. Ever. The point of the patch is to allow the DBA to knock off broken clients, but this isn't a broken client, it just looks like one. -- Vik
On Tue, Jun 24, 2014 at 10:50 AM, Vik Fearing <vik.fearing@dalibo.com> wrote: > On 06/24/2014 04:04 PM, Robert Haas wrote: >>> If the local transaction is actually idle in transaction and the local >>> > server doesn't have a timeout, we're no worse off than before this patch. >> >> I think we are. First, the correct timeout is a matter of >> remote-server-policy, not local-server-policy. If the remote server >> wants to boot people with long-running idle transactions, it's >> entitled to do that, and postgres_fdw shouldn't assume that it's >> "special". > > So how would the local transaction ever get its work done? What option > does it have to tell the remote server that it isn't actually idling, it > just doesn't need to use the remote connection for a while? It *is* idling. You're going to get bloat, and lock contention, and so on, just as you would for any other idle session. I mean, you could make this assumption about any session: I'm not done with the transaction yet, e.g. I'm waiting for user input before deciding what to do next. That doesn't mean that the DBA doesn't want to kill it. > The point of the patch is to allow the DBA to knock off broken clients, > but this isn't a broken client, it just looks like one. If it walks like a duck, and quacks like a duck, it's a duck. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Jun 24, 2014 at 11:11 AM, Robert Haas [via PostgreSQL] <[hidden email]> wrote:
On Tue, Jun 24, 2014 at 10:50 AM, Vik Fearing <[hidden email]> wrote:It *is* idling. You're going to get bloat, and lock contention, and
> On 06/24/2014 04:04 PM, Robert Haas wrote:
>>> If the local transaction is actually idle in transaction and the local
>>> > server doesn't have a timeout, we're no worse off than before this patch.
>>
>> I think we are. First, the correct timeout is a matter of
>> remote-server-policy, not local-server-policy. If the remote server
>> wants to boot people with long-running idle transactions, it's
>> entitled to do that, and postgres_fdw shouldn't assume that it's
>> "special".
>
> So how would the local transaction ever get its work done? What option
> does it have to tell the remote server that it isn't actually idling, it
> just doesn't need to use the remote connection for a while?
so on, just as you would for any other idle session.
If an application is making use of the foreign server directly then there is the option to commit after using the foreign server, while saving the relevant data for the main transaction. But if you make use of API functions there can only be a single transaction encompassing both the local and foreign servers. But even then, if the user needs a logical super-transaction across both servers - even though the bulk of the work occurs locally - that option to commit is then removed regardless of client usage.
IMO this tool is too blunt to properly allow servers to self-manage fdw-initiated transactions/sessions; and allowing it to be used is asking for end-user confusion and frustration.
OTOH, requiring the administrator of the foreign server to issue an ALTER ROLE fdw_user SET idle_in_transaction_session_timeout = 0; would be fairly easy to justify. Allowing them to distinguish between known long-running and problematic transactions and those that are expected to execute quickly has value as well.
Ultimately you give the users power and then just need to make sure we provide sufficient documentation suggestions on how best to configure the two servers in various typical usage scenarios.
David J.
View this message in context: Re: idle_in_transaction_timeout
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
On 06/23/2014 03:52 PM, Andres Freund wrote: > On 2014-06-23 13:19:47 -0700, Kevin Grittner wrote: >>>>> which already seems less clear (because the transaction belongs >>>>> to idle) >> >> I have no idea what that means. > > It's "idle_in_transaction"_session_timeout. Not > "idle_in"_transaction_session_timeout. > >>>>> and for another that distinction seems to be to subtle for users. >> >> The difference between an "idle in transaction session" and an >> "idle transaction" is too subtle for someone preparing to terminate >> one of those? > > Yes. To me that's an academic distinction. As a nonnative speaker it > looks pretty much random that one has an "in" in it and the other > doesn't. Maybe I'm just having a grammar fail, but there doesn't seem to > be much sense in it. As a native speaker, I find the distinction elusive as well. If someone was actually planning to commit transaction cancel, I'd object to it. And frankly, it doesn't make any sense to have two independent timeouts anyway. Only one of them would ever be invoked, whichever one came first. If you really want to plan for a feature I doubt anyone is going to write, the appropriate two GUCs are: idle_transaction_timeout: ## ms idle_transaction_timeout_action: cancel | terminate However, since I'm not convinced that anyone is ever going to write the "cancel" version, can we please just leave the 2nd GUC out for now? >>> A long idle in transaction state pretty much always indicates a >>> problematic interaction with postgres. >> >> True. Which makes me wonder whether we shouldn't default this to >> something non-zero -- even if it is 5 or 10 days. I'd go for even shorter: 48 hours. I'd suggest 24 hours, but that would trip up some users who just need really long pg_dumps. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 06/24/2014 06:43 PM, Josh Berkus wrote: >>>> A long idle in transaction state pretty much always indicates a >>>> >>> problematic interaction with postgres. >>> >> >>> >> True. Which makes me wonder whether we shouldn't default this to >>> >> something non-zero -- even if it is 5 or 10 days. > > I'd go for even shorter: 48 hours. I'd suggest 24 hours, but that would > trip up some users who just need really long pg_dumps. Why would pg_dump be idle for 24 hours? -- Vik
2014-06-24 18:43 GMT+02:00 Josh Berkus <josh@agliodbs.com>:
On 06/23/2014 03:52 PM, Andres Freund wrote:
> On 2014-06-23 13:19:47 -0700, Kevin Grittner wrote:
>>>>> which already seems less clear (because the transaction belongs
>>>>> to idle)
>>
>> I have no idea what that means.
>
> It's "idle_in_transaction"_session_timeout. Not
> "idle_in"_transaction_session_timeout.
>
>>>>> and for another that distinction seems to be to subtle for users.
>>
>> The difference between an "idle in transaction session" and an
>> "idle transaction" is too subtle for someone preparing to terminate
>> one of those?
>
> Yes. To me that's an academic distinction. As a nonnative speaker it
> looks pretty much random that one has an "in" in it and the other
> doesn't. Maybe I'm just having a grammar fail, but there doesn't seem to
> be much sense in it.
As a native speaker, I find the distinction elusive as well. If someone
was actually planning to commit transaction cancel, I'd object to it.
And frankly, it doesn't make any sense to have two independent timeouts
anyway. Only one of them would ever be invoked, whichever one came
first. If you really want to plan for a feature I doubt anyone is going
to write, the appropriate two GUCs are:
idle_transaction_timeout: ## ms
idle_transaction_timeout_action: cancel | terminate
However, since I'm not convinced that anyone is ever going to write the
"cancel" version, can we please just leave the 2nd GUC out for now?
>>> A long idle in transaction state pretty much always indicates a
>>> problematic interaction with postgres.
>>
>> True. Which makes me wonder whether we shouldn't default this to
>> something non-zero -- even if it is 5 or 10 days.
I'd go for even shorter: 48 hours. I'd suggest 24 hours, but that would
trip up some users who just need really long pg_dumps.
long transactions should not be a problem - this should to break transaction when it does >>nothing<< long time.
Regards
Pavel
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 06/24/2014 07:50 AM, Vik Fearing wrote: > On 06/24/2014 04:04 PM, Robert Haas wrote: >>> If the local transaction is actually idle in transaction and the local >>>> server doesn't have a timeout, we're no worse off than before this patch. >> >> I think we are. First, the correct timeout is a matter of >> remote-server-policy, not local-server-policy. If the remote server >> wants to boot people with long-running idle transactions, it's >> entitled to do that, and postgres_fdw shouldn't assume that it's >> "special". > > So how would the local transaction ever get its work done? What option > does it have to tell the remote server that it isn't actually idling, it > just doesn't need to use the remote connection for a while? > > Once the remote times out, the local transaction is doomed (and won't > even know it until it tries to commit). If we don't allow the fdw to be > special, then the local transaction can't run at all. Ever. I'm unclear on how the FDW could be special. From the point of the remote server, how does it even know that it's receiving an FDW connection and not some other kind of connection? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: > On 06/24/2014 07:50 AM, Vik Fearing wrote: >> Once the remote times out, the local transaction is doomed (and won't >> even know it until it tries to commit). If we don't allow the fdw to be >> special, then the local transaction can't run at all. Ever. > I'm unclear on how the FDW could be special. From the point of the > remote server, how does it even know that it's receiving an FDW > connection and not some other kind of connection? One way you could do it is to use a user id that's only for FDW connections, and do an ALTER ROLE on that id to set the appropriate timeout. Personally I'm violently against having postgres_fdw mess with this setting; for one thing, the proposed coding would prevent DBAs from controlling the timeout as they see fit, because it would override any ALTER ROLE or other remote-side setting. It doesn't satisfy the POLA either. postgres_fdw does not for example override statement_timeout; why should it override this timeout? regards, tom lane
Josh Berkus <josh@agliodbs.com> writes: > On 06/23/2014 03:52 PM, Andres Freund wrote: >> True. Which makes me wonder whether we shouldn't default this to >> something non-zero -- even if it is 5 or 10 days. > I'd go for even shorter: 48 hours. I'd suggest 24 hours, but that would > trip up some users who just need really long pg_dumps. FWIW, I do not think we should have a nonzero default for this. We could not safely set it to any value that would be small enough to be really useful in the field. BTW, has anyone thought about the interaction of this feature with prepared transactions? I wonder whether there shouldn't be a similar but separately-settable maximum time for a transaction to stay in the prepared state. If we could set a nonzero default on that, perhaps on the order of a few minutes, we could solve the ancient bugaboo that "prepared transactions are too dangerous to enable by default". regards, tom lane
On 06/24/2014 07:17 PM, Tom Lane wrote: > BTW, has anyone thought about the interaction of this feature with > prepared transactions? I wonder whether there shouldn't be a similar but > separately-settable maximum time for a transaction to stay in the prepared > state. If we could set a nonzero default on that, perhaps on the order of > a few minutes, we could solve the ancient bugaboo that "prepared > transactions are too dangerous to enable by default". I did not think about that, but I could probably cook up a patch for it.I don't believe it belongs in this patch, though. -- Vik
On 2014-06-24 10:17:49 -0700, Tom Lane wrote: > BTW, has anyone thought about the interaction of this feature with > prepared transactions? I wonder whether there shouldn't be a similar but > separately-settable maximum time for a transaction to stay in the prepared > state. If we could set a nonzero default on that, perhaps on the order of > a few minutes, we could solve the ancient bugaboo that "prepared > transactions are too dangerous to enable by default". I'd very much like that feature, but I'm not sure how to implement it. Which process would do that check? We currently only allow rollbacks from the corresponding database... The best idea I have is to do it via autovacuum. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2014-06-24 10:17:49 -0700, Tom Lane wrote: >> BTW, has anyone thought about the interaction of this feature with >> prepared transactions? I wonder whether there shouldn't be a similar but >> separately-settable maximum time for a transaction to stay in the prepared >> state. If we could set a nonzero default on that, perhaps on the order of >> a few minutes, we could solve the ancient bugaboo that "prepared >> transactions are too dangerous to enable by default". > I'd very much like that feature, but I'm not sure how to implement > it. Which process would do that check? We currently only allow rollbacks > from the corresponding database... > The best idea I have is to do it via autovacuum. I did not actually have any plan in mind when I wrote that, but your mention of autovacuum suggests an idea for it: consider the code that kicks autovacuum off a table when somebody wants exclusive lock. In the same way, we could teach processes that want a lock that conflicts with a prepared xact that they can kill the prepared xact if it's more than X seconds old. The other way in which old prepared xacts are dangerous is in blocking cleanup of dead tuples, and I agree with your thought that maybe autovacuum is the place to deal with that. I don't know whether we'd really need both mechanisms, or if just one would be enough. In either case, this wouldn't directly be a timeout but rather a "license to kill" once a prepared xact exceeds the threshold and is getting in somebody's way. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > Josh Berkus <josh@agliodbs.com> writes: >>> Which makes me wonder whether we shouldn't default this to >>> something non-zero -- even if it is 5 or 10 days. > >> I'd go for even shorter: 48 hours. I'd suggest 24 hours, but that >> would trip up some users who just need really long pg_dumps. > > FWIW, I do not think we should have a nonzero default for this. > We could not safely set it to any value that would be small enough > to be really useful in the field. I have seen production environments where users asked for help when performance had gradually degraded to a fraction of what it was, due to a connection sitting "idle in transaction" for several weeks. Even a timeout of five or ten days would have saved a lot of pain. What concerns me on the other side is that I've been known to start a long-running conversion or data fix on a Friday and check the results on Monday before committing. Something like that might sit for a day or two with little or no concurrent activity to cause a problem. It would be a real forehead-slapper to have forgotten to set a longer timeout before starting the run on Friday. A five day timeout seems likely to prevent extreme pain in the former circumstances while not being likely to mess up ad hoc bulk activity like the latter. Of course, if I were managing a cluster and was knowingly and consciously setting a value, it would probably be more like 5min. If I have actually set such a policy I am much less likely to forget it when it needs to be extended or disabled, and far less likely to be mad at anyone else if it cancels my work. > BTW, has anyone thought about the interaction of this feature with > prepared transactions? I wonder whether there shouldn't be a similar but > separately-settable maximum time for a transaction to stay in the prepared > state. If we could set a nonzero default on that, perhaps on the order of > a few minutes, we could solve the ancient bugaboo that "prepared > transactions are too dangerous to enable by default". I thought about it enough to mention it briefly. I haven't taken it further than to note that it would be a great follow-up patch once this is in. I'm not sure that a few minutes would be sufficient, though. Theoretically, a crash of the transaction manager, or one of the other data stores managed by it, or even a WAN connection to one of the servers, should cause the transaction manager to finish things up after recovery from the problem. I think that a default would need to allow sufficient time for that, so we can have some confidence that the transaction manager has actually lost track of it. If I were configuring this for a real production environment, I would be in mind of frequently having seen WAN outages of several hours, and a few which lasted two or three days. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2014-06-24 10:17:49 -0700, Tom Lane wrote: > >> BTW, has anyone thought about the interaction of this feature with > >> prepared transactions? I wonder whether there shouldn't be a similar but > >> separately-settable maximum time for a transaction to stay in the prepared > >> state. If we could set a nonzero default on that, perhaps on the order of > >> a few minutes, we could solve the ancient bugaboo that "prepared > >> transactions are too dangerous to enable by default". > > > I'd very much like that feature, but I'm not sure how to implement > > it. Which process would do that check? We currently only allow rollbacks > > from the corresponding database... > > The best idea I have is to do it via autovacuum. > > I did not actually have any plan in mind when I wrote that, but your > mention of autovacuum suggests an idea for it: consider the code that > kicks autovacuum off a table when somebody wants exclusive lock. > In the same way, we could teach processes that want a lock that conflicts > with a prepared xact that they can kill the prepared xact if it's more > than X seconds old. > > The other way in which old prepared xacts are dangerous is in blocking > cleanup of dead tuples, and I agree with your thought that maybe > autovacuum is the place to deal with that. I don't know whether we'd > really need both mechanisms, or if just one would be enough. > > In either case, this wouldn't directly be a timeout but rather a "license > to kill" once a prepared xact exceeds the threshold and is getting in > somebody's way. Why isn't this what we want for idle-in-transaction sessions..? Sounds like exactly what I'd want, at least. Don't kill it off unless it's blocking something or preventing xmin progression... Indeed, we have specifically implemented a Nagios check which does exactly this- looks to see if any idle-in-transaction process is blocking something else and if it's been idle for too long it gets killed. We don't have prepared transactions enabled, so we havn't had to address that. We do have a check which alerts (but doesn't kill, yet) idle-in-transaction processes which have been idle for a long time. Thanks, Stephen
On 06/24/2014 10:17 AM, Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: >> On 06/23/2014 03:52 PM, Andres Freund wrote: >>> True. Which makes me wonder whether we shouldn't default this to >>> something non-zero -- even if it is 5 or 10 days. > >> I'd go for even shorter: 48 hours. I'd suggest 24 hours, but that would >> trip up some users who just need really long pg_dumps. > > FWIW, I do not think we should have a nonzero default for this. > We could not safely set it to any value that would be small enough > to be really useful in the field. 48 hours would actually be a useful value; I've dealt multiple times with newbie users who had a transaction which had been open for a week. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Sun, Jun 22, 2014 at 6:54 AM, Vik Fearing <vik.fearing@dalibo.com> wrote: > On 06/21/2014 08:23 PM, Kevin Grittner wrote: >> OK, so I think we want to see a patch based on v1 (FATAL approach) >> with a change of the name to idle_in_transaction_session_timeout >> and the units changed to milliseconds. I don't see why the >> remoteversion test shouldn't be changed to use 90500 now, too. > > The attached patch, rebased to current master, addresses all of these > issues. Sorry if this has already been discussed before.... Why is IIT timeout turned on only when send_ready_for_query is true? I was thinking it should be turned on every time a message is received. Imagine the case where the session is in idle-in-transaction state and a client gets stuck after sending Parse message and before sending Bind message. This case would also cause long transaction problem and should be resolved by IIT timeout. But IIT timeout that this patch adds cannot handle this case because it's enabled only when send_ready_for_query is true. Thought? Regards, -- Fujii Masao
Fujii Masao <masao.fujii@gmail.com> writes: > Why is IIT timeout turned on only when send_ready_for_query is true? > I was thinking it should be turned on every time a message is received. > Imagine the case where the session is in idle-in-transaction state and > a client gets stuck after sending Parse message and before sending Bind > message. This case would also cause long transaction problem and should > be resolved by IIT timeout. But IIT timeout that this patch adds cannot > handle this case because it's enabled only when send_ready_for_query is > true. Thought? I think you just moved the goalposts to the next county. The point of this feature, AFAICS, is to detect clients that are failing to issue another query or close their transaction as a result of bad client logic. It's not to protect against network glitches. Moreover, there would be no way to implement a timeout like that without adding a gettimeofday() call after every packet receipt, which is overhead we do not need and cannot afford. I don't think this feature should add *any* gettimeofday's beyond the ones that are already there. regards, tom lane
On Thu, Jun 26, 2014 at 12:40 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Fujii Masao <masao.fujii@gmail.com> writes: >> Why is IIT timeout turned on only when send_ready_for_query is true? >> I was thinking it should be turned on every time a message is received. >> Imagine the case where the session is in idle-in-transaction state and >> a client gets stuck after sending Parse message and before sending Bind >> message. This case would also cause long transaction problem and should >> be resolved by IIT timeout. But IIT timeout that this patch adds cannot >> handle this case because it's enabled only when send_ready_for_query is >> true. Thought? > > I think you just moved the goalposts to the next county. > > The point of this feature, AFAICS, is to detect clients that are failing > to issue another query or close their transaction as a result of bad > client logic. It's not to protect against network glitches. If so, the document should explain that cleanly. Otherwise users may misunderstand this parameter and try to use it to protect even long transaction generated by network glitches or client freeze. > Moreover, there would be no way to implement a timeout like that without > adding a gettimeofday() call after every packet receipt, which is overhead > we do not need and cannot afford. Hmm.. right. I was thinking to just call enable_timeout_after() every time message is received. But it calls gettimeofday() and causes overhead. > I don't think this feature should add > *any* gettimeofday's beyond the ones that are already there. But, ISTM that the patch adds enable_timeout_after() which calls GetCurrentTimestamp()->gettimeofday() before receiving new message when send_ready_for_query is true. When send_ready_for_query is true, it's expected that next message takes time to arrive at server, so calling gettimeofday() in that case might not be so harmful, though. Regards, -- Fujii Masao
On 06/26/2014 06:45 AM, Fujii Masao wrote: >> The point of this feature, AFAICS, is to detect clients that are failing >> > to issue another query or close their transaction as a result of bad >> > client logic. It's not to protect against network glitches. > > If so, the document should explain that cleanly. Otherwise users may > misunderstand this parameter and try to use it to protect even long transaction > generated by network glitches or client freeze. What does pg_stat_activity say for those cases? If it's able to say "idle in transaction", then this patch covers it. If it isn't, then that seems like a different patch. -- Vik
On Wed, Jun 25, 2014 at 11:40 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Fujii Masao <masao.fujii@gmail.com> writes: >> Why is IIT timeout turned on only when send_ready_for_query is true? >> I was thinking it should be turned on every time a message is received. >> Imagine the case where the session is in idle-in-transaction state and >> a client gets stuck after sending Parse message and before sending Bind >> message. This case would also cause long transaction problem and should >> be resolved by IIT timeout. But IIT timeout that this patch adds cannot >> handle this case because it's enabled only when send_ready_for_query is >> true. Thought? > > I think you just moved the goalposts to the next county. > > The point of this feature, AFAICS, is to detect clients that are failing > to issue another query or close their transaction as a result of bad > client logic. It's not to protect against network glitches. Hmm, so there's no reason a client, after sending one protocol message, might not pause before sending the next protocol message? That seems like a surprising argument. Someone couldn't Parse and then wait before sending Bind and Execute, or Parse and Bind and then wait before sending Execute? > Moreover, there would be no way to implement a timeout like that without > adding a gettimeofday() call after every packet receipt, which is overhead > we do not need and cannot afford. I don't think this feature should add > *any* gettimeofday's beyond the ones that are already there. That's an especially good point if we think that this feature will be enabled commonly or by default - but as Fujii Masao says, it might be tricky to avoid. :-( -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Jun 25, 2014 at 11:40 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Fujii Masao <masao.fujii@gmail.com> writes: >>> Why is IIT timeout turned on only when send_ready_for_query is true? >>> I was thinking it should be turned on every time a message is received. >>> Imagine the case where the session is in idle-in-transaction state and >>> a client gets stuck after sending Parse message and before sending Bind >>> message. This case would also cause long transaction problem and should >>> be resolved by IIT timeout. But IIT timeout that this patch adds cannot >>> handle this case because it's enabled only when send_ready_for_query is >>> true. Thought? >> >> I think you just moved the goalposts to the next county. >> >> The point of this feature, AFAICS, is to detect clients that are failing >> to issue another query or close their transaction as a result of bad >> client logic. It's not to protect against network glitches. > > Hmm, so there's no reason a client, after sending one protocol > message, might not pause before sending the next protocol message? > That seems like a surprising argument. Someone couldn't Parse and > then wait before sending Bind and Execute, or Parse and Bind and then > wait before sending Execute? > > >> Moreover, there would be no way to implement a timeout like that without >> adding a gettimeofday() call after every packet receipt, which is overhead >> we do not need and cannot afford. I don't think this feature should add >> *any* gettimeofday's beyond the ones that are already there. > > That's an especially good point if we think that this feature will be > enabled commonly or by default - but as Fujii Masao says, it might be > tricky to avoid. :-( I think that this patch, as it stands, is a clear win if the postgres_fdw part is excluded. Remaining points of disagreement seem to be the postgres_fdw, whether a default value measured in days might be better than a default of off, and whether it's worth the extra work of covering more. The preponderance of opinion seems to be in favor of excluding the postgres_fdw changes, with Tom being violently opposed to including it. I consider the idea of the FDW ignoring the server setting dead. Expanding the protected area of code seems like it would be sane to ask whoever wants to extend the protection in that direction to propose a patch. My sense is that there is more support for a default of a small number of days than a default of never, but that seems far from settled. I propose to push this as it stands except for the postgres_fdw part. The default is easy enough to change if we reach consensus, and expanding the scope can be a new patch in a new CF. Objections? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner <kgrittn@ymail.com> writes: > I propose to push this as it stands except for the postgres_fdw > part.� The default is easy enough to change if we reach consensus, > and expanding the scope can be a new patch in a new CF. > Objections? There's been enough noise about the external definition that I think no one has bothered to worry about whether the patch is actually implemented sanely. I do not think it is: specifically, the notion that we will call ereport(FATAL) directly from a signal handler does not fill me with warm fuzzies. While the scope of code in which the signal is enabled is relatively narrow, that doesn't mean there's no problem. Consider in particular the case where we are using SSL: this will mean that we take control away from OpenSSL, which might be in the midst of doing something if we're unlucky timing-wise, and then we'll re-entrantly invoke it to try to send an error message to the client. That seems pretty unsafe. Another risky flow of control is if ReadCommand throws an ordinary error and then the timeout interrupt happens while we're somewhere in the transaction abort logic (the sigsetjmp stanza in postgres.c). I'd be happier if this were implemented in the more traditional style where the signal handler just sets a volatile flag variable, which would be consulted at determinate places in the mainline logic. Or possibly it could be made safe if we only let it throw the error directly when ImmediateInterruptOK is true (compare the handling of notify/catchup interrupts). regards, tom lane
On Sun, Jun 29, 2014 at 12:32 PM, Kevin Grittner <kgrittn@ymail.com> wrote: >>> Moreover, there would be no way to implement a timeout like that without >>> adding a gettimeofday() call after every packet receipt, which is overhead >>> we do not need and cannot afford. I don't think this feature should add >>> *any* gettimeofday's beyond the ones that are already there. >> >> That's an especially good point if we think that this feature will be >> enabled commonly or by default - but as Fujii Masao says, it might be >> tricky to avoid. :-( > > I think that this patch, as it stands, is a clear win if the > postgres_fdw part is excluded. Remaining points of disagreement > seem to be the postgres_fdw, whether a default value measured in > days might be better than a default of off, and whether it's worth > the extra work of covering more. The preponderance of opinion > seems to be in favor of excluding the postgres_fdw changes, with > Tom being violently opposed to including it. I consider the idea > of the FDW ignoring the server setting dead. Expanding the > protected area of code seems like it would be sane to ask whoever > wants to extend the protection in that direction to propose a > patch. My sense is that there is more support for a default of a > small number of days than a default of never, but that seems far > from settled. > > I propose to push this as it stands except for the postgres_fdw > part. The default is easy enough to change if we reach consensus, > and expanding the scope can be a new patch in a new CF. > Objections? Yeah, I think someone should do some analysis of whether this is adding gettimeofday() calls, and how many, and what the performance implications are. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Sun, Jun 29, 2014 at 12:32 PM, Kevin Grittner <kgrittn@ymail.com> wrote: >> I propose to push this as it stands except for the postgres_fdw >> part. The default is easy enough to change if we reach consensus, >> and expanding the scope can be a new patch in a new CF. >> Objections? > Yeah, I think someone should do some analysis of whether this is > adding gettimeofday() calls, and how many, and what the performance > implications are. I believe that as the patch stands, we'd incur one new gettimeofday() per query-inside-a-transaction, inside the enable_timeout_after() call. (I think the disable_timeout() call would not result in a gettimeofday call, since there would be no remaining live timeout events.) We could possibly refactor enough to share the clock reading with the call done in pgstat_report_activity. Not sure how ugly that would be or whether it's worth the trouble. Note that in the not-a-transaction-block case, we already have got two gettimeofday calls in this sequence, one in pgstat_report_stat and one in pgstat_report_activity :-( regards, tom lane
On 2014-06-29 15:48:15 -0400, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Sun, Jun 29, 2014 at 12:32 PM, Kevin Grittner <kgrittn@ymail.com> wrote: > >> I propose to push this as it stands except for the postgres_fdw > >> part. The default is easy enough to change if we reach consensus, > >> and expanding the scope can be a new patch in a new CF. > >> Objections? > > > Yeah, I think someone should do some analysis of whether this is > > adding gettimeofday() calls, and how many, and what the performance > > implications are. > > I believe that as the patch stands, we'd incur one new gettimeofday() > per query-inside-a-transaction, inside the enable_timeout_after() call. > (I think the disable_timeout() call would not result in a gettimeofday > call, since there would be no remaining live timeout events.) > > We could possibly refactor enough to share the clock reading with the call > done in pgstat_report_activity. Not sure how ugly that would be or > whether it's worth the trouble. Note that in the not-a-transaction-block > case, we already have got two gettimeofday calls in this sequence, one in > pgstat_report_stat and one in pgstat_report_activity :-( I've seen several high throughput production servers where code around gettimeofday is in the top three profile entries - so I'd be hesitant to add more there. Especially as the majority of people here seems to think we should enable this by default. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2014-06-29 15:48:15 -0400, Tom Lane wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> Yeah, I think someone should do some analysis of whether this is >>> adding gettimeofday() calls, and how many, and what the performance >>> implications are. >> I believe that as the patch stands, we'd incur one new gettimeofday() >> per query-inside-a-transaction, inside the enable_timeout_after() call. >> (I think the disable_timeout() call would not result in a gettimeofday >> call, since there would be no remaining live timeout events.) >> >> We could possibly refactor enough to share the clock reading with the call >> done in pgstat_report_activity. Not sure how ugly that would be or >> whether it's worth the trouble. Note that in the not-a-transaction-block >> case, we already have got two gettimeofday calls in this sequence, one in >> pgstat_report_stat and one in pgstat_report_activity :-( > I've seen several high throughput production servers where code around > gettimeofday is in the top three profile entries - so I'd be hesitant to > add more there. Especially as the majority of people here seems to think > we should enable this by default. Note that we'd presumably also be adding two kernel calls associated with setting/killing the SIGALRM timer. I'm not sure how much those cost, but it likely wouldn't be negligible compared to the gettimeofday cost. OTOH, one should not forget that there's also going to be a client round trip involved here, so it's possible this is all down in the noise compared to that. But we ought to try to quantify it rather than just hope for the best. A thought that comes to mind in connection with that is whether we shouldn't be doing the ReadyForQuery call (which I believe includes fflush'ing the previous query response out to the client) before rather than after all this statistics housekeeping. Then at least we'd have a shot at spending these cycles in parallel with the network I/O and client think-time, rather than serializing it all. regards, tom lane
On 2014-06-29 12:53:56 -0400, Tom Lane wrote: > I do not think it is: specifically, the notion > that we will call ereport(FATAL) directly from a signal handler > does not fill me with warm fuzzies. Aren't we already pretty much doing that for SIGTERM/pg_terminate_backend() and recovery conflict interrupts? If we get a SIGTERM while reading a command die() will set ProcDiePending() and call ProcessInterrupts() after disabling some other interrupts. Then the latter will FATAL out. Imo the idle timeout handler pretty much needs a copy of die(), just setting a different variable than (or in addition to?) ProcDiePending. BUT: why is what ProcessInterrupts() is doing safe? Shouldn't it always at least set whereToSendOutput = DestNone when FATALing while reading (potentially via openssl)? The current behaviour imo both a protocol violation and dangerous because of what you explained? > I'd be happier if this were implemented in the more traditional > style where the signal handler just sets a volatile flag variable, > which would be consulted at determinate places in the mainline logic. > Or possibly it could be made safe if we only let it throw the error > directly when ImmediateInterruptOK is true (compare the handling > of notify/catchup interrupts). Hm. That sounds approximately like what I've written above. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2014-06-29 17:28:06 -0400, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2014-06-29 15:48:15 -0400, Tom Lane wrote: > >> Robert Haas <robertmhaas@gmail.com> writes: > >>> Yeah, I think someone should do some analysis of whether this is > >>> adding gettimeofday() calls, and how many, and what the performance > >>> implications are. > > >> I believe that as the patch stands, we'd incur one new gettimeofday() > >> per query-inside-a-transaction, inside the enable_timeout_after() call. > >> (I think the disable_timeout() call would not result in a gettimeofday > >> call, since there would be no remaining live timeout events.) > >> > >> We could possibly refactor enough to share the clock reading with the call > >> done in pgstat_report_activity. Not sure how ugly that would be or > >> whether it's worth the trouble. Note that in the not-a-transaction-block > >> case, we already have got two gettimeofday calls in this sequence, one in > >> pgstat_report_stat and one in pgstat_report_activity :-( > > > I've seen several high throughput production servers where code around > > gettimeofday is in the top three profile entries - so I'd be hesitant to > > add more there. Especially as the majority of people here seems to think > > we should enable this by default. > > Note that we'd presumably also be adding two kernel calls associated > with setting/killing the SIGALRM timer. I'm not sure how much those > cost, but it likely wouldn't be negligible compared to the gettimeofday > cost. It's probably higher, at least if we get around to replacing gettimeofday() with clock_gettime() :( So, i've traced a SELECT 1. We're currently doing: 1) gettimeofday() in SetCurrentStatementStartTimestamp 2) gettimeofday() pgstat_report_activity() 3) gettimeofday() for enable_timeout_after (id=STATEMENT_TIMEOUT) 4) setitimer() for schedule_alarm for STATEMENT_TIMEOUT 5) gettimeofday() for pgstat_report_activity() Interestingly recvfrom(), setitimer(), sendto() are the only calls to actually fully hit the kernel via syscalls (i.e. visible via strace). The performance difference of setting up statement_timeout=10s for a pgbench run that does: \setrandom aid 1 1000000 SELECT * FROM pgbench_accounts WHERE aid = :aid; is 164850.368336 (no statment_timeout) vs 157866.924725 (statement_timeout=10s). That's the best of 10 10s runs. for SELECT 1 it's 242846.348628 vs 236764.177593. Not too bad. Absolutely bleeding edge kernel/libc though; I seem to recall different results with earlier libc/kernel combos. I think statement_timeout's overhead should be fairly similar to what's proposed for iit_t? > A thought that comes to mind in connection with that is whether we > shouldn't be doing the ReadyForQuery call (which I believe includes > fflush'ing the previous query response out to the client) before > rather than after all this statistics housekeeping. Then at least > we'd have a shot at spending these cycles in parallel with the > network I/O and client think-time, rather than serializing it all. Worth a try. It'd be also rather neat to to consolidate the first three gettimeofday()'s above. Afaics they should all be consolidated via GetCurrentTransactionStartTimestamp()... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2014-06-29 12:53:56 -0400, Tom Lane wrote: >> I do not think it is: specifically, the notion >> that we will call ereport(FATAL) directly from a signal handler >> does not fill me with warm fuzzies. > Aren't we already pretty much doing that for > SIGTERM/pg_terminate_backend() and recovery conflict interrupts? We do that *only at safe spots*, ie CHECK_FOR_INTERRUPTS. Or at least I sure hope so. Otherwise interrupting, eg, malloc will lead to much unhappiness. > BUT: why is what ProcessInterrupts() is doing safe? Shouldn't it always > at least set whereToSendOutput = DestNone when FATALing while reading > (potentially via openssl)? Uh, no, that would pretty much destroy the point of trying to report an error message at all. We do restrict the immediate interrupt to occur only while we're actually doing a recv(), see prepare_for_client_read and client_read_ended. I grant that in the case of an SSL connection, that could be in the middle of some sort of SSL handshake, so it might not be completely safe protocol-wise --- but it's not likely to mean instant data structure corruption. regards, tom lane
On 2014-06-29 19:13:55 -0400, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2014-06-29 12:53:56 -0400, Tom Lane wrote: > >> I do not think it is: specifically, the notion > >> that we will call ereport(FATAL) directly from a signal handler > >> does not fill me with warm fuzzies. > > > Aren't we already pretty much doing that for > > SIGTERM/pg_terminate_backend() and recovery conflict interrupts? > > We do that *only at safe spots*, ie CHECK_FOR_INTERRUPTS. Or at least > I sure hope so. Otherwise interrupting, eg, malloc will lead to much > unhappiness. I was specifically thinking about us immediately reacting to those while we're reading from the client. We're indeed doing that directly: #1 0x000000000076648a in proc_exit (code=1) at /home/andres/src/postgresql/src/backend/storage/ipc/ipc.c:143 #2 0x00000000008bcbf7 in errfinish (dummy=0) at /home/andres/src/postgresql/src/backend/utils/error/elog.c:555 #3 0x00000000007909f7 in ProcessInterrupts () at /home/andres/src/postgresql/src/backend/tcop/postgres.c:2869 #4 0x0000000000790469 in die (postgres_signal_arg=15) at /home/andres/src/postgresql/src/backend/tcop/postgres.c:2604 #5 <signal handler called> #6 0x00007fb7c8ca0ebb in __libc_recv (fd=10, buf=0xd5f240 <PqRecvBuffer>, n=8192, flags=-1) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:29 #7 0x000000000066a07c in secure_read (port=0x1a29d30, ptr=0xd5f240 <PqRecvBuffer>, len=8192) at /home/andres/src/postgresql/src/backend/libpq/be-secure.c:317 #8 0x00000000006770b5 in pq_recvbuf () at /home/andres/src/postgresql/src/backend/libpq/pqcomm.c:854 #9 0x000000000067714f in pq_getbyte () at /home/andres/src/postgresql/src/backend/libpq/pqcomm.c:895 #10 0x000000000078d26b in SocketBackend (inBuf=0x7fffbc02bc30) at /home/andres/src/postgresql/src/backend/tcop/postgres.c:335 #11 0x000000000078d659 in ReadCommand (inBuf=0x7fffbc02bc30) at /home/andres/src/postgresql/src/backend/tcop/postgres.c:483 Note that we're *inside* recv() here. Both paths to recv, without ssl and with, have called prepare_for_client_read() which sets ImmediateInterruptOK = true. Right now I fail to see why it's safe to do so, at least when inside openssl. > > BUT: why is what ProcessInterrupts() is doing safe? Shouldn't it always > > at least set whereToSendOutput = DestNone when FATALing while reading > > (potentially via openssl)? > > Uh, no, that would pretty much destroy the point of trying to report > an error message at all. I only mean that we should do so in scenarios where we're currently reading from the client. For die(), while we're reading from the client, we're sending a message the client doesn't expect - and thus unsurprisingly doesn't report. The client will just report 'server closed the connection unexpectedly'. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > Note that we're *inside* recv() here. Well, actually, the recv() has probably failed with an EINTR at this point, or else it will when/if control returns. >> Uh, no, that would pretty much destroy the point of trying to report >> an error message at all. > I only mean that we should do so in scenarios where we're currently > reading from the client. For die(), while we're reading from the client, > we're sending a message the client doesn't expect - and thus > unsurprisingly doesn't report. This is nonsense. The client will see the error as a response to its query. Of course, if it gets an EPIPE it might complain about that first -- but that would only be likely to happen with a multi-packet query string, at least over a TCP connection. Experimenting with this on a RHEL6 box, I do see the "FATAL: terminating connection due to administrator command" message if I SIGTERM a backend while idle and it's using a TCP connection; psql sees that as a response next time it issues a command. I do get the EPIPE behavior over a Unix-socket connection, but I wonder if we shouldn't try to fix that. It would make sense to see if there's data available before complaining about the EPIPE. Don't currently have an SSL configuration to experiment with. regards, tom lane
On 2014-06-29 19:51:05 -0400, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > Note that we're *inside* recv() here. > > Well, actually, the recv() has probably failed with an EINTR at this > point, or else it will when/if control returns. Well, the point is that we'll be reentering ssl when sending the error message, without having left it properly. I.e. we're already hitting the problem you've described. Sure, if we're not FATALing, it'll return EINTR after that. But that's not really the point here. I wonder if we should instead *not* set ImmediateInterruptOK = true and do a CHECK_FOR_INTERRUPT in secure_read, after returning from openssl. When the recv in my_sock_read() sets BIO_set_retry_read() because the signal handler caused an EINTR to be returned openssl will return control to the caller. Which then can do the CHECK_FOR_INTERRUPT(), jumping out without having to deal with openssl. Probably has some problems with annoying platforms like windows though :(. Not sure how the signal emulation plays with recv() being interrupted there. > >> Uh, no, that would pretty much destroy the point of trying to report > >> an error message at all. > > > I only mean that we should do so in scenarios where we're currently > > reading from the client. For die(), while we're reading from the client, > > we're sending a message the client doesn't expect - and thus > > unsurprisingly doesn't report. > > This is nonsense. The client will see the error as a response to its > query. Man. Don't be so quick to judge stuff you can't immediately follow or find wrongly stated as 'nonsense'. We're sending messages back to the client while the client isn't expecting any from the server. E.g. evidenced by the fact that libpq's pqParseInput3() doesn't treat error messages during that phase as errors... It instead emits them via the notice hooks, expecting them to be NOTICEs... This e.g. means that there's no error message stored in conn->errorMessage. That happens to be somewhat ok in the case of FATALs because the connection is killed afterwards so any confusion won't be long lived, but you can't tell me that you'd e.g. find it acceptable to send an ERROR there. > Of course, if it gets an EPIPE it might complain about that first > -- but that would only be likely to happen with a multi-packet query > string, at least over a TCP connection. > > Experimenting with this on a RHEL6 box, I do see the "FATAL: terminating > connection due to administrator command" message if I SIGTERM a backend > while idle and it's using a TCP connection; psql sees that as a response > next time it issues a command. I do get the EPIPE behavior over a > Unix-socket connection, but I wonder if we shouldn't try to fix that. > It would make sense to see if there's data available before complaining > about the EPIPE. Even over unix sockets the data seems to be read - courtesy of pqHandleSendFailure(): sendto(3, "Q\0\0\0\16SELECT 1;\0", 15, MSG_NOSIGNAL, NULL, 0) = -1 EPIPE (Broken pipe) recvfrom(3, "E\0\0\0mSFATAL\0C57P01\0Mterminating "..., 16384, 0, NULL, NULL) = 110 The reason we don't print anything is that pqDropConnection(), which is called by the second pqReadData() invocation in the loop, resets the data positions: conn->inStart = conn->inCursor = conn->inEnd = 0; Moving the parseInput(conn) into the loop seems to "fix" it. Haven't analyzed why, but if FATALs arrive during a query they're printed twice. I've seen that before... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Jun 24, 2014 at 10:17:49AM -0700, Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: > > On 06/23/2014 03:52 PM, Andres Freund wrote: > >> True. Which makes me wonder whether we shouldn't default this to > >> something non-zero -- even if it is 5 or 10 days. > > > I'd go for even shorter: 48 hours. I'd suggest 24 hours, but that would > > trip up some users who just need really long pg_dumps. > > FWIW, I do not think we should have a nonzero default for this. > We could not safely set it to any value that would be small enough > to be really useful in the field. > > BTW, has anyone thought about the interaction of this feature with > prepared transactions? I wonder whether there shouldn't be a similar but > separately-settable maximum time for a transaction to stay in the prepared > state. If we could set a nonzero default on that, perhaps on the order of > a few minutes, we could solve the ancient bugaboo that "prepared > transactions are too dangerous to enable by default". Added as a TODO item. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +