Thread: Implementing setQueryTimeout()
The discussion around TCP keepalives reminded me that I have wanted to implement setQueryTimeout() support for a while. Here's my thoughts on it so far. My motivation here is that we have an environment where having threads hang for extended periods when doing DB queries is not good. We must have them fail the query within a relatively short time, or the entire system can grind to a halt due to a lack of threads. This is the same regardless of the cause of the delay - slow query, or a resultset that's taking a long time to transfer, or the server just died and TCP hasn't killed the connection yet, or there is a slow query ahead of us on the same connection. In essense the requirement is: calling Statement.execute() should either return a result or throw an exception within some relatively short period of time. In my case the timeout is on the order of 15 seconds or less. ... There are two main approaches I see: (1) Map query timeout to server-side statement_timeout. Add a timeout parameter to QueryExecutor methods. The protocol layer remembers the current setting of statement_timeout and issues an appropriate "SET statement_timeout" as necessary before submitting each query. Advantages: Gives nice failure characteristics (query is cancelled, connection remains usable) Disadvantages: Doesn't help with anything but slow queries on the server side, relies on server-side query cancellation due to timeout happening reasonably promptly. Client code that sets statement_timeout itself can confuse it. (2) Run a separate timer thread. Start a timer in Statement.execute() before submitting the query to the protocol layer. If the timer expires, close the low-level DB connection (from the timer thread) which should cause an IOException in the guts of the protocol layer where the query executing thread is blocked on network I/O, eventually propagating up as a fatal SQLException to the caller. The assumption here is that if a thread is blocked inside Statement.execute(), the eventual cause of that is that something is blocked on network I/O. I think that's generally true? Advantages: Should catch all cases regardless of cause. Disadvantages: Draconian failure mode - a query timeout means the connection is dead. ... I would like to implement (2) but I can see that killing the connection on timeout may not be desirable in all cases. Independant of approach, I'd also like to implement a connection parameter that provides a default query timeout for all statements that don't override it (this would include internal driver queries e.g. OID lookups, metadata queries). Any thoughts on this? -O
Oliver Jowett <oliver@opencloud.com> writes: > (2) Run a separate timer thread. Start a timer in Statement.execute() > before submitting the query to the protocol layer. If the timer expires, > close the low-level DB connection (from the timer thread) which should > cause an IOException in the guts of the protocol layer where the query > executing thread is blocked on network I/O, eventually propagating up as > a fatal SQLException to the caller. > I would like to implement (2) but I can see that killing the connection > on timeout may not be desirable in all cases. That seems pretty darn horrid, actually. If the reason for the slow response is server overload, this technique will make things rapidly *worse*. In the first place it does nothing to prevent the server from continuing to compute the too-slow query (and perhaps even committing it). In the second place, having to establish a new connection will eat a lot of cycles you really don't want to waste. In the third place, once you do establish a new connection it will be competing for cycles with the still-running query in the original backend. Iterate a few times and you'll have a self-inflicted denial of service. I agree with having a timer thread, I think, just not with what you want to do when the timer fires. Can't you do something like sending a query cancel request when you time out? It might be that you need to decouple queries from connections a bit more, so that a query can fail and "let go" of a connection, while the connection object has to wait for its query to be cancelled before returning to the pool of available connections. regards, tom lane
Tom Lane wrote: > That seems pretty darn horrid, actually. If the reason for the slow > response is server overload, this technique will make things rapidly > *worse*. In the first place it does nothing to prevent the server from > continuing to compute the too-slow query (and perhaps even committing > it). In the second place, having to establish a new connection will eat > a lot of cycles you really don't want to waste. In the third place, > once you do establish a new connection it will be competing for cycles > with the still-running query in the original backend. Iterate a few > times and you'll have a self-inflicted denial of service. Except for the problem of the query continuing to run, these problems seem to be common to anything that throws an exception to the client on timeout. The client is going to have to give up on that query regardless of how we actually implement the timeout, so the server is doing extra "useless" work anyway. If the surrounding logic is not smart enough to throttle itself, then you're hosed either way. > I agree with having a timer thread, I think, just not with what you want > to do when the timer fires. Can't you do something like sending a query > cancel request when you time out? I could do that, but if the problem is actually that the server or network has died it will not help things (the cancel is just going to fail.. eventually). Maybe two timeouts? One causes query cancel; if the cancel doesn't actually happen by the time the second timeout occurs, the connection gets closed. In fact that could be done by simply doing both of the approaches I suggested with different timeouts - set statement_timeout, and if you're still waiting for a response some time after statement_timeout should have fired, you kill the connection? This has the advantage that you don't have to juggle an extra connection to do the cancel. > It might be that you need to decouple queries from connections a bit > more, so that a query can fail and "let go" of a connection, while the > connection object has to wait for its query to be cancelled before > returning to the pool of available connections. This is going to be a lot more work than I have time to do unfortunately :( The main problems are that: (1) the current code stores a fair amount of protocol state in what are essentially local variables - so you can't just unwind the stack and throw an exception to the caller at an arbitary point without losing important protocol state. The current code is very careful to wait until it is at a known point in the protocol (ReadyForQuery, IIRC) before returning. So there would be quite a lot of rework needed here. (2) there is no simple way to nondestructively interrupt a blocking I/O call deep in the protocol code; and a rewrite to allow this (a) is a lot of work and (b) would probably require that we drop support for older Java versions. It also seems a bit problematic in this scenario: - client get connection from pool - client runs query - query throws SQLException due to timeout; at the protocol level we are still waiting for the query to cancel - client logs exception, cleans up, returns the connection to the pool - connection gets reused - new query gets backed up behind the cancelling query Essentially you've now got a useless connection in the pool.. In fact, in this scenario it's likely the pool will be running some cleanup SQL before handing it out to a new client, and that will fail - so the connection will probably be discarded entirely anyway! (There is no way in the JDBC API to say to a client or connection pool "this connection is busy, please don't use it right now") -O
"Oliver Jowett" <oliver@opencloud.com> writes: > Tom Lane wrote: > >> That seems pretty darn horrid, actually. If the reason for the slow >> response is server overload, this technique will make things rapidly >> *worse*. In the first place it does nothing to prevent the server from >> continuing to compute the too-slow query (and perhaps even committing >> it). In the second place, having to establish a new connection will eat >> a lot of cycles you really don't want to waste. In the third place, >> once you do establish a new connection it will be competing for cycles >> with the still-running query in the original backend. Iterate a few >> times and you'll have a self-inflicted denial of service. > > Except for the problem of the query continuing to run, these problems seem to > be common to anything that throws an exception to the client on timeout. The > client is going to have to give up on that query regardless of how we actually > implement the timeout, so the server is doing extra "useless" work anyway. If > the surrounding logic is not smart enough to throttle itself, then you're hosed > either way. > >> I agree with having a timer thread, I think, just not with what you want >> to do when the timer fires. Can't you do something like sending a query >> cancel request when you time out? > > I could do that, but if the problem is actually that the server or network has > died it will not help things (the cancel is just going to fail.. eventually). I think you have to tackle this as two problems. The usual case is going to be a long query which you want to cancel. Cancelling is normally quick and you can report an error with the query just as if the database had encountered some other error. The problem of a broken network connection or down server is another case. For most users without a failover server I think triggering an error in this case would actually do more harm than good. Even with a failover in my experience you really want a manual or out-of-band mechanism to trigger failover lest you get false positives or double-failures. > (2) there is no simple way to nondestructively interrupt a blocking I/O call > deep in the protocol code; and a rewrite to allow this (a) is a lot of work and > (b) would probably require that we drop support for older Java versions. Ouch. That's frightening. I'm not sure there's any reasonable way to implement a statement timeout without some way to interrupt the read it's blocking on. Unless there some kind of select(2) equivalent that can allow you to block only for a limited amount of time and then regain control? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning
On Feb 17, 2008, at 5:40 PM, Oliver Jowett wrote: > (1) Map query timeout to server-side statement_timeout. Add a > timeout parameter to QueryExecutor methods. The protocol layer > remembers the current setting of statement_timeout and issues an > appropriate "SET statement_timeout" as necessary before submitting > each query. > > Advantages: Gives nice failure characteristics (query is cancelled, > connection remains usable) > Disadvantages: Doesn't help with anything but slow queries on the > server side, relies on server-side query cancellation due to timeout > happening reasonably promptly. Client code that sets > statement_timeout itself can confuse it. i think this is a very reasonable approach. it seems like it should handle the majority of the cases. -pete -- (peter.royal|osi)@pobox.com - http://fotap.org/~osi
Attachment
Gregory Stark wrote: > The problem of a broken network connection or down server is another case. For > most users without a failover server I think triggering an error in this case > would actually do more harm than good. Even with a failover in my experience > you really want a manual or out-of-band mechanism to trigger failover lest you > get false positives or double-failures. It is not about triggering failover in my scenario, it is about limiting the time a thread can spend blocked waiting on a query. The scenario is call processing where access to the DB is not critical - you can do something acceptable without DB access. However, what you do *not* have the luxury of doing is hanging around for 5 minutes waiting for a connection to time out before you make a decision about what to do with the call! Instead you want a rapid error, mark the DB as down, and future queries proceed as if the DB has failed until a separate check process manages to run a test query against the DB successfully. There is a secondary problem that you do not have an unbounded number of threads and connections to do your processing on, so tying up threads and connections blocking on a slowly dying TCP connection is a real problem - under any real load you will exhaust your thread pool (or connection pool) in a matter of seconds. Then everything stops. This is bad. I can implement something in my local copy of the driver to support this, but it'd be nice if I can do something more generally useful too.. I'd argue that "close connection on network timeout if setQueryTimeout() is set" is better than "do absolutely nothing special if setQueryTimeout() is set", anyway. If you don't want queries being aborted because they are taking too long to complete.. don't set a query timeout! -O
Oliver Jowett <oliver@opencloud.com> writes: > I'd argue that "close connection on network timeout if setQueryTimeout() > is set" is better than "do absolutely nothing special if > setQueryTimeout() is set", anyway. If you don't want queries being > aborted because they are taking too long to complete.. don't set a query > timeout! I think it's a serious, serious conceptual error to tie network timeouts to query timeouts. Maybe your specific application needs them to be the same, but implementing something that forces them to be the same is a good way to guarantee that your work won't be generally useful or acceptable for the mainstream driver. regards, tom lane
Tom Lane wrote: > Oliver Jowett <oliver@opencloud.com> writes: >> I'd argue that "close connection on network timeout if setQueryTimeout() >> is set" is better than "do absolutely nothing special if >> setQueryTimeout() is set", anyway. If you don't want queries being >> aborted because they are taking too long to complete.. don't set a query >> timeout! > > I think it's a serious, serious conceptual error to tie network timeouts > to query timeouts. Why? I don't understand this argument. setQueryTimeout() is documented as: > Sets the number of seconds the driver will wait for a Statement object to execute to the given number of seconds. If thelimit is exceeded, an SQLException is thrown. That would seem to cover any sort of delay, not just "it is a slow query on the server side". If the Statement can't execute in N seconds, return an error. That could mean slow network - slow query - whatever. I don't see why clients that want timeouts care about the cause of the slowness - all they care about is that they get control back within some reasonable amount of time, surely? What's the use case where you would use setQueryTimeout() but don't actually mean "give me control back in N seconds"? If a client is postgresql-aware, it could always set statement_timeout itself if it really does mean "don't take more than N seconds of server time" instead of "please return within N seconds". -O
Oliver Jowett <oliver@opencloud.com> writes: > Tom Lane wrote: >> I think it's a serious, serious conceptual error to tie network timeouts >> to query timeouts. > Why? I don't understand this argument. Because the failure mechanisms that you're worried about are completely different in the two cases, and don't necessarily have similar timeout requirements (to say nothing of the appropriate recovery actions). regards, tom lane
Tom Lane wrote: > Oliver Jowett <oliver@opencloud.com> writes: >> Tom Lane wrote: >>> I think it's a serious, serious conceptual error to tie network timeouts >>> to query timeouts. > >> Why? I don't understand this argument. > > Because the failure mechanisms that you're worried about are completely > different in the two cases, and don't necessarily have similar timeout > requirements (to say nothing of the appropriate recovery actions). Which is fair enough .. but I have a single JDBC method to work with here. I could implement something driver-specific but it seems a little silly to have timeout behaviour in the driver but have the standard method unimplemented. Note that you could implement different recovery behaviour by looking at the SQLState of the thrown exception easily enough. -O
Based on feedback so far here's attempt #2. The main thing I got out of the feedback is that statement_timeout seems to be enough for most people. Unfortunately statement_timeout is not sufficient for what I need, so my changes will end up doing more than that. Here's an attempt at a compromise: Add 4 new connection parameters, associated connection / statement values and accessors on the postgresql extension interfaces: - softQueryTimeout: 0=disabled, >0 = timeout in ms, default 0 - hardQueryTimeout: 0=disabled, >0 = timeout in ms, default 0 - softQueryMargin: -1=disabled, >=0 = margin in ms, default 0 - hardQueryMargin: -1=disabled, >=0 = margin in ms, default 60s The soft query timeout (if enabled) makes the driver set statement_timeout before executing a query, which in most cases will result in a SQLException being reported if the timeout is reached (but this is not guaranteed). The hard query timeout (if enabled) makes the driver forcibly close the connection after that timeout if the query has not completed, which will result in a fatal SQLException due to an IOException from the blocked query thread. The setQueryTimeout(N) logic then looks something like this: > if (N == 0) { > softQueryTimeout = hardQueryTimeout = 0; > return; > } > > if (softQueryMargin == -1 && hardQueryMargin == -1) > throw new PSQLException("not implemented"); > > if (softQueryMargin != -1) > softQueryTimeout = max(1,N*1000+softQueryMargin) > else > softQueryTimeout = 0; > > if (hardQueryMargin != -1) > hardQueryTimeout = max(1,N*1000+hardQueryMargin) > else > hardQueryTimeout = 0; The net effect is that if you call "setQueryTimeout(N)" by default you get an attempt at query cancellation after N seconds and a hard close of the connection after N+60 seconds. Any comments on this iteration? Too configurable? Not configurable enough? Are the defaults sensible? -O
On Feb 18, 2008, at 8:39 PM, Oliver Jowett wrote: > Unfortunately statement_timeout is not sufficient for what I need, > so my changes will end up doing more than that. Here's an attempt at > a compromise: > > Add 4 new connection parameters, associated connection / statement > values and accessors on the postgresql extension interfaces: this seems complicated. can setQueryTimeout set statement_timeout, and then can any additional logic be implemented externally? i agree with tom that tying queries actually running long and network failures isn't appropriate. -pete -- (peter.royal|osi)@pobox.com - http://fotap.org/~osi
Attachment
Oliver Jowett wrote: > setQueryTimeout() is documented as: > >> Sets the number of seconds the driver will wait for a Statement object >> to execute to the given number of seconds. If the limit is exceeded, >> an SQLException is thrown. > > That would seem to cover any sort of delay, not just "it is a slow query > on the server side". If the Statement can't execute in N seconds, return > an error. That could mean slow network - slow query - whatever. > > I don't see why clients that want timeouts care about the cause of the > slowness - all they care about is that they get control back within > some reasonable amount of time, surely? What's the use case where you > would use setQueryTimeout() but don't actually mean "give me control > back in N seconds"? The documentation is unclear. But i think it is intended to limit the execution time on the server. And i do care about the cause of the slowness. If i call setQueryTimeout(), i expect to be able to use the connection after a timeout, including everything i had SET before, prepared statements etc. A network timeout or anything worse is handled in an entirely different part of the code. That is not the responsibility of the driver, and doesn't event need to be part of the database related code at all. Till
Till Toenges wrote: > A network timeout or anything worse is handled in an entirely different > part of the code. That is not the responsibility of the driver, and > doesn't event need to be part of the database related code at all. I'm not sure what you mean here - if the connection to the DB fails, the driver will certainly handle that and tell you about it via a SQLException! -O
Oliver Jowett wrote: > - softQueryTimeout: 0=disabled, >0 = timeout in ms, default 0 > - hardQueryTimeout: 0=disabled, >0 = timeout in ms, default 0 > - softQueryMargin: -1=disabled, >=0 = margin in ms, default 0 > - hardQueryMargin: -1=disabled, >=0 = margin in ms, default 60s I had to think about how you mean this. Doesn't this reduce to just one useful parameter, namely hardQueryMargin? If hardQueryMargin is set, then kill the connection after setQueryTimeout() + hardQueryMargin seconds? All other cases would be covered by setting appropriate values for these two. And i'm still not quite convinced that anything but "softQueryTimeout" should be implemented by the driver, but that's just my personal opinion. How about starting with the simple case (using set), and then see how it turns out in the real world? Till
Till Toenges wrote: > Oliver Jowett wrote: >> - softQueryTimeout: 0=disabled, >0 = timeout in ms, default 0 >> - hardQueryTimeout: 0=disabled, >0 = timeout in ms, default 0 >> - softQueryMargin: -1=disabled, >=0 = margin in ms, default 0 >> - hardQueryMargin: -1=disabled, >=0 = margin in ms, default 60s > > I had to think about how you mean this. Doesn't this reduce to just one > useful parameter, namely hardQueryMargin? If hardQueryMargin is set, > then kill the connection after setQueryTimeout() + hardQueryMargin > seconds? All other cases would be covered by setting appropriate values > for these two. The two Margins are there for applications that are not aware of the extensions and just uses setQueryTimeout(). If your code is aware of the two timeouts, it can set them directly. Or they can be set at the connection level and inherited by all users, even ones that don't use setQueryTimeout at all. > And i'm still not quite convinced that anything but "softQueryTimeout" > should be implemented by the driver, but that's just my personal > opinion. How about starting with the simple case (using set), and then > see how it turns out in the real world? The real world case I am dealing with needs the hardQueryTimeout behaviour. If I'm going to implement anything first, it's that. -O
Oliver Jowett wrote: >> A network timeout or anything worse is handled in an entirely >> different part of the code. That is not the responsibility of the >> driver, and doesn't event need to be part of the database related code >> at all. > > I'm not sure what you mean here - if the connection to the DB fails, the > driver will certainly handle that and tell you about it via a SQLException! Sorry. Of course the driver must handle failed connections. I meant the code in my application, which handles the two cases (timeout but usable connection vs broken connection for whatever reason) differently. Till
Till Toenges wrote: > Oliver Jowett wrote: >>> A network timeout or anything worse is handled in an entirely >>> different part of the code. That is not the responsibility of the >>> driver, and doesn't event need to be part of the database related >>> code at all. >> >> I'm not sure what you mean here - if the connection to the DB fails, >> the driver will certainly handle that and tell you about it via a >> SQLException! > > Sorry. Of course the driver must handle failed connections. I meant the > code in my application, which handles the two cases (timeout but usable > connection vs broken connection for whatever reason) differently. You can check the SQLState of any exception thrown to tell the difference, right? -O
Oliver Jowett wrote: > Till Toenges wrote: >> Oliver Jowett wrote: >>>> A network timeout or anything worse is handled in an entirely >>>> different part of the code. That is not the responsibility of the >>>> driver, and doesn't event need to be part of the database related >>>> code at all. >>> >>> I'm not sure what you mean here - if the connection to the DB fails, >>> the driver will certainly handle that and tell you about it via a >>> SQLException! >> >> Sorry. Of course the driver must handle failed connections. I meant >> the code in my application, which handles the two cases (timeout but >> usable connection vs broken connection for whatever reason) differently. > > You can check the SQLState of any exception thrown to tell the > difference, right? Just to clarify: - If the soft query timeout goes off and the server cancels the query, you will get a server-generated SQLException saying the query was cancelled. This will be some non-fatal SQLState and generally looks like any other SQL error generated by the server. The connection and statement will still be live (and in fact IIRC the transaction is still live and you can rollback to a savepoint, etc, if you like) - If the hard query timeout goes off, you will get a SQLException generated by the driver that has whatever the SQLState for connection errors is (I can't remember offhand) - but it will look very similar to the exception you get for any other connection problem when running a query. Basically the "hard query" timeout just introduces a new cause of connection errors - "server didn't respond within the timeout". It will otherwise behave like any other network-level connection error. -O
Oliver Jowett wrote: >>> Sorry. Of course the driver must handle failed connections. I meant >>> the code in my application, which handles the two cases (timeout but >>> usable connection vs broken connection for whatever reason) differently. >> >> You can check the SQLState of any exception thrown to tell the >> difference, right? Yes, i can check that. But if the connection is closed at this point, that's no longer useful, except for diagnostics. This is, of course, just the case where setQueryTimeout() is really "hard". > Just to clarify: > > - If the soft query timeout goes off and the server cancels the query, > you will get a server-generated SQLException saying the query was > cancelled. This will be some non-fatal SQLState and generally looks like > any other SQL error generated by the server. The connection and > statement will still be live (and in fact IIRC the transaction is still > live and you can rollback to a savepoint, etc, if you like) > > - If the hard query timeout goes off, you will get a SQLException > generated by the driver that has whatever the SQLState for connection > errors is (I can't remember offhand) - but it will look very similar to > the exception you get for any other connection problem when running a > query. > > Basically the "hard query" timeout just introduces a new cause of > connection errors - "server didn't respond within the timeout". It will > otherwise behave like any other network-level connection error. The distinction between hard and soft timeouts makes sense in this way. Please just make it so that hard timeouts are off by default. Till
"Oliver Jowett" <oliver@opencloud.com> writes: > The real world case I am dealing with needs the hardQueryTimeout behaviour. If > I'm going to implement anything first, it's that. I still don't understand why this has to be in the driver. Your application is going to have to be able to handle creating a new connection in response to this error so why can't it just also be the one to close the connection? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support!
On Tue, 19 Feb 2008, Oliver Jowett wrote: > Add 4 new connection parameters, associated connection / statement values and > accessors on the postgresql extension interfaces: > > - softQueryTimeout: 0=disabled, >0 = timeout in ms, default 0 > - hardQueryTimeout: 0=disabled, >0 = timeout in ms, default 0 > - softQueryMargin: -1=disabled, >=0 = margin in ms, default 0 > - hardQueryMargin: -1=disabled, >=0 = margin in ms, default 60s > > Any comments on this iteration? Too configurable? Not configurable enough? > Are the defaults sensible? > Sounds too configurable. I don't understand the use case for softQueryMargin at all right now. Since hardQueryTimeout + hardQueryMargin should be > softQueryTimeout + softQueryMargin for a reasonable configuration, we don't need all these parameters. Don't you just want: 1) defaultQueryTimeout: 0=disabled, >0 timeout in ms 2) hardTimeoutMargin: -1 disabled, >=0 hard timeout above query timeout Kris Jurka
Kris Jurka wrote: > On Tue, 19 Feb 2008, Oliver Jowett wrote: > >> Add 4 new connection parameters, associated connection / statement >> values and accessors on the postgresql extension interfaces: >> >> - softQueryTimeout: 0=disabled, >0 = timeout in ms, default 0 >> - hardQueryTimeout: 0=disabled, >0 = timeout in ms, default 0 >> - softQueryMargin: -1=disabled, >=0 = margin in ms, default 0 >> - hardQueryMargin: -1=disabled, >=0 = margin in ms, default 60s >> >> Any comments on this iteration? Too configurable? Not configurable >> enough? Are the defaults sensible? >> > > Sounds too configurable. I don't understand the use case for > softQueryMargin at all right now. Since hardQueryTimeout + > hardQueryMargin should be > softQueryTimeout + softQueryMargin for a > reasonable configuration, we don't need all these parameters. Don't you > just want: > > 1) defaultQueryTimeout: 0=disabled, >0 timeout in ms > 2) hardTimeoutMargin: -1 disabled, >=0 hard timeout above query timeout +1 from me, sounds easier to grasp, and looks like it would be everything Oliver needs, too. Oliver? setQueryTimeout() could by default only set "soft" timeout (hardTimeoutMargin set to -1) If hardTimeoutMargin is set to >= 0 you will get hard timeouts at setQueryTimeout + margin. Margin can be 0 of course. If you want to have soft/hard timeout for all queries, you set one or both parameters. Any use cases not working with just these two parameters? Best Regards Michael Paesold