Thread: statement_timeout affects query results fetching?

statement_timeout affects query results fetching?

From
Shay Rojansky
Date:
Hi everyone, I'm seeing some strange behavior and wanted to confirm it.

When executing a query that selects a long result set, if the code processing the results takes its time (i.e.g more than statement_timeout), a timeout occurs. My expectation was that statement_timeout only affects query *processing*, and does not cover the frontend actually processing the result.

First, I wanted to confirm that this is the case (and not some sort of bug in my code).

If this is the case, is this intended? Here are some points:
* It makes statement_timeout very difficult to use; clients do very diverse things with database results, it may be very difficult (or impossible) to estimate how much time they should take (e.g. random load factors on the client machine or on some other machine receiving results).
* It makes it impossible to specifically detect problematic *queries* which take too long to execute (as opposed to the time taken to process their results).

If you do insist that this behavior is correct, a documentation update for statement_timeout might make this clearer.

Thanks,

Shay

Re: statement_timeout affects query results fetching?

From
Robert Haas
Date:
On Sat, Aug 8, 2015 at 5:31 AM, Shay Rojansky <roji@roji.org> wrote:
> Hi everyone, I'm seeing some strange behavior and wanted to confirm it.
>
> When executing a query that selects a long result set, if the code
> processing the results takes its time (i.e.g more than statement_timeout), a
> timeout occurs. My expectation was that statement_timeout only affects query
> *processing*, and does not cover the frontend actually processing the
> result.
>
> First, I wanted to confirm that this is the case (and not some sort of bug
> in my code).
>
> If this is the case, is this intended? Here are some points:
> * It makes statement_timeout very difficult to use; clients do very diverse
> things with database results, it may be very difficult (or impossible) to
> estimate how much time they should take (e.g. random load factors on the
> client machine or on some other machine receiving results).
> * It makes it impossible to specifically detect problematic *queries* which
> take too long to execute (as opposed to the time taken to process their
> results).
>
> If you do insist that this behavior is correct, a documentation update for
> statement_timeout might make this clearer.

I think the issue here is that we start returning rows to the client
while the query is still executing.

Suppose each row of output takes 100 ms of CPU time to generate and
there are 1,000,000 rows.  Then it's quite conceivable that we could
be under the statement_timeout when we start returning rows to the
client, but over the statement_timeout by the time we finish -- and
the user would probably want statement_timeout to kick in in that
case, because that's a lotta CPU time.

I suppose we could try to toll statement_timeout while we're blocked
waiting for the client, but nobody wrote the code for that yet.  And
it would mean that you can't use statement_timeout to prevent xmin
from lagging, which could be why you set it in the first place.  There
might also be some usability difficulties: pg_stat_activity shows the
time the query started, so if you know what statement_timeout is you
can tell how close it is to being killed.  If some of the time isn't
counted, then you can't tell any more.

Another approach (which I think might be better) is to have GUCs like
statement_cpu_limit and statement_io_limit that kill a query when it
uses more than the configured amount of that resource.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: statement_timeout affects query results fetching?

From
Tom Lane
Date:
Shay Rojansky <roji@roji.org> writes:
> Hi everyone, I'm seeing some strange behavior and wanted to confirm it.
> When executing a query that selects a long result set, if the code
> processing the results takes its time (i.e.g more than statement_timeout),
> a timeout occurs. My expectation was that statement_timeout only affects
> query *processing*, and does not cover the frontend actually processing the
> result.

Are you using a cursor, or something like that?

libpq ordinarily absorbs the result set as fast as possible and then hands
it back to the application as one blob; the time subsequently spent by the
application looking at the blob doesn't count against statement_timeout.

As Robert says, statement_timeout *does* include time spent sending the
result set to the client, and we're not going to change that, because it
would be too hard to disentangle calculation from I/O.  So if the client
isn't prompt about absorbing all the data then you have to factor that
into your setting.  But it would be a slightly unusual usage pattern
AFAIK.
        regards, tom lane



Re: statement_timeout affects query results fetching?

From
Shay Rojansky
Date:
Thanks for your responses.

I'm not using cursors or anything fancy. The expected behavior (as far as I can tell) for a .NET database driver is to read one row at a time from the database and make it available. There's even a standard API option for fetching data on a column-by-column basis: this allows the user to not hold the entire row in memory (imagine rows with megabyte-sized columns). This makes sense to me; Tom, doesn't the libpq behavior you describe of absorbing the result set as fast as possible mean that a lot of memory is wasted on the client side? I'd be interested in your take on this.

I can definitely appreciate the complexity of changing this behavior. I think that some usage cases (such as mine) would benefit from a timeout on the time until the first row is sent, this would allow to put an upper cap on stuff like query complexity, for example.

Shay

On Sat, Aug 8, 2015 at 5:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Shay Rojansky <roji@roji.org> writes:
> Hi everyone, I'm seeing some strange behavior and wanted to confirm it.
> When executing a query that selects a long result set, if the code
> processing the results takes its time (i.e.g more than statement_timeout),
> a timeout occurs. My expectation was that statement_timeout only affects
> query *processing*, and does not cover the frontend actually processing the
> result.

Are you using a cursor, or something like that?

libpq ordinarily absorbs the result set as fast as possible and then hands
it back to the application as one blob; the time subsequently spent by the
application looking at the blob doesn't count against statement_timeout.

As Robert says, statement_timeout *does* include time spent sending the
result set to the client, and we're not going to change that, because it
would be too hard to disentangle calculation from I/O.  So if the client
isn't prompt about absorbing all the data then you have to factor that
into your setting.  But it would be a slightly unusual usage pattern
AFAIK.

                        regards, tom lane

Re: statement_timeout affects query results fetching?

From
Shay Rojansky
Date:
I'd also recommend adding a sentence about this aspect of statement_timeout in the docs to prevent confusion...

On Sat, Aug 8, 2015 at 5:30 PM, Shay Rojansky <roji@roji.org> wrote:
Thanks for your responses.

I'm not using cursors or anything fancy. The expected behavior (as far as I can tell) for a .NET database driver is to read one row at a time from the database and make it available. There's even a standard API option for fetching data on a column-by-column basis: this allows the user to not hold the entire row in memory (imagine rows with megabyte-sized columns). This makes sense to me; Tom, doesn't the libpq behavior you describe of absorbing the result set as fast as possible mean that a lot of memory is wasted on the client side? I'd be interested in your take on this.

I can definitely appreciate the complexity of changing this behavior. I think that some usage cases (such as mine) would benefit from a timeout on the time until the first row is sent, this would allow to put an upper cap on stuff like query complexity, for example.

Shay

On Sat, Aug 8, 2015 at 5:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Shay Rojansky <roji@roji.org> writes:
> Hi everyone, I'm seeing some strange behavior and wanted to confirm it.
> When executing a query that selects a long result set, if the code
> processing the results takes its time (i.e.g more than statement_timeout),
> a timeout occurs. My expectation was that statement_timeout only affects
> query *processing*, and does not cover the frontend actually processing the
> result.

Are you using a cursor, or something like that?

libpq ordinarily absorbs the result set as fast as possible and then hands
it back to the application as one blob; the time subsequently spent by the
application looking at the blob doesn't count against statement_timeout.

As Robert says, statement_timeout *does* include time spent sending the
result set to the client, and we're not going to change that, because it
would be too hard to disentangle calculation from I/O.  So if the client
isn't prompt about absorbing all the data then you have to factor that
into your setting.  But it would be a slightly unusual usage pattern
AFAIK.

                        regards, tom lane


Re: statement_timeout affects query results fetching?

From
Robert Haas
Date:
On Sat, Aug 8, 2015 at 11:30 AM, Shay Rojansky <roji@roji.org> wrote:
> the entire row in memory (imagine rows with megabyte-sized columns). This
> makes sense to me; Tom, doesn't the libpq behavior you describe of absorbing
> the result set as fast as possible mean that a lot of memory is wasted on
> the client side?

It sure does.

> I can definitely appreciate the complexity of changing this behavior. I
> think that some usage cases (such as mine) would benefit from a timeout on
> the time until the first row is sent, this would allow to put an upper cap
> on stuff like query complexity, for example.

Unfortunately, it would not do any such thing.  It's possible for the
first row to be returned really really fast and then for an arbitrary
amount of time to pass and computation to happen before all the rows
are returned.  A plan can have a startup cost of zero and a run cost
of a billion (or a trillion).  This kind of scenario isn't even
particularly uncommon.  You just need a plan that looks like this:

Nested Loop
-> Nested Loop -> Nested Loop   -> Seq Scan   -> Index Scan -> Index Scan
-> Index Scan

You can just keep pushing more nested loop/index scans on there and
the first row will still pop out quite fast.  But if the seq-scanned
table is large, generating the whole result set can take a long, long
time.

Even worse, you could have a case like this:

SELECT somefunc(a) FROM foo;

Now suppose somefunc is normally very quick, but if a = 42 then it
does pg_sleep() in a loop until the world ends.   You're going to have
however many rows of foo have a != 42 pop out near-instantaneously,
and then it will go into the tank and not come out until the meaning
of life, the universe, and everything is finally revealed.

That second case is a little extreme, and a little artificial, but the
point is this: just as you don't want the client to have to buffer the
results in memory, the server doesn't either.  It's not the case that
the server computes the rows and sends them to you.  Each one is sent
to you as it is computed, and in the normal case, at the time the
first row is sent, only a small percentage of the total work of the
query has been performed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: statement_timeout affects query results fetching?

From
Shay Rojansky
Date:
Thanks for the explanation Robert, that makes total sense. However, it seems like the utility of PG's statement_timeout is much more limited than I thought.

In case you're interested, I dug a little further and it seems that Microsoft's client for SQL Server implements the following timeout (source):

cumulative time-out (for all network packets that are read during the invocation of a method) for all network reads during command execution or processing of the results. A time-out can still occur after the first row is returned, and does not include user processing time, only network read time.

Since it doesn't seem possible to have a clean query-processing-only timeout at the backend, we may be better off doing something similar to the above and enforce timeouts on the client only. Any further thoughts on this would be appreciated.

On Sun, Aug 9, 2015 at 5:21 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sat, Aug 8, 2015 at 11:30 AM, Shay Rojansky <roji@roji.org> wrote:
> the entire row in memory (imagine rows with megabyte-sized columns). This
> makes sense to me; Tom, doesn't the libpq behavior you describe of absorbing
> the result set as fast as possible mean that a lot of memory is wasted on
> the client side?

It sure does.

> I can definitely appreciate the complexity of changing this behavior. I
> think that some usage cases (such as mine) would benefit from a timeout on
> the time until the first row is sent, this would allow to put an upper cap
> on stuff like query complexity, for example.

Unfortunately, it would not do any such thing.  It's possible for the
first row to be returned really really fast and then for an arbitrary
amount of time to pass and computation to happen before all the rows
are returned.  A plan can have a startup cost of zero and a run cost
of a billion (or a trillion).  This kind of scenario isn't even
particularly uncommon.  You just need a plan that looks like this:

Nested Loop
-> Nested Loop
  -> Nested Loop
    -> Seq Scan
    -> Index Scan
  -> Index Scan
-> Index Scan

You can just keep pushing more nested loop/index scans on there and
the first row will still pop out quite fast.  But if the seq-scanned
table is large, generating the whole result set can take a long, long
time.

Even worse, you could have a case like this:

SELECT somefunc(a) FROM foo;

Now suppose somefunc is normally very quick, but if a = 42 then it
does pg_sleep() in a loop until the world ends.   You're going to have
however many rows of foo have a != 42 pop out near-instantaneously,
and then it will go into the tank and not come out until the meaning
of life, the universe, and everything is finally revealed.

That second case is a little extreme, and a little artificial, but the
point is this: just as you don't want the client to have to buffer the
results in memory, the server doesn't either.  It's not the case that
the server computes the rows and sends them to you.  Each one is sent
to you as it is computed, and in the normal case, at the time the
first row is sent, only a small percentage of the total work of the
query has been performed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: statement_timeout affects query results fetching?

From
Robert Haas
Date:
On Mon, Aug 10, 2015 at 5:25 AM, Shay Rojansky <roji@roji.org> wrote:
> Thanks for the explanation Robert, that makes total sense. However, it seems
> like the utility of PG's statement_timeout is much more limited than I
> thought.
>
> In case you're interested, I dug a little further and it seems that
> Microsoft's client for SQL Server implements the following timeout (source):
>
> cumulative time-out (for all network packets that are read during the
> invocation of a method) for all network reads during command execution or
> processing of the results. A time-out can still occur after the first row is
> returned, and does not include user processing time, only network read time.
>
> Since it doesn't seem possible to have a clean query-processing-only timeout
> at the backend, we may be better off doing something similar to the above
> and enforce timeouts on the client only. Any further thoughts on this would
> be appreciated.

An alternative you may want to consider is using the Execute message
with a non-zero row count and reading all of the returned rows as they
come back, buffering them in memory.  When those have all been
consumed, issue another Execute message and get some more rows.

AFAICS, the biggest problem with this is that there's no good way to
bound the number of rows returned by size rather than by number, which
has been complained about before by somebody else in a situation
similar to yours.  Another problem is that I believe it will cause
cursor_tuple_fraction to kick in, which may change query plans.  But
it does have the advantage that the query will be suspended from the
server's point of view, which I *think* will toll statement_timeout.

You might also consider exposing some knobs to the user, so that they
can set the number of rows fetched in one go, and let that be all the
rows or only some of them.

We really need a better way of doing this, but I think this is the way
other drivers are handling it now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: statement_timeout affects query results fetching?

From
Shay Rojansky
Date:
Thanks (once again!) for the valuable suggestions Robert.

The idea of chunking/buffering via cursors has been raised before for another purpose - allowing multiple queries "concurrently" at the API level (where concurrently means interleaving when reading the resultsets). This would imply exposing the number of rows fetched to the user like you suggested. However, I don't think there's a way we can remove the API option to *not* buffer (as I said before, ADO.NET even provides a standard API feature for reading column-by-column), and therefore the general problem remains...

I think the right solution for us at the driver level would be to switch to driver-enforced timeouts, i.e. to no longer use statement_timeout but look at socket read times instead. I'll look into doing that for our next version.

Thanks for all your thoughts!

On Mon, Aug 10, 2015 at 2:30 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Aug 10, 2015 at 5:25 AM, Shay Rojansky <roji@roji.org> wrote:
> Thanks for the explanation Robert, that makes total sense. However, it seems
> like the utility of PG's statement_timeout is much more limited than I
> thought.
>
> In case you're interested, I dug a little further and it seems that
> Microsoft's client for SQL Server implements the following timeout (source):
>
> cumulative time-out (for all network packets that are read during the
> invocation of a method) for all network reads during command execution or
> processing of the results. A time-out can still occur after the first row is
> returned, and does not include user processing time, only network read time.
>
> Since it doesn't seem possible to have a clean query-processing-only timeout
> at the backend, we may be better off doing something similar to the above
> and enforce timeouts on the client only. Any further thoughts on this would
> be appreciated.

An alternative you may want to consider is using the Execute message
with a non-zero row count and reading all of the returned rows as they
come back, buffering them in memory.  When those have all been
consumed, issue another Execute message and get some more rows.

AFAICS, the biggest problem with this is that there's no good way to
bound the number of rows returned by size rather than by number, which
has been complained about before by somebody else in a situation
similar to yours.  Another problem is that I believe it will cause
cursor_tuple_fraction to kick in, which may change query plans.  But
it does have the advantage that the query will be suspended from the
server's point of view, which I *think* will toll statement_timeout.

You might also consider exposing some knobs to the user, so that they
can set the number of rows fetched in one go, and let that be all the
rows or only some of them.

We really need a better way of doing this, but I think this is the way
other drivers are handling it now.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company