Re: statement_timeout affects query results fetching? - Mailing list pgsql-hackers

From Shay Rojansky
Subject Re: statement_timeout affects query results fetching?
Date
Msg-id CADT4RqDhytLw5Tk+XJ6BwJOyn=-e-RkPZ4tNjWVd+U-f18WH1w@mail.gmail.com
Whole thread Raw
In response to Re: statement_timeout affects query results fetching?  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Thanks (once again!) for the valuable suggestions Robert.

The idea of chunking/buffering via cursors has been raised before for another purpose - allowing multiple queries "concurrently" at the API level (where concurrently means interleaving when reading the resultsets). This would imply exposing the number of rows fetched to the user like you suggested. However, I don't think there's a way we can remove the API option to *not* buffer (as I said before, ADO.NET even provides a standard API feature for reading column-by-column), and therefore the general problem remains...

I think the right solution for us at the driver level would be to switch to driver-enforced timeouts, i.e. to no longer use statement_timeout but look at socket read times instead. I'll look into doing that for our next version.

Thanks for all your thoughts!

On Mon, Aug 10, 2015 at 2:30 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Aug 10, 2015 at 5:25 AM, Shay Rojansky <roji@roji.org> wrote:
> Thanks for the explanation Robert, that makes total sense. However, it seems
> like the utility of PG's statement_timeout is much more limited than I
> thought.
>
> In case you're interested, I dug a little further and it seems that
> Microsoft's client for SQL Server implements the following timeout (source):
>
> cumulative time-out (for all network packets that are read during the
> invocation of a method) for all network reads during command execution or
> processing of the results. A time-out can still occur after the first row is
> returned, and does not include user processing time, only network read time.
>
> Since it doesn't seem possible to have a clean query-processing-only timeout
> at the backend, we may be better off doing something similar to the above
> and enforce timeouts on the client only. Any further thoughts on this would
> be appreciated.

An alternative you may want to consider is using the Execute message
with a non-zero row count and reading all of the returned rows as they
come back, buffering them in memory.  When those have all been
consumed, issue another Execute message and get some more rows.

AFAICS, the biggest problem with this is that there's no good way to
bound the number of rows returned by size rather than by number, which
has been complained about before by somebody else in a situation
similar to yours.  Another problem is that I believe it will cause
cursor_tuple_fraction to kick in, which may change query plans.  But
it does have the advantage that the query will be suspended from the
server's point of view, which I *think* will toll statement_timeout.

You might also consider exposing some knobs to the user, so that they
can set the number of rows fetched in one go, and let that be all the
rows or only some of them.

We really need a better way of doing this, but I think this is the way
other drivers are handling it now.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

Previous
From: Anastasia Lubennikova
Date:
Subject: Re: How to compare different datums within from a tuple?
Next
From: Amit Kapila
Date:
Subject: Re: Reducing ClogControlLock contention