Re: statement_timeout affects query results fetching? - Mailing list pgsql-hackers

From Robert Haas
Subject Re: statement_timeout affects query results fetching?
Date
Msg-id CA+Tgmoafyv8ObYiGYBoJJoevcvgbBxiWNyc3unGG=Z2GvqkNhQ@mail.gmail.com
Whole thread Raw
In response to Re: statement_timeout affects query results fetching?  (Shay Rojansky <roji@roji.org>)
Responses Re: statement_timeout affects query results fetching?
List pgsql-hackers
On Sat, Aug 8, 2015 at 11:30 AM, Shay Rojansky <roji@roji.org> wrote:
> the entire row in memory (imagine rows with megabyte-sized columns). This
> makes sense to me; Tom, doesn't the libpq behavior you describe of absorbing
> the result set as fast as possible mean that a lot of memory is wasted on
> the client side?

It sure does.

> I can definitely appreciate the complexity of changing this behavior. I
> think that some usage cases (such as mine) would benefit from a timeout on
> the time until the first row is sent, this would allow to put an upper cap
> on stuff like query complexity, for example.

Unfortunately, it would not do any such thing.  It's possible for the
first row to be returned really really fast and then for an arbitrary
amount of time to pass and computation to happen before all the rows
are returned.  A plan can have a startup cost of zero and a run cost
of a billion (or a trillion).  This kind of scenario isn't even
particularly uncommon.  You just need a plan that looks like this:

Nested Loop
-> Nested Loop -> Nested Loop   -> Seq Scan   -> Index Scan -> Index Scan
-> Index Scan

You can just keep pushing more nested loop/index scans on there and
the first row will still pop out quite fast.  But if the seq-scanned
table is large, generating the whole result set can take a long, long
time.

Even worse, you could have a case like this:

SELECT somefunc(a) FROM foo;

Now suppose somefunc is normally very quick, but if a = 42 then it
does pg_sleep() in a loop until the world ends.   You're going to have
however many rows of foo have a != 42 pop out near-instantaneously,
and then it will go into the tank and not come out until the meaning
of life, the universe, and everything is finally revealed.

That second case is a little extreme, and a little artificial, but the
point is this: just as you don't want the client to have to buffer the
results in memory, the server doesn't either.  It's not the case that
the server computes the rows and sends them to you.  Each one is sent
to you as it is computed, and in the normal case, at the time the
first row is sent, only a small percentage of the total work of the
query has been performed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: WIP: SCRAM authentication
Next
From: Robert Haas
Date:
Subject: Re: cache invalidation skip logic