Re: Fetch zero result rows when executing a query? - Mailing list pgsql-hackers

From Shay Rojansky
Subject Re: Fetch zero result rows when executing a query?
Date
Msg-id CADT4RqBdKTWR8gmpO509Bwonhxfu3RU=Kp1aRz39kwkR8BkFow@mail.gmail.com
Whole thread Raw
In response to Fetch zero result rows when executing a query?  (Shay Rojansky <roji@roji.org>)
Responses Re: Fetch zero result rows when executing a query?  (Marko Tiikkaja <marko@joh.to>)
Re: Fetch zero result rows when executing a query?  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
First a general comment:

> Then the driver writers that need these special API behaviors are
> reasonably expected to contribute to adding them to backend products that
> do not already have them.  The database developers are not going to take on
> responsibility for the API decisions of others; and features deemed (or
> that in reality are) of marginal usefulness are likely to be omitted -
> intentionally or otherwise - from the official (in this case libpq)
> protocol.

I absolutely agree with you there, I'm not trying to get anybody to implement
something I need (i.e. fetch 0 rows). This is more of a general discussion as
to whether that feature *makes sense* to you as a protocol feature (which doesn't
seem to be the case, as some of you guys want to deprecate the whole max_rows
thing).

>> I'll clarify just a little... I am indeed talking about the PostgreSQL
>> network protocol, and not about query optimization (with LIMIT or omitting
>> RETURNING etc.). I am implementing ADO.NET's ExecuteNonQuery
>> through which the user indicates they're not interested in any result rows
>> whether those exist or not.

> ExecuteNonQuery returns an integer while row-returning queries do not.
> I'd argue that the API states that the user is declaring that the query
> they are executing does not return any actual rows - just a count of
> affected rows - not that they do not care to see what rows are returned.

That's true. IMHO the count of affected rows isn't relevant to this discussion
so I didn't mention it.

>> For the situation where a user does ExecuteNonQuery but the query returns
>> result rows, the driver can save the needless network transfers. We can
>> definitely say it's the user's fault for providing a query with a resultset
>> to ExecuteNonQuery, but we *do* have the user's clear intention that no
>> rows be fetched so why not act on it. I agree this isn't a terribly
>> important optimization, the trigger for this question was first and
>> foremost curiosity: it seems strange the protocol allows you to specify
>> max_rows for any value other than 0.

> Yes, it does seem strange and, like Marko said, ideally would be
> deprecated.  The fact that it cannot handle "zero rows" seems like an
> unnecessary limitation and I cannot image that any values other than 0 and
> all would be of practical usage.  In the case of zero returning instead the
> number of rows would be more useful than simply refusing to return anything
> so even if something like this is needed the current implementation is
> flawed.

Just to be precise: what is strange to me is that the max_rows feature exists
but has no 0 value. You and Marko are arguing that the whole feature should be
deprecated (i.e. always return all rows).

>> Here's a possible believable use-case which doesn't involve user neglect:
>> imagine some server-side function which has side-effects and also returns
>> some rows. In some situations the user is interested in the result rows,
>> but in others they only want the side-effect. The user would probably have
>> no control over the function, and their only way to *not* transfer the
>> result rows would be with a mechanism such as max_rows.

> Functions always return rows and so should not be executed using
> "ExecuteNonQuery".  In most cases action-oriented functions return a single
> result-status row so ignoring that row, while likely not advisable, is not
> exactly expensive.

Your description of functions doesn't hold for all functions, this is why I
tried to provide a usecase. It is possible for some function to both have a
side-effect (i.e. modify some table) *and* return a large number of rows. It
may be legitimate for a user to want to have the side-effect but not care
about the rows.  Ignoring one row isn't expensive, ignoring many could be.

> The basic question here becomes - the executor already must generate, in
> memory, all of the rows so is there a way to properly interact with the
> server where you can request the number of rows that were generated but not
> be obligated to actually pull them down to the client.  This doesn't seem
> like an unreasonable request but assuming that it is not currently possible
> (of which I have little clue) then the question becomes who cares enough to
> design and implement such a protocol enhancement.

OK.

>> More to the point, doesn't max_rows=1 have exactly the same dangers as
>> LIMIT 1? The two seem to be identical, except that one is expressed in the
>> SQL query and the other at the network protocol level.

> The planner does not have access to network protocol level? options while
> it does know about LIMIT.

That's an internal PostgreSQL matter (which granted, may impact efficiency).
My comment about max_rows being equivalent to LIMIT was meant to address Marko's
argument that max_rows is dangerous because any row might come out and tests
may pass accidentally (but that holds for LIMIT 1 as well, doesn't it).

> Expecting users to use an API without knowledge or control of the SQL that
> is being executed seems like a stretch to me.  Expecting the driver to
> simply provide an easy way to access data from the common SQL idioms a user
> might use seems like a reasonable goal and puts leaves the smarts in the
> purvue of the planner.  The optimization you require doesn't seem
> unreasonable but also doesn't seem especially compelling - nor matter how
> many people might be using ADO.NET (which provides no indication that they
> are trying to use APIs that are incompatible with the queries that they are
> sending.

Fair enough, I don't disagree with the above. The idea is less "no knowledge or
control of the SQL", but rather a complicated prepared statement that is executed
in some places to fetch all rows, and in others to fetch only 1 (maybe this is the
most compelling usecase).

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code
Next
From: Marko Tiikkaja
Date:
Subject: Re: Fetch zero result rows when executing a query?