Re: Fetch zero result rows when executing a query? - Mailing list pgsql-hackers

From David G Johnston
Subject Re: Fetch zero result rows when executing a query?
Date
Msg-id CAKFQuwZec+d0cz9qZAuujYTYhgLwmBDJJUY+oyE5MR3qgrsN8Q@mail.gmail.com
Whole thread Raw
In response to Re: Fetch zero result rows when executing a query?  (Shay Rojansky <roji@roji.org>)
List pgsql-hackers
On Sat, Feb 7, 2015 at 10:41 AM, Shay Rojansky [via PostgreSQL] <[hidden email]> wrote:
Sorry everyone, I was unexpectedly very busy and couldn't respond earlier... My apologies.

I'll clarify just a little... I am indeed talking about the PostgreSQL network protocol, and not about query optimization (with LIMIT or omitting RETURNING etc.). I am implementing ADO.NET's ExecuteNonQuery
​ ​
through which the user indicates they're not interested in any result rows whether those exist or not.

​​ExecuteNonQuery returns an integer while row-returning queries ​do not.  I'd argue that the API states that the user is declaring that the query they are executing does not return any actual rows - just a count of affected rows - not that they do not care to see what rows are returned.
 
For the situation where a user does ExecuteNonQuery but the query returns result rows, the driver can save the needless network transfers. We can definitely say it's the user's fault for providing a query with a resultset to ExecuteNonQuery, but we *do* have the user's clear intention that no rows be fetched so why not act on it. I agree this isn't a terribly important optimization, the trigger for this question was first and foremost curiosity: it seems strange the protocol allows you to specify max_rows for any value other than 0.

​Yes, it does seem strange and, like Marko said, ideally would be deprecated.  The fact that it cannot handle "zero rows" seems like an unnecessary limitation and I cannot image that any values other than 0 and all would be of practical usage.  In the case of zero returning instead the number of rows would be more useful than simply refusing to return anything so even if something like this is needed the current implementation is flawed.


Here's a possible believable use-case which doesn't involve user neglect: imagine some server-side function which has side-effects and also returns some rows. In some situations the user is interested in the result rows, but in others they only want the side-effect. The user would probably have no control over the function, and their only way to *not* transfer the result rows would be with a mechanism such as max_rows.

​Functions always return rows and so should not be executed using "ExecuteNonQuery".  In most cases action-oriented functions return a single result-status row so ignoring that row, while likely not advisable, is not exactly expensive.​


Marko, regarding your general criticism of max_rows:

> This seems to be a common pattern, and I think it's a *huge* mistake to specify maxrows=1 and/or ignore rows after the first one in
> the driver layer.  If the user says "give me the only row returned by this query", the interface should check that only one row is in
> reality returned by the query.  If the query returns more than one row, the user made a mistake in formulating the query and she
> probably wants to know about it.  If she genuinely doesn't care about the rows after the first one, she can always specify LIMIT 1.

> For a sad example, look at PL/PgSQL's  SELECT .. INTO ..; it's not terribly difficult to write a query which returns more than one
> row *by mistake* and have something really bad happen later on since it went undetected during testing because you just
> happened to get the expected row back first.  And when you do want to specifically enforce it for e.g. security critical code,
> you have to resort to really ugly hacks like window functions.

There are some problems with what you say... First, the ADO.NET API provides a SingleRow API option which explicitly provides exactly this. This API option doesn't at all mean that there *should* be only one row (i.e. an error should be raised if otherwise), but simply that any other rows beyond the first should be discarded. So regardless of what we think best practices are on this, this behavior is mandated/specified by a major API. 

​"​The resultset may contain multiple rows, which are ignored by ExecuteScalar"; so ignore them.

The basic question here becomes - the executor already must generate, in memory, all of the rows so is there a way to properly interact with the server where you can request the number of rows that were generated but not be obligated to actually pull them down to the client.  This doesn't seem like an unreasonable request but assuming that it is not currently possible (of which I have little clue) then the question becomes who cares enough to design and implement such a protocol enhancement.
 

More to the point, doesn't max_rows=1 have exactly the same dangers as LIMIT 1? The two seem to be identical, except that one is expressed in the SQL query and the other at the network protocol level.

​The planner does not have access to network protocol level​ options while it does know about LIMIT.

 
The way I see it, if the user specifies one of them without specifying ORDER BY, they are explicitly saying they don't care which row comes out. And if their testing code fails because this is wrong, then they've made a mistake - IMHo this isn't a reason to kill the entire feature.

In general, in my view it's beneficial to separate between the SQL queries and the features that the driver is supposed to provide in its API. The SQL may be written or managed by one entity, reused in many places (some of which want all rows and others which want only 1).


​Then the driver writers that need these special API behaviors are reasonably expected to contribute to adding them to backend products that do not already have them.  The database developers are not going to take on responsibility for the API decisions of others; and features deemed (or that in reality are) of marginal usefulness are likely to be omitted - intentionally or otherwise - from the official (in this case libpq) protocol.

Expecting users to use an API without knowledge or control of the SQL that is being executed seems like a stretch to me.  Expecting the driver to simply provide an easy way to access data from the common SQL idioms a user might use seems like a reasonable goal and puts leaves the smarts in the purvue of the planner.  The optimization you require doesn't seem unreasonable but also doesn't seem especially compelling - nor matter how many people might be using ADO.NET (which provides no indication that they are trying to use APIs that are incompatible with the queries that they are sending.

David J.




View this message in context: Re: Fetch zero result rows when executing a query?
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

pgsql-hackers by date:

Previous
From: Shay Rojansky
Date:
Subject: Re: Fetch zero result rows when executing a query?
Next
From: Robert Haas
Date:
Subject: perplexing error message