Thread: Fetch zero result rows when executing a query?

Fetch zero result rows when executing a query?

From
Shay Rojansky
Date:
Sorry if this has been asked before, couldn't find any mention...

I'm working on the Npgsql, the .NET driver for PostgreSQL, and am trying to find a way to execute a query but without fetching any rows. The Execute message has a maximum result-row count, but zero is documented to mean "fetch all rows".

The use case would be sending a query which might modify or might not (e.g. UPDATE), but we know that the user is uninterested in any result row.

My current workaround is to specify maxrows=1, was wondering if I missed a better alternative.

Thanks,

Shay

Re: Fetch zero result rows when executing a query?

From
Andres Freund
Date:
Hi,

On 2015-02-03 12:26:33 +0100, Shay Rojansky wrote:
> Sorry if this has been asked before, couldn't find any mention...
> 
> I'm working on the Npgsql, the .NET driver for PostgreSQL, and am trying to
> find a way to execute a query but without fetching any rows. The Execute
> message has a maximum result-row count, but zero is documented to mean
> "fetch all rows".
> 
> The use case would be sending a query which might modify or might not (e.g.
> UPDATE), but we know that the user is uninterested in any result row.
> 
> My current workaround is to specify maxrows=1, was wondering if I missed a
> better alternative.

Is this really a relevant optimization? If the user doesn't want
results, RETURNING shouldn't be specified... Sure, sometimes the same
query will be reused over cases where you want the results and those
where you don't, but I doubt this is worthy of optimization.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Fetch zero result rows when executing a query?

From
Jim Nasby
Date:
On 2/3/15 5:26 AM, Shay Rojansky wrote:
> Sorry if this has been asked before, couldn't find any mention...
>
> I'm working on the Npgsql, the .NET driver for PostgreSQL, and am trying
> to find a way to execute a query but without fetching any rows. The
> Execute message has a maximum result-row count, but zero is documented
> to mean "fetch all rows".
>
> The use case would be sending a query which might modify or might not
> (e.g. UPDATE), but we know that the user is uninterested in any result row.
>
> My current workaround is to specify maxrows=1, was wondering if I missed
> a better alternative.

You might be able to add something like WHERE FALSE to the RETURNING 
clause, but I agree with Andres; this seems like premature optimization.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Fetch zero result rows when executing a query?

From
"Stephen R. van den Berg"
Date:
Shay Rojansky wrote:
>I'm working on the Npgsql, the .NET driver for PostgreSQL, and am trying to
>find a way to execute a query but without fetching any rows. The Execute
>message has a maximum result-row count, but zero is documented to mean
>"fetch all rows".

>The use case would be sending a query which might modify or might not (e.g.
>UPDATE), but we know that the user is uninterested in any result row.

>My current workaround is to specify maxrows=1, was wondering if I missed a
>better alternative.

If you know beforehand the query generates at most one row, then
specifying maxrows=0 is best.
If you know beforehand the query might generate more than one row (SELECT)
yet you also know that you are not interested in those, then maxrows=1
is best; then again, modifying the query to include a LIMIT 1 is even
better, in which case maxrows can be zero again.
maxrows=1 for UPDATE/DELETE/INSERT does not make sense, use maxrows=0.
-- 
Stephen.



Re: Fetch zero result rows when executing a query?

From
"Stephen R. van den Berg"
Date:
Andres Freund wrote:
>On 2015-02-03 12:26:33 +0100, Shay Rojansky wrote:
>> find a way to execute a query but without fetching any rows. The Execute
>> message has a maximum result-row count, but zero is documented to mean
>> "fetch all rows".

>Is this really a relevant optimization? If the user doesn't want

I believe he's talking about the network protocol of postgreSQL, not
about query optimisation (as you do).
-- 
Stephen.



Re: Fetch zero result rows when executing a query?

From
Andres Freund
Date:
On 2015-02-04 12:17:23 +0100, Stephen R. van den Berg wrote:
> Andres Freund wrote:
> >On 2015-02-03 12:26:33 +0100, Shay Rojansky wrote:
> >> find a way to execute a query but without fetching any rows. The Execute
> >> message has a maximum result-row count, but zero is documented to mean
> >> "fetch all rows".
> 
> >Is this really a relevant optimization? If the user doesn't want
> 
> I believe he's talking about the network protocol of postgreSQL, not
> about query optimisation (as you do).

I actually am not. There's no rows returned by a UPDATE without a
RETURNING. So the whole question doesn't really make much sense without
RETURNING... And I think it's premature optimization to ignore
RETURNING's results in the driver, even if the user doesn't ask for
them.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Fetch zero result rows when executing a query?

From
Marko Tiikkaja
Date:
On 2/4/15 12:13 PM, Stephen R. van den Berg wrote:
> If you know beforehand the query might generate more than one row (SELECT)
> yet you also know that you are not interested in those, then maxrows=1
> is best; then again, modifying the query to include a LIMIT 1 is even
> better, in which case maxrows can be zero again.

This seems to be a common pattern, and I think it's a *huge* mistake to 
specify maxrows=1 and/or ignore rows after the first one in the driver 
layer.  If the user says "give me the only row returned by this query", 
the interface should check that only one row is in reality returned by 
the query.  If the query returns more than one row, the user made a 
mistake in formulating the query and she probably wants to know about 
it.  If she genuinely doesn't care about the rows after the first one, 
she can always specify LIMIT 1.

For a sad example, look at PL/PgSQL's  SELECT .. INTO ..; it's not 
terribly difficult to write a query which returns more than one row *by 
mistake* and have something really bad happen later on since it went 
undetected during testing because you just happened to get the expected 
row back first.  And when you do want to specifically enforce it for 
e.g. security critical code, you have to resort to really ugly hacks 
like window functions.


.m



Re: Fetch zero result rows when executing a query?

From
Marko Tiikkaja
Date:
On 2/4/15 12:17 PM, Stephen R. van den Berg wrote:
> Andres Freund wrote:
>> On 2015-02-03 12:26:33 +0100, Shay Rojansky wrote:
>>> find a way to execute a query but without fetching any rows. The Execute
>>> message has a maximum result-row count, but zero is documented to mean
>>> "fetch all rows".
>
>> Is this really a relevant optimization? If the user doesn't want
>
> I believe he's talking about the network protocol of postgreSQL, not
> about query optimisation (as you do).

I don't believe so.  If this is not about optimization, why can't the 
driver just ignore the rows from the server?


.m



Re: Fetch zero result rows when executing a query?

From
Andres Freund
Date:
On 2015-02-04 12:23:51 +0100, Marko Tiikkaja wrote:
> On 2/4/15 12:13 PM, Stephen R. van den Berg wrote:
> >If you know beforehand the query might generate more than one row (SELECT)
> >yet you also know that you are not interested in those, then maxrows=1
> >is best; then again, modifying the query to include a LIMIT 1 is even
> >better, in which case maxrows can be zero again.
> 
> This seems to be a common pattern, and I think it's a *huge* mistake to
> specify maxrows=1 and/or ignore rows after the first one in the driver
> layer.  If the user says "give me the only row returned by this query", the
> interface should check that only one row is in reality returned by the
> query

I don't think these are what this thread is about. It's about a UPDATE
(=> no LIMIT) where the user uses a driver interface that doesn't return
rows generated by the UPDATE (the above error check doesn't make sense).

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Fetch zero result rows when executing a query?

From
Andres Freund
Date:
On 2015-02-04 12:25:04 +0100, Marko Tiikkaja wrote:
> On 2/4/15 12:17 PM, Stephen R. van den Berg wrote:
> >Andres Freund wrote:
> >>On 2015-02-03 12:26:33 +0100, Shay Rojansky wrote:
> >>>find a way to execute a query but without fetching any rows. The Execute
> >>>message has a maximum result-row count, but zero is documented to mean
> >>>"fetch all rows".
> >
> >>Is this really a relevant optimization? If the user doesn't want
> >
> >I believe he's talking about the network protocol of postgreSQL, not
> >about query optimisation (as you do).
> 
> I don't believe so.

Did you read the original post?

> I'm working on the Npgsql, the .NET driver for PostgreSQL, and am trying to
> find a way to execute a query but without fetching any rows. The Execute
> message has a maximum result-row count, but zero is documented to mean
> "fetch all rows".

That's pretty unambiguously about a driver and the network protocol.

> If this is not about optimization, why can't the driver just ignore
> the rows from the server?

IIUC Shay wants to avoid the network traffic incurred by that. I think
that's premature, but it's certainly not completely bogus.

Greetings,

Andres Freund



Re: Fetch zero result rows when executing a query?

From
Marko Tiikkaja
Date:
On 2/4/15 12:31 PM, Andres Freund wrote:
> On 2015-02-04 12:25:04 +0100, Marko Tiikkaja wrote:
>> On 2/4/15 12:17 PM, Stephen R. van den Berg wrote:
>>> I believe he's talking about the network protocol of postgreSQL, not
>>> about query optimisation (as you do).
>>
>> I don't believe so.
>
> Did you read the original post?

Yes, I did.  Now I realize that my quoted part there was very poorly worded.

What I meant is that "I don't believe [you] were talking about query 
optimization".  I hope the rest of my message makes more sense to you in 
that light.


.m



Re: Fetch zero result rows when executing a query?

From
Marko Tiikkaja
Date:
On 2/4/15 12:27 PM, Andres Freund wrote:
> On 2015-02-04 12:23:51 +0100, Marko Tiikkaja wrote:
>> On 2/4/15 12:13 PM, Stephen R. van den Berg wrote:
>>> If you know beforehand the query might generate more than one row (SELECT)
>>> yet you also know that you are not interested in those, then maxrows=1
>>> is best; then again, modifying the query to include a LIMIT 1 is even
>>> better, in which case maxrows can be zero again.
>>
>> This seems to be a common pattern, and I think it's a *huge* mistake to
>> specify maxrows=1 and/or ignore rows after the first one in the driver
>> layer.  If the user says "give me the only row returned by this query", the
>> interface should check that only one row is in reality returned by the
>> query
>
> I don't think these are what this thread is about. It's about a UPDATE
> (=> no LIMIT) where the user uses a driver interface that doesn't return
> rows generated by the UPDATE (the above error check doesn't make sense).

No, this wasn't what OP was on about.  But I was merely responding to 
the quoted paragraph, which suggested that maxrows=1 would be something 
to consider for SELECT.  Which I really strongly believe is not, and I'm 
hoping we can eliminate it from all interfaces by 2025.

So slightly off-topic, for which I apologize.


.m



Re: Fetch zero result rows when executing a query?

From
"Stephen R. van den Berg"
Date:
Marko Tiikkaja wrote:
>On 2/4/15 12:13 PM, Stephen R. van den Berg wrote:
>>If you know beforehand the query might generate more than one row (SELECT)
>>yet you also know that you are not interested in those, then maxrows=1
>>is best; then again, modifying the query to include a LIMIT 1 is even
>>better, in which case maxrows can be zero again.

>This seems to be a common pattern, and I think it's a *huge* mistake
>to specify maxrows=1 and/or ignore rows after the first one in the
>driver layer.  If the user says "give me the only row returned by

I guess it depends on the amount of backpressure you can excert on
the sql "programmer".
If you are writing a driver which just has to run applications
written by third parties in the most efficient way, the outline
I gave above is best.
If the driver has a way to communicate with the one writing
the SQL, then giving warnings is better.

At the driver level, you generally have to assume that communicating
with the programmer is not possible anymore unless the driver
API allows for appropriate backpressure (and if this
results in numerous warnings, they still might go largely
unnoticed on production sites).
-- 
Stephen.



Re: Fetch zero result rows when executing a query?

From
Marko Tiikkaja
Date:
On 2/4/15 12:36 PM, Stephen R. van den Berg wrote:
> Marko Tiikkaja wrote:
>> This seems to be a common pattern, and I think it's a *huge* mistake
>> to specify maxrows=1 and/or ignore rows after the first one in the
>> driver layer.  If the user says "give me the only row returned by
>
> I guess it depends on the amount of backpressure you can excert on
> the sql "programmer".
> If you are writing a driver which just has to run applications
> written by third parties in the most efficient way, the outline
> I gave above is best.
> If the driver has a way to communicate with the one writing
> the SQL, then giving warnings is better.

Screw warnings, that should be an error.  "I tried to execute your 
query, but something went wrong."

Anyway, this is getting really off topic now, and that's my fault, so I 
should shut up.


.m



Re: Fetch zero result rows when executing a query?

From
David G Johnston
Date:
Shay Rojansky wrote
> The use case would be sending a query which might modify or might not
> (e.g.
> UPDATE), but we know that the user is uninterested in any result row.

How do you intend to gain this knowledge if the query doesn't structure
itself so that it does or does not return actual rows?  Can you give a real
concrete example for people to get their heads around?

David J.



--
View this message in context:
http://postgresql.nabble.com/Fetch-zero-result-rows-when-executing-a-query-tp5836537p5836740.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



Re: Fetch zero result rows when executing a query?

From
Shay Rojansky
Date:
Sorry everyone, I was unexpectedly very busy and couldn't respond earlier... My apologies.

I'll clarify just a little... I am indeed talking about the PostgreSQL network protocol, and not about query optimization (with LIMIT or omitting RETURNING etc.). I am implementing ADO.NET's ExecuteNonQuery, through which the user indicates they're not interested in any result rows whether those exist or not. For the situation where a user does ExecuteNonQuery but the query returns result rows, the driver can save the needless network transfers. We can definitely say it's the user's fault for providing a query with a resultset to ExecuteNonQuery, but we *do* have the user's clear intention that no rows be fetched so why not act on it. I agree this isn't a terribly important optimization, the trigger for this question was first and foremost curiosity: it seems strange the protocol allows you to specify max_rows for any value other than 0.

Here's a possible believable use-case which doesn't involve user neglect: imagine some server-side function which has side-effects and also returns some rows. In some situations the user is interested in the result rows, but in others they only want the side-effect. The user would probably have no control over the function, and their only way to *not* transfer the result rows would be with a mechanism such as max_rows.

Marko, regarding your general criticism of max_rows:

> This seems to be a common pattern, and I think it's a *huge* mistake to specify maxrows=1 and/or ignore rows after the first one in
> the driver layer.  If the user says "give me the only row returned by this query", the interface should check that only one row is in
> reality returned by the query.  If the query returns more than one row, the user made a mistake in formulating the query and she
> probably wants to know about it.  If she genuinely doesn't care about the rows after the first one, she can always specify LIMIT 1.

> For a sad example, look at PL/PgSQL's  SELECT .. INTO ..; it's not terribly difficult to write a query which returns more than one
> row *by mistake* and have something really bad happen later on since it went undetected during testing because you just
> happened to get the expected row back first.  And when you do want to specifically enforce it for e.g. security critical code,
> you have to resort to really ugly hacks like window functions.

There are some problems with what you say... First, the ADO.NET API provides a SingleRow API option which explicitly provides exactly this. This API option doesn't at all mean that there *should* be only one row (i.e. an error should be raised if otherwise), but simply that any other rows beyond the first should be discarded. So regardless of what we think best practices are on this, this behavior is mandated/specified by a major API.

More to the point, doesn't max_rows=1 have exactly the same dangers as LIMIT 1? The two seem to be identical, except that one is expressed in the SQL query and the other at the network protocol level. The way I see it, if the user specifies one of them without specifying ORDER BY, they are explicitly saying they don't care which row comes out. And if their testing code fails because this is wrong, then they've made a mistake - IMHo this isn't a reason to kill the entire feature.

In general, in my view it's beneficial to separate between the SQL queries and the features that the driver is supposed to provide in its API. The SQL may be written or managed by one entity, reused in many places (some of which want all rows and others which want only 1).

Re: Fetch zero result rows when executing a query?

From
David G Johnston
Date:
On Sat, Feb 7, 2015 at 10:41 AM, Shay Rojansky [via PostgreSQL] <[hidden email]> wrote:
Sorry everyone, I was unexpectedly very busy and couldn't respond earlier... My apologies.

I'll clarify just a little... I am indeed talking about the PostgreSQL network protocol, and not about query optimization (with LIMIT or omitting RETURNING etc.). I am implementing ADO.NET's ExecuteNonQuery
​ ​
through which the user indicates they're not interested in any result rows whether those exist or not.

​​ExecuteNonQuery returns an integer while row-returning queries ​do not.  I'd argue that the API states that the user is declaring that the query they are executing does not return any actual rows - just a count of affected rows - not that they do not care to see what rows are returned.
 
For the situation where a user does ExecuteNonQuery but the query returns result rows, the driver can save the needless network transfers. We can definitely say it's the user's fault for providing a query with a resultset to ExecuteNonQuery, but we *do* have the user's clear intention that no rows be fetched so why not act on it. I agree this isn't a terribly important optimization, the trigger for this question was first and foremost curiosity: it seems strange the protocol allows you to specify max_rows for any value other than 0.

​Yes, it does seem strange and, like Marko said, ideally would be deprecated.  The fact that it cannot handle "zero rows" seems like an unnecessary limitation and I cannot image that any values other than 0 and all would be of practical usage.  In the case of zero returning instead the number of rows would be more useful than simply refusing to return anything so even if something like this is needed the current implementation is flawed.


Here's a possible believable use-case which doesn't involve user neglect: imagine some server-side function which has side-effects and also returns some rows. In some situations the user is interested in the result rows, but in others they only want the side-effect. The user would probably have no control over the function, and their only way to *not* transfer the result rows would be with a mechanism such as max_rows.

​Functions always return rows and so should not be executed using "ExecuteNonQuery".  In most cases action-oriented functions return a single result-status row so ignoring that row, while likely not advisable, is not exactly expensive.​


Marko, regarding your general criticism of max_rows:

> This seems to be a common pattern, and I think it's a *huge* mistake to specify maxrows=1 and/or ignore rows after the first one in
> the driver layer.  If the user says "give me the only row returned by this query", the interface should check that only one row is in
> reality returned by the query.  If the query returns more than one row, the user made a mistake in formulating the query and she
> probably wants to know about it.  If she genuinely doesn't care about the rows after the first one, she can always specify LIMIT 1.

> For a sad example, look at PL/PgSQL's  SELECT .. INTO ..; it's not terribly difficult to write a query which returns more than one
> row *by mistake* and have something really bad happen later on since it went undetected during testing because you just
> happened to get the expected row back first.  And when you do want to specifically enforce it for e.g. security critical code,
> you have to resort to really ugly hacks like window functions.

There are some problems with what you say... First, the ADO.NET API provides a SingleRow API option which explicitly provides exactly this. This API option doesn't at all mean that there *should* be only one row (i.e. an error should be raised if otherwise), but simply that any other rows beyond the first should be discarded. So regardless of what we think best practices are on this, this behavior is mandated/specified by a major API. 

​"​The resultset may contain multiple rows, which are ignored by ExecuteScalar"; so ignore them.

The basic question here becomes - the executor already must generate, in memory, all of the rows so is there a way to properly interact with the server where you can request the number of rows that were generated but not be obligated to actually pull them down to the client.  This doesn't seem like an unreasonable request but assuming that it is not currently possible (of which I have little clue) then the question becomes who cares enough to design and implement such a protocol enhancement.
 

More to the point, doesn't max_rows=1 have exactly the same dangers as LIMIT 1? The two seem to be identical, except that one is expressed in the SQL query and the other at the network protocol level.

​The planner does not have access to network protocol level​ options while it does know about LIMIT.

 
The way I see it, if the user specifies one of them without specifying ORDER BY, they are explicitly saying they don't care which row comes out. And if their testing code fails because this is wrong, then they've made a mistake - IMHo this isn't a reason to kill the entire feature.

In general, in my view it's beneficial to separate between the SQL queries and the features that the driver is supposed to provide in its API. The SQL may be written or managed by one entity, reused in many places (some of which want all rows and others which want only 1).


​Then the driver writers that need these special API behaviors are reasonably expected to contribute to adding them to backend products that do not already have them.  The database developers are not going to take on responsibility for the API decisions of others; and features deemed (or that in reality are) of marginal usefulness are likely to be omitted - intentionally or otherwise - from the official (in this case libpq) protocol.

Expecting users to use an API without knowledge or control of the SQL that is being executed seems like a stretch to me.  Expecting the driver to simply provide an easy way to access data from the common SQL idioms a user might use seems like a reasonable goal and puts leaves the smarts in the purvue of the planner.  The optimization you require doesn't seem unreasonable but also doesn't seem especially compelling - nor matter how many people might be using ADO.NET (which provides no indication that they are trying to use APIs that are incompatible with the queries that they are sending.

David J.




View this message in context: Re: Fetch zero result rows when executing a query?
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: Fetch zero result rows when executing a query?

From
Shay Rojansky
Date:
First a general comment:

> Then the driver writers that need these special API behaviors are
> reasonably expected to contribute to adding them to backend products that
> do not already have them.  The database developers are not going to take on
> responsibility for the API decisions of others; and features deemed (or
> that in reality are) of marginal usefulness are likely to be omitted -
> intentionally or otherwise - from the official (in this case libpq)
> protocol.

I absolutely agree with you there, I'm not trying to get anybody to implement
something I need (i.e. fetch 0 rows). This is more of a general discussion as
to whether that feature *makes sense* to you as a protocol feature (which doesn't
seem to be the case, as some of you guys want to deprecate the whole max_rows
thing).

>> I'll clarify just a little... I am indeed talking about the PostgreSQL
>> network protocol, and not about query optimization (with LIMIT or omitting
>> RETURNING etc.). I am implementing ADO.NET's ExecuteNonQuery
>> through which the user indicates they're not interested in any result rows
>> whether those exist or not.

> ExecuteNonQuery returns an integer while row-returning queries do not.
> I'd argue that the API states that the user is declaring that the query
> they are executing does not return any actual rows - just a count of
> affected rows - not that they do not care to see what rows are returned.

That's true. IMHO the count of affected rows isn't relevant to this discussion
so I didn't mention it.

>> For the situation where a user does ExecuteNonQuery but the query returns
>> result rows, the driver can save the needless network transfers. We can
>> definitely say it's the user's fault for providing a query with a resultset
>> to ExecuteNonQuery, but we *do* have the user's clear intention that no
>> rows be fetched so why not act on it. I agree this isn't a terribly
>> important optimization, the trigger for this question was first and
>> foremost curiosity: it seems strange the protocol allows you to specify
>> max_rows for any value other than 0.

> Yes, it does seem strange and, like Marko said, ideally would be
> deprecated.  The fact that it cannot handle "zero rows" seems like an
> unnecessary limitation and I cannot image that any values other than 0 and
> all would be of practical usage.  In the case of zero returning instead the
> number of rows would be more useful than simply refusing to return anything
> so even if something like this is needed the current implementation is
> flawed.

Just to be precise: what is strange to me is that the max_rows feature exists
but has no 0 value. You and Marko are arguing that the whole feature should be
deprecated (i.e. always return all rows).

>> Here's a possible believable use-case which doesn't involve user neglect:
>> imagine some server-side function which has side-effects and also returns
>> some rows. In some situations the user is interested in the result rows,
>> but in others they only want the side-effect. The user would probably have
>> no control over the function, and their only way to *not* transfer the
>> result rows would be with a mechanism such as max_rows.

> Functions always return rows and so should not be executed using
> "ExecuteNonQuery".  In most cases action-oriented functions return a single
> result-status row so ignoring that row, while likely not advisable, is not
> exactly expensive.

Your description of functions doesn't hold for all functions, this is why I
tried to provide a usecase. It is possible for some function to both have a
side-effect (i.e. modify some table) *and* return a large number of rows. It
may be legitimate for a user to want to have the side-effect but not care
about the rows.  Ignoring one row isn't expensive, ignoring many could be.

> The basic question here becomes - the executor already must generate, in
> memory, all of the rows so is there a way to properly interact with the
> server where you can request the number of rows that were generated but not
> be obligated to actually pull them down to the client.  This doesn't seem
> like an unreasonable request but assuming that it is not currently possible
> (of which I have little clue) then the question becomes who cares enough to
> design and implement such a protocol enhancement.

OK.

>> More to the point, doesn't max_rows=1 have exactly the same dangers as
>> LIMIT 1? The two seem to be identical, except that one is expressed in the
>> SQL query and the other at the network protocol level.

> The planner does not have access to network protocol level? options while
> it does know about LIMIT.

That's an internal PostgreSQL matter (which granted, may impact efficiency).
My comment about max_rows being equivalent to LIMIT was meant to address Marko's
argument that max_rows is dangerous because any row might come out and tests
may pass accidentally (but that holds for LIMIT 1 as well, doesn't it).

> Expecting users to use an API without knowledge or control of the SQL that
> is being executed seems like a stretch to me.  Expecting the driver to
> simply provide an easy way to access data from the common SQL idioms a user
> might use seems like a reasonable goal and puts leaves the smarts in the
> purvue of the planner.  The optimization you require doesn't seem
> unreasonable but also doesn't seem especially compelling - nor matter how
> many people might be using ADO.NET (which provides no indication that they
> are trying to use APIs that are incompatible with the queries that they are
> sending.

Fair enough, I don't disagree with the above. The idea is less "no knowledge or
control of the SQL", but rather a complicated prepared statement that is executed
in some places to fetch all rows, and in others to fetch only 1 (maybe this is the
most compelling usecase).

Re: Fetch zero result rows when executing a query?

From
Marko Tiikkaja
Date:
On 2015-02-08 09:56, Shay Rojansky wrote:
>>> More to the point, doesn't max_rows=1 have exactly the same dangers as
>>> LIMIT 1? The two seem to be identical, except that one is expressed in
> the
>>> SQL query and the other at the network protocol level.
>
>> The planner does not have access to network protocol level? options while
>> it does know about LIMIT.
>
> That's an internal PostgreSQL matter (which granted, may impact efficiency).
> My comment about max_rows being equivalent to LIMIT was meant to address
> Marko's
> argument that max_rows is dangerous because any row might come out and tests
> may pass accidentally (but that holds for LIMIT 1 as well, doesn't it).

The point is that then the user gets to choose the behavior.  LIMIT 1 
without ORDER BY is very explicitly telling the reader of the code 
"there might be more than one row returned by this query, but I'm okay 
with getting only one of them, whichever it is".  And when the  LIMIT 1  is *not* there, you get the driver
automaticallychecking your queries 
 
for sanity.  If the driver always throws away the rows after the first 
one, it's difficult to go to behavior of enforcing that no more than one 
row was returned.

Anyway, like you said somewhere upthread, the interface the driver 
you're working on promises to implement right now can't be changed due 
to backwards compatibility concerns.  But I see new interfaces being 
created all the time, and they all make this same mistake.


.m



Re: Fetch zero result rows when executing a query?

From
Robert Haas
Date:
On Sun, Feb 8, 2015 at 3:56 AM, Shay Rojansky <roji@roji.org> wrote:
> Just to be precise: what is strange to me is that the max_rows feature
> exists
> but has no 0 value. You and Marko are arguing that the whole feature should
> be
> deprecated (i.e. always return all rows).

I think the fact that it has no zero value is probably just a
historical accident; most likely, whoever designed it originally
(probably twenty years ago) didn't think about queries with
side-effects and therefore didn't consider that wanting 0 rows would
ever be sensible.  Meanwhile, a sentinel value was needed to request
all rows, so they used 0.  If they'd thought of it, they might have
picked -1 and we'd not be having this discussion.

FWIW, I'm in complete agreement that it would be good if we had this
feature.  I believe this is not the first report we've had of
PostgreSQL doing things in ways that mesh nicely with standardized
driver interfaces.  Whether we think those interfaces are
well-designed or not, they are standardized.  When people use $OTHERDB
and have a really great driver, and then they move to PostgreSQL and
get one with more warts, it does not encourage them to stick with
PostgreSQL.

.NET is not some fringe user community that we can dismiss as
irrelevant.  We need users of all languages to want to use PostgreSQL,
not just users of languages any one of us happens to personally like.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Fetch zero result rows when executing a query?

From
Shay Rojansky
Date:
Thanks for understanding Robert, that's more or less what I had in mind, I was mainly wondering if I were missing some deeper explanation for the absence of the possibility of requesting 0 rows.

Regardless of all of the above, it's really no big deal. I'll go ahead and use max_rows=1 for now, hopefully you guys don't decide to deprecate it.

Shay

On Tue, Feb 10, 2015 at 3:00 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Feb 8, 2015 at 3:56 AM, Shay Rojansky <roji@roji.org> wrote:
> Just to be precise: what is strange to me is that the max_rows feature
> exists
> but has no 0 value. You and Marko are arguing that the whole feature should
> be
> deprecated (i.e. always return all rows).

I think the fact that it has no zero value is probably just a
historical accident; most likely, whoever designed it originally
(probably twenty years ago) didn't think about queries with
side-effects and therefore didn't consider that wanting 0 rows would
ever be sensible.  Meanwhile, a sentinel value was needed to request
all rows, so they used 0.  If they'd thought of it, they might have
picked -1 and we'd not be having this discussion.

FWIW, I'm in complete agreement that it would be good if we had this
feature.  I believe this is not the first report we've had of
PostgreSQL doing things in ways that mesh nicely with standardized
driver interfaces.  Whether we think those interfaces are
well-designed or not, they are standardized.  When people use $OTHERDB
and have a really great driver, and then they move to PostgreSQL and
get one with more warts, it does not encourage them to stick with
PostgreSQL.

.NET is not some fringe user community that we can dismiss as
irrelevant.  We need users of all languages to want to use PostgreSQL,
not just users of languages any one of us happens to personally like.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Fetch zero result rows when executing a query?

From
Shay Rojansky
Date:
Sorry to revive this thread, I just had one additional thought...

To those advocating the deprecation of the max_rows parameter of Execute, there's another argument to consider. max_rows isn't just there in order to fetch, say, a single row of the result set and discard the rest (which is what I originally asked about). There's also the function of retrieving the resultset in chunks: getting 5 rows, then 10, etc. etc. Deprecating max_rows would leave the user/driver only with the option of retrieving the entire resultset in one go (for example, no option for the interleaving of rows from several resultsets). And the lack of the ability to execute and retrieve 0 rows hurts this scenario as well.

Just wanted to put it out there as another argument against deprecation.

On Wed, Feb 11, 2015 at 2:05 AM, Shay Rojansky <roji@roji.org> wrote:
Thanks for understanding Robert, that's more or less what I had in mind, I was mainly wondering if I were missing some deeper explanation for the absence of the possibility of requesting 0 rows.

Regardless of all of the above, it's really no big deal. I'll go ahead and use max_rows=1 for now, hopefully you guys don't decide to deprecate it.

Shay

On Tue, Feb 10, 2015 at 3:00 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Feb 8, 2015 at 3:56 AM, Shay Rojansky <roji@roji.org> wrote:
> Just to be precise: what is strange to me is that the max_rows feature
> exists
> but has no 0 value. You and Marko are arguing that the whole feature should
> be
> deprecated (i.e. always return all rows).

I think the fact that it has no zero value is probably just a
historical accident; most likely, whoever designed it originally
(probably twenty years ago) didn't think about queries with
side-effects and therefore didn't consider that wanting 0 rows would
ever be sensible.  Meanwhile, a sentinel value was needed to request
all rows, so they used 0.  If they'd thought of it, they might have
picked -1 and we'd not be having this discussion.

FWIW, I'm in complete agreement that it would be good if we had this
feature.  I believe this is not the first report we've had of
PostgreSQL doing things in ways that mesh nicely with standardized
driver interfaces.  Whether we think those interfaces are
well-designed or not, they are standardized.  When people use $OTHERDB
and have a really great driver, and then they move to PostgreSQL and
get one with more warts, it does not encourage them to stick with
PostgreSQL.

.NET is not some fringe user community that we can dismiss as
irrelevant.  We need users of all languages to want to use PostgreSQL,
not just users of languages any one of us happens to personally like.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company