Home > mailing lists

Re: Fetch zero result rows when executing a query? - Mailing list pgsql-hackers

From	Shay Rojansky
Subject	Re: Fetch zero result rows when executing a query?
Date	February 8, 2015 08:56:24
Msg-id	CADT4RqBdKTWR8gmpO509Bwonhxfu3RU=Kp1aRz39kwkR8BkFow@mail.gmail.com Whole thread
In response to	Fetch zero result rows when executing a query? (Shay Rojansky <roji@roji.org>)
Responses	Re: Fetch zero result rows when executing a query? Re: Fetch zero result rows when executing a query?
List	pgsql-hackers

Tree view

First a general comment:

> Then the driver writers that need these special API behaviors are

> reasonably expected to contribute to adding them to backend products that

> do not already have them. The database developers are not going to take on

> responsibility for the API decisions of others; and features deemed (or

> that in reality are) of marginal usefulness are likely to be omitted -

> intentionally or otherwise - from the official (in this case libpq)

> protocol.

I absolutely agree with you there, I'm not trying to get anybody to implement

something I need (i.e. fetch 0 rows). This is more of a general discussion as

to whether that feature *makes sense* to you as a protocol feature (which doesn't

seem to be the case, as some of you guys want to deprecate the whole max_rows

thing).

>> I'll clarify just a little... I am indeed talking about the PostgreSQL

>> network protocol, and not about query optimization (with LIMIT or omitting

>> RETURNING etc.). I am implementing ADO.NET's ExecuteNonQuery

>> through which the user indicates they're not interested in any result rows

>> whether those exist or not.

> ExecuteNonQuery returns an integer while row-returning queries do not.

> I'd argue that the API states that the user is declaring that the query

> they are executing does not return any actual rows - just a count of

> affected rows - not that they do not care to see what rows are returned.

That's true. IMHO the count of affected rows isn't relevant to this discussion

so I didn't mention it.

>> For the situation where a user does ExecuteNonQuery but the query returns

>> result rows, the driver can save the needless network transfers. We can

>> definitely say it's the user's fault for providing a query with a resultset

>> to ExecuteNonQuery, but we *do* have the user's clear intention that no

>> rows be fetched so why not act on it. I agree this isn't a terribly

>> important optimization, the trigger for this question was first and

>> foremost curiosity: it seems strange the protocol allows you to specify

>> max_rows for any value other than 0.

> Yes, it does seem strange and, like Marko said, ideally would be

> deprecated. The fact that it cannot handle "zero rows" seems like an

> unnecessary limitation and I cannot image that any values other than 0 and

> all would be of practical usage. In the case of zero returning instead the

> number of rows would be more useful than simply refusing to return anything

> so even if something like this is needed the current implementation is

> flawed.

Just to be precise: what is strange to me is that the max_rows feature exists

but has no 0 value. You and Marko are arguing that the whole feature should be

deprecated (i.e. always return all rows).

>> Here's a possible believable use-case which doesn't involve user neglect:

>> imagine some server-side function which has side-effects and also returns

>> some rows. In some situations the user is interested in the result rows,

>> but in others they only want the side-effect. The user would probably have

>> no control over the function, and their only way to *not* transfer the

>> result rows would be with a mechanism such as max_rows.

> Functions always return rows and so should not be executed using

> "ExecuteNonQuery". In most cases action-oriented functions return a single

> result-status row so ignoring that row, while likely not advisable, is not

> exactly expensive.

Your description of functions doesn't hold for all functions, this is why I

tried to provide a usecase. It is possible for some function to both have a

side-effect (i.e. modify some table) *and* return a large number of rows. It

may be legitimate for a user to want to have the side-effect but not care

about the rows. Ignoring one row isn't expensive, ignoring many could be.

> The basic question here becomes - the executor already must generate, in

> memory, all of the rows so is there a way to properly interact with the

> server where you can request the number of rows that were generated but not

> be obligated to actually pull them down to the client. This doesn't seem

> like an unreasonable request but assuming that it is not currently possible

> (of which I have little clue) then the question becomes who cares enough to

> design and implement such a protocol enhancement.

OK.

>> More to the point, doesn't max_rows=1 have exactly the same dangers as

>> LIMIT 1? The two seem to be identical, except that one is expressed in the

>> SQL query and the other at the network protocol level.

> The planner does not have access to network protocol level? options while

> it does know about LIMIT.

That's an internal PostgreSQL matter (which granted, may impact efficiency).

My comment about max_rows being equivalent to LIMIT was meant to address Marko's

argument that max_rows is dangerous because any row might come out and tests

may pass accidentally (but that holds for LIMIT 1 as well, doesn't it).

> Expecting users to use an API without knowledge or control of the SQL that

> is being executed seems like a stretch to me. Expecting the driver to

> simply provide an easy way to access data from the common SQL idioms a user

> might use seems like a reasonable goal and puts leaves the smarts in the

> purvue of the planner. The optimization you require doesn't seem

> unreasonable but also doesn't seem especially compelling - nor matter how

> many people might be using ADO.NET (which provides no indication that they

> are trying to use APIs that are incompatible with the queries that they are

> sending.

Fair enough, I don't disagree with the above. The idea is less "no knowledge or

control of the SQL", but rather a complicated prepared statement that is executed

in some places to fetch all rows, and in others to fetch only 1 (maybe this is the

most compelling usecase).

pgsql-hackers by date:

From: Michael Paquier
Date: 08 February 2015, 05:54:29
Subject: Re: Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

From: Marko Tiikkaja
Date: 08 February 2015, 09:17:41
Subject: Re: Fetch zero result rows when executing a query?

Re: Fetch zero result rows when executing a query? - Mailing list pgsql-hackers

Previous

Next