Re: Fetch zero result rows when executing a query? - Mailing list pgsql-hackers
From | Shay Rojansky |
---|---|
Subject | Re: Fetch zero result rows when executing a query? |
Date | |
Msg-id | CADT4RqBdKTWR8gmpO509Bwonhxfu3RU=Kp1aRz39kwkR8BkFow@mail.gmail.com Whole thread Raw |
In response to | Fetch zero result rows when executing a query? (Shay Rojansky <roji@roji.org>) |
Responses |
Re: Fetch zero result rows when executing a query?
(Marko Tiikkaja <marko@joh.to>)
Re: Fetch zero result rows when executing a query? (Robert Haas <robertmhaas@gmail.com>) |
List | pgsql-hackers |
First a general comment:
> Then the driver writers that need these special API behaviors are
> reasonably expected to contribute to adding them to backend products that
> do not already have them. The database developers are not going to take on
> responsibility for the API decisions of others; and features deemed (or
> that in reality are) of marginal usefulness are likely to be omitted -
> intentionally or otherwise - from the official (in this case libpq)
> protocol.
I absolutely agree with you there, I'm not trying to get anybody to implement
something I need (i.e. fetch 0 rows). This is more of a general discussion as
to whether that feature *makes sense* to you as a protocol feature (which doesn't
seem to be the case, as some of you guys want to deprecate the whole max_rows
thing).
>> I'll clarify just a little... I am indeed talking about the PostgreSQL
>> network protocol, and not about query optimization (with LIMIT or omitting
>> RETURNING etc.). I am implementing ADO.NET's ExecuteNonQuery
>> through which the user indicates they're not interested in any result rows
>> whether those exist or not.
> ExecuteNonQuery returns an integer while row-returning queries do not.
> I'd argue that the API states that the user is declaring that the query
> they are executing does not return any actual rows - just a count of
> affected rows - not that they do not care to see what rows are returned.
That's true. IMHO the count of affected rows isn't relevant to this discussion
so I didn't mention it.
>> For the situation where a user does ExecuteNonQuery but the query returns
>> result rows, the driver can save the needless network transfers. We can
>> definitely say it's the user's fault for providing a query with a resultset
>> to ExecuteNonQuery, but we *do* have the user's clear intention that no
>> rows be fetched so why not act on it. I agree this isn't a terribly
>> important optimization, the trigger for this question was first and
>> foremost curiosity: it seems strange the protocol allows you to specify
>> max_rows for any value other than 0.
> Yes, it does seem strange and, like Marko said, ideally would be
> deprecated. The fact that it cannot handle "zero rows" seems like an
> unnecessary limitation and I cannot image that any values other than 0 and
> all would be of practical usage. In the case of zero returning instead the
> number of rows would be more useful than simply refusing to return anything
> so even if something like this is needed the current implementation is
> flawed.
Just to be precise: what is strange to me is that the max_rows feature exists
but has no 0 value. You and Marko are arguing that the whole feature should be
deprecated (i.e. always return all rows).
>> Here's a possible believable use-case which doesn't involve user neglect:
>> imagine some server-side function which has side-effects and also returns
>> some rows. In some situations the user is interested in the result rows,
>> but in others they only want the side-effect. The user would probably have
>> no control over the function, and their only way to *not* transfer the
>> result rows would be with a mechanism such as max_rows.
> Functions always return rows and so should not be executed using
> "ExecuteNonQuery". In most cases action-oriented functions return a single
> result-status row so ignoring that row, while likely not advisable, is not
> exactly expensive.
Your description of functions doesn't hold for all functions, this is why I
tried to provide a usecase. It is possible for some function to both have a
side-effect (i.e. modify some table) *and* return a large number of rows. It
may be legitimate for a user to want to have the side-effect but not care
about the rows. Ignoring one row isn't expensive, ignoring many could be.
> The basic question here becomes - the executor already must generate, in
> memory, all of the rows so is there a way to properly interact with the
> server where you can request the number of rows that were generated but not
> be obligated to actually pull them down to the client. This doesn't seem
> like an unreasonable request but assuming that it is not currently possible
> (of which I have little clue) then the question becomes who cares enough to
> design and implement such a protocol enhancement.
OK.
>> More to the point, doesn't max_rows=1 have exactly the same dangers as
>> LIMIT 1? The two seem to be identical, except that one is expressed in the
>> SQL query and the other at the network protocol level.
> The planner does not have access to network protocol level? options while
> it does know about LIMIT.
That's an internal PostgreSQL matter (which granted, may impact efficiency).
My comment about max_rows being equivalent to LIMIT was meant to address Marko's
argument that max_rows is dangerous because any row might come out and tests
may pass accidentally (but that holds for LIMIT 1 as well, doesn't it).
> Expecting users to use an API without knowledge or control of the SQL that
> is being executed seems like a stretch to me. Expecting the driver to
> simply provide an easy way to access data from the common SQL idioms a user
> might use seems like a reasonable goal and puts leaves the smarts in the
> purvue of the planner. The optimization you require doesn't seem
> unreasonable but also doesn't seem especially compelling - nor matter how
> many people might be using ADO.NET (which provides no indication that they
> are trying to use APIs that are incompatible with the queries that they are
> sending.
Fair enough, I don't disagree with the above. The idea is less "no knowledge or
control of the SQL", but rather a complicated prepared statement that is executed
in some places to fetch all rows, and in others to fetch only 1 (maybe this is the
most compelling usecase).
pgsql-hackers by date: