Thread: Are PostgreSQL functions that return sets or tables evaluated lazilyor eagerly?
Are PostgreSQL functions that return sets or tables evaluated lazilyor eagerly?
I'm learning to write functions in PostgreSQL. When I got to the documentation chapter on cursors, I came across this interesting comment:
A more interesting usage is to return a reference to a cursor that a function has created, allowing the caller to read the rows. This provides an efficient way to return large row sets from functions.
Near the top of this page: 42.7. Cursors
That made me wonder where, specifically, this would be more efficient than a plain old function call.
I made up a little test function and call:
CREATE FUNCTION foo() RETURNS SETOF customers
LANGUAGE SQL AS $$ SELECT c.* FROM customers c CROSS JOIN customers x CROSS JOIN customers y;
$$;
SELECT * FROM foo() LIMIT 1;
The customers table I'm working with has 20,000 rows so with the cross joins that should be 8e+12 rows (which would take a while to fully evaluate!). The select statement at the end appears to confirm that the function is reading all rows (I had to cancel it after several seconds -- way more than to just return the first row)
That leads me to ask:
If (and under what circumstances) PostgreSQL evaluates functions lazily (returning rows as requested by the caller) or eagerly (evaluation all rows before returning the first one)?
Re: Are PostgreSQL functions that return sets or tables evaluated lazily or eagerly?
Gerald Britton <gerald.britton@gmail.com> writes: > I'm learning to write functions in PostgreSQL. When I got to the > documentation chapter on cursors, I came across this interesting comment: > A more interesting usage is to return a reference to a cursor that a > function has created, allowing the caller to read the rows. This provides > an efficient way to return large row sets from functions. This is more or less unrelated to your question, or at least to your example, because you didn't use a cursor. > I made up a little test function and call: > CREATE FUNCTION foo() RETURNS SETOF customers > LANGUAGE SQL AS $$ > SELECT c.* FROM customers c > CROSS JOIN customers x > CROSS JOIN customers y;$$; > SELECT * FROM foo() LIMIT 1; At the moment, set-returning functions in FROM are generally evaluated "eagerly", ie run to completion before the calling query does anything else. There's been some work recently towards improving that, but it's not done yet. This particular example could be improved, because SQL-language functions can potentially be inlined. Yours wasn't because it's volatile by default, and for somewhat arcane semantic reasons we won't inline volatiles. But if you were to mark that function stable it'd get inlined, and then it'd respond promptly to the LIMIT 1. Another point is that the rules are different for set-returning functions in a targetlist. For example, select foo() limit 1; is fast with your example despite the lack of inlining. That's because that case actually is done row-by-row, with the function yielding control after each row. So we have the infrastructure for that, it just hasn't been exploited in the function-in-FROM case. And, if you weren't confused yet: all of this depends on the particular PL that you're considering. Of the PLs included with core PG, I think only SQL-language functions can do the above --- the other ones are strictly execute-to-completion. I don't know offhand about third-party PLs. regards, tom lane
Re: Are PostgreSQL functions that return sets or tables evaluated lazily or eagerly?
>>>>> "Gerald" == Gerald Britton <gerald.britton@gmail.com> writes: Gerald> That leads me to ask: Gerald> If (and under what circumstances) PostgreSQL evaluates Gerald> functions lazily (returning rows as requested by the caller) or Gerald> eagerly (evaluation all rows before returning the first one)? This isn't trivial to answer because it depends significantly on the language the function is written in and how it was called. The first matching rule below controls what happens. 1. LANGUAGE SQL with inlining Table functions in language SQL are candidates for inlining, see https://wiki.postgresql.org/wiki/Inlining_of_SQL_functions If an SQL-language function is inlined, then it behaves exactly as though the function body was written inline, which means it is evaluated as lazily as the query plan otherwise permits (for example, if there's a non-indexable ORDER BY clause, then clearly all the values have to be fetched before any are returned). 2. Table function called in the FROM clause Table functions in the FROM clause, e.g. SELECT ... FROM myfunc(); are always evaluated eagerly. 3. LANGUAGE SQL without inlining, in the select-list If the final statement of an SQL function is a plain select with no wCTEs, then it is evaluated lazily: the first fetch will execute everything up to the first row of the final select, and subsequently one row will be fetched at a time. If the final statement is a DML statement with a RETURNING clause, or contains wCTEs, then it is evaluated eagerly. 4. LANGUAGE C / INTERNAL C-language functions (and therefore internal functions too) can choose whether to use value-per-call mode or materialize mode. Materialize mode is always "eager", but value-per-call mode is sometimes still eager (as in case 2 above); it can only be lazy if no preceding rule forced it to be otherwise. Most built-in table functions use value-per-call mode (a notable exception being the functions in the tablefunc module). 5. LANGUAGE PLPGSQL, PLPERL, PLTCL Plpgsql, plperl, and pltcl functions are always evaluated eagerly. 6. LANGUAGE plpython Plpython functions that return an iterator run in value-per-call mode, with a "next" call on the iterator for each row. To what extent this is a lazy or eager evaluation depends on the python code. 7. Other PL languages For non-core PL languages the documentation or source code may indicate whether the language uses materialize mode or value-per-call mode. (Most languages are probably not well equipped to do value-per-call mode. One that does allow it is pl/lua, which runs table functions as coroutines.) -- Andrew (irc:RhodiumToad)
Re: Are PostgreSQL functions that return sets or tables evaluatedlazily or eagerly?
2. Table function called in the FROM clause
Table functions in the FROM clause, e.g. SELECT ... FROM myfunc();
are always evaluated eagerly.
>>>>> "Gerald" == Gerald Britton <gerald.britton@gmail.com> writes:
Gerald> That leads me to ask:
Gerald> If (and under what circumstances) PostgreSQL evaluates
Gerald> functions lazily (returning rows as requested by the caller) or
Gerald> eagerly (evaluation all rows before returning the first one)?
This isn't trivial to answer because it depends significantly on the
language the function is written in and how it was called. The first
matching rule below controls what happens.
1. LANGUAGE SQL with inlining
Table functions in language SQL are candidates for inlining, see
https://wiki.postgresql.org/wiki/Inlining_of_SQL_functions
If an SQL-language function is inlined, then it behaves exactly as
though the function body was written inline, which means it is evaluated
as lazily as the query plan otherwise permits (for example, if there's a
non-indexable ORDER BY clause, then clearly all the values have to be
fetched before any are returned).
2. Table function called in the FROM clause
Table functions in the FROM clause, e.g. SELECT ... FROM myfunc();
are always evaluated eagerly.
3. LANGUAGE SQL without inlining, in the select-list
If the final statement of an SQL function is a plain select with no
wCTEs, then it is evaluated lazily: the first fetch will execute
everything up to the first row of the final select, and subsequently one
row will be fetched at a time. If the final statement is a DML statement
with a RETURNING clause, or contains wCTEs, then it is evaluated
eagerly.
4. LANGUAGE C / INTERNAL
C-language functions (and therefore internal functions too) can choose
whether to use value-per-call mode or materialize mode. Materialize mode
is always "eager", but value-per-call mode is sometimes still eager (as
in case 2 above); it can only be lazy if no preceding rule forced it to
be otherwise.
Most built-in table functions use value-per-call mode (a notable
exception being the functions in the tablefunc module).
5. LANGUAGE PLPGSQL, PLPERL, PLTCL
Plpgsql, plperl, and pltcl functions are always evaluated eagerly.
6. LANGUAGE plpython
Plpython functions that return an iterator run in value-per-call mode,
with a "next" call on the iterator for each row. To what extent this is
a lazy or eager evaluation depends on the python code.
7. Other PL languages
For non-core PL languages the documentation or source code may indicate
whether the language uses materialize mode or value-per-call mode. (Most
languages are probably not well equipped to do value-per-call mode. One
that does allow it is pl/lua, which runs table functions as coroutines.)
--
Andrew (irc:RhodiumToad)
LinkedIn Profile: http://ca.linkedin.com/in/geraldbritton
Re: Are PostgreSQL functions that return sets or tables evaluated lazily or eagerly?
Gerald Britton <gerald.britton@gmail.com> writes: > Back to where I started in my top post: I became interested in this due to > the doc note on returning a cursor and that it can be an efficient way to > handle large result sets. I suppose that implies lazy evaluation. Does > that mean that if I need plpgsql for a function for he language's power yet > want the results to be returned lazily, a cursor is the (only?) way to go? Nope. The docs' reference to a cursor only suggests that if you can express the function's result as a single SQL query, then opening a cursor for that query and returning the cursor name will work. But if you need plpgsql to express the computation, that's not a terribly helpful suggestion. If you'd like to see some actual movement on the missing feature about lazy evaluation in FROM, you could help test/review the pending patch about it: https://commitfest.postgresql.org/26/2372/ However, that still is only half of the problem, because you also need a PL that is prepared to cooperate, which I don't believe plpgsql is. I think (might be wrong) that a plpython function using "yield" can be made to compute its results lazily. regards, tom lane
Re: Are PostgreSQL functions that return sets or tables evaluatedlazily or eagerly?
Gerald Britton <gerald.britton@gmail.com> writes:
> Back to where I started in my top post: I became interested in this due to
> the doc note on returning a cursor and that it can be an efficient way to
> handle large result sets. I suppose that implies lazy evaluation. Does
> that mean that if I need plpgsql for a function for he language's power yet
> want the results to be returned lazily, a cursor is the (only?) way to go?
Nope. The docs' reference to a cursor only suggests that if you can
express the function's result as a single SQL query, then opening a
cursor for that query and returning the cursor name will work. But
if you need plpgsql to express the computation, that's not a terribly
helpful suggestion.
If you'd like to see some actual movement on the missing feature about
lazy evaluation in FROM, you could help test/review the pending patch
about it:
https://commitfest.postgresql.org/26/2372/
However, that still is only half of the problem, because you also need
a PL that is prepared to cooperate, which I don't believe plpgsql is.
I think (might be wrong) that a plpython function using "yield" can
be made to compute its results lazily.
regards, tom lane