Thread: Are PostgreSQL functions that return sets or tables evaluated lazilyor eagerly?

Are PostgreSQL functions that return sets or tables evaluated lazilyor eagerly?

From
Gerald Britton
Date:
Cross post from Stack Overflow:  https://stackoverflow.com/questions/59554144/are-postgresql-functions-that-return-sets-or-tables-evaluated-lazily-or-eagerly 

I'm learning to write functions in PostgreSQL. When I got to the documentation chapter on cursors, I came across this interesting comment:

A more interesting usage is to return a reference to a cursor that a function has created, allowing the caller to read the rows. This provides an efficient way to return large row sets from functions.

Near the top of this page: 42.7. Cursors

That made me wonder where, specifically, this would be more efficient than a plain old function call.

I made up a little test function and call:

CREATE FUNCTION foo() RETURNS SETOF customers 
LANGUAGE SQL AS $$    SELECT c.* FROM customers c   CROSS JOIN customers x   CROSS JOIN customers y;
$$;

SELECT * FROM foo() LIMIT 1;

The customers table I'm working with has 20,000 rows so with the cross joins that should be 8e+12 rows (which would take a while to fully evaluate!). The select statement at the end appears to confirm that the function is reading all rows (I had to cancel it after several seconds -- way more than to just return the first row)

That leads me to ask:

If (and under what circumstances) PostgreSQL evaluates functions lazily (returning rows as requested by the caller) or eagerly (evaluation all rows before returning the first one)?


--
Gerald Britton, MCSE-DP, MVP
LinkedIn Profile: http://ca.linkedin.com/in/geraldbritton
Gerald Britton <gerald.britton@gmail.com> writes:
> I'm learning to write functions in PostgreSQL. When I got to the
> documentation chapter on cursors, I came across this interesting comment:
> A more interesting usage is to return a reference to a cursor that a
> function has created, allowing the caller to read the rows. This provides
> an efficient way to return large row sets from functions.

This is more or less unrelated to your question, or at least to your
example, because you didn't use a cursor.

> I made up a little test function and call:

> CREATE FUNCTION foo() RETURNS SETOF customers
> LANGUAGE SQL AS $$
>     SELECT c.* FROM customers c
>     CROSS JOIN customers x
>     CROSS JOIN customers y;$$;
> SELECT * FROM foo() LIMIT 1;

At the moment, set-returning functions in FROM are generally evaluated
"eagerly", ie run to completion before the calling query does anything
else.  There's been some work recently towards improving that, but it's
not done yet.

This particular example could be improved, because SQL-language functions
can potentially be inlined.  Yours wasn't because it's volatile by
default, and for somewhat arcane semantic reasons we won't inline
volatiles.  But if you were to mark that function stable it'd get
inlined, and then it'd respond promptly to the LIMIT 1.

Another point is that the rules are different for set-returning functions
in a targetlist.  For example,

select foo() limit 1;

is fast with your example despite the lack of inlining.  That's because
that case actually is done row-by-row, with the function yielding control
after each row.  So we have the infrastructure for that, it just hasn't
been exploited in the function-in-FROM case.

And, if you weren't confused yet: all of this depends on the particular
PL that you're considering.  Of the PLs included with core PG, I think
only SQL-language functions can do the above --- the other ones are
strictly execute-to-completion.  I don't know offhand about third-party
PLs.

            regards, tom lane



>>>>> "Gerald" == Gerald Britton <gerald.britton@gmail.com> writes:

 Gerald> That leads me to ask:

 Gerald> If (and under what circumstances) PostgreSQL evaluates
 Gerald> functions lazily (returning rows as requested by the caller) or
 Gerald> eagerly (evaluation all rows before returning the first one)?

This isn't trivial to answer because it depends significantly on the
language the function is written in and how it was called. The first
matching rule below controls what happens.


1. LANGUAGE SQL with inlining

Table functions in language SQL are candidates for inlining, see
https://wiki.postgresql.org/wiki/Inlining_of_SQL_functions

If an SQL-language function is inlined, then it behaves exactly as
though the function body was written inline, which means it is evaluated
as lazily as the query plan otherwise permits (for example, if there's a
non-indexable ORDER BY clause, then clearly all the values have to be
fetched before any are returned).


2. Table function called in the FROM clause

Table functions in the FROM clause, e.g. SELECT ... FROM myfunc();
are always evaluated eagerly.


3. LANGUAGE SQL without inlining, in the select-list

If the final statement of an SQL function is a plain select with no
wCTEs, then it is evaluated lazily: the first fetch will execute
everything up to the first row of the final select, and subsequently one
row will be fetched at a time. If the final statement is a DML statement
with a RETURNING clause, or contains wCTEs, then it is evaluated
eagerly.


4. LANGUAGE C / INTERNAL

C-language functions (and therefore internal functions too) can choose
whether to use value-per-call mode or materialize mode. Materialize mode
is always "eager", but value-per-call mode is sometimes still eager (as
in case 2 above); it can only be lazy if no preceding rule forced it to
be otherwise.

Most built-in table functions use value-per-call mode (a notable
exception being the functions in the tablefunc module).


5. LANGUAGE PLPGSQL, PLPERL, PLTCL

Plpgsql, plperl, and pltcl functions are always evaluated eagerly.


6. LANGUAGE plpython

Plpython functions that return an iterator run in value-per-call mode,
with a "next" call on the iterator for each row. To what extent this is
a lazy or eager evaluation depends on the python code.


7. Other PL languages

For non-core PL languages the documentation or source code may indicate
whether the language uses materialize mode or value-per-call mode. (Most
languages are probably not well equipped to do value-per-call mode. One
that does allow it is pl/lua, which runs table functions as coroutines.)


-- 
Andrew (irc:RhodiumToad)



Thank you all for the detailed explanations.  I think the most disappointing is this bit:

2. Table function called in the FROM clause
Table functions in the FROM clause, e.g. SELECT ... FROM myfunc();
are always evaluated eagerly.  

Which more or less matches my toy example.  OTOH Tom mentioned that marking the function STABLE effectively makes it lazy since it is inlined and my testing confirms that for SQL language functions.  Alas not for PL/pgSQL functions, though.

Back to where I started in my top post:  I became interested in this due to the doc note on returning a cursor and that it can be an efficient way to handle large result sets.  I suppose that implies lazy evaluation.  Does that mean that if I need plpgsql for a function for he language's power yet want the results to be returned lazily, a cursor is the (only?) way to go?

On Fri, Jan 3, 2020 at 10:07 PM Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
>>>>> "Gerald" == Gerald Britton <gerald.britton@gmail.com> writes:

 Gerald> That leads me to ask:

 Gerald> If (and under what circumstances) PostgreSQL evaluates
 Gerald> functions lazily (returning rows as requested by the caller) or
 Gerald> eagerly (evaluation all rows before returning the first one)?

This isn't trivial to answer because it depends significantly on the
language the function is written in and how it was called. The first
matching rule below controls what happens.


1. LANGUAGE SQL with inlining

Table functions in language SQL are candidates for inlining, see
https://wiki.postgresql.org/wiki/Inlining_of_SQL_functions

If an SQL-language function is inlined, then it behaves exactly as
though the function body was written inline, which means it is evaluated
as lazily as the query plan otherwise permits (for example, if there's a
non-indexable ORDER BY clause, then clearly all the values have to be
fetched before any are returned).


2. Table function called in the FROM clause

Table functions in the FROM clause, e.g. SELECT ... FROM myfunc();
are always evaluated eagerly.


3. LANGUAGE SQL without inlining, in the select-list

If the final statement of an SQL function is a plain select with no
wCTEs, then it is evaluated lazily: the first fetch will execute
everything up to the first row of the final select, and subsequently one
row will be fetched at a time. If the final statement is a DML statement
with a RETURNING clause, or contains wCTEs, then it is evaluated
eagerly.


4. LANGUAGE C / INTERNAL

C-language functions (and therefore internal functions too) can choose
whether to use value-per-call mode or materialize mode. Materialize mode
is always "eager", but value-per-call mode is sometimes still eager (as
in case 2 above); it can only be lazy if no preceding rule forced it to
be otherwise.

Most built-in table functions use value-per-call mode (a notable
exception being the functions in the tablefunc module).


5. LANGUAGE PLPGSQL, PLPERL, PLTCL

Plpgsql, plperl, and pltcl functions are always evaluated eagerly.


6. LANGUAGE plpython

Plpython functions that return an iterator run in value-per-call mode,
with a "next" call on the iterator for each row. To what extent this is
a lazy or eager evaluation depends on the python code.


7. Other PL languages

For non-core PL languages the documentation or source code may indicate
whether the language uses materialize mode or value-per-call mode. (Most
languages are probably not well equipped to do value-per-call mode. One
that does allow it is pl/lua, which runs table functions as coroutines.)


--
Andrew (irc:RhodiumToad)


--
Gerald Britton, MCSE-DP, MVP
LinkedIn Profile: http://ca.linkedin.com/in/geraldbritton
Gerald Britton <gerald.britton@gmail.com> writes:
> Back to where I started in my top post:  I became interested in this due to
> the doc note on returning a cursor and that it can be an efficient way to
> handle large result sets.  I suppose that implies lazy evaluation.  Does
> that mean that if I need plpgsql for a function for he language's power yet
> want the results to be returned lazily, a cursor is the (only?) way to go?

Nope.  The docs' reference to a cursor only suggests that if you can
express the function's result as a single SQL query, then opening a
cursor for that query and returning the cursor name will work.  But
if you need plpgsql to express the computation, that's not a terribly
helpful suggestion.

If you'd like to see some actual movement on the missing feature about
lazy evaluation in FROM, you could help test/review the pending patch
about it:

https://commitfest.postgresql.org/26/2372/

However, that still is only half of the problem, because you also need
a PL that is prepared to cooperate, which I don't believe plpgsql is.
I think (might be wrong) that a plpython function using "yield" can
be made to compute its results lazily.

            regards, tom lane



Love to see "return next" work like python yield! Anyone working on that?

On Sun, Jan 5, 2020, 5:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Gerald Britton <gerald.britton@gmail.com> writes:
> Back to where I started in my top post:  I became interested in this due to
> the doc note on returning a cursor and that it can be an efficient way to
> handle large result sets.  I suppose that implies lazy evaluation.  Does
> that mean that if I need plpgsql for a function for he language's power yet
> want the results to be returned lazily, a cursor is the (only?) way to go?

Nope.  The docs' reference to a cursor only suggests that if you can
express the function's result as a single SQL query, then opening a
cursor for that query and returning the cursor name will work.  But
if you need plpgsql to express the computation, that's not a terribly
helpful suggestion.

If you'd like to see some actual movement on the missing feature about
lazy evaluation in FROM, you could help test/review the pending patch
about it:

https://commitfest.postgresql.org/26/2372/

However, that still is only half of the problem, because you also need
a PL that is prepared to cooperate, which I don't believe plpgsql is.
I think (might be wrong) that a plpython function using "yield" can
be made to compute its results lazily.

                        regards, tom lane