Thread: [RFC] Set Returning Functions

[RFC] Set Returning Functions

From
Joe Conway
Date:
I've been reading past threads, studying backend code, reviewing Alex 
Pilosov's "cursor foo" patch (submitted last August/September, but never 
applied), and conversing off list with a few people regarding a possible 
implementation of Set Returning Functions (or SRF for short). Below is 
my proposal for how this might work. After discussion, and if there is 
no objection, I would like to work on this implementation with the hope 
that it could be in place for 7.3.


Proposal for set returning functions (SRF):
-----------------------------------------------------

The problem:
-----------------------------------------------------
Currently the ability to return multiple row, multiple column result 
sets from a function is quite limited. In fact, it is not possible to 
return multiple columns directly. It is possible to work around this 
limitation, but only in a clumsy way (see contrib/dblink for an 
example). Alternatively refcursors may be used, but they have their own 
set of issues, not the least of which is they cannot be used in view 
definitions or exist outside of explicit transactions.


The feature:
-----------------------------------------------------
The desired feature is the ability to return multiple row, multiple 
column result sets from a function, or set returning functions (SRF) for 
short.


Do we want this feature?
-----------------------------------------------------
Based on the many posts on this topic, I think the answer to this is a 
resounding yes.


How do we want the feature to behave?
-----------------------------------------------------
A SRF should behave similarly to any other table_ref (RangeTblEntry), 
i.e. as a tuple source in a FROM clause. Currently there are three 
primary kinds of RangeTblEntry: RTE_RELATION (ordinary relation), 
RTE_SUBQUERY (subquery in FROM), and RTE_JOIN (join). SRF would join 
this list and behave in much the same manner.


How do we want the feature implemented? (my proposal)
-----------------------------------------------------
1. Add a new table_ref node type:   - Current nodes are RangeVar, RangeSubselect, or JoinExpr   - Add new RangePortal
nodeas a possible table_ref. The RangePortal     node will be extented from the current Portal functionality.
 

2. Add support for three modes of operation to RangePortal:  a. Repeated calls -- this is the existing API for SRF, but
   implemented as a tuple source instead of as an expression.  b. Materialized results -- use a TupleStore to
materializethe     result set.  c. Return query -- use current Portal functionality, fetch entire     result set.
 

3. Add support to allow the RangePortal to materialize modes 1 and 3, if 
needed for a re-read.

4. Add a WITH keyword to CREATE FUNCTION, allowing SRF mode to be 
specified. This would default to mode a) for backward compatibility.

5. Ignore the current code which allows functions to return multiple 
results as expressions; we can leave it there, but deprecate it with the 
intention of eventual removal.

-----------------------------------------------------
Thoughts/comments would be much appreciated.

Thanks,

Joe



Re: [RFC] Set Returning Functions

From
"Christopher Kings-Lynne"
Date:
> Do we want this feature?
> -----------------------------------------------------
> Based on the many posts on this topic, I think the answer to this is a
> resounding yes.

Definitely!

> How do we want the feature to behave?
> -----------------------------------------------------
> A SRF should behave similarly to any other table_ref (RangeTblEntry),
> i.e. as a tuple source in a FROM clause. Currently there are three
> primary kinds of RangeTblEntry: RTE_RELATION (ordinary relation),
> RTE_SUBQUERY (subquery in FROM), and RTE_JOIN (join). SRF would join
> this list and behave in much the same manner.

Yes - I don't see any point in adhering to the SQL standard lame definition.
We can just make "CALL proc()" map to "SELECT * FROM proc()" in the parser
for compliance.

> How do we want the feature implemented? (my proposal)
> -----------------------------------------------------
> 1. Add a new table_ref node type:
>     - Current nodes are RangeVar, RangeSubselect, or JoinExpr
>     - Add new RangePortal node as a possible table_ref. The RangePortal
>       node will be extented from the current Portal functionality.
>
> 2. Add support for three modes of operation to RangePortal:
>    a. Repeated calls -- this is the existing API for SRF, but
>       implemented as a tuple source instead of as an expression.
>    b. Materialized results -- use a TupleStore to materialize the
>       result set.
>    c. Return query -- use current Portal functionality, fetch entire
>       result set.
>
> 3. Add support to allow the RangePortal to materialize modes 1 and 3, if
> needed for a re-read.

Looks cool.  That's stuff outta my league tho.

> 4. Add a WITH keyword to CREATE FUNCTION, allowing SRF mode to be
> specified. This would default to mode a) for backward compatibility.

Interesting idea.  Didn't occur to me that we could specify it on a
per-function level.  How do Oracle and Firebird do it?  What about the issue
of people maybe wanting different behaviours at different times?  ie.
statement level, rather than function level?

> 5. Ignore the current code which allows functions to return multiple
> results as expressions; we can leave it there, but deprecate it with the
> intention of eventual removal.

What does the current 'setof' pl/pgsql business actually _do_?

Chris



Re: [RFC] Set Returning Functions

From
Tom Lane
Date:
"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes:
>> 5. Ignore the current code which allows functions to return multiple
>> results as expressions; we can leave it there, but deprecate it with the
>> intention of eventual removal.

> What does the current 'setof' pl/pgsql business actually _do_?

plpgsql doesn't handle setof at all, AFAIR.  SQL-language functions do.
The gold is hidden in src/backend/executor/*.c.  The SQL function
executor (functions.c) suspends the query plan for the function's final
SELECT, and re-executes it to get one more result row each time it's
re-called.  That's okay as far as it goes; but look at what happens when
such a function is called from a SELECT targetlist.

The ExprMultipleResult flag from the function propagates up through
execQual.c, to ExecTargetList which forms a new result tuple for each
function result.  All the node executor routines that call ExecProject
have to be prepared to deal with that (eg, first if() in ExecScan).

This is all really messy, both in the implementation and in the
conception IMHO; for example, the behavior with multiple SRFs in the
same targetlist is really pretty stupid (and it was worse when the
code left Berkeley).  I'd like to deprecate and eventually remove the
whole feature.  SRFs in FROM (as table sources) make way more sense
than SRFs in targetlists.
        regards, tom lane