Thread: [RFC] Set Returning Functions
I've been reading past threads, studying backend code, reviewing Alex Pilosov's "cursor foo" patch (submitted last August/September, but never applied), and conversing off list with a few people regarding a possible implementation of Set Returning Functions (or SRF for short). Below is my proposal for how this might work. After discussion, and if there is no objection, I would like to work on this implementation with the hope that it could be in place for 7.3. Proposal for set returning functions (SRF): ----------------------------------------------------- The problem: ----------------------------------------------------- Currently the ability to return multiple row, multiple column result sets from a function is quite limited. In fact, it is not possible to return multiple columns directly. It is possible to work around this limitation, but only in a clumsy way (see contrib/dblink for an example). Alternatively refcursors may be used, but they have their own set of issues, not the least of which is they cannot be used in view definitions or exist outside of explicit transactions. The feature: ----------------------------------------------------- The desired feature is the ability to return multiple row, multiple column result sets from a function, or set returning functions (SRF) for short. Do we want this feature? ----------------------------------------------------- Based on the many posts on this topic, I think the answer to this is a resounding yes. How do we want the feature to behave? ----------------------------------------------------- A SRF should behave similarly to any other table_ref (RangeTblEntry), i.e. as a tuple source in a FROM clause. Currently there are three primary kinds of RangeTblEntry: RTE_RELATION (ordinary relation), RTE_SUBQUERY (subquery in FROM), and RTE_JOIN (join). SRF would join this list and behave in much the same manner. How do we want the feature implemented? (my proposal) ----------------------------------------------------- 1. Add a new table_ref node type: - Current nodes are RangeVar, RangeSubselect, or JoinExpr - Add new RangePortal nodeas a possible table_ref. The RangePortal node will be extented from the current Portal functionality. 2. Add support for three modes of operation to RangePortal: a. Repeated calls -- this is the existing API for SRF, but implemented as a tuple source instead of as an expression. b. Materialized results -- use a TupleStore to materializethe result set. c. Return query -- use current Portal functionality, fetch entire result set. 3. Add support to allow the RangePortal to materialize modes 1 and 3, if needed for a re-read. 4. Add a WITH keyword to CREATE FUNCTION, allowing SRF mode to be specified. This would default to mode a) for backward compatibility. 5. Ignore the current code which allows functions to return multiple results as expressions; we can leave it there, but deprecate it with the intention of eventual removal. ----------------------------------------------------- Thoughts/comments would be much appreciated. Thanks, Joe
> Do we want this feature? > ----------------------------------------------------- > Based on the many posts on this topic, I think the answer to this is a > resounding yes. Definitely! > How do we want the feature to behave? > ----------------------------------------------------- > A SRF should behave similarly to any other table_ref (RangeTblEntry), > i.e. as a tuple source in a FROM clause. Currently there are three > primary kinds of RangeTblEntry: RTE_RELATION (ordinary relation), > RTE_SUBQUERY (subquery in FROM), and RTE_JOIN (join). SRF would join > this list and behave in much the same manner. Yes - I don't see any point in adhering to the SQL standard lame definition. We can just make "CALL proc()" map to "SELECT * FROM proc()" in the parser for compliance. > How do we want the feature implemented? (my proposal) > ----------------------------------------------------- > 1. Add a new table_ref node type: > - Current nodes are RangeVar, RangeSubselect, or JoinExpr > - Add new RangePortal node as a possible table_ref. The RangePortal > node will be extented from the current Portal functionality. > > 2. Add support for three modes of operation to RangePortal: > a. Repeated calls -- this is the existing API for SRF, but > implemented as a tuple source instead of as an expression. > b. Materialized results -- use a TupleStore to materialize the > result set. > c. Return query -- use current Portal functionality, fetch entire > result set. > > 3. Add support to allow the RangePortal to materialize modes 1 and 3, if > needed for a re-read. Looks cool. That's stuff outta my league tho. > 4. Add a WITH keyword to CREATE FUNCTION, allowing SRF mode to be > specified. This would default to mode a) for backward compatibility. Interesting idea. Didn't occur to me that we could specify it on a per-function level. How do Oracle and Firebird do it? What about the issue of people maybe wanting different behaviours at different times? ie. statement level, rather than function level? > 5. Ignore the current code which allows functions to return multiple > results as expressions; we can leave it there, but deprecate it with the > intention of eventual removal. What does the current 'setof' pl/pgsql business actually _do_? Chris
"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes: >> 5. Ignore the current code which allows functions to return multiple >> results as expressions; we can leave it there, but deprecate it with the >> intention of eventual removal. > What does the current 'setof' pl/pgsql business actually _do_? plpgsql doesn't handle setof at all, AFAIR. SQL-language functions do. The gold is hidden in src/backend/executor/*.c. The SQL function executor (functions.c) suspends the query plan for the function's final SELECT, and re-executes it to get one more result row each time it's re-called. That's okay as far as it goes; but look at what happens when such a function is called from a SELECT targetlist. The ExprMultipleResult flag from the function propagates up through execQual.c, to ExecTargetList which forms a new result tuple for each function result. All the node executor routines that call ExecProject have to be prepared to deal with that (eg, first if() in ExecScan). This is all really messy, both in the implementation and in the conception IMHO; for example, the behavior with multiple SRFs in the same targetlist is really pretty stupid (and it was worse when the code left Berkeley). I'd like to deprecate and eventually remove the whole feature. SRFs in FROM (as table sources) make way more sense than SRFs in targetlists. regards, tom lane