Thread: Set Returning Functions (SRF) - request for patch review and comment

Set Returning Functions (SRF) - request for patch review and comment

From
Joe Conway
Date:
I've been buried in the backend parser/planner/executor now for the last
2 weeks or so, and I now have a patch for a working implementation of
SRFs as RTEs (i.e. "SELECT tbl.* FROM myfunc() AS tbl"). I think I'm at
a good point to get review and comments. Not everything yet has been
implemented per my proposal (see:
http://fts.postgresql.org/db/mw/msg.html?mid=1077099 ) but most of the
support is in place.

How it currently works:
-----------------------
1. At this point, FROM clause SRFs are used as a row source in a manner
similar to the current API, i.e. one row at a time is produced without
materializing.

2. The SRF may be either marked as returning a set or not. A function
not marked as returning a set simply produces one row.

3. The SRF may either return a base data type (e.g. TEXT) or a composite
data type (e.g. pg_class). If the function returns a base data type, the
single result column is named for the function. If the function returns
a composite type, the result columns get the same names as the
individual attributes of the type.

4. The SRF *must* be aliased in the FROM clause. This is similar to the
requirement for a subselect used in the FROM clause.

5. example:
test=# CREATE TABLE foo (fooid int, foosubid int, fooname text, primary
key(fooid,foosubid));
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
'foo_pkey' for table 'foo'
CREATE
test=# INSERT INTO foo VALUES(1,1,'Joe');
INSERT 16693 1
test=# INSERT INTO foo VALUES(1,2,'Ed');
INSERT 16694 1
test=# INSERT INTO foo VALUES(2,1,'Mary');
INSERT 16695 1
test=# CREATE FUNCTION getfoo(int) RETURNS setof foo AS 'SELECT * FROM
foo WHERE fooid = $1;' LANGUAGE SQL;
CREATE
test=# SELECT * FROM getfoo(1) AS t1;
  fooid | foosubid | fooname
-------+----------+---------
      1 |        1 | Joe
      1 |        2 | Ed
(2 rows)

test=# SELECT t1.fooname FROM getfoo(1) AS t1 WHERE t1.foosubid = 1;
  fooname
---------
  Joe
(1 row)

test=# select * from dblink_get_pkey('foo') as t1;
  dblink_get_pkey
-----------------
  fooid
  foosubid
(2 rows)

What still needs to be done:
----------------------------
1. Add a new table_ref node type - DONE
2. Add support for three modes of operation to RangePortal:
    a. Repeated calls -- DONE
    b. Materialized results -- partially complete
    c. Return query -- I'm starting to wonder how/if this is really
       different than a.) above
3. Add support to allow the RangePortal to materialize modes a and c,
    if needed for a re-read -- partially complete.
4. Add a WITH keyword to CREATE FUNCTION, allowing SRF mode to be
    specified -- not yet started.


Request for help:
-----------------
So far I've tested with SQL and C functions. I will also do some testing
with PLpgSQL functions. I need testing and feedback from users of the
other function PLs.

Review, comments, feedback, etc. are appreciated.

Thanks,

Joe



Attachment

Re: Set Returning Functions (SRF) - request for patch review and comment

From
"Christopher Kings-Lynne"
Date:
Feedback: you're a legend!

I'll try to patch my CVS and test it at some point...

Chris

> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org]On Behalf Of Joe Conway
> Sent: Tuesday, 7 May 2002 12:51 AM
> To: pgsql-hackers
> Subject: [HACKERS] Set Returning Functions (SRF) - request for patch
> review and comment
> 
> 
> I've been buried in the backend parser/planner/executor now for the last 
> 2 weeks or so, and I now have a patch for a working implementation of 
> SRFs as RTEs (i.e. "SELECT tbl.* FROM myfunc() AS tbl"). I think I'm at 
> a good point to get review and comments. Not everything yet has been 
> implemented per my proposal (see: 
> http://fts.postgresql.org/db/mw/msg.html?mid=1077099 ) but most of the 
> support is in place.
> 
> How it currently works:
> -----------------------
> 1. At this point, FROM clause SRFs are used as a row source in a manner 
> similar to the current API, i.e. one row at a time is produced without 
> materializing.
> 
> 2. The SRF may be either marked as returning a set or not. A function 
> not marked as returning a set simply produces one row.
> 
> 3. The SRF may either return a base data type (e.g. TEXT) or a composite 
> data type (e.g. pg_class). If the function returns a base data type, the 
> single result column is named for the function. If the function returns 
> a composite type, the result columns get the same names as the 
> individual attributes of the type.
> 
> 4. The SRF *must* be aliased in the FROM clause. This is similar to the 
> requirement for a subselect used in the FROM clause.
> 
> 5. example:
> test=# CREATE TABLE foo (fooid int, foosubid int, fooname text, primary 
> key(fooid,foosubid));
> NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index 
> 'foo_pkey' for table 'foo'
> CREATE
> test=# INSERT INTO foo VALUES(1,1,'Joe');
> INSERT 16693 1
> test=# INSERT INTO foo VALUES(1,2,'Ed');
> INSERT 16694 1
> test=# INSERT INTO foo VALUES(2,1,'Mary');
> INSERT 16695 1
> test=# CREATE FUNCTION getfoo(int) RETURNS setof foo AS 'SELECT * FROM 
> foo WHERE fooid = $1;' LANGUAGE SQL;
> CREATE
> test=# SELECT * FROM getfoo(1) AS t1;
>   fooid | foosubid | fooname
> -------+----------+---------
>       1 |        1 | Joe
>       1 |        2 | Ed
> (2 rows)
> 
> test=# SELECT t1.fooname FROM getfoo(1) AS t1 WHERE t1.foosubid = 1;
>   fooname
> ---------
>   Joe
> (1 row)
> 
> test=# select * from dblink_get_pkey('foo') as t1;
>   dblink_get_pkey
> -----------------
>   fooid
>   foosubid
> (2 rows)
> 
> What still needs to be done:
> ----------------------------
> 1. Add a new table_ref node type - DONE
> 2. Add support for three modes of operation to RangePortal:
>     a. Repeated calls -- DONE
>     b. Materialized results -- partially complete
>     c. Return query -- I'm starting to wonder how/if this is really
>        different than a.) above
> 3. Add support to allow the RangePortal to materialize modes a and c,
>     if needed for a re-read -- partially complete.
> 4. Add a WITH keyword to CREATE FUNCTION, allowing SRF mode to be
>     specified -- not yet started.
> 
> 
> Request for help:
> -----------------
> So far I've tested with SQL and C functions. I will also do some testing 
> with PLpgSQL functions. I need testing and feedback from users of the 
> other function PLs.
> 
> Review, comments, feedback, etc. are appreciated.
> 
> Thanks,
> 
> Joe
> 
> 
> 



Joe Conway <mail@joeconway.com> writes:
> I've been buried in the backend parser/planner/executor now for the last 
> 2 weeks or so, and I now have a patch for a working implementation of 
> SRFs as RTEs (i.e. "SELECT tbl.* FROM myfunc() AS tbl"). I think I'm at 
> a good point to get review and comments.

A few random comments ---

> 4. The SRF *must* be aliased in the FROM clause. This is similar to the 
> requirement for a subselect used in the FROM clause.

This seems unnecessary; couldn't we use the function name as the default
alias name?  The reason we require an alias for a subselect is that
there's no obvious automatic choice for a subselect; but there is for a
function.

You may not want to hear this at this point ;-) but I'd be strongly
inclined to s/portal/function/ throughout the patch.  The implementation
doesn't seem to have anything to do with portals as defined by
portalmem.c, so using the same name just sounds like a recipe for
confusion.  (I think Alex started with that name because he intended
to allow fetches from cursors --- but you're not doing that, and if
someone were to add it later he'd probably want to use the name
RangePortal for that.)

The patch's approach to checking function execute permissions seems
wrong, because it only looks at the topmost node of the function
expression.  Considerselect ... from myfunc(1, sin(x))
Probably better to let init_fcache do the checking instead when the
expression is prepared for execution.

*** src/backend/executor/execQual.c    27 Apr 2002 03:45:03 -0000    1.91
--- src/backend/executor/execQual.c    5 May 2002 21:36:55 -0000
***************
*** 44,49 ****
--- 44,52 ---- #include "utils/fcache.h"  
+ Datum ExecEvalFunc(Expr *funcClause, ExprContext *econtext,
+              bool *isNull, ExprDoneCond *isDone);
+ 

(and a corresponding "extern" in some other .c file)  Naughty naughty...
this should be in a .h file.  But actually you should probably just be
calling ExecEvalExpr anyway, rather than hard-wiring the assumption that
the top node of the expression tree is a Func.  Most of the other places
that assume that could be fixed easily by using functions like
exprType() in place of direct field access.

I've been toying with eliminating Iter nodes, which don't seem to do
anything especially worthwhile --- it'd make a lot more sense to add
a "returnsSet" boolean in Func nodes.  Dunno if that simplifies life
for you.  If you take the above advice you may find you don't really
care anymore whether there's an Iter node in the tree.

ExecPortalReScan does not look like it works yet (in fact, it looks like
it will dump core).  This is important.  It also brings up the question
of how you are handling parameters passed into the function.  I think
there's a lot more to that than meets the eye.

I have been thinking that TupleStore ought to be extended to allow
fetching of existing entries without foreclosing the option of storing
more tuples.  This would allow you to operate "on the fly" without
necessarily having to fetch the entire function output on first call.
You fetch only as far as you've been requested to provide answers.
(Which would be a good thing; consider cases with LIMIT for example.)


Good to see you making progress on this; it's been a wishlist item
for a long time.
        regards, tom lane


Re: Set Returning Functions (SRF) - request for patch review

From
Joe Conway
Date:
Tom Lane wrote:
>>4. The SRF *must* be aliased in the FROM clause. This is similar to the 
>>requirement for a subselect used in the FROM clause.
> 
> This seems unnecessary; couldn't we use the function name as the default
> alias name?  The reason we require an alias for a subselect is that
> there's no obvious automatic choice for a subselect; but there is for a
> function.

Yeah, I was on the fence about this. The only problem I could see is 
when the function returns a base type, what do I use for the column 
alias? Is it OK to use the same alias for the relation and column?


> 
> You may not want to hear this at this point ;-) but I'd be strongly
> inclined to s/portal/function/ throughout the patch.  The implementation
> doesn't seem to have anything to do with portals as defined by
> portalmem.c, so using the same name just sounds like a recipe for

I was already thinking the same thing. It will be a real PITA, but I do 
believe it is the right thing to do.


> The patch's approach to checking function execute permissions seems
> wrong, because it only looks at the topmost node of the function
> expression.  Consider
>     select ... from myfunc(1, sin(x))
> Probably better to let init_fcache do the checking instead when the
> expression is prepared for execution.

OK.


> + Datum ExecEvalFunc(Expr *funcClause, ExprContext *econtext,
> +              bool *isNull, ExprDoneCond *isDone);
> + 
> 
> (and a corresponding "extern" in some other .c file)  Naughty naughty...
> this should be in a .h file.  But actually you should probably just be
> calling ExecEvalExpr anyway, rather than hard-wiring the assumption that
> the top node of the expression tree is a Func.  Most of the other places
> that assume that could be fixed easily by using functions like
> exprType() in place of direct field access.

OK.


> 
> I've been toying with eliminating Iter nodes, which don't seem to do
> anything especially worthwhile --- it'd make a lot more sense to add
> a "returnsSet" boolean in Func nodes.  Dunno if that simplifies life
> for you.  If you take the above advice you may find you don't really
> care anymore whether there's an Iter node in the tree.

Actually it gets in my way a bit, and I think I remember some 
discussions wrt removing it. But I wasn't sure how large the impact 
would be on the current API if I messed with it, so I thought I'd leave 
it for now. Do you think it's worth it to address this now?


> 
> ExecPortalReScan does not look like it works yet (in fact, it looks like
> it will dump core).  This is important.  It also brings up the question
> of how you are handling parameters passed into the function.  I think
> there's a lot more to that than meets the eye.

Yeah, I took a first shot at writing the rescan support, but have not 
yet begun to use/test it. I'd like to get the rest of the patch to an 
acceptable level first, then concentrate on get materialization and 
rescan working.


> 
> I have been thinking that TupleStore ought to be extended to allow
> fetching of existing entries without foreclosing the option of storing
> more tuples.  This would allow you to operate "on the fly" without
> necessarily having to fetch the entire function output on first call.
> You fetch only as far as you've been requested to provide answers.
> (Which would be a good thing; consider cases with LIMIT for example.)
> 

Hmm, I'll have to look at this more closely. When I get to the 
materialization/rescan stuff, I'll see if I can address this idea.

Thanks for the review and comments!

Joe



Joe Conway <mail@joeconway.com> writes:
> Tom Lane wrote:
>> This seems unnecessary; couldn't we use the function name as the default
>> alias name?  The reason we require an alias for a subselect is that
>> there's no obvious automatic choice for a subselect; but there is for a
>> function.

> Yeah, I was on the fence about this. The only problem I could see is 
> when the function returns a base type, what do I use for the column 
> alias? Is it OK to use the same alias for the relation and column?

Sure.  foo.foo is valid for a column foo in a table foo, so I don't
see a problem with it for a function.

>> You may not want to hear this at this point ;-) but I'd be strongly
>> inclined to s/portal/function/ throughout the patch.

> I was already thinking the same thing. It will be a real PITA, but I do 
> believe it is the right thing to do.

You could try doing the text substitution on the diff file and then
re-applying the diff to fresh sources.  Might get a couple of merge
failures, but should be a lot less painful than doing the edit directly
on the full sources.

>> I've been toying with eliminating Iter nodes, which don't seem to do
>> anything especially worthwhile --- it'd make a lot more sense to add
>> a "returnsSet" boolean in Func nodes.  Dunno if that simplifies life
>> for you.  If you take the above advice you may find you don't really
>> care anymore whether there's an Iter node in the tree.

> Actually it gets in my way a bit, and I think I remember some 
> discussions wrt removing it. But I wasn't sure how large the impact 
> would be on the current API if I messed with it, so I thought I'd leave 
> it for now. Do you think it's worth it to address this now?

Up to you; probably should wait to see if Iter is still in your way
after you do the other thing.  I think removing it and instead inserting
returnsSet booleans in Oper and Func nodes would be a pretty
straightforward exercise, but it'll mean touching even more stuff.
Might be best to do that as a separate patch.

>> ExecPortalReScan does not look like it works yet (in fact, it looks like
>> it will dump core).  This is important.  It also brings up the question
>> of how you are handling parameters passed into the function.  I think
>> there's a lot more to that than meets the eye.

> Yeah, I took a first shot at writing the rescan support, but have not 
> yet begun to use/test it. I'd like to get the rest of the patch to an 
> acceptable level first, then concentrate on get materialization and 
> rescan working.

Fair enough.  We should try to get the bulk of the patch applied soon
so that you don't have code drift problems.  The rescan issues should
not involve touching nearly as much code.
        regards, tom lane


Re: Set Returning Functions (SRF) - request for patch review and comment

From
Ian Barwick
Date:
On Monday 06 May 2002 18:51, Joe Conway wrote:
(...)
> Request for help:
> -----------------
> So far I've tested with SQL and C functions.
(...)

Can you post an example of a function in C?
(I'm trying out your patch from Friday).


Thanks,

Ian Barwick


Re: Set Returning Functions (SRF) - request for patch review

From
Joe Conway
Date:
Ian Barwick wrote:
> On Monday 06 May 2002 18:51, Joe Conway wrote:
> (...)
> 
>>Request for help:
>>-----------------
>>So far I've tested with SQL and C functions. 
> 
> (...)
> 
> Can you post an example of a function in C?
> (I'm trying out your patch from Friday).
> 
> 
> Thanks,
> 
> Ian Barwick

See contrib/dblink. The version in cvs HEAD has two that return sets -- 
dblink() which returns an int, and dblink_get_pkey() which returns text. 
I don't have an example that returns a composite type though. I'll make 
one of those for testing some time next week.

Joe




Re: Set Returning Functions (SRF) - request for patch review

From
Ian Barwick
Date:
>
> See contrib/dblink. The version in cvs HEAD has two that return sets --
> dblink() which returns an int, and dblink_get_pkey() which returns text.

Thanks, now I can see what I was doing wrong

Yours

Ian Barwick