Thread: functions returning sets

functions returning sets

From
Jeff Rogers
Date:
I have a set-returning function that takes a text string and returns multiple
values from it.  Its a small hack on the pgxml_xpath function from contrib/xml
that returns all matching nodes from an xml document rather than just a single
specified node.

I currently use it as follows:

create table xml_files (file text, doc text) ;
create view xnodes as select file, pgxml_xpath(doc, '/top/node') from
xml_files;

select * from xnodes ;

This gives me a table of all the matching nodes in all the documents.
However, the documentation says that using a SRF in the select list of a
query, but this capability is deprecated.  I can't figure out how to call this
function in the from clause with it referring to a column in a table, I get an
error like
ERROR:  FROM function expression may not refer to other relations of same
query level.  Is there another way to accomplish this?

Another useful way to call my function would be
select file from xml_files where 'foo' in pgxml_xpath(doc,'/top/node')
but that gives be a parse error.
select file from xml_files where 'foo' in (select pgxml_xpath(doc,'/top/node'))
parses, but it doesn't seem to give correct results.

Thanks
-J

PS: Here's the function.  This goes with pgxml.c, not pgxml_dom.c, bt could
probably be modified to work there as well.

PG_FUNCTION_INFO_V1(pgxml_xpath_all);

Datum
pgxml_xpath_all(PG_FUNCTION_ARGS) {
    /* called as pgxml_xpath(document,pathstr) */
    /* returns set of all matching results */

    XPath_Results *xpresults;
    text       *restext;
    MemoryContext oldContext;
    FuncCallContext  *funcctx;

    text       *t = PG_GETARG_TEXT_P(0);        /* document buffer */
    text       *t2 = PG_GETARG_TEXT_P(1);
    int32 ind;

    if (SRF_IS_FIRSTCALL()) {
        funcctx=SRF_FIRSTCALL_INIT();
        oldContext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
        xpresults = build_xpath_results(t, t2);
        funcctx->user_fctx=xpresults;
        MemoryContextSwitchTo(oldContext);
    }

    funcctx = SRF_PERCALL_SETUP();
    xpresults=funcctx->user_fctx;

    ind=funcctx->call_cntr;

    if (xpresults  == NULL || ind >= xpresults->rescount) {
        if (xpresults != NULL) {
            pfree(xpresults->resbuf);
            pfree(xpresults);
        }
        SRF_RETURN_DONE(funcctx);
    }

    restext = (text *) palloc(xpresults->reslens[ind] + VARHDRSZ);
    memcpy(VARDATA(restext), xpresults->results[ind], xpresults->reslens[ind]);

    VARATT_SIZEP(restext) = xpresults->reslens[ind] + VARHDRSZ;

    SRF_RETURN_NEXT(funcctx,restext);
}





Re: functions returning sets

From
Tom Lane
Date:
Jeff Rogers <jrogers@findlaw.com> writes:
> However, the documentation says that using a SRF in the select list of
> a query, but this capability is deprecated.  I can't figure out how to
> call this function in the from clause with it referring to a column in
> a table, I get an error like
> ERROR:  FROM function expression may not refer to other relations of same
> query level.  Is there another way to accomplish this?

There isn't any good alternative at the moment (which is why SRFs in
select lists aren't deprecated yet).  There has been some discussion
about implementing SQL99's LATERAL clause to support this, but it's
not done yet.

> select file from xml_files where 'foo' in (select pgxml_xpath(doc,'/top/node'))
> parses, but it doesn't seem to give correct results.

That should work as far as I know.  Can you give more detail?

            regards, tom lane

Re: functions returning sets

From
Jeff Rogers
Date:
> > select file from xml_files where 'foo' in (select pgxml_xpath(doc,'/top/nod
> e'))
> > parses, but it doesn't seem to give correct results.
>
> That should work as far as I know.  Can you give more detail?
>
>             regards, tom lane

Cut & paste from a window:

xml=# create table foo (id text, doc text) ;
CREATE TABLE
xml=# insert into foo values (1,'<top><node>a</node><node>b</node></top>');
INSERT 22106552 1
xml=# insert into foo values (2,'<top><node>a</node><node>b</node></top>');
INSERT 22106553 1
xml=# insert into foo values (3,'<top><node>a</node><node>b</node></top>');
INSERT 22106554 1
xml=# select id,pgxml_xpath(doc,'/top/node') from foo ;
 id | pgxml_xpath
----+-------------
 1  | a
 1  | b
 2  | a
 2  | b
 3  | a
 3  | b
(6 rows)

xml=# select id from foo where 'a' in (select pgxml_xpath(doc,'/top/node'));
 id
----
 1
 3
(2 rows)

I expect this latter to return all the ids, but it seems to only return every
other one.

-J


Re: functions returning sets

From
Tom Lane
Date:
Jeff Rogers <jrogers@findlaw.com> writes:
> I expect this latter to return all the ids, but it seems to only return every
> other one.

Hmm, this looks like a bug having to do with not resetting state
correctly for the next invocation of the SRF.  Hard to tell whether
the bug is in your code or the system's support for SRFs though.

Joe, did you see
http://archives.postgresql.org/pgsql-general/2003-12/msg00709.php
?  I don't have time to look at this right now ...

            regards, tom lane

Re: functions returning sets

From
Joe Conway
Date:
Tom Lane wrote:
> Jeff Rogers <jrogers@findlaw.com> writes:
>>I expect this latter to return all the ids, but it seems to only return every
>>other one.
>
> Hmm, this looks like a bug having to do with not resetting state
> correctly for the next invocation of the SRF.  Hard to tell whether
> the bug is in your code or the system's support for SRFs though.
>
> Joe, did you see
> http://archives.postgresql.org/pgsql-general/2003-12/msg00709.php
> ?  I don't have time to look at this right now ...

I'll try to take a look in the next day or so -- hopefully tonight.

Joe



Re: functions returning sets

From
Joe Conway
Date:
Tom Lane wrote:
> Jeff Rogers <jrogers@findlaw.com> writes:
>>I expect this latter to return all the ids, but it seems to only return every
>>other one.
>
> Hmm, this looks like a bug having to do with not resetting state
> correctly for the next invocation of the SRF.  Hard to tell whether
> the bug is in your code or the system's support for SRFs though.
>
> Joe, did you see
> http://archives.postgresql.org/pgsql-general/2003-12/msg00709.php
> ?  I don't have time to look at this right now ...

I've not gotten to the bottom of this, but I can confirm the issue.

The way I would expect this query to be written does work:

regression=# select id from foo where 'a' in (select * from
pgxml_xpath(doc,'/top/node'));
  id
----
  1
  2
  3
(3 rows)

So the problem seems to be related specifically to "SRF returning setof
scalar in targetlist of subselect". Maybe some difference between
ExecMakeFunctionResult and ExecMakeTableFunctionResult? SRFs returning
complex types are not allowed in targetlists, but in this case there is
no restriction because the return type is scalar.

I'll keep digging as time allows.

Joe





Re: functions returning sets

From
Joe Conway
Date:
Tom Lane wrote:
> Hmm, this looks like a bug having to do with not resetting state
> correctly for the next invocation of the SRF.  Hard to tell whether
> the bug is in your code or the system's support for SRFs though.

Looks like a bug in ExecScanSubPlan() around line 401:
8<-----------------------------------------------------
if (subLinkType == ANY_SUBLINK)
{
     /* combine across rows per OR semantics */
     if (rownull)
     *isNull = true;
     else if (DatumGetBool(rowresult))
     {
         result = BoolGetDatum(true);
         *isNull = false;
         break;    /* needn't look at any more rows */
     }
}
8<-----------------------------------------------------

If the function returns a set, that comment is wrong, isn't it? I
removed the "break" to run out all the available tuples and got the
correct result:

regression=# select id from foo where 'a' in (select
pgxml_xpath(doc,'/top/node'));
  id
----
  1
  2
  3
(3 rows)

Any guidance on the preferred fix? I don't see an obvious way to
determine if the plan includes a set returning function in
ExecScanSubPlan(), and it seems unfortunate to remove the optimization
for all other cases, especially since targetlist SRFs are deprecated.

Joe



Re: functions returning sets

From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes:
> Any guidance on the preferred fix?

We cannot fix this by changing ExecScanSubPlan as you suggest.
That would amount to saying that all plans have to be run to completion,
which destroys LIMIT to name just one unpleasant consequence.

There is a mechanism available for functions to arrange to get cleanup
callbacks when a containing plan is shut down early --- see
RegisterExprContextCallback and ShutdownExprContext.  It looks like this
is not used by the existing SRF support, but I suspect it should be.

[ scratches head ... ]  Right at the moment I don't see where
ShutdownExprContext gets called during a ReScan operation.  I'm quite
sure it once was ... there may be another bug here ...

            regards, tom lane

Re: functions returning sets

From
Joe Conway
Date:
Tom Lane wrote:
> Joe Conway <mail@joeconway.com> writes:
>>Any guidance on the preferred fix?
>
> We cannot fix this by changing ExecScanSubPlan as you suggest.
> That would amount to saying that all plans have to be run to completion,
> which destroys LIMIT to name just one unpleasant consequence.

I suspected as much.

> There is a mechanism available for functions to arrange to get cleanup
> callbacks when a containing plan is shut down early --- see
> RegisterExprContextCallback and ShutdownExprContext.  It looks like this
> is not used by the existing SRF support, but I suspect it should be.

OK -- I'll take a look.

> [ scratches head ... ]  Right at the moment I don't see where
> ShutdownExprContext gets called during a ReScan operation.  I'm quite
> sure it once was ... there may be another bug here ...
>

Thanks, I'll keep that in mind too.

Joe




Re: functions returning sets

From
Tom Lane
Date:
> Tom Lane wrote:
>> [ scratches head ... ]  Right at the moment I don't see where
>> ShutdownExprContext gets called during a ReScan operation.  I'm quite
>> sure it once was ... there may be another bug here ...

After further looking I've realized that memory is misserving me here;
having ReScan call ShutdownExprContext was not something that ever got
done.  Instead I have an entry on my personal todo list that says

: Need to invent ExprContextRescan and call it at appropriate places.
: Knowing where all the econtexts are seems like the hard part ... though
: maybe we only care about econtexts that might contain set-returning
: functions, which might limit it to the targetlist...

A perfectly clean solution would require being careful to reset *all*
econtexts, which might be thought rather a lot of work to support a
feature that's eventually going to be deprecated anyway (viz, SRFs
outside of FROM).  I'll see about the tlist-only case though.

            regards, tom lane

Re: functions returning sets

From
Tom Lane
Date:
I said:
> : Need to invent ExprContextRescan and call it at appropriate places.

> A perfectly clean solution would require being careful to reset *all*
> econtexts, which might be thought rather a lot of work to support a
> feature that's eventually going to be deprecated anyway (viz, SRFs
> outside of FROM).  I'll see about the tlist-only case though.

Actually, some study shows that we *only* support set-returning
functions in ExecProject(), which is only used with the ps_ExprContext
econtext of plan nodes, which makes it pretty easy to ensure that ReScan
gets everything it needs to.  I have committed a patch into 7.4 and HEAD
branches that ensures shutdown callback functions are called during
ReScan.  I've tested that the problem is fixed for SQL-language
functions; for example in the regression database do

create function foo(int) returns setof int as
'select unique1 from tenk1' language sql;

and compare the output of

select foo(1) limit 10;
select unique1 from tenk1 where unique1 in (select foo(1) limit 10);
select unique1 from tenk1 where unique1 in (select foo(ten) limit 10);

These should be the same (up to ordering) but are not without the patch.

We still need a code addition that uses an ExprState shutdown callback
to reset the state of an SRF that uses the SRF_XXX macros.  Joe, have
you had time to give that any thought?

            regards, tom lane

Re: functions returning sets

From
Joe Conway
Date:
Tom Lane wrote:
> We still need a code addition that uses an ExprState shutdown callback
> to reset the state of an SRF that uses the SRF_XXX macros.  Joe, have
> you had time to give that any thought?
>

Yeah, I've gotten something started, but couldn't do much without the
shutdown getting called. I hadn't yet figured out the best way to make
that happen, so I'm glad you did ;-). I'll update to cvs tip and try
Jeff's function with my changes and your committed changes. Will get
back shortly with the results.

Joe



Re: functions returning sets

From
Joe Conway
Date:
Joe Conway wrote:
> Tom Lane wrote:
>> We still need a code addition that uses an ExprState shutdown callback
>> to reset the state of an SRF that uses the SRF_XXX macros.  Joe, have
>> you had time to give that any thought?
>
> Yeah, I've gotten something started, but couldn't do much without the
> shutdown getting called. I hadn't yet figured out the best way to make
> that happen, so I'm glad you did ;-). I'll update to cvs tip and try
> Jeff's function with my changes and your committed changes. Will get
> back shortly with the results.

OK, updated to cvs tip, and with the attached patch applied, all seems well:

regression=# select id from foo where 'a' in (select
pgxml_xpath(doc,'/top/node'));
  id
----
  1
  2
  3
(3 rows)

Any comment on the patch?


BTW, I am seeing:
[...]
test portals_p2           ... ok
test rules                ... FAILED
test foreign_key          ... ok
[...]
but it seems unrelated to this change -- caused by a redefinition of the
pg_stats view. I guess I need to initdb.

Thanks,

Joe

Index: src/backend/utils/fmgr/funcapi.c
===================================================================
RCS file: /opt/src/cvs/pgsql-server/src/backend/utils/fmgr/funcapi.c,v
retrieving revision 1.12
diff -c -r1.12 funcapi.c
*** src/backend/utils/fmgr/funcapi.c    29 Nov 2003 19:52:01 -0000    1.12
--- src/backend/utils/fmgr/funcapi.c    18 Dec 2003 01:51:37 -0000
***************
*** 17,23 ****
  #include "catalog/pg_type.h"
  #include "utils/syscache.h"

-
  /*
   * init_MultiFuncCall
   * Create an empty FuncCallContext data structure
--- 17,22 ----
***************
*** 58,63 ****
--- 57,64 ----
          retval->user_fctx = NULL;
          retval->attinmeta = NULL;
          retval->multi_call_memory_ctx = fcinfo->flinfo->fn_mcxt;
+         retval->flinfo = fcinfo->flinfo;
+         retval->shutdown_reg = false;

          /*
           * save the pointer for cross-call use
***************
*** 106,115 ****
   * Clean up after init_MultiFuncCall
   */
  void
! end_MultiFuncCall(PG_FUNCTION_ARGS, FuncCallContext *funcctx)
  {
      /* unbind from fcinfo */
!     fcinfo->flinfo->fn_extra = NULL;

      /*
       * Caller is responsible to free up memory for individual struct
--- 107,119 ----
   * Clean up after init_MultiFuncCall
   */
  void
! end_MultiFuncCall(Datum arg)
  {
+     FuncCallContext *funcctx = (FuncCallContext *) DatumGetPointer(arg);
+     FmgrInfo *flinfo = funcctx->flinfo;
+
      /* unbind from fcinfo */
!     flinfo->fn_extra = NULL;

      /*
       * Caller is responsible to free up memory for individual struct
Index: src/include/funcapi.h
===================================================================
RCS file: /opt/src/cvs/pgsql-server/src/include/funcapi.h,v
retrieving revision 1.10
diff -c -r1.10 funcapi.h
*** src/include/funcapi.h    29 Nov 2003 22:40:53 -0000    1.10
--- src/include/funcapi.h    18 Dec 2003 20:17:07 -0000
***************
*** 111,116 ****
--- 111,126 ----
       */
      MemoryContext multi_call_memory_ctx;

+     /*
+      * pointer to FmgrInfo needed for shutdown
+      */
+     FmgrInfo   *flinfo;
+
+     /*
+      * true if registered shutdown callback
+      */
+     bool        shutdown_reg;
+
  } FuncCallContext;

  /*----------
***************
*** 203,209 ****
  /* from funcapi.c */
  extern FuncCallContext *init_MultiFuncCall(PG_FUNCTION_ARGS);
  extern FuncCallContext *per_MultiFuncCall(PG_FUNCTION_ARGS);
! extern void end_MultiFuncCall(PG_FUNCTION_ARGS, FuncCallContext *funcctx);

  #define SRF_IS_FIRSTCALL() (fcinfo->flinfo->fn_extra == NULL)

--- 213,219 ----
  /* from funcapi.c */
  extern FuncCallContext *init_MultiFuncCall(PG_FUNCTION_ARGS);
  extern FuncCallContext *per_MultiFuncCall(PG_FUNCTION_ARGS);
! extern void end_MultiFuncCall(Datum arg);

  #define SRF_IS_FIRSTCALL() (fcinfo->flinfo->fn_extra == NULL)

***************
*** 217,231 ****
          (_funcctx)->call_cntr++; \
          rsi = (ReturnSetInfo *) fcinfo->resultinfo; \
          rsi->isDone = ExprMultipleResult; \
          PG_RETURN_DATUM(_result); \
      } while (0)

  #define  SRF_RETURN_DONE(_funcctx) \
      do { \
          ReturnSetInfo       *rsi; \
!         end_MultiFuncCall(fcinfo, _funcctx); \
          rsi = (ReturnSetInfo *) fcinfo->resultinfo; \
          rsi->isDone = ExprEndResult; \
          PG_RETURN_NULL(); \
      } while (0)

--- 227,252 ----
          (_funcctx)->call_cntr++; \
          rsi = (ReturnSetInfo *) fcinfo->resultinfo; \
          rsi->isDone = ExprMultipleResult; \
+         if (!_funcctx->shutdown_reg) \
+         { \
+             RegisterExprContextCallback(rsi->econtext, \
+                                         end_MultiFuncCall, \
+                                         PointerGetDatum(_funcctx)); \
+             _funcctx->shutdown_reg = true; \
+         } \
          PG_RETURN_DATUM(_result); \
      } while (0)

  #define  SRF_RETURN_DONE(_funcctx) \
      do { \
          ReturnSetInfo       *rsi; \
!         end_MultiFuncCall(PointerGetDatum(_funcctx)); \
          rsi = (ReturnSetInfo *) fcinfo->resultinfo; \
          rsi->isDone = ExprEndResult; \
+         UnregisterExprContextCallback(rsi->econtext, \
+                                       end_MultiFuncCall, \
+                                       PointerGetDatum(_funcctx)); \
+         _funcctx->shutdown_reg = false; \
          PG_RETURN_NULL(); \
      } while (0)


Re: functions returning sets

From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes:
> Any comment on the patch?

It seems like a bad idea to change the contents of the SRF_XXX() macros
for 7.4.1; if we do that, any existing already-compiled-for-7.4 user
SRFs will be broken, and there's no easy way to catch the problem.
We don't normally require people to recompile user-defined functions for
dot releases anyway.

I would suggest leaving end_MultiFuncCall() with its existing API,
and adding a separate shutdown callback function that is registered
during init_MultiFuncCall and deregistered by end_MultiFuncCall.
(This should be workable without API change since init_MultiFuncCall
can get to the econtext via fcinfo->resultinfo.)

You may not even need to add any fields to FuncCallContext --- consider
passing the fcinfo pointer to the callback, rather than passing the
FuncCallContext pointer.

            regards, tom lane

Re: functions returning sets

From
Tom Lane
Date:
> You may not even need to add any fields to FuncCallContext --- consider
> passing the fcinfo pointer to the callback, rather than passing the
> FuncCallContext pointer.

Dept. of second thoughts: better pass the flinfo pointer, instead.
fcinfo might point to temporary space on the stack.

            regards, tom lane

Re: functions returning sets

From
Joe Conway
Date:
Tom Lane wrote:
>>You may not even need to add any fields to FuncCallContext --- consider
>>passing the fcinfo pointer to the callback, rather than passing the
>>FuncCallContext pointer.
>
> Dept. of second thoughts: better pass the flinfo pointer, instead.
> fcinfo might point to temporary space on the stack.

OK -- this one is a good bit simpler. Any more comments?

Joe

Index: src/backend/utils/fmgr/funcapi.c
===================================================================
RCS file: /opt/src/cvs/pgsql-server/src/backend/utils/fmgr/funcapi.c,v
retrieving revision 1.12
diff -c -r1.12 funcapi.c
*** src/backend/utils/fmgr/funcapi.c    29 Nov 2003 19:52:01 -0000    1.12
--- src/backend/utils/fmgr/funcapi.c    19 Dec 2003 00:01:46 -0000
***************
*** 17,22 ****
--- 17,23 ----
  #include "catalog/pg_type.h"
  #include "utils/syscache.h"

+ static void shutdown_MultiFuncCall(Datum arg);

  /*
   * init_MultiFuncCall
***************
*** 41,47 ****
      {
          /*
           * First call
!          *
           * Allocate suitably long-lived space and zero it
           */
          retval = (FuncCallContext *)
--- 42,51 ----
      {
          /*
           * First call
!          */
!         ReturnSetInfo       *rsi = (ReturnSetInfo *) fcinfo->resultinfo;
!
!         /*
           * Allocate suitably long-lived space and zero it
           */
          retval = (FuncCallContext *)
***************
*** 63,68 ****
--- 67,80 ----
           * save the pointer for cross-call use
           */
          fcinfo->flinfo->fn_extra = retval;
+
+         /*
+          * Ensure we will get shut down cleanly if the exprcontext is not
+          * run to completion.
+          */
+         RegisterExprContextCallback(rsi->econtext,
+                                     shutdown_MultiFuncCall,
+                                     PointerGetDatum(fcinfo->flinfo));
      }
      else
      {
***************
*** 108,115 ****
  void
  end_MultiFuncCall(PG_FUNCTION_ARGS, FuncCallContext *funcctx)
  {
!     /* unbind from fcinfo */
!     fcinfo->flinfo->fn_extra = NULL;

      /*
       * Caller is responsible to free up memory for individual struct
--- 120,148 ----
  void
  end_MultiFuncCall(PG_FUNCTION_ARGS, FuncCallContext *funcctx)
  {
!     ReturnSetInfo       *rsi = (ReturnSetInfo *) fcinfo->resultinfo;
!
!     /* Deregister the shutdown callback */
!     UnregisterExprContextCallback(rsi->econtext,
!                                   shutdown_MultiFuncCall,
!                                   PointerGetDatum(fcinfo->flinfo));
!
!     /* But use it to do the real work */
!     shutdown_MultiFuncCall(PointerGetDatum(fcinfo->flinfo));
! }
!
! /*
!  * shutdown_MultiFuncCall
!  * Shutdown function to clean up after init_MultiFuncCall
!  */
! static void
! shutdown_MultiFuncCall(Datum arg)
! {
!     FmgrInfo *flinfo = (FmgrInfo *) DatumGetPointer(arg);
!     FuncCallContext *funcctx = (FuncCallContext *) flinfo->fn_extra;
!
!     /* unbind from flinfo */
!     flinfo->fn_extra = NULL;

      /*
       * Caller is responsible to free up memory for individual struct

Re: functions returning sets

From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes:
> OK -- this one is a good bit simpler. Any more comments?

Looks good to me.  Please apply to 7.4 and HEAD.

            regards, tom lane