Thread: Accessing original TupleDesc from SRF

Accessing original TupleDesc from SRF

From
John Gray
Date:
Please correct me if I've got this wrong, but it appears from the SRF
API, that a SRF cannot readily refer to the TupleDesc to which it is
expected to conform (i.e. the TupleDesc derived from the FROM clause of
an original SELECT statement) because that is held in the executor state
and not copied or linked into the function context.

The reason I'm interested (and this might be a crazy idea) is that a
function might choose to adapt its output based on what it is asked for.
i.e. the attribute names and types which it is asked to provide might
have some significance to the function. 

The application in this case is the querying of an XML document (this
relates to the contrib/xml XPath functions) where you might want a
function which gives you a "virtual view" of the document. In order to
do so, you specify a query such as:

SELECT * FROM xmlquery_func('some text here') AS xq(document_author
text, document_publisher text, document_date text);

(this would likely be part of a subquery or join in practice.)

The function would see the requested attribute "document_author" and
translate that to '//document/author/text()' for use by the XPath
engine. This avoids having to have a function with varying arguments
-instead you have a 'virtual table' that generates only the attributes
requested.

Does this sound completely crazy?

Regards

John

-- 
John Gray    
Azuli IT    
www.azuli.co.uk    




Re: Accessing original TupleDesc from SRF

From
"Zeugswetter Andreas SB SD"
Date:
> Please correct me if I've got this wrong, but it appears from the SRF
> API, that a SRF cannot readily refer to the TupleDesc to which it is
> expected to conform (i.e. the TupleDesc derived from the FROM
> clause of
> an original SELECT statement) because that is held in the
> executor state
> and not copied or linked into the function context.
>
> The reason I'm interested (and this might be a crazy idea) is that a
> function might choose to adapt its output based on what it is
> asked for.
> i.e. the attribute names and types which it is asked to provide might
> have some significance to the function.
>
> The application in this case is the querying of an XML document (this
> relates to the contrib/xml XPath functions) where you might want a
> function which gives you a "virtual view" of the document. In order to
> do so, you specify a query such as:
>
> SELECT * FROM xmlquery_func('some text here') AS xq(document_author
> text, document_publisher text, document_date text);
>
> (this would likely be part of a subquery or join in practice.)
>
> The function would see the requested attribute "document_author" and
> translate that to '//document/author/text()' for use by the XPath
> engine. This avoids having to have a function with varying arguments
> -instead you have a 'virtual table' that generates only the attributes
> requested.
>
> Does this sound completely crazy?

Nope, sounds really useful.

Andreas


Re: Accessing original TupleDesc from SRF

From
Tom Lane
Date:
John Gray <jgray@azuli.co.uk> writes:
> Please correct me if I've got this wrong, but it appears from the SRF
> API, that a SRF cannot readily refer to the TupleDesc to which it is
> expected to conform (i.e. the TupleDesc derived from the FROM clause of
> an original SELECT statement) because that is held in the executor state
> and not copied or linked into the function context.

> The reason I'm interested (and this might be a crazy idea) is that a
> function might choose to adapt its output based on what it is asked for.

Seems like a cool idea.

We could fairly readily add a field to ReturnSetInfo, but that would
only be available to functions defined as returning a set.  That'd
probably cover most useful cases but it still seems a bit unclean.

I suppose that ExecMakeTableFunctionResult could be changed to *always*
pass ReturnSetInfo, even if it's not expecting the function to return
a set.  That seems even less clean; but it would work, at least in the
current implementation.

Anyone have a better idea?
        regards, tom lane


Re: Accessing original TupleDesc from SRF

From
Joe Conway
Date:
John Gray wrote:
> Please correct me if I've got this wrong, but it appears from the SRF
> API, that a SRF cannot readily refer to the TupleDesc to which it is
> expected to conform (i.e. the TupleDesc derived from the FROM clause of
> an original SELECT statement) because that is held in the executor state
> and not copied or linked into the function context.
> 

[snip]

> 
> Does this sound completely crazy?
> 

Not crazy at all. I asked the same question a few days ago:
http://archives.postgresql.org/pgsql-hackers/2002-08/msg01914.php

Tom suggested a workaround for my purpose, but I do agree that this is 
needed in the long run. I looked at it briefly, but there was no easy 
answer I could spot. I'll take another look today.

Joe




Re: Accessing original TupleDesc from SRF

From
Joe Conway
Date:
Tom Lane wrote:
> John Gray <jgray@azuli.co.uk> writes:
> 
>>Please correct me if I've got this wrong, but it appears from the SRF
>>API, that a SRF cannot readily refer to the TupleDesc to which it is
>>expected to conform (i.e. the TupleDesc derived from the FROM clause of
>>an original SELECT statement) because that is held in the executor state
>>and not copied or linked into the function context.
> 
> 
>>The reason I'm interested (and this might be a crazy idea) is that a
>>function might choose to adapt its output based on what it is asked for.
> 
> 
> Seems like a cool idea.
> 
> We could fairly readily add a field to ReturnSetInfo, but that would
> only be available to functions defined as returning a set.  That'd
> probably cover most useful cases but it still seems a bit unclean.

I thought about that, but had the same concern.


> I suppose that ExecMakeTableFunctionResult could be changed to *always*
> pass ReturnSetInfo, even if it's not expecting the function to return
> a set.  That seems even less clean; but it would work, at least in the
> current implementation.

Hmmm. I hadn't thought about this possibility. Why is it unclean? Are 
there places where the lack of ReturnSetInfo is used to indicate that 
the function does not return a set? Doesn't seem like there should be.

> Anyone have a better idea?

I was looking to see if it could be added to FunctionCallInfoData, but 
you might find that more unclean still ;-).

Actually, I left off trying to figure out how to pass the columndef to 
ExecMakeFunctionResult in the first place. It wasn't obvious to me, and 
since you offered an easy alternative solution I stopped trying. Any 
suggestions? Preference of extending FunctionCallInfoData or 
ReturnSetInfo? I'd really like to do this for 7.3.

Joe



Re: Accessing original TupleDesc from SRF

From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes:
> John Gray wrote:
>> Does this sound completely crazy?

> Not crazy at all. I asked the same question a few days ago:
> http://archives.postgresql.org/pgsql-hackers/2002-08/msg01914.php

I've been thinking more about this, and wondering if we should not
only make the tupdesc available but rely more heavily on it than we
do.  Most of the C-coded functions do fairly substantial pushups to
construct tupdescs that are just going to duplicate what
nodeFunctionscan already has in its back pocket.  They could save some
time by just picking that up and using it.

On the other hand, your experience yesterday with debugging a mismatched
function declaration suggests that it's still a good idea to make the
functions build the tupdesc they think they are returning.
        regards, tom lane


Re: Accessing original TupleDesc from SRF

From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes:
>> I suppose that ExecMakeTableFunctionResult could be changed to *always*
>> pass ReturnSetInfo, even if it's not expecting the function to return
>> a set.  That seems even less clean; but it would work, at least in the
>> current implementation.

> Hmmm. I hadn't thought about this possibility. Why is it unclean? Are 
> there places where the lack of ReturnSetInfo is used to indicate that 
> the function does not return a set? Doesn't seem like there should be.

Probably not.  If the function itself doesn't know whether it should
return a set, it can always look at the FmgrInfo struct to find out.

> Actually, I left off trying to figure out how to pass the columndef to 
> ExecMakeFunctionResult in the first place.

That was hard yesterday, but it's easy today because nodeFunctionscan
isn't using ExecEvalExpr anymore --- we'd only have to add one more
parameter to ExecMakeTableFunctionResult and we're there.

> Preference of extending FunctionCallInfoData or ReturnSetInfo?

Definitely ReturnSetInfo.  If we put it in FunctionCallInfoData then
that's an extra word to zero for *every* fmgr function call, not only
table functions.

One thing to notice is that a table function that's depending on this
info being available will have to punt if it's invoked via
ExecMakeFunctionResult (ie, it's being called in a SELECT targetlist).
That doesn't bother me too much, but maybe others will see it
differently.
        regards, tom lane


Re: Accessing original TupleDesc from SRF

From
Joe Conway
Date:
Tom Lane wrote:
> I've been thinking more about this, and wondering if we should not
> only make the tupdesc available but rely more heavily on it than we
> do.  Most of the C-coded functions do fairly substantial pushups to
> construct tupdescs that are just going to duplicate what
> nodeFunctionscan already has in its back pocket.  They could save some
> time by just picking that up and using it.
> 
> On the other hand, your experience yesterday with debugging a mismatched
> function declaration suggests that it's still a good idea to make the
> functions build the tupdesc they think they are returning.

In a function which *can* know what the tupledec should look like based 
on independent information (contrib/tablefunc.c:crosstab), or based on a 
priori knowledge (guc.c:show_all_settings), then the passed in tupdesc 
could be used by the function to validate that it has been acceptably 
declared (for named types) or called (for anonymous types).

But it is also interesting to let the function try to adapt to the way 
in which it has been called, and punt if it can't deal with it. And in 
some cases, like John's example, there *is* no other way.

Joe




Re: Accessing original TupleDesc from SRF

From
Joe Conway
Date:
Tom Lane wrote:
> Joe Conway <mail@joeconway.com> writes:
>>Actually, I left off trying to figure out how to pass the columndef to 
>>ExecMakeFunctionResult in the first place.
> 
> That was hard yesterday, but it's easy today because nodeFunctionscan
> isn't using ExecEvalExpr anymore --- we'd only have to add one more
> parameter to ExecMakeTableFunctionResult and we're there.

I didn't even realize you had changed that! Things move quickly around 
here ;-). I'll take a look this morning.


>>Preference of extending FunctionCallInfoData or ReturnSetInfo?
> 
> Definitely ReturnSetInfo.  If we put it in FunctionCallInfoData then
> that's an extra word to zero for *every* fmgr function call, not only
> table functions.

OK.


> One thing to notice is that a table function that's depending on this
> info being available will have to punt if it's invoked via
> ExecMakeFunctionResult (ie, it's being called in a SELECT targetlist).
> That doesn't bother me too much, but maybe others will see it
> differently.

It's an important safety tip, but it doesn't bother me either. I think 
you have suggested before that SRFs in SELECT targetlists should be 
deprecated/removed. I'd take that one step further and say that SELECT 
targetlists should only allow a scalar result, but obviously there are 
some backwards compatibility issues there.

Joe




Re: Accessing original TupleDesc from SRF

From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes:
>> On the other hand, your experience yesterday with debugging a mismatched
>> function declaration suggests that it's still a good idea to make the
>> functions build the tupdesc they think they are returning.

> In a function which *can* know what the tupledec should look like based 
> on independent information (contrib/tablefunc.c:crosstab), or based on a 
> priori knowledge (guc.c:show_all_settings), then the passed in tupdesc 
> could be used by the function to validate that it has been acceptably 
> declared (for named types) or called (for anonymous types).

Yeah, I had also considered the idea of pushing the responsibility of
verifying the tupdesc matching out to the function (ie, nodeFunctionscan
wouldn't call tupdesc_mismatch anymore, but the function could).

I think this is a bad idea on balance though; it would save few cycles
and probably create lots more debugging headaches like the one you had.
        regards, tom lane


Re: Accessing original TupleDesc from SRF

From
Joe Conway
Date:
Tom Lane wrote:
> Joe Conway <mail@joeconway.com> writes:
>>Preference of extending FunctionCallInfoData or ReturnSetInfo?
>
> Definitely ReturnSetInfo.  If we put it in FunctionCallInfoData then
> that's an extra word to zero for *every* fmgr function call, not only
> table functions.

Attached adds:
   + TupleDesc queryDesc; /* descriptor for planned query */
to ReturnSetInfo, and populates ReturnSetInfo for every function call to
  ExecMakeTableFunctionResult, not just when fn_retset.

> One thing to notice is that a table function that's depending on this
> info being available will have to punt if it's invoked via
> ExecMakeFunctionResult (ie, it's being called in a SELECT targetlist).
> That doesn't bother me too much, but maybe others will see it
> differently.

I haven't done it yet, but I suppose this should be documented in
xfunc.sgml. With this patch the behavior of a function called through
ExecMakeFunctionResult will be:

if (fn_retset)
      ReturnSetInfo is populated but queryDesc is set to NULL

if (!fn_retset)
      ReturnSetInfo is NULL

If there are no objections, please apply.

Thanks,

Joe
Index: src/backend/executor/execQual.c
===================================================================
RCS file: /opt/src/cvs/pgsql-server/src/backend/executor/execQual.c,v
retrieving revision 1.102
diff -c -r1.102 execQual.c
*** src/backend/executor/execQual.c    30 Aug 2002 00:28:41 -0000    1.102
--- src/backend/executor/execQual.c    30 Aug 2002 19:30:38 -0000
***************
*** 709,714 ****
--- 709,715 ----
          rsinfo.econtext = econtext;
          rsinfo.allowedModes = (int) SFRM_ValuePerCall;
          rsinfo.returnMode = SFRM_ValuePerCall;
+         rsinfo.queryDesc = NULL;
          /* isDone is filled below */
          rsinfo.setResult = NULL;
          rsinfo.setDesc = NULL;
***************
*** 851,856 ****
--- 852,858 ----
  Tuplestorestate *
  ExecMakeTableFunctionResult(Expr *funcexpr,
                              ExprContext *econtext,
+                             TupleDesc queryDesc,
                              TupleDesc *returnDesc)
  {
      Tuplestorestate *tupstore = NULL;
***************
*** 859,865 ****
      List       *argList;
      FunctionCachePtr fcache;
      FunctionCallInfoData fcinfo;
!     ReturnSetInfo rsinfo;        /* for functions returning sets */
      ExprDoneCond argDone;
      MemoryContext callerContext;
      MemoryContext oldcontext;
--- 861,867 ----
      List       *argList;
      FunctionCachePtr fcache;
      FunctionCallInfoData fcinfo;
!     ReturnSetInfo rsinfo;
      ExprDoneCond argDone;
      MemoryContext callerContext;
      MemoryContext oldcontext;
***************
*** 918,938 ****
      }

      /*
!      * If function returns set, prepare a resultinfo node for
!      * communication
       */
!     if (fcache->func.fn_retset)
!     {
!         fcinfo.resultinfo = (Node *) &rsinfo;
!         rsinfo.type = T_ReturnSetInfo;
!         rsinfo.econtext = econtext;
!         rsinfo.allowedModes = (int) (SFRM_ValuePerCall | SFRM_Materialize);
!     }
!     /* we set these fields always since we examine them below */
      rsinfo.returnMode = SFRM_ValuePerCall;
      /* isDone is filled below */
      rsinfo.setResult = NULL;
      rsinfo.setDesc = NULL;

      /*
       * Switch to short-lived context for calling the function.
--- 920,942 ----
      }

      /*
!      * prepare a resultinfo node for communication
       */
!     fcinfo.resultinfo = (Node *) &rsinfo;
!     rsinfo.type = T_ReturnSetInfo;
!     rsinfo.econtext = econtext;
!     rsinfo.queryDesc = queryDesc;
      rsinfo.returnMode = SFRM_ValuePerCall;
      /* isDone is filled below */
      rsinfo.setResult = NULL;
      rsinfo.setDesc = NULL;
+
+     /*
+      * If function returns set, we need to tell it what modes of return
+      * we will accept
+      */
+     if (fcache->func.fn_retset)
+         rsinfo.allowedModes = (int) (SFRM_ValuePerCall | SFRM_Materialize);

      /*
       * Switch to short-lived context for calling the function.
Index: src/backend/executor/nodeFunctionscan.c
===================================================================
RCS file: /opt/src/cvs/pgsql-server/src/backend/executor/nodeFunctionscan.c,v
retrieving revision 1.8
diff -c -r1.8 nodeFunctionscan.c
*** src/backend/executor/nodeFunctionscan.c    30 Aug 2002 00:28:41 -0000    1.8
--- src/backend/executor/nodeFunctionscan.c    30 Aug 2002 19:01:25 -0000
***************
*** 79,84 ****
--- 79,85 ----
          scanstate->tuplestorestate = tuplestorestate =
              ExecMakeTableFunctionResult((Expr *) scanstate->funcexpr,
                                          econtext,
+                                         scanstate->tupdesc,
                                          &funcTupdesc);

          /*
Index: src/include/executor/executor.h
===================================================================
RCS file: /opt/src/cvs/pgsql-server/src/include/executor/executor.h,v
retrieving revision 1.75
diff -c -r1.75 executor.h
*** src/include/executor/executor.h    30 Aug 2002 00:28:41 -0000    1.75
--- src/include/executor/executor.h    30 Aug 2002 19:02:02 -0000
***************
*** 82,87 ****
--- 82,88 ----
                         ExprDoneCond *isDone);
  extern Tuplestorestate *ExecMakeTableFunctionResult(Expr *funcexpr,
                                                      ExprContext *econtext,
+                                                     TupleDesc queryDesc,
                                                      TupleDesc *returnDesc);
  extern Datum ExecEvalExpr(Node *expression, ExprContext *econtext,
               bool *isNull, ExprDoneCond *isDone);
Index: src/include/nodes/execnodes.h
===================================================================
RCS file: /opt/src/cvs/pgsql-server/src/include/nodes/execnodes.h,v
retrieving revision 1.73
diff -c -r1.73 execnodes.h
*** src/include/nodes/execnodes.h    30 Aug 2002 00:28:41 -0000    1.73
--- src/include/nodes/execnodes.h    30 Aug 2002 19:29:36 -0000
***************
*** 152,157 ****
--- 152,158 ----
      SetFunctionReturnMode    returnMode;    /* actual return mode */
      ExprDoneCond isDone;        /* status for ValuePerCall mode */
      Tuplestorestate *setResult;    /* return object for Materialize mode */
+     TupleDesc    queryDesc;        /* descriptor for planned query */
      TupleDesc    setDesc;        /* descriptor for Materialize mode */
  } ReturnSetInfo;


Re: Accessing original TupleDesc from SRF

From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes:
> Attached adds:
>    + TupleDesc queryDesc; /* descriptor for planned query */
> to ReturnSetInfo, and populates ReturnSetInfo for every function call to 
>   ExecMakeTableFunctionResult, not just when fn_retset.

I thought "expectedDesc" was a more sensible choice of name, so I made
it that.  Otherwise, patch applied.

> I haven't done it yet, but I suppose this should be documented in 
> xfunc.sgml.

Actually, most of what's in src/backend/utils/fmgr/README should be
transposed into xfunc.sgml someday.
        regards, tom lane


Re: Accessing original TupleDesc from SRF

From
Joe Conway
Date:
Tom Lane wrote:
> Joe Conway <mail@joeconway.com> writes:
>>Attached adds:
>>   + TupleDesc queryDesc; /* descriptor for planned query */
>>to ReturnSetInfo, and populates ReturnSetInfo for every function call to
>>  ExecMakeTableFunctionResult, not just when fn_retset.
>
> I thought "expectedDesc" was a more sensible choice of name, so I made
> it that.  Otherwise, patch applied.


I was trying to actually use this new feature today, and ran into a
little bug in nodeFunctionscan.c that prevented it from actually working.

For anonymous composite types, ExecInitFunctionScan builds the tuple
description using:
   tupdesc = BuildDescForRelation(coldeflist);

But BuildDescForRelation leaves initializes the tupdesc like this:
   desc = CreateTemplateTupleDesc(natts, UNDEFOID);

The UNDEFOID later causes an assertion failure in heap_formtuple when
you try to use the tupdesc to build a tuple.

Attached is a very small patch to fix.

>>I haven't done it yet, but I suppose this should be documented in
>>xfunc.sgml.
> Actually, most of what's in src/backend/utils/fmgr/README should be
> transposed into xfunc.sgml someday.

After beta starts I'll work on migrating this to the sgml docs.

Joe

Index: src/backend/executor/nodeFunctionscan.c
===================================================================
RCS file: /opt/src/cvs/pgsql-server/src/backend/executor/nodeFunctionscan.c,v
retrieving revision 1.9
diff -c -r1.9 nodeFunctionscan.c
*** src/backend/executor/nodeFunctionscan.c    30 Aug 2002 23:59:46 -0000    1.9
--- src/backend/executor/nodeFunctionscan.c    31 Aug 2002 18:01:48 -0000
***************
*** 226,231 ****
--- 226,232 ----
          List *coldeflist = rte->coldeflist;

          tupdesc = BuildDescForRelation(coldeflist);
+         tupdesc->tdhasoid = WITHOUTOID;
      }
      else
          elog(ERROR, "Unknown kind of return type specified for function");

Re: Accessing original TupleDesc from SRF

From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes:
> But BuildDescForRelation leaves initializes the tupdesc like this:
>    desc = CreateTemplateTupleDesc(natts, UNDEFOID);

> The UNDEFOID later causes an assertion failure in heap_formtuple when 
> you try to use the tupdesc to build a tuple.

So far, I haven't seen any reason to think that that three-way has-OID
stuff accomplishes anything but causing trouble ... but I've applied
this patch for the moment.  I hope to get around to reviewing the
HeapTupleHeader hacks later in the weekend.
        regards, tom lane