Thread: functions returning sets
I have a set-returning function that takes a text string and returns multiple values from it. Its a small hack on the pgxml_xpath function from contrib/xml that returns all matching nodes from an xml document rather than just a single specified node. I currently use it as follows: create table xml_files (file text, doc text) ; create view xnodes as select file, pgxml_xpath(doc, '/top/node') from xml_files; select * from xnodes ; This gives me a table of all the matching nodes in all the documents. However, the documentation says that using a SRF in the select list of a query, but this capability is deprecated. I can't figure out how to call this function in the from clause with it referring to a column in a table, I get an error like ERROR: FROM function expression may not refer to other relations of same query level. Is there another way to accomplish this? Another useful way to call my function would be select file from xml_files where 'foo' in pgxml_xpath(doc,'/top/node') but that gives be a parse error. select file from xml_files where 'foo' in (select pgxml_xpath(doc,'/top/node')) parses, but it doesn't seem to give correct results. Thanks -J PS: Here's the function. This goes with pgxml.c, not pgxml_dom.c, bt could probably be modified to work there as well. PG_FUNCTION_INFO_V1(pgxml_xpath_all); Datum pgxml_xpath_all(PG_FUNCTION_ARGS) { /* called as pgxml_xpath(document,pathstr) */ /* returns set of all matching results */ XPath_Results *xpresults; text *restext; MemoryContext oldContext; FuncCallContext *funcctx; text *t = PG_GETARG_TEXT_P(0); /* document buffer */ text *t2 = PG_GETARG_TEXT_P(1); int32 ind; if (SRF_IS_FIRSTCALL()) { funcctx=SRF_FIRSTCALL_INIT(); oldContext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx); xpresults = build_xpath_results(t, t2); funcctx->user_fctx=xpresults; MemoryContextSwitchTo(oldContext); } funcctx = SRF_PERCALL_SETUP(); xpresults=funcctx->user_fctx; ind=funcctx->call_cntr; if (xpresults == NULL || ind >= xpresults->rescount) { if (xpresults != NULL) { pfree(xpresults->resbuf); pfree(xpresults); } SRF_RETURN_DONE(funcctx); } restext = (text *) palloc(xpresults->reslens[ind] + VARHDRSZ); memcpy(VARDATA(restext), xpresults->results[ind], xpresults->reslens[ind]); VARATT_SIZEP(restext) = xpresults->reslens[ind] + VARHDRSZ; SRF_RETURN_NEXT(funcctx,restext); }
Jeff Rogers <jrogers@findlaw.com> writes: > However, the documentation says that using a SRF in the select list of > a query, but this capability is deprecated. I can't figure out how to > call this function in the from clause with it referring to a column in > a table, I get an error like > ERROR: FROM function expression may not refer to other relations of same > query level. Is there another way to accomplish this? There isn't any good alternative at the moment (which is why SRFs in select lists aren't deprecated yet). There has been some discussion about implementing SQL99's LATERAL clause to support this, but it's not done yet. > select file from xml_files where 'foo' in (select pgxml_xpath(doc,'/top/node')) > parses, but it doesn't seem to give correct results. That should work as far as I know. Can you give more detail? regards, tom lane
> > select file from xml_files where 'foo' in (select pgxml_xpath(doc,'/top/nod > e')) > > parses, but it doesn't seem to give correct results. > > That should work as far as I know. Can you give more detail? > > regards, tom lane Cut & paste from a window: xml=# create table foo (id text, doc text) ; CREATE TABLE xml=# insert into foo values (1,'<top><node>a</node><node>b</node></top>'); INSERT 22106552 1 xml=# insert into foo values (2,'<top><node>a</node><node>b</node></top>'); INSERT 22106553 1 xml=# insert into foo values (3,'<top><node>a</node><node>b</node></top>'); INSERT 22106554 1 xml=# select id,pgxml_xpath(doc,'/top/node') from foo ; id | pgxml_xpath ----+------------- 1 | a 1 | b 2 | a 2 | b 3 | a 3 | b (6 rows) xml=# select id from foo where 'a' in (select pgxml_xpath(doc,'/top/node')); id ---- 1 3 (2 rows) I expect this latter to return all the ids, but it seems to only return every other one. -J
Jeff Rogers <jrogers@findlaw.com> writes: > I expect this latter to return all the ids, but it seems to only return every > other one. Hmm, this looks like a bug having to do with not resetting state correctly for the next invocation of the SRF. Hard to tell whether the bug is in your code or the system's support for SRFs though. Joe, did you see http://archives.postgresql.org/pgsql-general/2003-12/msg00709.php ? I don't have time to look at this right now ... regards, tom lane
Tom Lane wrote: > Jeff Rogers <jrogers@findlaw.com> writes: >>I expect this latter to return all the ids, but it seems to only return every >>other one. > > Hmm, this looks like a bug having to do with not resetting state > correctly for the next invocation of the SRF. Hard to tell whether > the bug is in your code or the system's support for SRFs though. > > Joe, did you see > http://archives.postgresql.org/pgsql-general/2003-12/msg00709.php > ? I don't have time to look at this right now ... I'll try to take a look in the next day or so -- hopefully tonight. Joe
Tom Lane wrote: > Jeff Rogers <jrogers@findlaw.com> writes: >>I expect this latter to return all the ids, but it seems to only return every >>other one. > > Hmm, this looks like a bug having to do with not resetting state > correctly for the next invocation of the SRF. Hard to tell whether > the bug is in your code or the system's support for SRFs though. > > Joe, did you see > http://archives.postgresql.org/pgsql-general/2003-12/msg00709.php > ? I don't have time to look at this right now ... I've not gotten to the bottom of this, but I can confirm the issue. The way I would expect this query to be written does work: regression=# select id from foo where 'a' in (select * from pgxml_xpath(doc,'/top/node')); id ---- 1 2 3 (3 rows) So the problem seems to be related specifically to "SRF returning setof scalar in targetlist of subselect". Maybe some difference between ExecMakeFunctionResult and ExecMakeTableFunctionResult? SRFs returning complex types are not allowed in targetlists, but in this case there is no restriction because the return type is scalar. I'll keep digging as time allows. Joe
Tom Lane wrote: > Hmm, this looks like a bug having to do with not resetting state > correctly for the next invocation of the SRF. Hard to tell whether > the bug is in your code or the system's support for SRFs though. Looks like a bug in ExecScanSubPlan() around line 401: 8<----------------------------------------------------- if (subLinkType == ANY_SUBLINK) { /* combine across rows per OR semantics */ if (rownull) *isNull = true; else if (DatumGetBool(rowresult)) { result = BoolGetDatum(true); *isNull = false; break; /* needn't look at any more rows */ } } 8<----------------------------------------------------- If the function returns a set, that comment is wrong, isn't it? I removed the "break" to run out all the available tuples and got the correct result: regression=# select id from foo where 'a' in (select pgxml_xpath(doc,'/top/node')); id ---- 1 2 3 (3 rows) Any guidance on the preferred fix? I don't see an obvious way to determine if the plan includes a set returning function in ExecScanSubPlan(), and it seems unfortunate to remove the optimization for all other cases, especially since targetlist SRFs are deprecated. Joe
Joe Conway <mail@joeconway.com> writes: > Any guidance on the preferred fix? We cannot fix this by changing ExecScanSubPlan as you suggest. That would amount to saying that all plans have to be run to completion, which destroys LIMIT to name just one unpleasant consequence. There is a mechanism available for functions to arrange to get cleanup callbacks when a containing plan is shut down early --- see RegisterExprContextCallback and ShutdownExprContext. It looks like this is not used by the existing SRF support, but I suspect it should be. [ scratches head ... ] Right at the moment I don't see where ShutdownExprContext gets called during a ReScan operation. I'm quite sure it once was ... there may be another bug here ... regards, tom lane
Tom Lane wrote: > Joe Conway <mail@joeconway.com> writes: >>Any guidance on the preferred fix? > > We cannot fix this by changing ExecScanSubPlan as you suggest. > That would amount to saying that all plans have to be run to completion, > which destroys LIMIT to name just one unpleasant consequence. I suspected as much. > There is a mechanism available for functions to arrange to get cleanup > callbacks when a containing plan is shut down early --- see > RegisterExprContextCallback and ShutdownExprContext. It looks like this > is not used by the existing SRF support, but I suspect it should be. OK -- I'll take a look. > [ scratches head ... ] Right at the moment I don't see where > ShutdownExprContext gets called during a ReScan operation. I'm quite > sure it once was ... there may be another bug here ... > Thanks, I'll keep that in mind too. Joe
> Tom Lane wrote: >> [ scratches head ... ] Right at the moment I don't see where >> ShutdownExprContext gets called during a ReScan operation. I'm quite >> sure it once was ... there may be another bug here ... After further looking I've realized that memory is misserving me here; having ReScan call ShutdownExprContext was not something that ever got done. Instead I have an entry on my personal todo list that says : Need to invent ExprContextRescan and call it at appropriate places. : Knowing where all the econtexts are seems like the hard part ... though : maybe we only care about econtexts that might contain set-returning : functions, which might limit it to the targetlist... A perfectly clean solution would require being careful to reset *all* econtexts, which might be thought rather a lot of work to support a feature that's eventually going to be deprecated anyway (viz, SRFs outside of FROM). I'll see about the tlist-only case though. regards, tom lane
I said: > : Need to invent ExprContextRescan and call it at appropriate places. > A perfectly clean solution would require being careful to reset *all* > econtexts, which might be thought rather a lot of work to support a > feature that's eventually going to be deprecated anyway (viz, SRFs > outside of FROM). I'll see about the tlist-only case though. Actually, some study shows that we *only* support set-returning functions in ExecProject(), which is only used with the ps_ExprContext econtext of plan nodes, which makes it pretty easy to ensure that ReScan gets everything it needs to. I have committed a patch into 7.4 and HEAD branches that ensures shutdown callback functions are called during ReScan. I've tested that the problem is fixed for SQL-language functions; for example in the regression database do create function foo(int) returns setof int as 'select unique1 from tenk1' language sql; and compare the output of select foo(1) limit 10; select unique1 from tenk1 where unique1 in (select foo(1) limit 10); select unique1 from tenk1 where unique1 in (select foo(ten) limit 10); These should be the same (up to ordering) but are not without the patch. We still need a code addition that uses an ExprState shutdown callback to reset the state of an SRF that uses the SRF_XXX macros. Joe, have you had time to give that any thought? regards, tom lane
Tom Lane wrote: > We still need a code addition that uses an ExprState shutdown callback > to reset the state of an SRF that uses the SRF_XXX macros. Joe, have > you had time to give that any thought? > Yeah, I've gotten something started, but couldn't do much without the shutdown getting called. I hadn't yet figured out the best way to make that happen, so I'm glad you did ;-). I'll update to cvs tip and try Jeff's function with my changes and your committed changes. Will get back shortly with the results. Joe
Joe Conway wrote: > Tom Lane wrote: >> We still need a code addition that uses an ExprState shutdown callback >> to reset the state of an SRF that uses the SRF_XXX macros. Joe, have >> you had time to give that any thought? > > Yeah, I've gotten something started, but couldn't do much without the > shutdown getting called. I hadn't yet figured out the best way to make > that happen, so I'm glad you did ;-). I'll update to cvs tip and try > Jeff's function with my changes and your committed changes. Will get > back shortly with the results. OK, updated to cvs tip, and with the attached patch applied, all seems well: regression=# select id from foo where 'a' in (select pgxml_xpath(doc,'/top/node')); id ---- 1 2 3 (3 rows) Any comment on the patch? BTW, I am seeing: [...] test portals_p2 ... ok test rules ... FAILED test foreign_key ... ok [...] but it seems unrelated to this change -- caused by a redefinition of the pg_stats view. I guess I need to initdb. Thanks, Joe Index: src/backend/utils/fmgr/funcapi.c =================================================================== RCS file: /opt/src/cvs/pgsql-server/src/backend/utils/fmgr/funcapi.c,v retrieving revision 1.12 diff -c -r1.12 funcapi.c *** src/backend/utils/fmgr/funcapi.c 29 Nov 2003 19:52:01 -0000 1.12 --- src/backend/utils/fmgr/funcapi.c 18 Dec 2003 01:51:37 -0000 *************** *** 17,23 **** #include "catalog/pg_type.h" #include "utils/syscache.h" - /* * init_MultiFuncCall * Create an empty FuncCallContext data structure --- 17,22 ---- *************** *** 58,63 **** --- 57,64 ---- retval->user_fctx = NULL; retval->attinmeta = NULL; retval->multi_call_memory_ctx = fcinfo->flinfo->fn_mcxt; + retval->flinfo = fcinfo->flinfo; + retval->shutdown_reg = false; /* * save the pointer for cross-call use *************** *** 106,115 **** * Clean up after init_MultiFuncCall */ void ! end_MultiFuncCall(PG_FUNCTION_ARGS, FuncCallContext *funcctx) { /* unbind from fcinfo */ ! fcinfo->flinfo->fn_extra = NULL; /* * Caller is responsible to free up memory for individual struct --- 107,119 ---- * Clean up after init_MultiFuncCall */ void ! end_MultiFuncCall(Datum arg) { + FuncCallContext *funcctx = (FuncCallContext *) DatumGetPointer(arg); + FmgrInfo *flinfo = funcctx->flinfo; + /* unbind from fcinfo */ ! flinfo->fn_extra = NULL; /* * Caller is responsible to free up memory for individual struct Index: src/include/funcapi.h =================================================================== RCS file: /opt/src/cvs/pgsql-server/src/include/funcapi.h,v retrieving revision 1.10 diff -c -r1.10 funcapi.h *** src/include/funcapi.h 29 Nov 2003 22:40:53 -0000 1.10 --- src/include/funcapi.h 18 Dec 2003 20:17:07 -0000 *************** *** 111,116 **** --- 111,126 ---- */ MemoryContext multi_call_memory_ctx; + /* + * pointer to FmgrInfo needed for shutdown + */ + FmgrInfo *flinfo; + + /* + * true if registered shutdown callback + */ + bool shutdown_reg; + } FuncCallContext; /*---------- *************** *** 203,209 **** /* from funcapi.c */ extern FuncCallContext *init_MultiFuncCall(PG_FUNCTION_ARGS); extern FuncCallContext *per_MultiFuncCall(PG_FUNCTION_ARGS); ! extern void end_MultiFuncCall(PG_FUNCTION_ARGS, FuncCallContext *funcctx); #define SRF_IS_FIRSTCALL() (fcinfo->flinfo->fn_extra == NULL) --- 213,219 ---- /* from funcapi.c */ extern FuncCallContext *init_MultiFuncCall(PG_FUNCTION_ARGS); extern FuncCallContext *per_MultiFuncCall(PG_FUNCTION_ARGS); ! extern void end_MultiFuncCall(Datum arg); #define SRF_IS_FIRSTCALL() (fcinfo->flinfo->fn_extra == NULL) *************** *** 217,231 **** (_funcctx)->call_cntr++; \ rsi = (ReturnSetInfo *) fcinfo->resultinfo; \ rsi->isDone = ExprMultipleResult; \ PG_RETURN_DATUM(_result); \ } while (0) #define SRF_RETURN_DONE(_funcctx) \ do { \ ReturnSetInfo *rsi; \ ! end_MultiFuncCall(fcinfo, _funcctx); \ rsi = (ReturnSetInfo *) fcinfo->resultinfo; \ rsi->isDone = ExprEndResult; \ PG_RETURN_NULL(); \ } while (0) --- 227,252 ---- (_funcctx)->call_cntr++; \ rsi = (ReturnSetInfo *) fcinfo->resultinfo; \ rsi->isDone = ExprMultipleResult; \ + if (!_funcctx->shutdown_reg) \ + { \ + RegisterExprContextCallback(rsi->econtext, \ + end_MultiFuncCall, \ + PointerGetDatum(_funcctx)); \ + _funcctx->shutdown_reg = true; \ + } \ PG_RETURN_DATUM(_result); \ } while (0) #define SRF_RETURN_DONE(_funcctx) \ do { \ ReturnSetInfo *rsi; \ ! end_MultiFuncCall(PointerGetDatum(_funcctx)); \ rsi = (ReturnSetInfo *) fcinfo->resultinfo; \ rsi->isDone = ExprEndResult; \ + UnregisterExprContextCallback(rsi->econtext, \ + end_MultiFuncCall, \ + PointerGetDatum(_funcctx)); \ + _funcctx->shutdown_reg = false; \ PG_RETURN_NULL(); \ } while (0)
Joe Conway <mail@joeconway.com> writes: > Any comment on the patch? It seems like a bad idea to change the contents of the SRF_XXX() macros for 7.4.1; if we do that, any existing already-compiled-for-7.4 user SRFs will be broken, and there's no easy way to catch the problem. We don't normally require people to recompile user-defined functions for dot releases anyway. I would suggest leaving end_MultiFuncCall() with its existing API, and adding a separate shutdown callback function that is registered during init_MultiFuncCall and deregistered by end_MultiFuncCall. (This should be workable without API change since init_MultiFuncCall can get to the econtext via fcinfo->resultinfo.) You may not even need to add any fields to FuncCallContext --- consider passing the fcinfo pointer to the callback, rather than passing the FuncCallContext pointer. regards, tom lane
> You may not even need to add any fields to FuncCallContext --- consider > passing the fcinfo pointer to the callback, rather than passing the > FuncCallContext pointer. Dept. of second thoughts: better pass the flinfo pointer, instead. fcinfo might point to temporary space on the stack. regards, tom lane
Tom Lane wrote: >>You may not even need to add any fields to FuncCallContext --- consider >>passing the fcinfo pointer to the callback, rather than passing the >>FuncCallContext pointer. > > Dept. of second thoughts: better pass the flinfo pointer, instead. > fcinfo might point to temporary space on the stack. OK -- this one is a good bit simpler. Any more comments? Joe Index: src/backend/utils/fmgr/funcapi.c =================================================================== RCS file: /opt/src/cvs/pgsql-server/src/backend/utils/fmgr/funcapi.c,v retrieving revision 1.12 diff -c -r1.12 funcapi.c *** src/backend/utils/fmgr/funcapi.c 29 Nov 2003 19:52:01 -0000 1.12 --- src/backend/utils/fmgr/funcapi.c 19 Dec 2003 00:01:46 -0000 *************** *** 17,22 **** --- 17,23 ---- #include "catalog/pg_type.h" #include "utils/syscache.h" + static void shutdown_MultiFuncCall(Datum arg); /* * init_MultiFuncCall *************** *** 41,47 **** { /* * First call ! * * Allocate suitably long-lived space and zero it */ retval = (FuncCallContext *) --- 42,51 ---- { /* * First call ! */ ! ReturnSetInfo *rsi = (ReturnSetInfo *) fcinfo->resultinfo; ! ! /* * Allocate suitably long-lived space and zero it */ retval = (FuncCallContext *) *************** *** 63,68 **** --- 67,80 ---- * save the pointer for cross-call use */ fcinfo->flinfo->fn_extra = retval; + + /* + * Ensure we will get shut down cleanly if the exprcontext is not + * run to completion. + */ + RegisterExprContextCallback(rsi->econtext, + shutdown_MultiFuncCall, + PointerGetDatum(fcinfo->flinfo)); } else { *************** *** 108,115 **** void end_MultiFuncCall(PG_FUNCTION_ARGS, FuncCallContext *funcctx) { ! /* unbind from fcinfo */ ! fcinfo->flinfo->fn_extra = NULL; /* * Caller is responsible to free up memory for individual struct --- 120,148 ---- void end_MultiFuncCall(PG_FUNCTION_ARGS, FuncCallContext *funcctx) { ! ReturnSetInfo *rsi = (ReturnSetInfo *) fcinfo->resultinfo; ! ! /* Deregister the shutdown callback */ ! UnregisterExprContextCallback(rsi->econtext, ! shutdown_MultiFuncCall, ! PointerGetDatum(fcinfo->flinfo)); ! ! /* But use it to do the real work */ ! shutdown_MultiFuncCall(PointerGetDatum(fcinfo->flinfo)); ! } ! ! /* ! * shutdown_MultiFuncCall ! * Shutdown function to clean up after init_MultiFuncCall ! */ ! static void ! shutdown_MultiFuncCall(Datum arg) ! { ! FmgrInfo *flinfo = (FmgrInfo *) DatumGetPointer(arg); ! FuncCallContext *funcctx = (FuncCallContext *) flinfo->fn_extra; ! ! /* unbind from flinfo */ ! flinfo->fn_extra = NULL; /* * Caller is responsible to free up memory for individual struct
Joe Conway <mail@joeconway.com> writes: > OK -- this one is a good bit simpler. Any more comments? Looks good to me. Please apply to 7.4 and HEAD. regards, tom lane