Thread: Function-manager redesign: second draft (long)
This is a followup to a message I wrote in June about reworking the fmgr interface. I've had a couple of good ideas (I think ;-)) since then, but there are some parts of the proposal that still need work before implementation can begin. I could particularly use some feedback from Jan and anyone else who's worked with function-call handlers: does this design eliminate the kluges that you've had to use in the past? If not, what else is needed? regards, tom lane Proposal for function-manager redesign -------------------------------------- We know that the existing mechanism for calling Postgres functions needs to be redesigned. It has portability problems because it makes assumptions about parameter passing that violate ANSI C; it fails to handle NULL arguments and results cleanly; and "function handlers" that support a class of functions (such as fmgr_pl) can only be done via a really ugly, non-reentrant kluge. (Global variable set during every function call, forsooth.) Here is a proposal for fixing these problems. In the past, the major objections to redoing the function-manager interface have been (a) it'll be quite tedious to implement, since every built-in function and everyplace that calls such functions will need to be touched; (b) such wide-ranging changes will be difficult to make in parallel with other development work; (c) it will break existing user-written loadable modules that define "C language" functions. While I have no solution to the "tedium" aspect, I believe I see an answer to the other problems: by use of function handlers, we can support both old and new interfaces in parallel for both callers and callees, at some small efficiency cost for the old styles. That way, most of the changes can be done on an incremental file-by-file basis --- we won't need a "big bang" where everything changes at once. Support for callees written in the old style can be left in place indefinitely, to provide backward compatibility for user-written C functions. The new function-manager interface ---------------------------------- The core of the new design is revised data structures for representing the result of a function lookup and for representing the parameters passed to a specific function invocation. (We want to keep function lookup separate, since many parts of the system apply the same function over and over; the lookup overhead should be paid once per query, not once per tuple.) When a function is looked up in pg_proc, the result is represented as typedef struct { PGFunction fn_addr; /* pointer to function or handler to be called */ Oid fn_oid; /* OID of function(NOT of handler, if any) */ int fn_nargs; /* 0..MAXFMGRARGS, or -1 if variable arg count */ void *fn_extra; /* extra space for use by handler */ } FunctionLookupInfoData; typedef FunctionLookupInfoData* FunctionLookupInfo; For an ordinary built-in function, fn_addr is just the address of the C routine that implements the function. Otherwise it is the address of a handler for the class of functions that includes the target function. The handler can use the function OID and perhaps also the fn_extra slot to find the specific code to execute. (fn_oid = InvalidOid can be used to denote a not-yet-initialized FunctionLookupInfoData struct. fn_extra will always be NULL when a FunctionLookupInfoData is first filled by the function lookup code, but a function handler could set it to avoid making repeated lookups of its own when the same FunctionLookupInfoData is used repeatedly during a query.) fn_nargs is the number of arguments expected by the function. FunctionLookupInfo replaces the present FmgrInfo structure (but I'm giving it a new name so that the old struct definition can continue to exist during the transition phase). During a call of a function, the following data structure is created and passed to the function: typedef struct { FunctionLookupInfo flinfo; /* ptr to lookup info used for this call */ bool isnull; /* input/outputflag, see below */ int nargs; /* # arguments actually passed */ Datum arg[MAXFMGRARGS]; /* Arguments passed to function */ bool argnull[MAXFMGRARGS]; /* T if arg[i] is actually NULL*/ } FunctionCallInfoData; typedef FunctionCallInfoData* FunctionCallInfo; Note that all the arguments passed to a function (as well as its result value) will now uniformly be of type Datum. As discussed below, callers and callees should apply the standard Datum-to-and-from-whatever macros to convert to the actual argument types of a particular function. The value in arg[i] is unspecified when argnull[i] is true. It is generally the responsibility of the caller to ensure that the number of arguments passed matches what the callee is expecting: except for callees that take a variable number of arguments, the callee will typically ignore the nargs field and just grab values from arg[]. The meaning of the struct elements should be pretty obvious with the exception of isnull. isnull must be set by the caller to the logical OR of the argnull[i] flags --- ie, isnull is true if any argument is NULL. (Of course, isnull is false if nargs == 0.) On return from the function, isnull is the null flag for the function result: if it is true the function's result is NULL, regardless of the actual function return value. Overlapping the input and output flags in this way provides a simple, convenient, fast implementation for the most common case of a "strict" function (whose result is NULL if any input is NULL): if (finfo->isnull) return (Datum) 0; /* specific value doesn't matter */ ... else do normal calculation ignoring argnull[] ... Non-strict functions can easily be implemented; they just need to check the individual argnull[] flags and set the appropriate isnull value before returning. FunctionCallInfo replaces FmgrValues plus a bunch of ad-hoc parameter conventions. Callees, whether they be individual functions or function handlers, shall always have this signature: Datum function (FunctionCallInfo finfo); which is represented by the typedef typedef Datum (*PGFunction) (FunctionCallInfo finfo); The function is responsible for setting finfo->isnull appropriately as well as returning a result represented as a Datum. Note that since all callees will now have exactly the same signature, and will be called through a function pointer declared with exactly that signature, we should have no portability or optimization problems. When the function's result type is pass-by-reference, the result value must always be stored in freshly-palloc'd space (it can't be a constant or a copy of an input pointer). This rule will eventually allow automatic reclamation of storage space during expression evaluation. Function coding conventions --------------------------- As an example, int4 addition goes from old-style int32 int4pl(int32 arg1, int32 arg2) { return arg1 + arg2; } to new-style Datum int4pl(FunctionCallInfo finfo) { if (finfo->isnull) return (Datum) 0; /* value doesn't really matter ... */ /* we can ignore flinfo, nargsand argnull */ return Int32GetDatum(DatumGetInt32(finfo->arg[0]) + DatumGetInt32(finfo->arg[1])); } This is, of course, much uglier than the old-style code, but we can improve matters with some well-chosen macros for the boilerplate parts. What we actually end up writing might look something like Datum int4pl(PG_FUNCTION_ARGS) { PG_STRICT_FUNCTION; /* encapsulates null check */ { PG_ARG1_INT32; PG_ARG2_INT32; PG_RESULT_INT32( arg1 + arg2 ); } } where the macros expand to things like "int32 arg1 = DatumGetInt32(finfo->arg[0])" and "return Int32GetDatum( x )". I don't yet have a detailed proposal for convenience macros for function authors, but I think it'd be well worth while to define some. For the standard pass-by-reference types (int8, float4, float8) these macros should also hide the indirection and space allocation involved, so that the function's code is not explicitly aware that the types are pass-by-ref. This will allow future conversion of these types to pass-by-value on machines where it's feasible to do that. (For example, on an Alpha it's pretty silly to make int8 be pass-by-ref, since Datum is going to be 64 bits anyway.) Call-site coding conventions ---------------------------- There are many places in the system that call either a specific function (for example, the parser invokes "textin" by name in places) or a particular group of functions that have a common argument list (for example, the optimizer invokes selectivity estimation functions with a fixed argument list). These places will need to change, but we should try to avoid making them significantly uglier than before. Places that invoke an arbitrary function with an arbitrary argument list can simply be changed to fill a FunctionCallInfoData structure directly; that'll be no worse and possibly cleaner than what they do now. When invoking a specific built-in function by name, we have generally just written something likeresult = textin ( ... args ... ) which will not work after textin() is converted to the new call style. I suggest that code like this be converted to use "helper" functions that will create and fill in a FunctionCallInfoData struct. For example, if textin is being called with one argument, it'd look something likeresult = DirectFunctionCall1(textin, PointerGetDatum(argument)); These helper routines will have declarations likeDatum DirectFunctionCall2(PGFunction func, Datum arg1, Datum arg2); Note it will be the caller's responsibility to convert to and from Datum; appropriate conversion macros should be used. The DirectFunctionCallN routines will not bother to fill in finfo->flinfo (indeed cannot, since they have no idea about an OID for the target function); they will just set it NULL. This is unlikely to bother any built-in function that could be called this way. Note also that this style of coding cannot check for a NULL result (it couldn't before, either!). We could reasonably make the helper routines elog an error if they see that the function returns a NULL. (Note: direct calls like this will have to be changed at the same time that the called routines are changed to the new style. But that will still be a lot less of a constraint than a "big bang" conversion.) When invoking a function that has a known argument signature, we have usually written eitherresult = fmgr(targetfuncOid, ... args ... ); orresult = fmgr_ptr(FmgrInfo *finfo, ... args ... ); depending on whether an FmgrInfo lookup has been done yet or not. This kind of code can be recast using helper routines, in the same style as above:result = OidFunctionCall1(funcOid, PointerGetDatum(argument));result = FunctionCall2(funcCallInfo, PointerGetDatum(argument), Int32GetDatum(argument)); Again, this style of coding does not recognize the possibility of a null result. We could provide variant helper routines that allow a null return rather than raising an error, which could be called in a style likeif (FunctionCall1IsNull(&result, funcCallInfo, PointerGetDatum(argument))){ ... copewith null result ...}else{ ... OK, use 'result' here ...} But I'm unsure that there are enough places in the system that need this to justify the extra set of helpers. If there are only a few places that need a non-error response to a null result, they could just be changed to fill and examine a FunctionCallInfoData structure directly. As with the callee-side situation, I am strongly inclined to add argument conversion macros that hide the pass-by-reference nature of int8, float4, and float8, with an eye to making those types relatively painless to convert to pass-by-value. The existing helper functions fmgr(), fmgr_c(), etc will be left in place until all uses of them are gone. Of course their internals will have to change in the first step of implementation, but they can continue to support the same external appearance. Notes about function handlers ----------------------------- Handlers for classes of functions should find life much easier and cleaner in this design. The OID of the called function is directly reachable from the passed parameters; we don't need the global variable fmgr_pl_finfo anymore. Also, by modifying finfo->flinfo->fn_extra, the handler can cache lookup info to avoid repeat lookups when the same function is invoked many times. (fn_extra can only be used as a hint, since callers are not required to re-use a FunctionLookupInfo struct. But in performance-critical paths they normally will do so.) I observe that at least one other global variable, CurrentTriggerData, is being used as part of the call convention for some function handlers. That's just as grotty as fmgr_pl_finfo, so I'd like to get rid of it. Any comments on the cleanest way to do so? Are there any other things needed by the call handlers for PL/pgsql and other languages? During the conversion process, support for old-style builtin functions and old-style user-written C functions will be provided by appropriate function handlers. For example, the handler for old-style builtins will look roughly like fmgr_c() does now. System table updates -------------------- In the initial phase, pg_language type 11 ("builtin") will be renamed to "old_builtin", and a new language type named "builtin" will be created with a new OID. Then pg_proc entries will be changed from language code 11 to the new code piecemeal, as the associated routines are rewritten. (This will imply several rounds of forced initdbs as the contents of pg_proc change. It would be a good idea to add a "catalog contents version number" to the database version info checked at startup before we begin this process.) The existing pg_language entry for "C" functions will continue to describe user functions coded in the old style, and we will need to add a new language name for user functions coded in the new style. (Any suggestions for what the new name should be?) We should deprecate old-style functions because of their portability problems, but the support for them will only be one small function handler routine, so we can leave them in place for as long as necessary. The expected calling convention for PL call handlers will need to change all-at-once, but fortunately there are not very many of them to fix.
> This is a followup to a message I wrote in June about reworking the fmgr > interface. I've had a couple of good ideas (I think ;-)) since then, > but there are some parts of the proposal that still need work before > implementation can begin. > > I could particularly use some feedback from Jan and anyone else who's > worked with function-call handlers: does this design eliminate the > kluges that you've had to use in the past? If not, what else is needed? Sounds good. My only question is whether people need backward compatibility, and whether we can remove the compatiblity part of the interface and small overhead after 7.1 or later? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> > > This is a followup to a message I wrote in June about reworking the fmgr > > interface. I've had a couple of good ideas (I think ;-)) since then, > > but there are some parts of the proposal that still need work before > > implementation can begin. > > > > I could particularly use some feedback from Jan and anyone else who's > > worked with function-call handlers: does this design eliminate the > > kluges that you've had to use in the past? If not, what else is needed? > > Sounds good. My only question is whether people need backward > compatibility, and whether we can remove the compatiblity part of the > interface and small overhead after 7.1 or later? Backward compatibility is a common source of problems, and I don't like it generally. In the case of the fmgr it is quite a little difficult, and I thought about it too already. I like the current interface for it's simpleness from the user function developers point of view. And converting ALL internal plus most of the contrib directories ones to something new is really a huge project. All function calls through the fmgr use the FmgrInfo structure, but there are alot of calls to internal functions like textout() etc. too. Changing their interface would IMHO introduce many problems. And there are only a few internal functions where a new fmgr interface really is required due to incomplete NULL handling or the like. Therefore I would prefer an interface extension, that doesn't require changes to existing functions. What about adding a proifversion to pg_proc, telling the fmgr which call interface the function uses? This is then held in the FmgrInfo struct too so the fmgr can call a function using the old and new interface. First fmgr_info() is extended to put the interface version into the FmgrInfo too. Then fmgr_faddr() is renamed to fmgr_faddr_v1() and it has to check that only functions using the old interface are called through it (there aren't that many calls to it as you might think). After that you have all the time in the world to implement another interface and add a switch into fmgr() and sisters handling the different versions. My thoughts for the new interface: o Each function argument must have it's separate NULL flag. o The functions result must have another NULL flag too. o Argument names and default values for omitted ones aren't IMHO something to go into the interface itself. The function is allways called with all arguments positional, the parser must provide this list. o The new interface must at least be designed for a later implementation of tuple set returns. I think this must be implemented as a temp table, collecting all return tuples since for a procedural language it might be impossible to implement a real RETURN AND RESUME (like it is already for PL/Tcl and it would be for PL/Perl). Therefore another STRUCT kind of relation must be added too, providing the tupdesc for the returned set. This temp table, filled by calling the function at the first time a tuple is needed and then it is simply another RTE. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
> System table updates > -------------------- > > In the initial phase, pg_language type 11 ("builtin") will be renamed > to "old_builtin", and a new language type named "builtin" will be > created with a new OID. Then pg_proc entries will be changed from > language code 11 to the new code piecemeal, as the associated routines > are rewritten. (This will imply several rounds of forced initdbs as > the contents of pg_proc change. It would be a good idea to add a > "catalog contents version number" to the database version info checked > at startup before we begin this process.) > > The existing pg_language entry for "C" functions will continue to > describe user functions coded in the old style, and we will need to add > a new language name for user functions coded in the new style. (Any > suggestions for what the new name should be?) We should deprecate > old-style functions because of their portability problems, but the > support for them will only be one small function handler routine, > so we can leave them in place for as long as necessary. > > The expected calling convention for PL call handlers will need to change > all-at-once, but fortunately there are not very many of them to fix. This approach nearly matches all my thoughts about the redesign of the fmgr. In the system table section I miss named arguments. I think we need a new system table pg_proargs ( Oid pargprooid, int2 pargno, name pargname, bool pargdflnull, text pargdefault ); plus another flag in pg_proc that tells if this function prototype information is available. The parser then has to handle function calls like ... myfunc(userid = 123, username = 'hugo'); and build a complete function argument list that has all the arguments in the correct order and defaults for omitted arguments filled in as const nodes. This new prototype information than must also be used in the PL handlers to choose the given names for arguments. In addition, we could add an argument type at this time (INPUT, OUTPUT and INOUT) but only support INPUT ones for now from the SQL level. PL functions calling other functions could eventually use these argument types in the future. Also I miss the interface for tuple set returns. I know that this requires much more in other sections than only the fmgr, but we need to cover it now or we'll not be able to do it without another change to the fmgr interface at the time we want to support real stored procedures. As said in another mail, I think it must be done via some temp table since most interpreter languages will not be able to do RETURN AND RESUME in any other way - not even PL/pgSQL will be easy and would need a total internal redesign of the bytecode interpreter since it otherwise needs to recover the internal call stack maybe across recursive calls. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
Bruce Momjian <maillist@candle.pha.pa.us> writes: > Sounds good. My only question is whether people need backward > compatibility, and whether we can remove the compatiblity part of the > interface and small overhead after 7.1 or later? I think we could drop it after a decent interval, but I don't see any reason to be in a hurry. I do think that we'll get complaints if 7.0 doesn't have any backward compatibility for existing user functions. regards, tom lane
[ responding to both of Jan's messages in one ] wieck@debis.com (Jan Wieck) writes: > I like the current interface for it's simpleness from the user > function developers point of view. There is that; even with a good set of convenience macros there will be more to learn about writing user functions. OTOH, the way it is now is not exactly transparent --- in particular, NULL handling is so easy to forget about/get wrong. We can make that much easier. We can also simplify the interface noticeably for standard types like float4/float8. BTW: I am not in nearly as big a hurry as Bruce is to rip out support for the current user-function interface. I want to get rid of old-style builtin functions before 7.0 because of the portability issue (see below). But if a particular user is using old-style user functions and isn't having portability problems on his machine, there's no need to force him to convert, it seems to me. > And converting ALL > internal plus most of the contrib directories ones to > something new is really a huge project. It is. I was hoping to get some help ;-) ... but I will do it myself if I have to. > ... And there are only a few internal > functions where a new fmgr interface really is required due > to incomplete NULL handling or the like. If your goal is *only* to deal with the NULL issue, or *only* to get things working on a specific platform like Alpha, then yes we could patch in a few dozen places and not undertake a complete changeover. But I believe that we really need to fix this right, because it will keep coming back to haunt us until we do. I will not be happy as long as we have ports that have to compile "-O0" because of fmgr brain-damage. I fear there are going to be more and more such ports as people install smarter compilers that assume they are working with ANSI-compliant source code. We're giving up performance system-wide because of fmgr. > Therefore I would prefer an interface extension, that doesn't > require changes to existing functions. What about adding a > proifversion to pg_proc, telling the fmgr which call > interface the function uses? I was going to let the prolang column tell that, by having different language codes for old vs. new builtin function and old vs. new dynamic-linked C function. We could add a version column instead, but that seems like unnecessary complication. > First fmgr_info() is extended to put the interface version > into the FmgrInfo too. Then fmgr_faddr() is renamed to > fmgr_faddr_v1() and it has to check that only functions using > the old interface are called through it (there aren't that > many calls to it as you might think). I think it should be possible to make fmgr_faddr call both old and new functions --- I haven't actually written code yet, but I think I can do it. You're right that that's important in order to spread out the repair work instead of having a "big bang". > This approach nearly matches all my thoughts about the > redesign of the fmgr. In the system table section I miss > named arguments. As you said in your earlier message, that is a parser-level feature that has nothing to do with what happens at the fmgr level. I don't want to worry about it for this revision. > Also I miss the interface for tuple set returns. I know that > this requires much more in other sections than only the fmgr, > but we need to cover it now or we'll not be able to do it > without another change to the fmgr interface at the time we > want to support real stored procedures. OK, I'm willing to worry about that. But what, exactly, needs to be provided in the fmgr interface? regards, tom lane
> > [ responding to both of Jan's messages in one ] > > wieck@debis.com (Jan Wieck) writes: > > I like the current interface for it's simpleness from the user > > function developers point of view. > > There is that; even with a good set of convenience macros there will be > more to learn about writing user functions. OTOH, the way it is now > is not exactly transparent --- in particular, NULL handling is so easy > to forget about/get wrong. We can make that much easier. We can also > simplify the interface noticeably for standard types like float4/float8. > > BTW: I am not in nearly as big a hurry as Bruce is to rip out support > for the current user-function interface. I want to get rid of old-style > builtin functions before 7.0 because of the portability issue (see > below). But if a particular user is using old-style user functions > and isn't having portability problems on his machine, there's no need > to force him to convert, it seems to me. Personally, I could live with dropping the entire old interface. That's not the problem. But at least Bruce and his book need to know the final programming conventions if we ought to change it at all, so it can be covered in his manuscript when it is sent down to the paper. > I was going to let the prolang column tell that, by having different > language codes for old vs. new builtin function and old vs. new > dynamic-linked C function. We could add a version column instead, > but that seems like unnecessary complication. Right - language is all needed to tell. > > This approach nearly matches all my thoughts about the > > redesign of the fmgr. In the system table section I miss > > named arguments. > > As you said in your earlier message, that is a parser-level feature > that has nothing to do with what happens at the fmgr level. I don't > want to worry about it for this revision. Right too. I just hoped to expand the scope of this change so there would only ONE change to the PL handlers covering both, new interface with proper NULL handling and enhanced function prototypes. > > > Also I miss the interface for tuple set returns. I know that > > this requires much more in other sections than only the fmgr, > > but we need to cover it now or we'll not be able to do it > > without another change to the fmgr interface at the time we > > want to support real stored procedures. > > OK, I'm willing to worry about that. But what, exactly, needs to > be provided in the fmgr interface? First we need another relation type in pg_class. It's like a table or view, but none of the NORMAL SQL statements can be used with it (e.g. INSERT, SELECT, ...). It just describes a row structure. Then, a function returning a SET of any row type (this time, regular relations or views too) can be used in the rangetable as SELECT A.col1, B.col2 FROM mysetfunc() A, anothertab B WHERE A.col1 = B.col1; Of course, it requires some new fields in the rangetable entry. Anyway, at the beginning of execution for such a query, the executor initializes those RTE's by creating a temptable with the schema of the specified structure or relation. Then it calls the user function, passing in some handle to the temp table and the user function fills in the tuples. Now the rest of query execution is if it is a regular table. After execution, the temp table is dropped. Isn't that all required for true stored procedures? Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
> Bruce Momjian <maillist@candle.pha.pa.us> writes: > > Sounds good. My only question is whether people need backward > > compatibility, and whether we can remove the compatiblity part of the > > interface and small overhead after 7.1 or later? > > I think we could drop it after a decent interval, but I don't see any > reason to be in a hurry. I do think that we'll get complaints if 7.0 > doesn't have any backward compatibility for existing user functions. True. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> > > > [ responding to both of Jan's messages in one ] > > > > wieck@debis.com (Jan Wieck) writes: > > > I like the current interface for it's simpleness from the user > > > function developers point of view. > > > > There is that; even with a good set of convenience macros there will be > > more to learn about writing user functions. OTOH, the way it is now > > is not exactly transparent --- in particular, NULL handling is so easy > > to forget about/get wrong. We can make that much easier. We can also > > simplify the interface noticeably for standard types like float4/float8. > > > > BTW: I am not in nearly as big a hurry as Bruce is to rip out support > > for the current user-function interface. I want to get rid of old-style > > builtin functions before 7.0 because of the portability issue (see > > below). But if a particular user is using old-style user functions > > and isn't having portability problems on his machine, there's no need > > to force him to convert, it seems to me. > > Personally, I could live with dropping the entire old > interface. That's not the problem. But at least Bruce and his > book need to know the final programming conventions if we > ought to change it at all, so it can be covered in his > manuscript when it is sent down to the paper. No. My coverage of that is going to be more conceptual than actual programming. Do whatever you think is best. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
wieck@debis.com (Jan Wieck) writes: >>>> Also I miss the interface for tuple set returns. I know that >>>> this requires much more in other sections than only the fmgr, >>>> but we need to cover it now or we'll not be able to do it >>>> without another change to the fmgr interface at the time we >>>> want to support real stored procedures. >> >> OK, I'm willing to worry about that. But what, exactly, needs to >> be provided in the fmgr interface? > First we need another relation type in pg_class. It's like a > table or view, but none of the NORMAL SQL statements can be > used with it (e.g. INSERT, SELECT, ...). It just describes a > row structure. Why bother? Just create a table --- if you don't want to put any rows in it, you don't have to. > Of course, it requires some new fields in the rangetable > entry. Anyway, at the beginning of execution for such a > query, the executor initializes those RTE's by creating a > temptable with the schema of the specified structure or > relation. Then it calls the user function, passing in some > handle to the temp table and the user function fills in the > tuples. Now the rest of query execution is if it is a regular > table. After execution, the temp table is dropped. OK, but this doesn't answer the immediate problem of what needs to appear in the fmgr interface. I have been thinking about this some, and I think maybe our best bet is not to commit to *exactly* what needs to pass through fmgr, but rather to put in a couple of generic hooks. I can see two hooks that are needed: one is for passing "context" information, such as information about the current trigger event when a function is called from the trigger manager. (This'd replace the present CurrentTriggerData global, which I hope you'll agree shouldn't be a global...) The other is for passing and/or returning information about the function result --- maybe returning a tuple descriptor, maybe passing in the name of a temp table to put a result set in, maybe other stuff. So, I am thinking that we should add two fields like this to the FunctionCallInfo struct: Node *resultinfo; /* pass or return extra info about result */Node *context; /* pass info about context of call*/ We would restrict the usage of these fields only to the extent of saying that they must point to some kind of Node --- that lets callers and callees check the node tag to make sure that they understand what they are looking at. Different node types can be used to handle different situations. For an "ordinary" function call expecting a scalar return value, both fields will be set to NULL. Other conventions for their use will be developed as time goes on. Does this look good to you, or am I off the track entirely? regards, tom lane
> > Node *resultinfo; /* pass or return extra info about result */ > Node *context; /* pass info about context of call */ > > Does this look good to you, or am I off the track entirely? > > regards, tom lane > Looks perfect. I appreciate to get rid of globals whenever possible. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
Tom Lane wrote: > Bruce Momjian <maillist@candle.pha.pa.us> writes: > > Sounds good. My only question is whether people need backward > > compatibility, and whether we can remove the compatiblity part of the > > interface and small overhead after 7.1 or later? > > I think we could drop it after a decent interval, but I don't see any > reason to be in a hurry. I do think that we'll get complaints if 7.0 > doesn't have any backward compatibility for existing user functions. Right. A major release is what it is. And porting applications to a new major release too, it is a conversion, not an upgrade. Therefore a major release should drop as much backward compatibility code for minor releases as possible. Thus, we should think about getting rid of the broken design for functions returning tuple sets in 7.0. As far as I understand the books I have, there are a couple of different types of functions/procedures out, and not every database implements all types, nor do they all name one and the same type equally so that something called function in one database is a stored procedure in another. Anyway, the different types are: 1. Functions returning a scalar value taking only input- arguments. 2. Functions returning a scalar value taking input-, output- and in/out-arguments. 3. Functions returning nothing taking only input-arguments. 4. Functions returning nothing taking input-, output- and in/out-arguments. 5. Functions returning a set of result rows taking only input-arguments. 6. Functions returning a set of result rows taking input-, output- and in/out-arguments. I don't think that we have to implement everything, and since we don't have host variables, output- and in/out-arguments would make sense only for calls from procedural languages. OTOH they would cause much trouble so they are one detail to let out for PostgreSQL. Three cases left. Type number 1. we have already. And it is advanced, because the arguments can be either single values, or single rows. And type number 3. is easy, because invoking something that returns a dummy that is thrown away is absolutely no work. So the only thing that's really left is number 5. The funny detail is, that those functions or procedures can't be used inside regular SELECT queries. Instead a CALL FUNCTION or EXECUTE PROCEDURE statement is used from the client application or inside a PL block. CALL FUNCTION then returns a tuple set as a SELECT does. The result in our world therefore has a tuple descriptor and depending on the invoker is sent to the client or stored in an SPI tuple table. So we do not need to call functions returning sets through the normal function manager. It could competely deny calls to set functions, and the interface for them can be a total different one. I have something in mind that could work without temp tables, but it requires a redesign for PL/pgSQL and causes some limitations for PL/Tcl. Let's leave that for a past 7.0 release. I correct my previous statements and vote to deny calls to set functions through the default function manager in 7.0. And there is another detail I found while browsing through the books. Functions can be defined as [NOT] NULL CALL (IBM DB2). Functions defined as NOT NULL CALL will be called only if all their arguments aren't NULL. So we can prevent much NULL handling inside the functions if we simply define that a function that is NOT NULL CALL will allways return NULL if any of it's input arguments is NULL. This case can then be handled at the function manager level without calling the function itself. Nearly all our builtin functions behave that way but have all the tests inside. Another detail I'm missing now is a new, really defined interface for type input/output functions. The fact that they are defined taking one opaque (yepp, should be something different as already discussed) argument but in fact get more information from the attribute is ugly. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
Jan Wieck wrote: > ... > > 5. Functions returning a set of result rows taking only > input-arguments. ... > So the only thing that's really left is number 5. The funny > detail is, that those functions or procedures can't be used > inside regular SELECT queries. Instead a CALL FUNCTION or > EXECUTE PROCEDURE statement is used from the client > application or inside a PL block. CALL FUNCTION then returns > a tuple set as a SELECT does. The result in our world > therefore has a tuple descriptor and depending on the invoker > is sent to the client or stored in an SPI tuple table. > > So we do not need to call functions returning sets through > the normal function manager. It could competely deny calls to > set functions, and the interface for them can be a total > different one. I have something in mind that could work > without temp tables, but it requires a redesign for PL/pgSQL > and causes some limitations for PL/Tcl. Let's leave that for > a past 7.0 release. > > I correct my previous statements and vote to deny calls to > set functions through the default function manager in 7.0. > It would be very nice if we could use the tuple-set-returning functions in place of tables/views, SELECT * FROM LOGGED_IN_USERS_INFO_PROC; or at least define views on them: CREATE VIEV LOGGED_IN_USERS AS CALL FUNCTION LOGGED_IN_USERS_INFO_PROC; We would not need to call them in place of functions that return either single-value or tuple. On the topic of 2x3=6 kinds of functions you mentioned I think we could use jet another type of functions - the one returning a tuple/row as is ofteh case in python and possibly other languages that do automatic tuple packing/unpacking. It could be used in cases like this: INSERT INTO MY_TABLE CALL FUNCTION MY_ROW_VALUE; or DELETE FROM MY_TABLE WHERE * = CALL FUNCTION MY_ROW_VALUE; (The last example is not ansi and does not work currently), OTOH, these exaples would jus be redundant cases for your 5th case. OTOOH, all the functions returning less than a set of rows are redundadnt cases of the functions that do ;) ----------- Hannu
Jan Wieck wrote: > > Another detail I'm missing now is a new, really defined > interface for type input/output functions. The fact that they > are defined taking one opaque (yepp, should be something > different as already discussed) argument but in fact get more > information from the attribute is ugly. Can we currently return a list of the same type ? I guess we can, as lists (or arrays) are fundamentl types in PostgreSQL, but I'm not sure. I would like to define aggregate functions list() and set() Could I define then just once and specify that they return an array of their input type ? Half of that is currently done for count() - i.e. it can take any type of argument, but I guess the return-array-of-input-type is more complicated. Also (probably off topic) how hard would it be to add another type of aggregate funtions tha operate on pairs of values ? I would like to have FOR_MIN and FOR_MAX (and possibly MIN_MIN and MAX_MAX) functions that return _another_ field from a table for a minimal value in one field. ------------- Hannu
Hannu Krosing wrote: > > Jan Wieck wrote: > > > > I correct my previous statements and vote to deny calls to > > set functions through the default function manager in 7.0. > > > > It would be very nice if we could use the tuple-set-returning > functions in place of tables/views, > > SELECT * FROM LOGGED_IN_USERS_INFO_PROC; Exactly that's something I want for long now. Sticking another querytree, that returns a tuple set, into a rangetable entry. This other querytree could be either a SELECT as in SELECT A.x, A.y, B.z FROM table1 A, (SELECT ...) B WHERE A.x = B.x; or a function returning a set as in SELECT A.x, A.y, B.z FROM table1 A, setfunc('arg') B WHERE A.x = B.x; Finally, the form CALL setfunc('arg'); would be equivalent to a SELECT * FROM setfunc('arg'); but closer to the IBM DB2 calling syntax. The first one is required to get rid of some problems in the rule system, especially views with aggregate columns that need their own GROUP BY clause. The other ones are what we need to implement stored procedures. > > or at least define views on them: > > CREATE VIEV LOGGED_IN_USERS AS CALL FUNCTION LOGGED_IN_USERS_INFO_PROC; Wrong syntax since the statement after AS must be a SELECT. But a CREATE VIEW v AS SELECT * FROM setfunc(); would do the trick. > > We would not need to call them in place of functions that return > either single-value or tuple. > > On the topic of 2x3=6 kinds of functions you mentioned I think we > could use jet another type of functions - the one returning a > tuple/row as is ofteh case in python and possibly other languages > that do automatic tuple packing/unpacking. > > It could be used in cases like this: > > INSERT INTO MY_TABLE CALL FUNCTION MY_ROW_VALUE; Let's clearly distinguish between scalar, row and set return values. A scalar return value is one single datum. A row return value is exactly one tuple of 1...n datums and a set return value is a collection of 0...n rows. What we have now (at least what works properly) are only scalar return values from functions. And I don't see the point of a row return, so I think we don't need them. > > or > > DELETE FROM MY_TABLE WHERE * = CALL FUNCTION MY_ROW_VALUE; > > (The last example is not ansi and does not work currently), > > OTOH, these exaples would jus be redundant cases for your 5th case. > > OTOOH, all the functions returning less than a set of rows are > redundadnt cases of the functions that do ;) But please don't forget that it isn't enough to write down the syntax and specify the behaviour with some english words. We must define the behaviour in C too, and in that language it's a little more than a redundant case of something, because we don't have that something. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
Jan Wieck wrote: > > Hannu Krosing wrote: > > > > Jan Wieck wrote: > What we have now (at least what works properly) are only > scalar return values from functions. And I don't see the > point of a row return, so I think we don't need them. That's what I mant by the OTOH below > > > > (The last example is not ansi and does not work currently), > > > > OTOH, these exaples would jus be redundant cases for your 5th case. > > > > OTOOH, all the functions returning less than a set of rows are > > redundadnt cases of the functions that do ;) > > But please don't forget that it isn't enough to write down > the syntax and specify the behaviour with some english words. > We must define the behaviour in C too, and in that language > it's a little more than a redundant case of something, > because we don't have that something. Yes, that's the hard part. --------- Hannu
> Tom Lane wrote: > > > Bruce Momjian <maillist@candle.pha.pa.us> writes: > > > Sounds good. My only question is whether people need backward > > > compatibility, and whether we can remove the compatiblity part of the > > > interface and small overhead after 7.1 or later? > > > > I think we could drop it after a decent interval, but I don't see any > > reason to be in a hurry. I do think that we'll get complaints if 7.0 > > doesn't have any backward compatibility for existing user functions. > > Right. A major release is what it is. And porting > applications to a new major release too, it is a conversion, > not an upgrade. Therefore a major release should drop as much > backward compatibility code for minor releases as possible. If this was simple stuff, we could keep compatibility with little problem. But, with complex stuff like this, keeping backward compatibility sometimes make things more confusing. They can code things two ways, and that makes people get confused as to which one to follow. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
I think that the proposals for the revision of the function interface are all an improvement on what is there, and I hope to try to find time to help implement whatever is decided. Here are some more thoughts in relation to the question of set valued and tuple valued (or complex type?) arguments. Another place that user defined functions are used in the PostgreSQL backend is in association with access methods. Both as boolean operators for search predicates, and as auxiliary functions for the access methods. Allowing set valued and tuple valued arguments and return values for functions and operators in this setting can be valuable. For instance, suppose I have a table t that stores geometric objects in some space, and I have a spatial index such as R*-tree, or even a GIST index. Given a set of points pts I want to do a query of the form SELECT * FORM t WHERE t.object <intersects> pts; Under these circumstances it would be really nice to be able to pass a set of objects (as an SPI tuple table for instance) into the index. Currently, the way I do this (with a custom access method) is to create a temp table, put the key set into the temp table, and pass the name of the temp table to the access method in the search key. The access method then does an SPI select on the temp table and stores the returned items into the private scan state for use during the scan. While I realize that implementing this example requires much more than a change to the function interface, I hope that it illustrates that it is perhaps a good idea to keep as much flexibility in the function interface as possible. Bernie Franpitt
Bernie Franpitt wrote: > SELECT * FORM t WHERE t.object <intersects> pts; > > Under these circumstances it would be really nice to be able to pass a > set of objects (as an SPI tuple table for instance) into the index. > > Currently, the way I do this (with a custom access method) is to create > a temp table, put the key set into the temp table, and pass the name of > the temp table to the access method in the search key. The access > method then does an SPI select on the temp table and stores the returned > items into the private scan state for use during the scan. > > While I realize that implementing this example requires much more than a > change to the function interface, I hope that it illustrates that it is > perhaps a good idea to keep as much flexibility in the function > interface as possible. Uhhh - it is a good idea, but passing tuple sets as arguments to functions, that will cause headaches, man. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
On Oct 30, Jan Wieck mentioned: > Right. A major release is what it is. And porting > applications to a new major release too, it is a conversion, > not an upgrade. Therefore a major release should drop as much > backward compatibility code for minor releases as possible. Certainly true. But that would also mean that we'd have to keep maintaining the 6.* series as well. At least bug-fixing and releasing one or two more 6.5.x versions. Up until now the usual answer to a bug was "upgrade to latest version". But if you break compatibility you can't do that any more. (Compare to Linux 2.0 vs 2.2) I'm just wondering what the plans are in that regard. -- Peter Eisentraut Sernanders vaeg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
Peter Eisentraut wrote: > On Oct 30, Jan Wieck mentioned: > > > Right. A major release is what it is. And porting > > applications to a new major release too, it is a conversion, > > not an upgrade. Therefore a major release should drop as much > > backward compatibility code for minor releases as possible. > > Certainly true. But that would also mean that we'd have to keep > maintaining the 6.* series as well. At least bug-fixing and releasing one > or two more 6.5.x versions. Up until now the usual answer to a bug was > "upgrade to latest version". But if you break compatibility you can't do > that any more. (Compare to Linux 2.0 vs 2.2) I'm just wondering what the > plans are in that regard. Sometimes ago, I already pointed out that we have to support one or too older releases for some time. Not only because we might drop some compatibility code. Each release usually declares one or the other new keyword, making existing applications probably fail with the new release. And no amount of compatibility code would help in that case! It's a deadlock trap, an application that cannot be easily ported to a newer release because of incompatibilities in the querylaguage cannot use the last release it is compatible to because of a bug. There is a new aspect in this discussion since then. The new corporation PostgreSQL Inc. offers commercial support for our database (look at www.pgsql.com). If they offer support, they must support older releases as well, so they need to backpatch already. Wouldn't it be a good idea if their return into our project are bugfix releases of older versions (created by backpatching release branches)? In the case of a customers accident, they have to do it anyway. And doing it for critical bugs during idle time could avoid accidents, so it's good customer service. Marc, what do you think about a such an agreement? Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
wieck@debis.com (Jan Wieck) writes: > There is a new aspect in this discussion since then. The new > corporation PostgreSQL Inc. offers commercial support for our > database (look at www.pgsql.com). If they offer support, they > must support older releases as well, so they need to > backpatch already. Yes, but who's the "them" here? If PostgreSQL Inc. has any warm bodies other than the existing group of developers, I sure haven't heard from them... I agree 100% with Jan's basic point: we must provide a degree of backwards compatibility from release to release. In some cases that might create enough pain to be worth debating, but in this particular case it seems like the choice is a no-brainer. We just leave in the transition fmgr code that we're going to write anyway. I don't understand why it even got to be a topic of discussion. regards, tom lane