Thread: Function-manager redesign: second draft (long)

Function-manager redesign: second draft (long)

From
Tom Lane
Date:
This is a followup to a message I wrote in June about reworking the fmgr
interface.  I've had a couple of good ideas (I think ;-)) since then,
but there are some parts of the proposal that still need work before
implementation can begin.

I could particularly use some feedback from Jan and anyone else who's
worked with function-call handlers: does this design eliminate the
kluges that you've had to use in the past?  If not, what else is needed?
        regards, tom lane


Proposal for function-manager redesign
--------------------------------------

We know that the existing mechanism for calling Postgres functions needs
to be redesigned.  It has portability problems because it makes
assumptions about parameter passing that violate ANSI C; it fails to
handle NULL arguments and results cleanly; and "function handlers" that
support a class of functions (such as fmgr_pl) can only be done via a
really ugly, non-reentrant kluge.  (Global variable set during every
function call, forsooth.)  Here is a proposal for fixing these problems.

In the past, the major objections to redoing the function-manager
interface have been (a) it'll be quite tedious to implement, since every
built-in function and everyplace that calls such functions will need to
be touched; (b) such wide-ranging changes will be difficult to make in
parallel with other development work; (c) it will break existing
user-written loadable modules that define "C language" functions.  While
I have no solution to the "tedium" aspect, I believe I see an answer to
the other problems: by use of function handlers, we can support both old
and new interfaces in parallel for both callers and callees, at some
small efficiency cost for the old styles.  That way, most of the changes
can be done on an incremental file-by-file basis --- we won't need a
"big bang" where everything changes at once.  Support for callees
written in the old style can be left in place indefinitely, to provide
backward compatibility for user-written C functions.


The new function-manager interface
----------------------------------

The core of the new design is revised data structures for representing
the result of a function lookup and for representing the parameters
passed to a specific function invocation.  (We want to keep function
lookup separate, since many parts of the system apply the same function
over and over; the lookup overhead should be paid once per query, not
once per tuple.)


When a function is looked up in pg_proc, the result is represented as

typedef struct
{   PGFunction  fn_addr;    /* pointer to function or handler to be called */   Oid         fn_oid;     /* OID of
function(NOT of handler, if any) */   int         fn_nargs;   /* 0..MAXFMGRARGS, or -1 if variable arg count */   void
    *fn_extra;   /* extra space for use by handler */
 
} FunctionLookupInfoData;
typedef FunctionLookupInfoData* FunctionLookupInfo;

For an ordinary built-in function, fn_addr is just the address of the C
routine that implements the function.  Otherwise it is the address of a
handler for the class of functions that includes the target function.
The handler can use the function OID and perhaps also the fn_extra slot
to find the specific code to execute.  (fn_oid = InvalidOid can be used
to denote a not-yet-initialized FunctionLookupInfoData struct.  fn_extra
will always be NULL when a FunctionLookupInfoData is first filled by the
function lookup code, but a function handler could set it to avoid
making repeated lookups of its own when the same FunctionLookupInfoData
is used repeatedly during a query.)  fn_nargs is the number of arguments
expected by the function.

FunctionLookupInfo replaces the present FmgrInfo structure (but I'm
giving it a new name so that the old struct definition can continue
to exist during the transition phase).


During a call of a function, the following data structure is created
and passed to the function:

typedef struct
{   FunctionLookupInfo flinfo;  /* ptr to lookup info used for this call */   bool        isnull;         /*
input/outputflag, see below */   int         nargs;          /* # arguments actually passed */   Datum
arg[MAXFMGRARGS]; /* Arguments passed to function */   bool        argnull[MAXFMGRARGS];  /* T if arg[i] is actually
NULL*/
 
} FunctionCallInfoData;
typedef FunctionCallInfoData* FunctionCallInfo;

Note that all the arguments passed to a function (as well as its result
value) will now uniformly be of type Datum.  As discussed below, callers
and callees should apply the standard Datum-to-and-from-whatever macros
to convert to the actual argument types of a particular function.  The
value in arg[i] is unspecified when argnull[i] is true.

It is generally the responsibility of the caller to ensure that the
number of arguments passed matches what the callee is expecting: except
for callees that take a variable number of arguments, the callee will
typically ignore the nargs field and just grab values from arg[].

The meaning of the struct elements should be pretty obvious with the
exception of isnull.  isnull must be set by the caller to the logical OR
of the argnull[i] flags --- ie, isnull is true if any argument is NULL.
(Of course, isnull is false if nargs == 0.)  On return from the
function, isnull is the null flag for the function result: if it is true
the function's result is NULL, regardless of the actual function return
value.  Overlapping the input and output flags in this way provides a
simple, convenient, fast implementation for the most common case of a
"strict" function (whose result is NULL if any input is NULL):
if (finfo->isnull)    return (Datum) 0;   /* specific value doesn't matter */
... else do normal calculation ignoring argnull[] ...

Non-strict functions can easily be implemented; they just need to check
the individual argnull[] flags and set the appropriate isnull value
before returning.

FunctionCallInfo replaces FmgrValues plus a bunch of ad-hoc parameter
conventions.


Callees, whether they be individual functions or function handlers,
shall always have this signature:

Datum function (FunctionCallInfo finfo);

which is represented by the typedef

typedef Datum (*PGFunction) (FunctionCallInfo finfo);

The function is responsible for setting finfo->isnull appropriately
as well as returning a result represented as a Datum.  Note that since
all callees will now have exactly the same signature, and will be called
through a function pointer declared with exactly that signature, we
should have no portability or optimization problems.

When the function's result type is pass-by-reference, the result value
must always be stored in freshly-palloc'd space (it can't be a constant
or a copy of an input pointer).  This rule will eventually allow
automatic reclamation of storage space during expression evaluation.


Function coding conventions
---------------------------

As an example, int4 addition goes from old-style

int32
int4pl(int32 arg1, int32 arg2)
{   return arg1 + arg2;
}

to new-style

Datum
int4pl(FunctionCallInfo finfo)
{   if (finfo->isnull)       return (Datum) 0;        /* value doesn't really matter ... */   /* we can ignore flinfo,
nargsand argnull */
 
   return Int32GetDatum(DatumGetInt32(finfo->arg[0]) +                        DatumGetInt32(finfo->arg[1]));
}

This is, of course, much uglier than the old-style code, but we can
improve matters with some well-chosen macros for the boilerplate parts.
What we actually end up writing might look something like

Datum
int4pl(PG_FUNCTION_ARGS)
{   PG_STRICT_FUNCTION;            /* encapsulates null check */   {       PG_ARG1_INT32;       PG_ARG2_INT32;
       PG_RESULT_INT32( arg1 + arg2 );   }
}

where the macros expand to things like
"int32 arg1 = DatumGetInt32(finfo->arg[0])"
and "return Int32GetDatum( x )".  I don't yet have a detailed proposal
for convenience macros for function authors, but I think it'd be well
worth while to define some.

For the standard pass-by-reference types (int8, float4, float8) these
macros should also hide the indirection and space allocation involved,
so that the function's code is not explicitly aware that the types are
pass-by-ref.  This will allow future conversion of these types to
pass-by-value on machines where it's feasible to do that.  (For example,
on an Alpha it's pretty silly to make int8 be pass-by-ref, since Datum
is going to be 64 bits anyway.)


Call-site coding conventions
----------------------------

There are many places in the system that call either a specific function
(for example, the parser invokes "textin" by name in places) or a
particular group of functions that have a common argument list (for
example, the optimizer invokes selectivity estimation functions with
a fixed argument list).  These places will need to change, but we should
try to avoid making them significantly uglier than before.

Places that invoke an arbitrary function with an arbitrary argument list
can simply be changed to fill a FunctionCallInfoData structure directly;
that'll be no worse and possibly cleaner than what they do now.

When invoking a specific built-in function by name, we have generally
just written something likeresult = textin ( ... args ... )
which will not work after textin() is converted to the new call style.
I suggest that code like this be converted to use "helper" functions
that will create and fill in a FunctionCallInfoData struct.  For
example, if textin is being called with one argument, it'd look
something likeresult = DirectFunctionCall1(textin, PointerGetDatum(argument));
These helper routines will have declarations likeDatum DirectFunctionCall2(PGFunction func, Datum arg1, Datum arg2);
Note it will be the caller's responsibility to convert to and from
Datum; appropriate conversion macros should be used.

The DirectFunctionCallN routines will not bother to fill in
finfo->flinfo (indeed cannot, since they have no idea about an OID for
the target function); they will just set it NULL.  This is unlikely to
bother any built-in function that could be called this way.  Note also
that this style of coding cannot check for a NULL result (it couldn't
before, either!).  We could reasonably make the helper routines elog an
error if they see that the function returns a NULL.

(Note: direct calls like this will have to be changed at the same time
that the called routines are changed to the new style.  But that will
still be a lot less of a constraint than a "big bang" conversion.)

When invoking a function that has a known argument signature, we have
usually written eitherresult = fmgr(targetfuncOid, ... args ... );
orresult = fmgr_ptr(FmgrInfo *finfo, ... args ... );
depending on whether an FmgrInfo lookup has been done yet or not.
This kind of code can be recast using helper routines, in the same
style as above:result = OidFunctionCall1(funcOid, PointerGetDatum(argument));result = FunctionCall2(funcCallInfo,
               PointerGetDatum(argument),                       Int32GetDatum(argument));
 

Again, this style of coding does not recognize the possibility of a
null result.  We could provide variant helper routines that allow
a null return rather than raising an error, which could be called in
a style likeif (FunctionCall1IsNull(&result, funcCallInfo,                        PointerGetDatum(argument))){    ...
copewith null result ...}else{    ... OK, use 'result' here ...}
 
But I'm unsure that there are enough places in the system that need this
to justify the extra set of helpers.  If there are only a few places
that need a non-error response to a null result, they could just be
changed to fill and examine a FunctionCallInfoData structure directly.

As with the callee-side situation, I am strongly inclined to add
argument conversion macros that hide the pass-by-reference nature of
int8, float4, and float8, with an eye to making those types relatively
painless to convert to pass-by-value.

The existing helper functions fmgr(), fmgr_c(), etc will be left in
place until all uses of them are gone.  Of course their internals will
have to change in the first step of implementation, but they can
continue to support the same external appearance.


Notes about function handlers
-----------------------------

Handlers for classes of functions should find life much easier and
cleaner in this design.  The OID of the called function is directly
reachable from the passed parameters; we don't need the global variable
fmgr_pl_finfo anymore.  Also, by modifying finfo->flinfo->fn_extra,
the handler can cache lookup info to avoid repeat lookups when the same
function is invoked many times.  (fn_extra can only be used as a hint,
since callers are not required to re-use a FunctionLookupInfo struct.
But in performance-critical paths they normally will do so.)

I observe that at least one other global variable, CurrentTriggerData,
is being used as part of the call convention for some function handlers.
That's just as grotty as fmgr_pl_finfo, so I'd like to get rid of it.
Any comments on the cleanest way to do so?

Are there any other things needed by the call handlers for PL/pgsql and
other languages?

During the conversion process, support for old-style builtin functions
and old-style user-written C functions will be provided by appropriate
function handlers.  For example, the handler for old-style builtins
will look roughly like fmgr_c() does now.


System table updates
--------------------

In the initial phase, pg_language type 11 ("builtin") will be renamed
to "old_builtin", and a new language type named "builtin" will be
created with a new OID.  Then pg_proc entries will be changed from
language code 11 to the new code piecemeal, as the associated routines
are rewritten.  (This will imply several rounds of forced initdbs as
the contents of pg_proc change.  It would be a good idea to add a
"catalog contents version number" to the database version info checked
at startup before we begin this process.)

The existing pg_language entry for "C" functions will continue to
describe user functions coded in the old style, and we will need to add
a new language name for user functions coded in the new style.  (Any
suggestions for what the new name should be?)  We should deprecate
old-style functions because of their portability problems, but the
support for them will only be one small function handler routine,
so we can leave them in place for as long as necessary.

The expected calling convention for PL call handlers will need to change
all-at-once, but fortunately there are not very many of them to fix.


Re: [HACKERS] Function-manager redesign: second draft (long)

From
Bruce Momjian
Date:
> This is a followup to a message I wrote in June about reworking the fmgr
> interface.  I've had a couple of good ideas (I think ;-)) since then,
> but there are some parts of the proposal that still need work before
> implementation can begin.
> 
> I could particularly use some feedback from Jan and anyone else who's
> worked with function-call handlers: does this design eliminate the
> kluges that you've had to use in the past?  If not, what else is needed?

Sounds good.  My only question is whether people need backward
compatibility, and whether we can remove the compatiblity part of the
interface and small overhead after 7.1 or later?

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Function-manager redesign: second draft (long)

From
wieck@debis.com (Jan Wieck)
Date:
>
> > This is a followup to a message I wrote in June about reworking the fmgr
> > interface.  I've had a couple of good ideas (I think ;-)) since then,
> > but there are some parts of the proposal that still need work before
> > implementation can begin.
> >
> > I could particularly use some feedback from Jan and anyone else who's
> > worked with function-call handlers: does this design eliminate the
> > kluges that you've had to use in the past?  If not, what else is needed?
>
> Sounds good.  My only question is whether people need backward
> compatibility, and whether we can remove the compatiblity part of the
> interface and small overhead after 7.1 or later?

    Backward  compatibility is a common source of problems, and I
    don't like it generally. In the case of the fmgr it is  quite
    a  little  difficult,  and  I thought about it too already. I
    like the current interface for it's simpleness from the  user
    function   developers  point  of  view.  And  converting  ALL
    internal  plus  most  of  the  contrib  directories  ones  to
    something new is really a huge project.

    All   function  calls  through  the  fmgr  use  the  FmgrInfo
    structure, but there are alot of calls to internal  functions
    like  textout() etc. too. Changing their interface would IMHO
    introduce many problems. And there are only  a  few  internal
    functions  where  a new fmgr interface really is required due
    to incomplete NULL handling or the like.

    Therefore I would prefer an interface extension, that doesn't
    require  changes  to  existing functions. What about adding a
    proifversion  to  pg_proc,  telling  the  fmgr   which   call
    interface  the  function  uses?  This  is  then  held  in the
    FmgrInfo struct too so the fmgr can call a function using the
    old and new interface.

    First  fmgr_info()  is  extended to put the interface version
    into the  FmgrInfo  too.  Then  fmgr_faddr()  is  renamed  to
    fmgr_faddr_v1() and it has to check that only functions using
    the old interface are called through it  (there  aren't  that
    many  calls  to  it as you might think).  After that you have
    all the time in the world to implement another interface  and
    add  a  switch into fmgr() and sisters handling the different
    versions.

    My thoughts for the new interface:

    o   Each function argument must have it's separate NULL flag.

    o   The functions result must have another NULL flag too.

    o   Argument names and default values for omitted ones aren't
        IMHO something to  go  into  the  interface  itself.  The
        function is allways called with all arguments positional,
        the parser must provide this list.

    o   The new interface must at least be designed for  a  later
        implementation of tuple set returns. I think this must be
        implemented as a temp table, collecting all return tuples
        since for a procedural language it might be impossible to
        implement a real RETURN AND RESUME (like  it  is  already
        for  PL/Tcl  and  it  would  be  for  PL/Perl). Therefore
        another STRUCT  kind  of  relation  must  be  added  too,
        providing  the  tupdesc  for the returned set.  This temp
        table, filled by calling the function at the first time a
        tuple is needed and then it is simply another RTE.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

Re: [HACKERS] Function-manager redesign: second draft (long)

From
wieck@debis.com (Jan Wieck)
Date:
> System table updates
> --------------------
>
> In the initial phase, pg_language type 11 ("builtin") will be renamed
> to "old_builtin", and a new language type named "builtin" will be
> created with a new OID.  Then pg_proc entries will be changed from
> language code 11 to the new code piecemeal, as the associated routines
> are rewritten.  (This will imply several rounds of forced initdbs as
> the contents of pg_proc change.  It would be a good idea to add a
> "catalog contents version number" to the database version info checked
> at startup before we begin this process.)
>
> The existing pg_language entry for "C" functions will continue to
> describe user functions coded in the old style, and we will need to add
> a new language name for user functions coded in the new style.  (Any
> suggestions for what the new name should be?)  We should deprecate
> old-style functions because of their portability problems, but the
> support for them will only be one small function handler routine,
> so we can leave them in place for as long as necessary.
>
> The expected calling convention for PL call handlers will need to change
> all-at-once, but fortunately there are not very many of them to fix.

    This  approach  nearly  matches  all  my  thoughts  about the
    redesign of the fmgr. In the  system  table  section  I  miss
    named arguments.

    I think we need a new system table

    pg_proargs (
      Oid         pargprooid,
      int2        pargno,
      name        pargname,
      bool        pargdflnull,
      text        pargdefault
    );

    plus  another  flag  in  pg_proc  that tells if this function
    prototype information is available.

    The parser then has to handle function calls like

      ... myfunc(userid = 123, username = 'hugo');

    and build a complete function argument list that has all  the
    arguments  in  the  correct  order  and  defaults for omitted
    arguments filled  in  as  const  nodes.  This  new  prototype
    information  than  must  also  be  used in the PL handlers to
    choose the given names for arguments.

    In addition, we could add  an  argument  type  at  this  time
    (INPUT, OUTPUT and INOUT) but only support INPUT ones for now
    from the SQL level.  PL  functions  calling  other  functions
    could eventually use these argument types in the future.

    Also  I miss the interface for tuple set returns. I know that
    this requires much more in other sections than only the fmgr,
    but  we  need  to  cover it now or we'll not be able to do it
    without another change to the fmgr interface at the  time  we
    want  to  support real stored procedures.  As said in another
    mail, I think it must be done via some temp table since  most
    interpreter  languages  will  not  be  able  to do RETURN AND
    RESUME in any other way - not even PL/pgSQL will be easy  and
    would   need  a  total  internal  redesign  of  the  bytecode
    interpreter since it otherwise needs to recover the  internal
    call stack maybe across recursive calls.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

Re: [HACKERS] Function-manager redesign: second draft (long)

From
Tom Lane
Date:
Bruce Momjian <maillist@candle.pha.pa.us> writes:
> Sounds good.  My only question is whether people need backward
> compatibility, and whether we can remove the compatiblity part of the
> interface and small overhead after 7.1 or later?

I think we could drop it after a decent interval, but I don't see any
reason to be in a hurry.  I do think that we'll get complaints if 7.0
doesn't have any backward compatibility for existing user functions.
        regards, tom lane


Re: [HACKERS] Function-manager redesign: second draft (long)

From
Tom Lane
Date:
[ responding to both of Jan's messages in one ]

wieck@debis.com (Jan Wieck) writes:
>     I like the current interface for it's simpleness from the  user
>     function   developers  point  of  view.

There is that; even with a good set of convenience macros there will be
more to learn about writing user functions.  OTOH, the way it is now
is not exactly transparent --- in particular, NULL handling is so easy
to forget about/get wrong.  We can make that much easier.  We can also
simplify the interface noticeably for standard types like float4/float8.

BTW: I am not in nearly as big a hurry as Bruce is to rip out support
for the current user-function interface.  I want to get rid of old-style
builtin functions before 7.0 because of the portability issue (see
below).  But if a particular user is using old-style user functions
and isn't having portability problems on his machine, there's no need
to force him to convert, it seems to me.

>     And  converting  ALL
>     internal  plus  most  of  the  contrib  directories  ones  to
>     something new is really a huge project.

It is.  I was hoping to get some help ;-) ... but I will do it myself
if I have to.

>     ... And there are only  a  few  internal
>     functions  where  a new fmgr interface really is required due
>     to incomplete NULL handling or the like.

If your goal is *only* to deal with the NULL issue, or *only* to get
things working on a specific platform like Alpha, then yes we could
patch in a few dozen places and not undertake a complete changeover.
But I believe that we really need to fix this right, because it will
keep coming back to haunt us until we do.  I will not be happy as
long as we have ports that have to compile "-O0" because of fmgr
brain-damage.  I fear there are going to be more and more such ports
as people install smarter compilers that assume they are working with
ANSI-compliant source code.  We're giving up performance system-wide
because of fmgr.

>     Therefore I would prefer an interface extension, that doesn't
>     require  changes  to  existing functions. What about adding a
>     proifversion  to  pg_proc,  telling  the  fmgr   which   call
>     interface  the  function  uses?

I was going to let the prolang column tell that, by having different
language codes for old vs. new builtin function and old vs. new
dynamic-linked C function.  We could add a version column instead,
but that seems like unnecessary complication.

>     First  fmgr_info()  is  extended to put the interface version
>     into the  FmgrInfo  too.  Then  fmgr_faddr()  is  renamed  to
>     fmgr_faddr_v1() and it has to check that only functions using
>     the old interface are called through it  (there  aren't  that
>     many  calls  to  it as you might think).

I think it should be possible to make fmgr_faddr call both old and
new functions --- I haven't actually written code yet, but I think
I can do it.  You're right that that's important in order to spread
out the repair work instead of having a "big bang".

>     This  approach  nearly  matches  all  my  thoughts  about the
>     redesign of the fmgr. In the  system  table  section  I  miss
>     named arguments.

As you said in your earlier message, that is a parser-level feature
that has nothing to do with what happens at the fmgr level.  I don't
want to worry about it for this revision.

>     Also  I miss the interface for tuple set returns. I know that
>     this requires much more in other sections than only the fmgr,
>     but  we  need  to  cover it now or we'll not be able to do it
>     without another change to the fmgr interface at the  time  we
>     want  to  support real stored procedures.

OK, I'm willing to worry about that.  But what, exactly, needs to
be provided in the fmgr interface?
        regards, tom lane


Re: [HACKERS] Function-manager redesign: second draft (long)

From
wieck@debis.com (Jan Wieck)
Date:
>
> [ responding to both of Jan's messages in one ]
>
> wieck@debis.com (Jan Wieck) writes:
> >     I like the current interface for it's simpleness from the  user
> >     function   developers  point  of  view.
>
> There is that; even with a good set of convenience macros there will be
> more to learn about writing user functions.  OTOH, the way it is now
> is not exactly transparent --- in particular, NULL handling is so easy
> to forget about/get wrong.  We can make that much easier.  We can also
> simplify the interface noticeably for standard types like float4/float8.
>
> BTW: I am not in nearly as big a hurry as Bruce is to rip out support
> for the current user-function interface.  I want to get rid of old-style
> builtin functions before 7.0 because of the portability issue (see
> below).  But if a particular user is using old-style user functions
> and isn't having portability problems on his machine, there's no need
> to force him to convert, it seems to me.

    Personally,  I  could  live  with  dropping  the  entire  old
    interface. That's not the problem. But at least Bruce and his
    book  need  to  know  the final programming conventions if we
    ought to change it at all,  so  it  can  be  covered  in  his
    manuscript when it is sent down to the paper.

> I was going to let the prolang column tell that, by having different
> language codes for old vs. new builtin function and old vs. new
> dynamic-linked C function.  We could add a version column instead,
> but that seems like unnecessary complication.

    Right - language is all needed to tell.

> >     This  approach  nearly  matches  all  my  thoughts  about the
> >     redesign of the fmgr. In the  system  table  section  I  miss
> >     named arguments.
>
> As you said in your earlier message, that is a parser-level feature
> that has nothing to do with what happens at the fmgr level.  I don't
> want to worry about it for this revision.

    Right too. I just hoped to expand the scope of this change so
    there would only ONE change to the PL handlers covering both,
    new interface with proper NULL handling and enhanced function
    prototypes.

>
> >     Also  I miss the interface for tuple set returns. I know that
> >     this requires much more in other sections than only the fmgr,
> >     but  we  need  to  cover it now or we'll not be able to do it
> >     without another change to the fmgr interface at the  time  we
> >     want  to  support real stored procedures.
>
> OK, I'm willing to worry about that.  But what, exactly, needs to
> be provided in the fmgr interface?

    First we need another relation type in pg_class. It's like  a
    table  or  view, but none of the NORMAL SQL statements can be
    used with it (e.g.  INSERT, SELECT, ...). It just describes a
    row structure.

    Then,  a function returning a SET of any row type (this time,
    regular relations or views too) can be used in the rangetable
    as

        SELECT A.col1, B.col2 FROM mysetfunc() A, anothertab B
            WHERE A.col1 = B.col1;

    Of  course,  it  requires  some  new fields in the rangetable
    entry.  Anyway, at the beginning  of  execution  for  such  a
    query,  the  executor  initializes  those RTE's by creating a
    temptable with the  schema  of  the  specified  structure  or
    relation.  Then  it  calls the user function, passing in some
    handle to the temp table and the user function fills  in  the
    tuples. Now the rest of query execution is if it is a regular
    table. After execution, the temp table is dropped.

    Isn't that all required for true stored procedures?


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

Re: [HACKERS] Function-manager redesign: second draft (long)

From
Bruce Momjian
Date:
> Bruce Momjian <maillist@candle.pha.pa.us> writes:
> > Sounds good.  My only question is whether people need backward
> > compatibility, and whether we can remove the compatiblity part of the
> > interface and small overhead after 7.1 or later?
> 
> I think we could drop it after a decent interval, but I don't see any
> reason to be in a hurry.  I do think that we'll get complaints if 7.0
> doesn't have any backward compatibility for existing user functions.

True.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Function-manager redesign: second draft (long)

From
Bruce Momjian
Date:
> >
> > [ responding to both of Jan's messages in one ]
> >
> > wieck@debis.com (Jan Wieck) writes:
> > >     I like the current interface for it's simpleness from the  user
> > >     function   developers  point  of  view.
> >
> > There is that; even with a good set of convenience macros there will be
> > more to learn about writing user functions.  OTOH, the way it is now
> > is not exactly transparent --- in particular, NULL handling is so easy
> > to forget about/get wrong.  We can make that much easier.  We can also
> > simplify the interface noticeably for standard types like float4/float8.
> >
> > BTW: I am not in nearly as big a hurry as Bruce is to rip out support
> > for the current user-function interface.  I want to get rid of old-style
> > builtin functions before 7.0 because of the portability issue (see
> > below).  But if a particular user is using old-style user functions
> > and isn't having portability problems on his machine, there's no need
> > to force him to convert, it seems to me.
> 
>     Personally,  I  could  live  with  dropping  the  entire  old
>     interface. That's not the problem. But at least Bruce and his
>     book  need  to  know  the final programming conventions if we
>     ought to change it at all,  so  it  can  be  covered  in  his
>     manuscript when it is sent down to the paper.

No.  My coverage of that is going to be more conceptual than actual
programming.  Do whatever you think is best.



--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Function-manager redesign: second draft (long)

From
Tom Lane
Date:
wieck@debis.com (Jan Wieck) writes:
>>>> Also  I miss the interface for tuple set returns. I know that
>>>> this requires much more in other sections than only the fmgr,
>>>> but  we  need  to  cover it now or we'll not be able to do it
>>>> without another change to the fmgr interface at the  time  we
>>>> want  to  support real stored procedures.
>> 
>> OK, I'm willing to worry about that.  But what, exactly, needs to
>> be provided in the fmgr interface?

>     First we need another relation type in pg_class. It's like  a
>     table  or  view, but none of the NORMAL SQL statements can be
>     used with it (e.g.  INSERT, SELECT, ...). It just describes a
>     row structure.

Why bother?  Just create a table --- if you don't want to put any
rows in it, you don't have to.

>     Of  course,  it  requires  some  new fields in the rangetable
>     entry.  Anyway, at the beginning  of  execution  for  such  a
>     query,  the  executor  initializes  those RTE's by creating a
>     temptable with the  schema  of  the  specified  structure  or
>     relation.  Then  it  calls the user function, passing in some
>     handle to the temp table and the user function fills  in  the
>     tuples. Now the rest of query execution is if it is a regular
>     table. After execution, the temp table is dropped.

OK, but this doesn't answer the immediate problem of what needs to
appear in the fmgr interface.

I have been thinking about this some, and I think maybe our best bet is
not to commit to *exactly* what needs to pass through fmgr, but rather
to put in a couple of generic hooks.  I can see two hooks that are
needed: one is for passing "context" information, such as information
about the current trigger event when a function is called from the
trigger manager.  (This'd replace the present CurrentTriggerData global,
which I hope you'll agree shouldn't be a global...)  The other is for
passing and/or returning information about the function result --- maybe
returning a tuple descriptor, maybe passing in the name of a temp table
to put a result set in, maybe other stuff.  So, I am thinking that we
should add two fields like this to the FunctionCallInfo struct:
Node    *resultinfo;   /* pass or return extra info about result */Node    *context;      /* pass info about context of
call*/
 

We would restrict the usage of these fields only to the extent of saying
that they must point to some kind of Node --- that lets callers and
callees check the node tag to make sure that they understand what they
are looking at.  Different node types can be used to handle different
situations.  For an "ordinary" function call expecting a scalar return
value, both fields will be set to NULL.  Other conventions for their use
will be developed as time goes on.

Does this look good to you, or am I off the track entirely?
        regards, tom lane


Re: [HACKERS] Function-manager redesign: second draft (long)

From
wieck@debis.com (Jan Wieck)
Date:
>
>     Node    *resultinfo;   /* pass or return extra info about result */
>     Node    *context;      /* pass info about context of call */
>
> Does this look good to you, or am I off the track entirely?
>
>             regards, tom lane
>

    Looks perfect. I appreciate to get rid of globals whenever
    possible.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

Re: [HACKERS] Function-manager redesign: second draft (long)

From
wieck@debis.com (Jan Wieck)
Date:
Tom Lane wrote:

> Bruce Momjian <maillist@candle.pha.pa.us> writes:
> > Sounds good.  My only question is whether people need backward
> > compatibility, and whether we can remove the compatiblity part of the
> > interface and small overhead after 7.1 or later?
>
> I think we could drop it after a decent interval, but I don't see any
> reason to be in a hurry.  I do think that we'll get complaints if 7.0
> doesn't have any backward compatibility for existing user functions.

    Right.   A   major   release  is  what  it  is.  And  porting
    applications to a new major release too, it is a  conversion,
    not an upgrade. Therefore a major release should drop as much
    backward compatibility code for minor releases as possible.

    Thus, we should think about getting rid of the broken  design
    for  functions  returning  tuple  sets  in  7.0.  As far as I
    understand the books I have, there are a couple of  different
    types  of  functions/procedures  out,  and not every database
    implements all types, nor do they all name one and  the  same
    type  equally  so  that  something  called  function  in  one
    database is  a  stored  procedure  in  another.  Anyway,  the
    different types are:

    1.  Functions  returning  a  scalar  value taking only input-
        arguments.

    2.  Functions returning a scalar value taking input-, output-
        and in/out-arguments.

    3.  Functions  returning nothing taking only input-arguments.

    4.  Functions returning nothing taking  input-,  output-  and
        in/out-arguments.

    5.  Functions  returning  a  set  of  result rows taking only
        input-arguments.

    6.  Functions returning a set of result rows  taking  input-,
        output- and in/out-arguments.

    I don't think that we have to implement everything, and since
    we don't have host variables,  output-  and  in/out-arguments
    would  make  sense  only for calls from procedural languages.
    OTOH they would cause much trouble so they are one detail  to
    let out for PostgreSQL.

    Three  cases  left. Type number 1. we have already. And it is
    advanced, because the arguments can be either single  values,
    or single rows.

    And  type  number 3. is easy, because invoking something that
    returns a dummy that is thrown away is absolutely no work.

    So the only thing that's really left is number 5.  The  funny
    detail  is,  that those functions or procedures can't be used
    inside regular SELECT queries. Instead  a  CALL  FUNCTION  or
    EXECUTE   PROCEDURE   statement   is  used  from  the  client
    application or inside a PL block. CALL FUNCTION then  returns
    a  tuple  set  as  a  SELECT  does.  The  result in our world
    therefore has a tuple descriptor and depending on the invoker
    is sent to the client or stored in an SPI tuple table.

    So  we  do  not need to call functions returning sets through
    the normal function manager. It could competely deny calls to
    set  functions,  and  the  interface  for them can be a total
    different one. I have  something  in  mind  that  could  work
    without  temp tables, but it requires a redesign for PL/pgSQL
    and causes some limitations for PL/Tcl. Let's leave that  for
    a past 7.0 release.

    I  correct  my  previous statements and vote to deny calls to
    set functions through the default function manager in 7.0.

    And there is another detail I found  while  browsing  through
    the  books.  Functions can be defined as [NOT] NULL CALL (IBM
    DB2).  Functions defined as NOT NULL CALL will be called only
    if  all  their  arguments aren't NULL. So we can prevent much
    NULL handling inside the functions if we simply define that a
    function  that  is  NOT NULL CALL will allways return NULL if
    any of it's input arguments is NULL. This case  can  then  be
    handled  at  the  function  manager level without calling the
    function itself. Nearly all our builtin functions behave that
    way but have all the tests inside.

    Another  detail  I'm  missing  now  is  a new, really defined
    interface for type input/output functions. The fact that they
    are  defined  taking  one  opaque  (yepp, should be something
    different as already discussed) argument but in fact get more
    information from the attribute is ugly.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

Re: [HACKERS] Function-manager redesign: second draft (long)

From
Hannu Krosing
Date:
Jan Wieck wrote:
> 
...
> 
>     5.  Functions  returning  a  set  of  result rows taking only
>         input-arguments.
... 
>     So the only thing that's really left is number 5.  The  funny
>     detail  is,  that those functions or procedures can't be used
>     inside regular SELECT queries. Instead  a  CALL  FUNCTION  or
>     EXECUTE   PROCEDURE   statement   is  used  from  the  client
>     application or inside a PL block. CALL FUNCTION then  returns
>     a  tuple  set  as  a  SELECT  does.  The  result in our world
>     therefore has a tuple descriptor and depending on the invoker
>     is sent to the client or stored in an SPI tuple table.
> 
>     So  we  do  not need to call functions returning sets through
>     the normal function manager. It could competely deny calls to
>     set  functions,  and  the  interface  for them can be a total
>     different one. I have  something  in  mind  that  could  work
>     without  temp tables, but it requires a redesign for PL/pgSQL
>     and causes some limitations for PL/Tcl. Let's leave that  for
>     a past 7.0 release.
> 
>     I  correct  my  previous statements and vote to deny calls to
>     set functions through the default function manager in 7.0.
> 

It would be very nice if we could use the tuple-set-returning 
functions in place of tables/views,

SELECT * FROM LOGGED_IN_USERS_INFO_PROC;

or at least define views on them: 

CREATE VIEV LOGGED_IN_USERS AS CALL FUNCTION LOGGED_IN_USERS_INFO_PROC;

We would not need to call them in place of functions that return
either single-value or tuple.

On the topic of 2x3=6 kinds of functions you mentioned I think we 
could use jet another type of functions -  the one returning a 
tuple/row as is ofteh case in python and possibly other languages 
that do automatic tuple packing/unpacking. 

It could be used in cases like this:

INSERT INTO MY_TABLE CALL FUNCTION MY_ROW_VALUE;

or 

DELETE FROM MY_TABLE WHERE * = CALL FUNCTION MY_ROW_VALUE;

(The last example is not ansi and does not work currently),

OTOH, these exaples would jus be redundant cases for your 5th case.

OTOOH, all the functions returning less than a set of rows are 
redundadnt cases of the functions that do ;)


-----------
Hannu


Re: [HACKERS] Function-manager redesign: second draft (long)

From
Hannu Krosing
Date:
Jan Wieck wrote:
> 
>     Another  detail  I'm  missing  now  is  a new, really defined
>     interface for type input/output functions. The fact that they
>     are  defined  taking  one  opaque  (yepp, should be something
>     different as already discussed) argument but in fact get more
>     information from the attribute is ugly.

Can we currently return a list of the same type ?

I guess we can, as lists (or arrays) are fundamentl types in 
PostgreSQL, but I'm not sure.

I would like to define aggregate functions list() and set()

Could I define then just once and specify that they return an array 
of their input type ?

Half of that is currently done for count() - i.e. it can take any 
type of argument, but I guess the return-array-of-input-type is more 
complicated.




Also (probably off topic) how hard would it be to add another type 
of aggregate funtions tha operate on pairs of values ?

I would like to have FOR_MIN and FOR_MAX (and possibly MIN_MIN and
MAX_MAX) functions that return _another_ field from a table for a 
minimal value in one field.

-------------
Hannu


Re: [HACKERS] Function-manager redesign: second draft (long)

From
wieck@debis.com (Jan Wieck)
Date:
Hannu Krosing wrote:
>
> Jan Wieck wrote:
> >
> >     I  correct  my  previous statements and vote to deny calls to
> >     set functions through the default function manager in 7.0.
> >
>
> It would be very nice if we could use the tuple-set-returning
> functions in place of tables/views,
>
> SELECT * FROM LOGGED_IN_USERS_INFO_PROC;

    Exactly  that's  something  I  want  for  long  now. Sticking
    another  querytree,  that  returns  a  tuple  set,   into   a
    rangetable  entry.  This  other  querytree  could be either a
    SELECT as in

        SELECT A.x, A.y, B.z FROM table1 A, (SELECT ...) B
            WHERE A.x = B.x;

    or a function returning a set as in

        SELECT A.x, A.y, B.z FROM table1 A, setfunc('arg') B
            WHERE A.x = B.x;

    Finally, the form

        CALL setfunc('arg');

    would be equivalent to a

        SELECT * FROM setfunc('arg');

    but closer to the IBM DB2 calling syntax.  The first  one  is
    required  to  get  rid  of  some problems in the rule system,
    especially views with aggregate columns that need  their  own
    GROUP BY clause. The other ones are what we need to implement
    stored procedures.

>
> or at least define views on them:
>
> CREATE VIEV LOGGED_IN_USERS AS CALL FUNCTION LOGGED_IN_USERS_INFO_PROC;

    Wrong syntax since the statement after AS must be  a  SELECT.
    But a

        CREATE VIEW v AS SELECT * FROM setfunc();

    would do the trick.

>
> We would not need to call them in place of functions that return
> either single-value or tuple.
>
> On the topic of 2x3=6 kinds of functions you mentioned I think we
> could use jet another type of functions -  the one returning a
> tuple/row as is ofteh case in python and possibly other languages
> that do automatic tuple packing/unpacking.
>
> It could be used in cases like this:
>
> INSERT INTO MY_TABLE CALL FUNCTION MY_ROW_VALUE;

    Let's  clearly distinguish between scalar, row and set return
    values.  A scalar return value is one  single  datum.  A  row
    return  value  is exactly one tuple of 1...n datums and a set
    return value is a collection of 0...n rows.

    What we have now (at least  what  works  properly)  are  only
    scalar  return  values  from  functions.  And I don't see the
    point of a row return, so I think we don't need them.

>
> or
>
> DELETE FROM MY_TABLE WHERE * = CALL FUNCTION MY_ROW_VALUE;
>
> (The last example is not ansi and does not work currently),
>
> OTOH, these exaples would jus be redundant cases for your 5th case.
>
> OTOOH, all the functions returning less than a set of rows are
> redundadnt cases of the functions that do ;)

    But please don't forget that it isn't enough  to  write  down
    the syntax and specify the behaviour with some english words.
    We must define the behaviour in C too, and in  that  language
    it's  a  little  more  than  a  redundant  case of something,
    because we don't have that something.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

Re: [HACKERS] Function-manager redesign: second draft (long)

From
Hannu Krosing
Date:
Jan Wieck wrote:
> 
> Hannu Krosing wrote:
> >
> > Jan Wieck wrote:
>     What we have now (at least  what  works  properly)  are  only
>     scalar  return  values  from  functions.  And I don't see the
>     point of a row return, so I think we don't need them.

That's what I mant by the OTOH below

> >
> > (The last example is not ansi and does not work currently),
> >
> > OTOH, these exaples would jus be redundant cases for your 5th case.
> >
> > OTOOH, all the functions returning less than a set of rows are
> > redundadnt cases of the functions that do ;)
> 
>     But please don't forget that it isn't enough  to  write  down
>     the syntax and specify the behaviour with some english words.
>     We must define the behaviour in C too, and in  that  language
>     it's  a  little  more  than  a  redundant  case of something,
>     because we don't have that something.

Yes, that's the hard part.

---------
Hannu


Re: [HACKERS] Function-manager redesign: second draft (long)

From
Bruce Momjian
Date:
> Tom Lane wrote:
> 
> > Bruce Momjian <maillist@candle.pha.pa.us> writes:
> > > Sounds good.  My only question is whether people need backward
> > > compatibility, and whether we can remove the compatiblity part of the
> > > interface and small overhead after 7.1 or later?
> >
> > I think we could drop it after a decent interval, but I don't see any
> > reason to be in a hurry.  I do think that we'll get complaints if 7.0
> > doesn't have any backward compatibility for existing user functions.
> 
>     Right.   A   major   release  is  what  it  is.  And  porting
>     applications to a new major release too, it is a  conversion,
>     not an upgrade. Therefore a major release should drop as much
>     backward compatibility code for minor releases as possible.

If this was simple stuff, we could keep compatibility with little
problem.  But, with complex stuff like this, keeping backward
compatibility sometimes make things more confusing.  They can code
things two ways, and that makes people get confused as to which one to
follow.


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Function-manager redesign: second draft (long)

From
frankpit@pop.dn.net
Date:
I think that the proposals for the revision of the function interface
are all an improvement on what is there, and I hope to try to find time
to help implement whatever is decided.  Here are some more thoughts in
relation to the question of set valued and tuple valued (or complex
type?) arguments.

Another place that user defined functions are used in the PostgreSQL
backend is in association with access methods.  Both as boolean
operators for search predicates, and as auxiliary functions for the
access methods.  Allowing set valued and tuple valued arguments and
return values for functions and operators in this setting can be
valuable.  

For instance, suppose I have a table t that stores geometric objects in
some space, and I have a spatial index such as R*-tree, or even a GIST
index. Given a set of points pts I want to do a query of the form

SELECT * FORM t WHERE t.object <intersects> pts;

Under these circumstances it would be really nice to be able to pass a
set of objects (as an SPI tuple table for instance) into the index.

Currently, the way I do this (with a custom access method) is to create
a temp table, put the key set into the temp table, and pass the name of
the temp table to the access method in the search key.  The access
method then does an SPI select on the temp table and stores the returned
items into the private scan state for use during the scan.  

While I realize that implementing this example requires much more than a
change to the function interface, I hope that it illustrates that it is
perhaps a good idea to keep as much flexibility in the function
interface as possible. 

Bernie Franpitt


Re: [HACKERS] Function-manager redesign: second draft (long)

From
wieck@debis.com (Jan Wieck)
Date:
Bernie Franpitt wrote:

> SELECT * FORM t WHERE t.object <intersects> pts;
>
> Under these circumstances it would be really nice to be able to pass a
> set of objects (as an SPI tuple table for instance) into the index.
>
> Currently, the way I do this (with a custom access method) is to create
> a temp table, put the key set into the temp table, and pass the name of
> the temp table to the access method in the search key.  The access
> method then does an SPI select on the temp table and stores the returned
> items into the private scan state for use during the scan.
>
> While I realize that implementing this example requires much more than a
> change to the function interface, I hope that it illustrates that it is
> perhaps a good idea to keep as much flexibility in the function
> interface as possible.

    Uhhh - it is a good idea, but passing tuple sets as arguments
    to functions, that will cause headaches, man.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

Re: [HACKERS] Function-manager redesign: second draft (long)

From
Peter Eisentraut
Date:
On Oct 30, Jan Wieck mentioned:

>     Right.   A   major   release  is  what  it  is.  And  porting
>     applications to a new major release too, it is a  conversion,
>     not an upgrade. Therefore a major release should drop as much
>     backward compatibility code for minor releases as possible.

Certainly true. But that would also mean that we'd have to keep
maintaining the 6.* series as well. At least bug-fixing and releasing one
or two more 6.5.x versions. Up until now the usual answer to a bug was
"upgrade to latest version". But if you break compatibility you can't do
that any more. (Compare to Linux 2.0 vs 2.2) I'm just wondering what the
plans are in that regard.

-- 
Peter Eisentraut                  Sernanders vaeg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden



Re: [HACKERS] Function-manager redesign: second draft (long)

From
wieck@debis.com (Jan Wieck)
Date:
Peter Eisentraut wrote:

> On Oct 30, Jan Wieck mentioned:
>
> >     Right.   A   major   release  is  what  it  is.  And  porting
> >     applications to a new major release too, it is a  conversion,
> >     not an upgrade. Therefore a major release should drop as much
> >     backward compatibility code for minor releases as possible.
>
> Certainly true. But that would also mean that we'd have to keep
> maintaining the 6.* series as well. At least bug-fixing and releasing one
> or two more 6.5.x versions. Up until now the usual answer to a bug was
> "upgrade to latest version". But if you break compatibility you can't do
> that any more. (Compare to Linux 2.0 vs 2.2) I'm just wondering what the
> plans are in that regard.

    Sometimes  ago, I already pointed out that we have to support
    one or too older releases for some time. Not only because  we
    might  drop  some  compatibility  code.  Each release usually
    declares one  or  the  other  new  keyword,  making  existing
    applications  probably  fail  with  the  new  release. And no
    amount of compatibility code would help in that case! It's  a
    deadlock trap, an application that cannot be easily ported to
    a  newer  release  because  of   incompatibilities   in   the
    querylaguage  cannot use the last release it is compatible to
    because of a bug.

    There is a new aspect in this discussion since then. The  new
    corporation PostgreSQL Inc. offers commercial support for our
    database (look at www.pgsql.com). If they offer support, they
    must  support  older  releases  as  well,  so  they  need  to
    backpatch already.

    Wouldn't it be a good idea if their return into  our  project
    are   bugfix   releases   of   older   versions  (created  by
    backpatching release branches)?  In the case of  a  customers
    accident,  they  have  to  do  it  anyway.  And  doing it for
    critical bugs during idle time could avoid accidents, so it's
    good customer service.

    Marc, what do you think about a such an agreement?


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

Re: [HACKERS] Function-manager redesign: second draft (long)

From
Tom Lane
Date:
wieck@debis.com (Jan Wieck) writes:
>     There is a new aspect in this discussion since then. The  new
>     corporation PostgreSQL Inc. offers commercial support for our
>     database (look at www.pgsql.com). If they offer support, they
>     must  support  older  releases  as  well,  so  they  need  to
>     backpatch already.

Yes, but who's the "them" here?  If PostgreSQL Inc. has any warm
bodies other than the existing group of developers, I sure haven't
heard from them...

I agree 100% with Jan's basic point: we must provide a degree of
backwards compatibility from release to release.  In some cases
that might create enough pain to be worth debating, but in this
particular case it seems like the choice is a no-brainer.  We just
leave in the transition fmgr code that we're going to write anyway.
I don't understand why it even got to be a topic of discussion.
        regards, tom lane