Thread: FmgrInfo allocation patterns (and PL handling as staged programming)

FmgrInfo allocation patterns (and PL handling as staged programming)

From

Chapman Flack

Date:

06 April, 20:18:31

Hi hackers,

The way the core code allocates FmgrInfo structures has a pleasing property
(at least in the parts of the code I have read and the cases I've tested)
that the docs don't seem to emphasize.

To wit, given a query like

SELECT hello(n), hello(x)
FROM (VALUES
  (1::int4, 1.0::float4),
  (2,       2.0        ),
  (3,       3.0        )) AS t(n,x);

the core code allocates one FmgrInfo for each of the two uses.
If the PL handler has a useful way to specialize the polymorphic-typed
function to int4 in the one case and float4 in the other, each can be
stashed in its respective flinfo->fn_extra and the specialized versions
used as expected for the duration of the query.

Likewise, the trigger manager prepares for a query by allocating
one FmgrInfo for each distinct trigger involved, so if there is a single
trigger function referenced by multiple triggers, a version specialized
to each trigger can be stashed on its respective flinfo and used for
all firings of that trigger the query may generate.

This lends itself to a nice staged-programming view of a PL handler's job:

  prepare:              RegProcedure -> Template (cache by regproc)
  specialize:  (Template, call site) -> Routine  (cache on fn_extra)
  apply:           (Routine, fcinfo) -> result

(where I consider some members both of flinfo, such as fn_expr, and of
fcinfo, such as nargs, context, resultinfo, to be logically properties
of a notional "call site").

I wonder, though, if there might be code in the wild, or even in corners
of the core I haven't looked in, where FmgrInfo structs aren't being used
that way, and could get reused for successive calls of one routine but
with, say, different nargs or argument types. That would seem odd but
I don't see that the documentation ever came right out and said not to.

If that seems like a non-imaginary risk, I wonder if FmgrInfo could sprout
something like a safe-to-specialize bit, initialized to false for old code
that doesn't know about it, but set true in places where FmgrInfo structs
are definitely being managed as described above?

I suspect that would result in the vast majority of FmgrInfo structs
ever really encountered having the bit set.

Regards,
-Chap



A work-in-progress PL dispatcher based on the above can be seen at:


https://tada.github.io/pljava/preview1.7/pljava-api/apidocs/org.postgresql.pljava/org/postgresql/pljava/PLJavaBasedLanguage.html

Re: FmgrInfo allocation patterns (and PL handling as staged programming)

From

Tom Lane

Date:

06 April, 20:33:47

Chapman Flack <jcflack@acm.org> writes:
> To wit, given a query like

> SELECT hello(n), hello(x)
> FROM (VALUES
>   (1::int4, 1.0::float4),
>   (2,       2.0        ),
>   (3,       3.0        )) AS t(n,x);

> the core code allocates one FmgrInfo for each of the two uses.

Yeah, there's no attempt to merge FmgrInfos across call sites
within a query.  It's typical to use fn_extra to point to dynamic
state for a call, so that any such merging could break things.

> I wonder, though, if there might be code in the wild, or even in corners
> of the core I haven't looked in, where FmgrInfo structs aren't being used
> that way, and could get reused for successive calls of one routine but
> with, say, different nargs or argument types. That would seem odd but
> I don't see that the documentation ever came right out and said not to.

The only case I'm aware of that might require some thought is that the
relcache caches FmgrInfo structs for the opclass support functions for
each column of an index.  That seems like it's close enough to being
just as specialized as a query callsite, but maybe not?

A downside of relying entirely on fn_extra is that you don't get to
amortize the specialization work across queries, even though it's
probably pretty repetitive.  You might be interested in this recent
commit:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=0dca5d68d7bebf2c1036fd84875533afef6df992

which formalizes some caching rules that plpgsql has used for
a long time, and extends the rules to support SQL-language
functions (which need to specialize on output rowtype as well
as what plpgsql has traditionally considered).  Maybe you'd be
interested in using funccache.

            regards, tom lane

Re: FmgrInfo allocation patterns (and PL handling as staged programming)

From

Tom Lane

Date:

06 April, 20:59:52

I wrote:
> Chapman Flack <jcflack@acm.org> writes:
>> I wonder, though, if there might be code in the wild, or even in corners
>> of the core I haven't looked in, where FmgrInfo structs aren't being used
>> that way, and could get reused for successive calls of one routine but
>> with, say, different nargs or argument types. That would seem odd but
>> I don't see that the documentation ever came right out and said not to.

> The only case I'm aware of that might require some thought is that the
> relcache caches FmgrInfo structs for the opclass support functions for
> each column of an index.  That seems like it's close enough to being
> just as specialized as a query callsite, but maybe not?

Actually, there is a case where you have to be careful if you support
polymorphic arguments: the element type of an anyarray argument can
change on-the-fly from one call to the next in the same query.  I
think this is only possible when you're fed pg_stats.most_common_vals
or one of its sibling columns, but that's enough to be a problem.
That's why all of our array-munging functions that use fn_extra to
cache type-dependent state are careful to check the element type
against the cache every single time.

            regards, tom lane

Re: FmgrInfo allocation patterns (and PL handling as staged programming)

From

Chapman Flack

Date:

06 April, 21:16:38

On 04/06/25 13:59, Tom Lane wrote:
> polymorphic arguments: the element type of an anyarray argument can
> change on-the-fly from one call to the next in the same query.

Yuck! I had not guessed that.

Regards,
-Chap

Re: FmgrInfo allocation patterns (and PL handling as staged programming)

From

Chapman Flack

Date:

06 April, 22:39:43

On 04/06/25 13:33, Tom Lane wrote:
> Maybe you'd be interested in using funccache.

O funccache, where were you a year or two ago?

Can a spread-out variadic "any" arg list ever vary
in length or type on the fly at a single call site?
I notice that funccache only hashes the first nargs
argument types.

Regards,
-Chap

Re: FmgrInfo allocation patterns (and PL handling as staged programming)

From

Tom Lane

Date:

06 April, 22:47:46

Chapman Flack <jcflack@acm.org> writes:
> Can a spread-out variadic "any" arg list ever vary
> in length or type on the fly at a single call site?

Don't think so.

> I notice that funccache only hashes the first nargs
> argument types.

Yeah; the languages it serves today don't support VARIADIC ANY,
so it's not really a concern.  But if you'd like to consider doing so,
we could talk about what's needful.  I don't think funccache is set in
stone (yet).

            regards, tom lane

Re: FmgrInfo allocation patterns (and PL handling as staged programming)

From

Chapman Flack

Date:

07 April, 02:27:22

On 04/06/25 15:47, Tom Lane wrote:
> Chapman Flack <jcflack@acm.org> writes:
>> Can a spread-out variadic "any" arg list ever vary
>> in length or type on the fly at a single call site?
> 
> Don't think so.

Only slightly tangentially: what things count as "objects that depend
on the transform" for purposes of DROP TRANSFORM ... CASCADE / RESTRICT ?

I just now dropped (in 17.4) a transform named in an existing routine's
declaration, with nary a peep.

I'm not, for now, complaining, as my objective was to make sure my
dispatcher noticed that, when dispatching to the routine, and if DROP
TRANSFORM had refused, I'd have had to work harder.

Right now the dispatcher is only checking before validate or prepare though.
If transforms can really just vanish I'd better check even when dispatching
to a cached version.

I suppose a cached routine could continue merrily along using
the vanished transform's functions, as long as they haven't vanished.
But once the transform is dropped there's nothing saying you can't
drop the functions, and that could get weird.

Or, is somehow registering a routine's dependency on a transform
something that its PL handler is responsible for doing? I could sort
of see that, given that it's the PL that has to make whatever actual
transformation happens, happen. But relying on the PL to record the
dependency would seem brittle, what with validators not running
for sure in all circumstances.

Regards,
-Chap

Re: FmgrInfo allocation patterns (and PL handling as staged programming)

From

Tom Lane

Date:

07 April, 03:01:51

Chapman Flack <jcflack@acm.org> writes:
> Only slightly tangentially: what things count as "objects that depend
> on the transform" for purposes of DROP TRANSFORM ... CASCADE / RESTRICT ?
> I just now dropped (in 17.4) a transform named in an existing routine's
> declaration, with nary a peep.

Hmm, really?  ProcedureCreate() certainly looks like it creates
necessary dependencies, and when I tried it just now, I got the
expected failure:

regression=# create extension hstore_plperl;
CREATE EXTENSION
regression=# create function foo(f1 hstore) returns hstore language plperl
as 'return f1' transform for type hstore;
CREATE FUNCTION
regression=# drop extension hstore_plperl;
ERROR:  cannot drop extension hstore_plperl because other objects depend on it
DETAIL:  function foo(hstore) depends on transform for hstore language plperl
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

But wait a minute ... if I recreate the function with no
TRANSFORM clause, I *still* can't drop the transform:

regression=# drop function foo(f1 hstore);
DROP FUNCTION
regression=# create function foo(f1 hstore) returns hstore language plperl
as 'return f1';
CREATE FUNCTION
regression=# drop extension hstore_plperl;
ERROR:  cannot drop extension hstore_plperl because other objects depend on it
DETAIL:  function foo(hstore) depends on transform for hstore language plperl
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

Looking more closely at ProcedureCreate(), it makes a dependency
if a transform *exists* for the argument or result type, whether
a TRANSFORM clause is present or not.  Surely this is completely
bogus?  We should be depending on the OIDs mentioned in protrftypes,
not more nor less.

I'm not quite sure if that somehow explains your results, but
it does look like a bug to me.

            regards, tom lane

Re: FmgrInfo allocation patterns (and PL handling as staged programming)

From

Chapman Flack

Date:

07 April, 03:41:04

On 04/06/25 20:01, Tom Lane wrote:
> Looking more closely at ProcedureCreate(), it makes a dependency
> if a transform *exists* for the argument or result type, whether
> a TRANSFORM clause is present or not.  Surely this is completely
> bogus?  We should be depending on the OIDs mentioned in protrftypes,

I think that's it. I tested by creating a function like

create function foo() returns text transform for type circle

that is, with the transform type not appearing as an argument or
return type.

As far as I know, that's still a cromulent usage, as I could be saying
I want the function to apply that transform to circles it uses or retrieves
in queries it makes.

But if the dependency is being created based on argument/return types
and not on protrftypes, that will slip right through the cracks.

Regards,
-Chap

Re: FmgrInfo allocation patterns (and PL handling as staged programming)

From

Tom Lane

Date:

07 April, 05:37:41

Chapman Flack <jcflack@acm.org> writes:
> On 04/06/25 20:01, Tom Lane wrote:
>> Looking more closely at ProcedureCreate(), it makes a dependency
>> if a transform *exists* for the argument or result type, whether
>> a TRANSFORM clause is present or not.  Surely this is completely
>> bogus?  We should be depending on the OIDs mentioned in protrftypes,

> I think that's it. I tested by creating a function like
> create function foo() returns text transform for type circle
> that is, with the transform type not appearing as an argument or
> return type.
> As far as I know, that's still a cromulent usage, as I could be saying
> I want the function to apply that transform to circles it uses or retrieves
> in queries it makes.

Oh, interesting interpretation.  The in-core PLs only consider the
transform list in relation to parameters and results, but I suppose
that's just because they have no use for internal conversions ...
and even then, you could argue that that's wrong and they should
apply the specified conversions when invoking, say, SQL queries
via SPI.  Something to think about another day.

Here's a draft patch to fix the bogus dependencies.  As given this'd
only be okay for HEAD, since I doubt we can get away with changing
ProcedureCreate()'s signature in stable branches.  Given the lack of
prior complaints, I'm content to fix this only in HEAD --- but maybe
somebody will think differently?  In the back branches we could make
ProcedureCreate() deconstruct the OID array it's given and then repeat
the transform lookups, but ugh.  I guess a "ProcedureCreateExt"
alternate entry point would do the trick too.

BTW, I feel a little uncomfortable with the idea that we're adding
dependencies on objects that are explicitly mentioned nowhere in the
pg_proc entry.  I suppose it's okay, and the preceding code did that
too, but still.

            regards, tom lane

diff --git a/contrib/bool_plperl/expected/bool_plperl.out b/contrib/bool_plperl/expected/bool_plperl.out
index 187df8db96f..183dc07b3fb 100644
--- a/contrib/bool_plperl/expected/bool_plperl.out
+++ b/contrib/bool_plperl/expected/bool_plperl.out
@@ -104,9 +104,9 @@ SELECT spi_test();

 DROP EXTENSION plperl CASCADE;
 NOTICE:  drop cascades to 6 other objects
-DETAIL:  drop cascades to function spi_test()
-drop cascades to extension bool_plperl
+DETAIL:  drop cascades to extension bool_plperl
 drop cascades to function perl2int(integer)
 drop cascades to function perl2text(text)
 drop cascades to function perl2undef()
 drop cascades to function bool2perl(boolean,boolean,boolean)
+drop cascades to function spi_test()
diff --git a/contrib/bool_plperl/expected/bool_plperlu.out b/contrib/bool_plperl/expected/bool_plperlu.out
index 8337d337e99..1496bbafac8 100644
--- a/contrib/bool_plperl/expected/bool_plperlu.out
+++ b/contrib/bool_plperl/expected/bool_plperlu.out
@@ -104,9 +104,9 @@ SELECT spi_test();

 DROP EXTENSION plperlu CASCADE;
 NOTICE:  drop cascades to 6 other objects
-DETAIL:  drop cascades to function spi_test()
-drop cascades to extension bool_plperlu
+DETAIL:  drop cascades to extension bool_plperlu
 drop cascades to function perl2int(integer)
 drop cascades to function perl2text(text)
 drop cascades to function perl2undef()
 drop cascades to function bool2perl(boolean,boolean,boolean)
+drop cascades to function spi_test()
diff --git a/src/backend/catalog/pg_aggregate.c b/src/backend/catalog/pg_aggregate.c
index bcf4050f5b1..a05f8a87c1f 100644
--- a/src/backend/catalog/pg_aggregate.c
+++ b/src/backend/catalog/pg_aggregate.c
@@ -637,6 +637,7 @@ AggregateCreate(const char *aggName,
                              parameterNames,    /* parameterNames */
                              parameterDefaults, /* parameterDefaults */
                              PointerGetDatum(NULL), /* trftypes */
+                             NIL,    /* trfoids */
                              PointerGetDatum(NULL), /* proconfig */
                              InvalidOid,    /* no prosupport */
                              1, /* procost */
diff --git a/src/backend/catalog/pg_proc.c b/src/backend/catalog/pg_proc.c
index 880b597fb3a..5fdcf24d5f8 100644
--- a/src/backend/catalog/pg_proc.c
+++ b/src/backend/catalog/pg_proc.c
@@ -26,7 +26,6 @@
 #include "catalog/pg_proc.h"
 #include "catalog/pg_transform.h"
 #include "catalog/pg_type.h"
-#include "commands/defrem.h"
 #include "executor/functions.h"
 #include "funcapi.h"
 #include "mb/pg_wchar.h"
@@ -61,6 +60,35 @@ static bool match_prosrc_to_literal(const char *prosrc, const char *literal,
 /* ----------------------------------------------------------------
  *        ProcedureCreate
  *
+ *    procedureName: string name of routine (proname)
+ *    procNamespace: OID of namespace (pronamespace)
+ *    replace: true to allow replacement of an existing pg_proc entry
+ *    returnsSet: returns set? (proretset)
+ *    returnType: OID of result type (prorettype)
+ *    proowner: OID of owner role (proowner)
+ *    languageObjectId: OID of function language (prolang)
+ *    languageValidator: OID of validator function to apply, if any
+ *    prosrc: string form of function definition (prosrc)
+ *    probin: string form of binary reference, or NULL (probin)
+ *    prosqlbody: Node tree of pre-parsed SQL body, or NULL (prosqlbody)
+ *    prokind: function/aggregate/procedure/etc code (prokind)
+ *    security_definer: security definer? (prosecdef)
+ *    isLeakProof: leak proof? (proleakproof)
+ *    isStrict: strict? (proisstrict)
+ *    volatility: volatility code (provolatile)
+ *    parallel: parallel safety code (proparallel)
+ *    parameterTypes: input parameter types, as an oidvector (proargtypes)
+ *    allParameterTypes: all parameter types, as an OID array (proallargtypes)
+ *    parameterModes: parameter modes, as a "char" array (proargmodes)
+ *    parameterNames: parameter names, as a text array (proargnames)
+ *    parameterDefaults: defaults, as a List of Node trees (proargdefaults)
+ *    trftypes: transformable type OIDs, as an OID array (protrftypes)
+ *    trfoids: List of transform OIDs that routine should depend on
+ *    proconfig: GUC set clauses, as a text array (proconfig)
+ *    prosupport: OID of support function, if any (prosupport)
+ *    procost: cost factor (procost)
+ *    prorows: estimated output rows for a SRF (prorows)
+ *
  * Note: allParameterTypes, parameterModes, parameterNames, trftypes, and proconfig
  * are either arrays of the proper types or NULL.  We declare them Datum,
  * not "ArrayType *", to avoid importing array.h into pg_proc.h.
@@ -90,6 +118,7 @@ ProcedureCreate(const char *procedureName,
                 Datum parameterNames,
                 List *parameterDefaults,
                 Datum trftypes,
+                List *trfoids,
                 Datum proconfig,
                 Oid prosupport,
                 float4 procost,
@@ -115,7 +144,6 @@ ProcedureCreate(const char *procedureName,
                 referenced;
     char       *detailmsg;
     int            i;
-    Oid            trfid;
     ObjectAddresses *addrs;

     /*
@@ -609,25 +637,18 @@ ProcedureCreate(const char *procedureName,
     ObjectAddressSet(referenced, TypeRelationId, returnType);
     add_exact_object_address(&referenced, addrs);

-    /* dependency on transform used by return type, if any */
-    if ((trfid = get_transform_oid(returnType, languageObjectId, true)))
-    {
-        ObjectAddressSet(referenced, TransformRelationId, trfid);
-        add_exact_object_address(&referenced, addrs);
-    }
-
     /* dependency on parameter types */
     for (i = 0; i < allParamCount; i++)
     {
         ObjectAddressSet(referenced, TypeRelationId, allParams[i]);
         add_exact_object_address(&referenced, addrs);
+    }

-        /* dependency on transform used by parameter type, if any */
-        if ((trfid = get_transform_oid(allParams[i], languageObjectId, true)))
-        {
-            ObjectAddressSet(referenced, TransformRelationId, trfid);
-            add_exact_object_address(&referenced, addrs);
-        }
+    /* dependency on transforms, if any */
+    foreach_oid(transformid, trfoids)
+    {
+        ObjectAddressSet(referenced, TransformRelationId, transformid);
+        add_exact_object_address(&referenced, addrs);
     }

     /* dependency on support function, if any */
diff --git a/src/backend/commands/functioncmds.c b/src/backend/commands/functioncmds.c
index b9fd7683abb..0335e982b31 100644
--- a/src/backend/commands/functioncmds.c
+++ b/src/backend/commands/functioncmds.c
@@ -1046,6 +1046,7 @@ CreateFunction(ParseState *pstate, CreateFunctionStmt *stmt)
     List       *parameterDefaults;
     Oid            variadicArgType;
     List       *trftypes_list = NIL;
+    List       *trfoids_list = NIL;
     ArrayType  *trftypes;
     Oid            requiredResultType;
     bool        isWindowFunc,
@@ -1157,11 +1158,12 @@ CreateFunction(ParseState *pstate, CreateFunctionStmt *stmt)
             Oid            typeid = typenameTypeId(NULL,
                                                 lfirst_node(TypeName, lc));
             Oid            elt = get_base_element_type(typeid);
+            Oid            transformid;

             typeid = elt ? elt : typeid;
-
-            get_transform_oid(typeid, languageOid, false);
+            transformid = get_transform_oid(typeid, languageOid, false);
             trftypes_list = lappend_oid(trftypes_list, typeid);
+            trfoids_list = lappend_oid(trfoids_list, transformid);
         }
     }

@@ -1292,6 +1294,7 @@ CreateFunction(ParseState *pstate, CreateFunctionStmt *stmt)
                            PointerGetDatum(parameterNames),
                            parameterDefaults,
                            PointerGetDatum(trftypes),
+                           trfoids_list,
                            PointerGetDatum(proconfig),
                            prosupport,
                            procost,
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 3cb3ca1cca1..45ae7472ab5 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -1810,6 +1810,7 @@ makeRangeConstructors(const char *name, Oid namespace,
                                  PointerGetDatum(NULL), /* parameterNames */
                                  NIL,    /* parameterDefaults */
                                  PointerGetDatum(NULL), /* trftypes */
+                                 NIL,    /* trfoids */
                                  PointerGetDatum(NULL), /* proconfig */
                                  InvalidOid,    /* prosupport */
                                  1.0,    /* procost */
@@ -1875,6 +1876,7 @@ makeMultirangeConstructors(const char *name, Oid namespace,
                              PointerGetDatum(NULL), /* parameterNames */
                              NIL,    /* parameterDefaults */
                              PointerGetDatum(NULL), /* trftypes */
+                             NIL,    /* trfoids */
                              PointerGetDatum(NULL), /* proconfig */
                              InvalidOid,    /* prosupport */
                              1.0,    /* procost */
@@ -1919,6 +1921,7 @@ makeMultirangeConstructors(const char *name, Oid namespace,
                              PointerGetDatum(NULL), /* parameterNames */
                              NIL,    /* parameterDefaults */
                              PointerGetDatum(NULL), /* trftypes */
+                             NIL,    /* trfoids */
                              PointerGetDatum(NULL), /* proconfig */
                              InvalidOid,    /* prosupport */
                              1.0,    /* procost */
@@ -1957,6 +1960,7 @@ makeMultirangeConstructors(const char *name, Oid namespace,
                              PointerGetDatum(NULL), /* parameterNames */
                              NIL,    /* parameterDefaults */
                              PointerGetDatum(NULL), /* trftypes */
+                             NIL,    /* trfoids */
                              PointerGetDatum(NULL), /* proconfig */
                              InvalidOid,    /* prosupport */
                              1.0,    /* procost */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 45f593fc958..d7353e7a088 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -211,6 +211,7 @@ extern ObjectAddress ProcedureCreate(const char *procedureName,
                                      Datum parameterNames,
                                      List *parameterDefaults,
                                      Datum trftypes,
+                                     List *trfoids,
                                      Datum proconfig,
                                      Oid prosupport,
                                      float4 procost,

Re: FmgrInfo allocation patterns (and PL handling as staged programming)

From

Chapman Flack

Date:

07 April, 20:26:37

On 04/06/25 13:59, Tom Lane wrote:
> polymorphic arguments: the element type of an anyarray argument can
> change on-the-fly from one call to the next in the same query.  I
> think this is only possible when you're fed pg_stats.most_common_vals
> or one of its sibling columns, but that's enough to be a problem.
> That's why all of our array-munging functions that use fn_extra to
> cache type-dependent state are careful to check the element type
> against the cache every single time.

This was really bumming me out. I thought "what on earth does that do
to the rest of your surrounding query, say if you have anyelement types
in the args or return value also?".

So the answer to that part is easy: if a routine's types include both
anyarray and anyelement (like, say, unnest), it just can't be applied
to one of those statistics columns. An attempt to use it that way in
a query is rejected early, before any attempt to call the routine:

ERROR:  cannot determine element type of "anyarray" argument
STATEMENT:  select unnest(stavalues1) from pg_statistic limit 1;

Now, if you have a routine that uses anyarray but no corresponding
element pseudotype, you are allowed to apply that to a statistics
column. It can even have anyarray as a return/output type also.
When it's applied to a statistics column, here's what my dispatcher sees:

# select foo(stavalues1) from pg_statistic limit 1;

RegProcedure[1255,16861,0]public.foo essentialChecks: checkBody true
RegProcedure[1255,16861,0]public.foo prepare():
inputsTemplate   : [RegType.Unresolved[1247,2277,0]pg_catalog.anyarray]
unresolvedInputs : {0}
outputsTemplate  : [RegType.Unresolved[1247,2277,0]pg_catalog.anyarray]
unresolvedOutputs: {0}

RegProcedure[1255,16861,0]public.foo Template.specialize():
precomputed id   : 39d9314d
inputsDescriptor : [RegType.Unresolved[1247,2277,0]pg_catalog.anyarray]
inputsAreSpread  : false
stableInputs     : {}
outputsDescriptor: [RegType.Unresolved[1247,2277,0]pg_catalog.anyarray]

So the polymorphic type resolution applied at specialize() time
"succeeded", raised no error, and left the pseudotype anyarray
in both the inputs and outputs supposedly-"resolved" results.

That seems to make for a tidy way of recognizing the situation. If
you go to specialize and find anyarray among your supposedly-resolved
call-site types, well, you know you're in Wonderland and can adjust
behavior accordingly.

Regards,
-Chap

Re: FmgrInfo allocation patterns (and PL handling as staged programming)

From

Tom Lane

Date:

07 April, 20:34:49

Chapman Flack <jcflack@acm.org> writes:
> This was really bumming me out. I thought "what on earth does that do
> to the rest of your surrounding query, say if you have anyelement types
> in the args or return value also?".

> So the answer to that part is easy: if a routine's types include both
> anyarray and anyelement (like, say, unnest), it just can't be applied
> to one of those statistics columns. An attempt to use it that way in
> a query is rejected early, before any attempt to call the routine:

> ERROR:  cannot determine element type of "anyarray" argument
> STATEMENT:  select unnest(stavalues1) from pg_statistic limit 1;

Right.  AFAIK the oddity doesn't "leak out" of functions taking
anyarray.

> That seems to make for a tidy way of recognizing the situation. If
> you go to specialize and find anyarray among your supposedly-resolved
> call-site types, well, you know you're in Wonderland and can adjust
> behavior accordingly.

Cool.

            regards, tom lane

Re: FmgrInfo allocation patterns (and PL handling as staged programming)

From

Chapman Flack

Date:

15 April, 03:32:28

On 04/06/25 22:37, Tom Lane wrote:
> Here's a draft patch to fix the bogus dependencies.  As given this'd
> only be okay for HEAD, since I doubt we can get away with changing
> ProcedureCreate()'s signature in stable branches ... In the back branches
> we could make ProcedureCreate() deconstruct the OID array it's given and
> then repeat the transform lookups, but ugh. ...
> 
> BTW, I feel a little uncomfortable with the idea that we're adding
> dependencies on objects that are explicitly mentioned nowhere in the
> pg_proc entry.

Pursuing that idea a bit further, was there a compelling original reason
the column in pg_proc had to be protrftypes and not just protransforms?

The transforms have been looked up at ProcedureCreate time.

Any PL that supports transforms has to repeat the transform lookups
if there are protrftypes present.

Any PL that doesn't support transforms needs only to check for presence
and say "sorry, I don't support those", which it could do just as easily
with a list of transforms as with a list of types.

Regards,
-Chap

Re: FmgrInfo allocation patterns (and PL handling as staged programming)

From

Pavel Stehule

Date:

15 April, 08:05:15

út 15. 4. 2025 v 2:32 odesílatel Chapman Flack <jcflack@acm.org> napsal:

On 04/06/25 22:37, Tom Lane wrote:
> Here's a draft patch to fix the bogus dependencies. As given this'd
> only be okay for HEAD, since I doubt we can get away with changing
> ProcedureCreate()'s signature in stable branches ... In the back branches
> we could make ProcedureCreate() deconstruct the OID array it's given and
> then repeat the transform lookups, but ugh. ...
>
> BTW, I feel a little uncomfortable with the idea that we're adding
> dependencies on objects that are explicitly mentioned nowhere in the
> pg_proc entry.

Pursuing that idea a bit further, was there a compelling original reason
the column in pg_proc had to be protrftypes and not just protransforms?

The transforms have been looked up at ProcedureCreate time.

Any PL that supports transforms has to repeat the transform lookups
if there are protrftypes present.

Any PL that doesn't support transforms needs only to check for presence
and say "sorry, I don't support those", which it could do just as easily
with a list of transforms as with a list of types.

There was a long discussion, and I think the main reason for this design is a possibility to use an old code without change although the transformations are installed. And secondly there was a possibility to use a transformation when it was installed, and use it although it was installed after the function was created. For example PL/pgSQL code has no dependency on types or casts that are internally used. And I can write code in PL/Python that can work with or without transformation, and can be unfriendly to force recreate functions to use a transformation. Today we have more tools for processing dependencies in code, but 13 years ago the possibility to work with PL code and transformations independently (how much was possible) made sense. Today it is not a problem to build a new production server and migrate data there, and there are less reasons to maintain old servers. There was more expectation about others PL (than PL/pgSQL) and about transformations (a lot of work where PLPerl was used before can be done now by SQL/XML or SQL/JSON or by FDW).

https://www.postgresql.org/message-id/flat/1339713732.11971.79.camel%40vanquo.pezone.net

Regards

Pavel

Regards,
-Chap

Re: FmgrInfo allocation patterns (and PL handling as staged programming)

From

Tom Lane

Date:

15 April, 17:52:09

Chapman Flack <jcflack@acm.org> writes:
> On 04/06/25 22:37, Tom Lane wrote:
>> BTW, I feel a little uncomfortable with the idea that we're adding
>> dependencies on objects that are explicitly mentioned nowhere in the
>> pg_proc entry.

> Pursuing that idea a bit further, was there a compelling original reason
> the column in pg_proc had to be protrftypes and not just protransforms?

The problem from a PL's standpoint is "given this input or output
of type FOO, should I transform it, and if so using what?".  So
the starting point has to be a type not a transform.  The lookups
are implemented by get_transform_tosql and get_transform_fromsql,
and if you look at those you find that the protrftypes data is used
as a filter before attempting a pg_transform lookup.  If the pg_proc
entry contained transform OIDs we'd have to filter after the lookup,
which is pretty inefficient, especially if you expect that the normal
case is that there's not an applicable transform.

So I agree with storing protrftypes.  Maybe we should also have stored
a parallel array of transform OIDs, but it'd be just to make the
dependency more obvious, and maybe it's not worth the storage space.

            regards, tom lane

transforms [was Re: FmgrInfo allocation patterns (and PL handling as staged programming)]

From

Chapman Flack

Date:

16 April, 05:10:59

On 04/15/25 10:52, Tom Lane wrote:
> The problem from a PL's standpoint is "given this input or output
> of type FOO, should I transform it, and if so using what?".  So
> the starting point has to be a type not a transform. ... protrftypes data
> is used as a filter before attempting a pg_transform lookup. If the pg_proc
> entry contained transform OIDs we'd have to filter after the lookup,

I don't think I follow. If a function is declared in LANGUAGE BAR with
TRANSFORM FOR TYPE FOO, then either:

1. There is no transform declared for (trftype foo, trflang bar).
   CREATE FUNCTION fails, 42704: transform for type foo language "bar"
   does not exist.

2. CREATE FUNCTION succeeds and there is a transform
   (trftype foo, trflang bar, trffromsql f1, trftosql f2).

So the choice of what to put in pg_proc is between the oid of type foo,
or the oid of the transform (foo, bar, f1, f2).

If the function is going to encounter type foo at all, it cannot avoid
looking up the transform, either by the transform oid or by (type, lang).

The only case in which that lookup could be avoided would be where
a function declared with TRANSFORM FOR TYPE FOO actually sees no input
or output arguments or internally-SQL-encountered values of type foo in
a given call or at a given call site.

That seems to me like an edge case, so I really am questioning this:

> which is pretty inefficient, especially if you expect that the normal
> case is that there's not an applicable transform.

I *don't* expect that. I expect that if someone took the time to declare
the function with TRANSFORM FOR TYPE FOO and CreateFunction confirmed
that transform existed, the function is expected to encounter values
of type foo and apply that intended transform to them.

Perhaps the efficiency argument is really "say a function has
a list of 100 arguments and only one is of type foo, how many cycles
are wasted in get_transform_tosql and get_transform_fromsql applied
to all those other types?"

At present, they return quickly when the passed typid isn't found
in the passed list of trftypes.

If pg_proc had protransforms instead, that would add a step zero: looking
up the declared transforms to make an in-memory list of (typid, tosqloid,
fromsqloid). After that, get_transform_{tosql,fromsql} would be applied
and return quickly when the passed typid isn't in that list. When it is
in the list, they'd return just as quickly the transform function oid
directly from the list. For a routine with no transforms declared,
step zero completes in the time it takes to see protransforms empty and
return an empty list.

Now the question becomes: how many cycles does step zero spend in excess
of those that must be spent for the function to have its intended behavior?

My answer would be "zero, except in the vanishingly perverse case of
a function declared with transforms for types it never sees."

Am I mistaken?

On 04/15/25 01:05, Pavel Stehule wrote:
> There was a long discussion, and I think the main reason for this design
> is  a possibility to use an old code without change although the
> transformations are installed. And secondly there was a possibility to use
> a transformation when it was installed, and use it although it was
> installed after the function was created. ... I can write code
> in PL/Python that can work with or without transformation
> ...
>
https://www.postgresql.org/message-id/flat/1339713732.11971.79.camel%40vanquo.pezone.net

Thank you for the link to that old thread. I can see now that the first
version of the patch in mid-2012 had CREATE and DROP TRANSFORM but did
not add CREATE FUNCTION syntax for which transforms to use; in that
version, it did entail the idea of transforms being selected for use
just by existing. But the problem of that changing the behavior of existing
functions was already recognized by the fifth message in the thread, where
Peter aptly used the words "worst nightmare".

By November of 2013 there were already suggestions about explicit
CREATE FUNCTION syntax for what transforms to apply, And by January 2014
Peter had found the ISO SQL <transform group specification> that does
exactly that, and by April 2015 the patch was adopted in that form.

I'll guess that the 'bogus' dependency creation, based only on argument
and return types, that Tom fixed last week in b73e6d7, was a vestige from
the earliest "apply whatever transforms exist" version of the patch that
didn't get changed when the explicit function-creation syntax was added.

At any rate, we are now firmly in the world of "apply exactly the
transforms explicitly requested at function creation", and that mirrors
ISO SQL and seems the sanest world to me.

While I don't doubt that a PL/Python function could be written that would
work with or without a transform, that sounds to me like the kind of feat
undertaken to show it can be done. My normal expectation would be for ~ 100%
of real-world functions to faceplant if their data suddenly started
arriving in completely different datatypes. And we have, wisely, chosen
a design that rules that out.

Regards,
-Chap

Re: transforms [was Re: FmgrInfo allocation patterns (and PL handling as staged programming)]

From

Tom Lane

Date:

16 April, 06:39:19

Chapman Flack <jcflack@acm.org> writes:
> On 04/15/25 10:52, Tom Lane wrote:
>> The problem from a PL's standpoint is "given this input or output
>> of type FOO, should I transform it, and if so using what?".  So
>> the starting point has to be a type not a transform.

> Perhaps the efficiency argument is really "say a function has
> a list of 100 arguments and only one is of type foo, how many cycles
> are wasted in get_transform_tosql and get_transform_fromsql applied
> to all those other types?"

That, and also "how many cycles are wasted in get_transform_tosql and
get_transform_fromsql when there isn't any TRANSFORM clause at all?"

> If pg_proc had protransforms instead, that would add a step zero: looking
> up the declared transforms to make an in-memory list of (typid, tosqloid,
> fromsqloid). After that, get_transform_{tosql,fromsql} would be applied
> and return quickly when the passed typid isn't in that list.

I don't doubt there are other ways that this could be built.  But
you've not really made the case that another way is better (let alone
enough better to be worth breaking backward compatibility for).

I think the "normal" case is that there isn't any TRANSFORM clause.
As long as that case falls through quickly, it's hard to credit that
there's any meaningful performance difference between different
arrangements here.

My own beef with the whole setup is that you can't specify *which*
arguments or results you want transformed.  I don't believe that
this should have been keyed off data types to begin with.  But
that design was the SQL committee's choice so we're stuck ...

            regards, tom lane

Re: transforms [was Re: FmgrInfo allocation patterns (and PL handling as staged programming)]

From

Chapman Flack

Date:

16 April, 18:42:27

On 04/15/25 23:39, Tom Lane wrote:
> My own beef with the whole setup is that you can't specify *which*
> arguments or results you want transformed.  I don't believe that
> this should have been keyed off data types to begin with.  But
> that design was the SQL committee's choice so we're stuck ...

Yes, per-argument declarations would be more flexible ... but more verbose.
They were discussed in the long email thread Pavel linked, and I suppose
the SQL committee may have had its own discussions about the same tradeoff.

One area where we do depart from the standard is we don't have named
transform groups. I had wondered about that before, and supposed that
maybe our feature was designed from an older edition that didn't have
them yet. But they also got mentioned in that email thread, so they were
definitely in the standard already.

I do think our lack of named groups is unfortunate. It may be that Perl
and Python each have exactly one standard type that's exactly right for
an hstore, and Python exactly one representation you would ever want to
use for an ltree, but I don't think that generalizes. There are PostgreSQL
types you might map, with varying fidelity, into half a dozen different
Java standard library types, for example.

The standard also has GUC-like SET DEFAULT TRANSFORM GROUP and
SET TRANSFORM GROUP FOR TYPE session-wide settings that could help
reduce verbosity. (It doesn't quite come out and say that these session
values apply at function-creation time to types for which the CREATE
FUNCTION specifies no transform explicitly, but it does say that default
transforms will be supplied for those, with the defaults being
implementation-defined. So defaulting to session-wide settings would
not be wrong. The defaults do, of course, become part of the function
declaration at the time it's created, so the function doesn't fall over
later if someone has used SET DEFAULT TRANSFORM GROUP.)

Those features, naturally, would again require recording actual
transformation oids in pg_proc and not just types.

Maybe for per-argument flexibility we could have the transform spec
in CREATE FUNCTION accept FOR PARAMETER FOO as well as FOR TYPE BAR,
and then lobby the SQL committee not to adopt some conflicting variation
of it.

Regards,
-Chap

Re: transforms

From

Chapman Flack

Date:

19 April, 21:28:09

Also noticing about transforms:

1. protrftypes can have duplicates.

   In part, this is because CreateFunction does nothing to stop you saying
   redundant things like TRANSFORM FOR TYPE circle, FOR TYPE circle.

   But it is also because:

2. CreateFunction hands every type seen in TRANSFORM FOR TYPE
   to get_base_element_type, so if you write something like
   TRANSFORM FOR TYPE circle, FOR TYPE circle[], FOR TYPE arrcircle
   (where arrcircle is a domain over circle[]), protrftypes simply
   ends up {circle,circle,circle}.

3. get_base_element_type leaves alone a domain whose base type is not
   an array, so if bigcircle is a domain over circle, then
   TRANSFORM FOR TYPE bigcircle, FOR TYPE bigcircle[] will add
   {bigcircle,bigcircle} to protrftypes.

4. CreateTransform makes no use of get_base_element_type, but does reject
   any type that is a domain.

   Therefore:

5. You can create a transform for an array type, but can never refer to it
   in TRANSFORM FOR TYPE (you end up referring to the element type).

6. You can refer to a domain in TRANSFORM FOR TYPE, but that transform
   can never be created.

These don't seem to be what the rules ought to be, but I don't have
a strong intuition for what the rules ought to be instead.

Regards,
-Chap

Re: transforms

From

Tom Lane

Date:

19 April, 22:03:20

Chapman Flack <jcflack@acm.org> writes:
>    Therefore:
> 5. You can create a transform for an array type, but can never refer to it
>    in TRANSFORM FOR TYPE (you end up referring to the element type).
> 6. You can refer to a domain in TRANSFORM FOR TYPE, but that transform
>    can never be created.
> These don't seem to be what the rules ought to be, but I don't have
> a strong intuition for what the rules ought to be instead.

Yeah, I noticed that inconsistency too, but hadn't gotten around to
researching what the spec says about it.  I don't especially like it
because it puts the onus on PL implementations to dig for the
appropriate transform to apply to an argument or result, and I don't
think they're likely to do that consistently.  It looks to me like
plperl and plpython are already inconsistent: plpython seems to be
digging down into arrays and domains, but plperl does not AFAICS.
So what are the odds that outside PLs do it correctly (for whatever
you think "correctly" is)?  It doesn't help any that we document
none of this.

If I had my druthers I would flush every one of these special rules,
and just say that an argument/result gets transformed if there's a
transform matching its declared type, full stop.  You want to
transform arrays or domains, you make a transform to match.
This would be a lot simpler to understand and potentially more
flexible.  But I don't know if it falls foul of the wording of
the spec, and I concede that in some cases it'd require users
to do more work creating those additional transforms.

            regards, tom lane

Re: transforms

From

Tom Lane

Date:

19 April, 22:20:02

BTW, there is another way in which the current rules for transform
application are not self-consistent.  Postgres has three kinds of
container types: arrays, composites, and ranges.  But the transform
code special-cases only arrays.  I'd have thought that if the idea is
to specify transforms only for base types, then TRANSFORM FOR TYPE
ought to recurse into composites and list their field types in
protrftypes.  It doesn't, but if you look at plpython it appears
that

(1) you can create a transform for a composite type and it will be
applied;
(2) if you don't, the code recurses to the fields and will apply
per-field-type transforms then.

plperl will effectively do (1) but not (2), if I'm reading it right.

Ranges are a bit odd in that their semantics are not just a collection
of element/field types, so it perhaps makes sense that we don't
attempt to apply base-type transforms to their contents.  But it's
sure inconsistent that arrays and composites aren't handled more
nearly alike here.

            regards, tom lane

Re: transforms

From

Chapman Flack

Date:

20 April, 00:35:44

On 04/19/25 15:03, Tom Lane wrote:
> So what are the odds that outside PLs do it correctly (for whatever
> you think "correctly" is)?  It doesn't help any that we document
> none of this.

My sense of Déjà vu turns out to be because I had a bunch of proposed
new documentation and example code for plsample in a commitfest a few
years back[0] but had to set it aside at the time. And it's becoming clear
there's even more to be nailed down than I realized at the time anyway.

> If I had my druthers I would flush every one of these special rules,
> and just say that an argument/result gets transformed if there's a
> transform matching its declared type, full stop.  You want to
> transform arrays or domains, you make a transform to match.
> This would be a lot simpler to understand and potentially more
> flexible.  But I don't know if it falls foul of the wording of
> the spec, and I concede that in some cases it'd require users
> to do more work creating those additional transforms.

I don't see right offhand that it would fall afoul of the spec.

I had also proposed in that earlier CF to change some of what our doc
says about the spec. We currently say "There is a CREATE TRANSFORM command
in the SQL standard, but it is for adapting data types to client languages"
and I don't think it's as different as that suggests. 4.9.6 clearly says
the from-sql function is invoked whenever a value "is passed to a host-
language program *or an external routine*" and the to-sql function is
invoked whenever a value "is supplied by a host-language program *or an
external routine*" (asterisks mine).

The biggest difference is our use of type internal to allow transforming
to arbitrary PL data types: the ISO to- and from-sql functions have to
interface with the external routine using predefined SQL types.

ISO does have a notion of user-defined types being subtypes or supertypes
of each other, and sections 9.31 and 9.33 on determining the applicable
from-sql or to-sql function do involve examining a type's supertypes to
find one that has a transform matching the wanted transform group name.
I don't see anything, though, about peering into array or range or domain
types.

Perhaps we could regard a domain as a subtype of its base type, but
I do not see for sure that the spec does.

I think that a transform-must-match-declared-type rule would not be
at odds with the spec.

Regards,
-Chap

[0] https://commitfest.postgresql.org/patch/3554/

Re: transforms

From

Tom Lane

Date:

20 April, 02:12:58

Oh, just in case this wasn't complicated enough already: what
to do with polymorphic arguments/results?

You could argue that

CREATE FUNCTION foo(anyelement) RETURNS anyelement
TRANSFORM FOR TYPE int
AS ...;

should mean that if the actual argument type is int, then the
mentioned transform should be applied to the input and result;
but if it's some other type then just do the normal conversions.

You could perhaps also argue that that's a bad idea.  I'm not sure.
It does make for very dynamic matching of transforms to arguments,
which feels dubious to me, but I can't put my finger on a bad
consequence.

We have not faced this issue in the in-core PLs because neither
plperl nor plpython allow polymorphic arguments/results.  (I'm
surprised that no one has yet wanted to fix that.)  But I will
bet that there are other PLs that do allow that, and if they
also implement transforms then there's probably precedent
out there already.

            regards, tom lane

Re: transforms

From

Chapman Flack

Date:

20 April, 02:32:16

On 04/19/25 19:12, Tom Lane wrote:
> You could argue that
> CREATE FUNCTION foo(anyelement) RETURNS anyelement
> TRANSFORM FOR TYPE int
> AS ...;
> should mean that if the actual argument type is int, then the
> mentioned transform should be applied to the input and result;
> but if it's some other type then just do the normal conversions.
> 
> You could perhaps also argue that that's a bad idea.  I'm not sure.

My position would be that it's not a bad idea. It would be consistent
and unastonishing.

A PL that uses a staging approach has probably precomputed and cached
a good deal of information, including what transforms are requested and
what parameter positions are polymorphic if any, and only needs to check
those when specializing, and funccache can preserve those results too.

Regards,
-Chap

Re: transforms

From

Chapman Flack

Date:

20 April, 02:50:47

On 04/19/25 19:32, Chapman Flack wrote:
> On 04/19/25 19:12, Tom Lane wrote:
>> You could argue that
>> CREATE FUNCTION foo(anyelement) RETURNS anyelement
>> TRANSFORM FOR TYPE int
>> AS ...;
>> should mean that if the actual argument type is int, then the
>> mentioned transform should be applied to the input and result;

Also if foo() RETURNS TRIGGER and tg_trigtuple/tg_newtuple have
int components.

Regards,
-Chap