Thread: FmgrInfo allocation patterns (and PL handling as staged programming)
Hi hackers, The way the core code allocates FmgrInfo structures has a pleasing property (at least in the parts of the code I have read and the cases I've tested) that the docs don't seem to emphasize. To wit, given a query like SELECT hello(n), hello(x) FROM (VALUES (1::int4, 1.0::float4), (2, 2.0 ), (3, 3.0 )) AS t(n,x); the core code allocates one FmgrInfo for each of the two uses. If the PL handler has a useful way to specialize the polymorphic-typed function to int4 in the one case and float4 in the other, each can be stashed in its respective flinfo->fn_extra and the specialized versions used as expected for the duration of the query. Likewise, the trigger manager prepares for a query by allocating one FmgrInfo for each distinct trigger involved, so if there is a single trigger function referenced by multiple triggers, a version specialized to each trigger can be stashed on its respective flinfo and used for all firings of that trigger the query may generate. This lends itself to a nice staged-programming view of a PL handler's job: prepare: RegProcedure -> Template (cache by regproc) specialize: (Template, call site) -> Routine (cache on fn_extra) apply: (Routine, fcinfo) -> result (where I consider some members both of flinfo, such as fn_expr, and of fcinfo, such as nargs, context, resultinfo, to be logically properties of a notional "call site"). I wonder, though, if there might be code in the wild, or even in corners of the core I haven't looked in, where FmgrInfo structs aren't being used that way, and could get reused for successive calls of one routine but with, say, different nargs or argument types. That would seem odd but I don't see that the documentation ever came right out and said not to. If that seems like a non-imaginary risk, I wonder if FmgrInfo could sprout something like a safe-to-specialize bit, initialized to false for old code that doesn't know about it, but set true in places where FmgrInfo structs are definitely being managed as described above? I suspect that would result in the vast majority of FmgrInfo structs ever really encountered having the bit set. Regards, -Chap A work-in-progress PL dispatcher based on the above can be seen at: https://tada.github.io/pljava/preview1.7/pljava-api/apidocs/org.postgresql.pljava/org/postgresql/pljava/PLJavaBasedLanguage.html
Chapman Flack <jcflack@acm.org> writes: > To wit, given a query like > SELECT hello(n), hello(x) > FROM (VALUES > (1::int4, 1.0::float4), > (2, 2.0 ), > (3, 3.0 )) AS t(n,x); > the core code allocates one FmgrInfo for each of the two uses. Yeah, there's no attempt to merge FmgrInfos across call sites within a query. It's typical to use fn_extra to point to dynamic state for a call, so that any such merging could break things. > I wonder, though, if there might be code in the wild, or even in corners > of the core I haven't looked in, where FmgrInfo structs aren't being used > that way, and could get reused for successive calls of one routine but > with, say, different nargs or argument types. That would seem odd but > I don't see that the documentation ever came right out and said not to. The only case I'm aware of that might require some thought is that the relcache caches FmgrInfo structs for the opclass support functions for each column of an index. That seems like it's close enough to being just as specialized as a query callsite, but maybe not? A downside of relying entirely on fn_extra is that you don't get to amortize the specialization work across queries, even though it's probably pretty repetitive. You might be interested in this recent commit: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=0dca5d68d7bebf2c1036fd84875533afef6df992 which formalizes some caching rules that plpgsql has used for a long time, and extends the rules to support SQL-language functions (which need to specialize on output rowtype as well as what plpgsql has traditionally considered). Maybe you'd be interested in using funccache. regards, tom lane
I wrote: > Chapman Flack <jcflack@acm.org> writes: >> I wonder, though, if there might be code in the wild, or even in corners >> of the core I haven't looked in, where FmgrInfo structs aren't being used >> that way, and could get reused for successive calls of one routine but >> with, say, different nargs or argument types. That would seem odd but >> I don't see that the documentation ever came right out and said not to. > The only case I'm aware of that might require some thought is that the > relcache caches FmgrInfo structs for the opclass support functions for > each column of an index. That seems like it's close enough to being > just as specialized as a query callsite, but maybe not? Actually, there is a case where you have to be careful if you support polymorphic arguments: the element type of an anyarray argument can change on-the-fly from one call to the next in the same query. I think this is only possible when you're fed pg_stats.most_common_vals or one of its sibling columns, but that's enough to be a problem. That's why all of our array-munging functions that use fn_extra to cache type-dependent state are careful to check the element type against the cache every single time. regards, tom lane
On 04/06/25 13:59, Tom Lane wrote: > polymorphic arguments: the element type of an anyarray argument can > change on-the-fly from one call to the next in the same query. Yuck! I had not guessed that. Regards, -Chap
On 04/06/25 13:33, Tom Lane wrote: > Maybe you'd be interested in using funccache. O funccache, where were you a year or two ago? Can a spread-out variadic "any" arg list ever vary in length or type on the fly at a single call site? I notice that funccache only hashes the first nargs argument types. Regards, -Chap
Chapman Flack <jcflack@acm.org> writes: > Can a spread-out variadic "any" arg list ever vary > in length or type on the fly at a single call site? Don't think so. > I notice that funccache only hashes the first nargs > argument types. Yeah; the languages it serves today don't support VARIADIC ANY, so it's not really a concern. But if you'd like to consider doing so, we could talk about what's needful. I don't think funccache is set in stone (yet). regards, tom lane
On 04/06/25 15:47, Tom Lane wrote: > Chapman Flack <jcflack@acm.org> writes: >> Can a spread-out variadic "any" arg list ever vary >> in length or type on the fly at a single call site? > > Don't think so. Only slightly tangentially: what things count as "objects that depend on the transform" for purposes of DROP TRANSFORM ... CASCADE / RESTRICT ? I just now dropped (in 17.4) a transform named in an existing routine's declaration, with nary a peep. I'm not, for now, complaining, as my objective was to make sure my dispatcher noticed that, when dispatching to the routine, and if DROP TRANSFORM had refused, I'd have had to work harder. Right now the dispatcher is only checking before validate or prepare though. If transforms can really just vanish I'd better check even when dispatching to a cached version. I suppose a cached routine could continue merrily along using the vanished transform's functions, as long as they haven't vanished. But once the transform is dropped there's nothing saying you can't drop the functions, and that could get weird. Or, is somehow registering a routine's dependency on a transform something that its PL handler is responsible for doing? I could sort of see that, given that it's the PL that has to make whatever actual transformation happens, happen. But relying on the PL to record the dependency would seem brittle, what with validators not running for sure in all circumstances. Regards, -Chap
Chapman Flack <jcflack@acm.org> writes: > Only slightly tangentially: what things count as "objects that depend > on the transform" for purposes of DROP TRANSFORM ... CASCADE / RESTRICT ? > I just now dropped (in 17.4) a transform named in an existing routine's > declaration, with nary a peep. Hmm, really? ProcedureCreate() certainly looks like it creates necessary dependencies, and when I tried it just now, I got the expected failure: regression=# create extension hstore_plperl; CREATE EXTENSION regression=# create function foo(f1 hstore) returns hstore language plperl as 'return f1' transform for type hstore; CREATE FUNCTION regression=# drop extension hstore_plperl; ERROR: cannot drop extension hstore_plperl because other objects depend on it DETAIL: function foo(hstore) depends on transform for hstore language plperl HINT: Use DROP ... CASCADE to drop the dependent objects too. But wait a minute ... if I recreate the function with no TRANSFORM clause, I *still* can't drop the transform: regression=# drop function foo(f1 hstore); DROP FUNCTION regression=# create function foo(f1 hstore) returns hstore language plperl as 'return f1'; CREATE FUNCTION regression=# drop extension hstore_plperl; ERROR: cannot drop extension hstore_plperl because other objects depend on it DETAIL: function foo(hstore) depends on transform for hstore language plperl HINT: Use DROP ... CASCADE to drop the dependent objects too. Looking more closely at ProcedureCreate(), it makes a dependency if a transform *exists* for the argument or result type, whether a TRANSFORM clause is present or not. Surely this is completely bogus? We should be depending on the OIDs mentioned in protrftypes, not more nor less. I'm not quite sure if that somehow explains your results, but it does look like a bug to me. regards, tom lane
On 04/06/25 20:01, Tom Lane wrote: > Looking more closely at ProcedureCreate(), it makes a dependency > if a transform *exists* for the argument or result type, whether > a TRANSFORM clause is present or not. Surely this is completely > bogus? We should be depending on the OIDs mentioned in protrftypes, I think that's it. I tested by creating a function like create function foo() returns text transform for type circle that is, with the transform type not appearing as an argument or return type. As far as I know, that's still a cromulent usage, as I could be saying I want the function to apply that transform to circles it uses or retrieves in queries it makes. But if the dependency is being created based on argument/return types and not on protrftypes, that will slip right through the cracks. Regards, -Chap
Chapman Flack <jcflack@acm.org> writes: > On 04/06/25 20:01, Tom Lane wrote: >> Looking more closely at ProcedureCreate(), it makes a dependency >> if a transform *exists* for the argument or result type, whether >> a TRANSFORM clause is present or not. Surely this is completely >> bogus? We should be depending on the OIDs mentioned in protrftypes, > I think that's it. I tested by creating a function like > create function foo() returns text transform for type circle > that is, with the transform type not appearing as an argument or > return type. > As far as I know, that's still a cromulent usage, as I could be saying > I want the function to apply that transform to circles it uses or retrieves > in queries it makes. Oh, interesting interpretation. The in-core PLs only consider the transform list in relation to parameters and results, but I suppose that's just because they have no use for internal conversions ... and even then, you could argue that that's wrong and they should apply the specified conversions when invoking, say, SQL queries via SPI. Something to think about another day. Here's a draft patch to fix the bogus dependencies. As given this'd only be okay for HEAD, since I doubt we can get away with changing ProcedureCreate()'s signature in stable branches. Given the lack of prior complaints, I'm content to fix this only in HEAD --- but maybe somebody will think differently? In the back branches we could make ProcedureCreate() deconstruct the OID array it's given and then repeat the transform lookups, but ugh. I guess a "ProcedureCreateExt" alternate entry point would do the trick too. BTW, I feel a little uncomfortable with the idea that we're adding dependencies on objects that are explicitly mentioned nowhere in the pg_proc entry. I suppose it's okay, and the preceding code did that too, but still. regards, tom lane diff --git a/contrib/bool_plperl/expected/bool_plperl.out b/contrib/bool_plperl/expected/bool_plperl.out index 187df8db96f..183dc07b3fb 100644 --- a/contrib/bool_plperl/expected/bool_plperl.out +++ b/contrib/bool_plperl/expected/bool_plperl.out @@ -104,9 +104,9 @@ SELECT spi_test(); DROP EXTENSION plperl CASCADE; NOTICE: drop cascades to 6 other objects -DETAIL: drop cascades to function spi_test() -drop cascades to extension bool_plperl +DETAIL: drop cascades to extension bool_plperl drop cascades to function perl2int(integer) drop cascades to function perl2text(text) drop cascades to function perl2undef() drop cascades to function bool2perl(boolean,boolean,boolean) +drop cascades to function spi_test() diff --git a/contrib/bool_plperl/expected/bool_plperlu.out b/contrib/bool_plperl/expected/bool_plperlu.out index 8337d337e99..1496bbafac8 100644 --- a/contrib/bool_plperl/expected/bool_plperlu.out +++ b/contrib/bool_plperl/expected/bool_plperlu.out @@ -104,9 +104,9 @@ SELECT spi_test(); DROP EXTENSION plperlu CASCADE; NOTICE: drop cascades to 6 other objects -DETAIL: drop cascades to function spi_test() -drop cascades to extension bool_plperlu +DETAIL: drop cascades to extension bool_plperlu drop cascades to function perl2int(integer) drop cascades to function perl2text(text) drop cascades to function perl2undef() drop cascades to function bool2perl(boolean,boolean,boolean) +drop cascades to function spi_test() diff --git a/src/backend/catalog/pg_aggregate.c b/src/backend/catalog/pg_aggregate.c index bcf4050f5b1..a05f8a87c1f 100644 --- a/src/backend/catalog/pg_aggregate.c +++ b/src/backend/catalog/pg_aggregate.c @@ -637,6 +637,7 @@ AggregateCreate(const char *aggName, parameterNames, /* parameterNames */ parameterDefaults, /* parameterDefaults */ PointerGetDatum(NULL), /* trftypes */ + NIL, /* trfoids */ PointerGetDatum(NULL), /* proconfig */ InvalidOid, /* no prosupport */ 1, /* procost */ diff --git a/src/backend/catalog/pg_proc.c b/src/backend/catalog/pg_proc.c index 880b597fb3a..5fdcf24d5f8 100644 --- a/src/backend/catalog/pg_proc.c +++ b/src/backend/catalog/pg_proc.c @@ -26,7 +26,6 @@ #include "catalog/pg_proc.h" #include "catalog/pg_transform.h" #include "catalog/pg_type.h" -#include "commands/defrem.h" #include "executor/functions.h" #include "funcapi.h" #include "mb/pg_wchar.h" @@ -61,6 +60,35 @@ static bool match_prosrc_to_literal(const char *prosrc, const char *literal, /* ---------------------------------------------------------------- * ProcedureCreate * + * procedureName: string name of routine (proname) + * procNamespace: OID of namespace (pronamespace) + * replace: true to allow replacement of an existing pg_proc entry + * returnsSet: returns set? (proretset) + * returnType: OID of result type (prorettype) + * proowner: OID of owner role (proowner) + * languageObjectId: OID of function language (prolang) + * languageValidator: OID of validator function to apply, if any + * prosrc: string form of function definition (prosrc) + * probin: string form of binary reference, or NULL (probin) + * prosqlbody: Node tree of pre-parsed SQL body, or NULL (prosqlbody) + * prokind: function/aggregate/procedure/etc code (prokind) + * security_definer: security definer? (prosecdef) + * isLeakProof: leak proof? (proleakproof) + * isStrict: strict? (proisstrict) + * volatility: volatility code (provolatile) + * parallel: parallel safety code (proparallel) + * parameterTypes: input parameter types, as an oidvector (proargtypes) + * allParameterTypes: all parameter types, as an OID array (proallargtypes) + * parameterModes: parameter modes, as a "char" array (proargmodes) + * parameterNames: parameter names, as a text array (proargnames) + * parameterDefaults: defaults, as a List of Node trees (proargdefaults) + * trftypes: transformable type OIDs, as an OID array (protrftypes) + * trfoids: List of transform OIDs that routine should depend on + * proconfig: GUC set clauses, as a text array (proconfig) + * prosupport: OID of support function, if any (prosupport) + * procost: cost factor (procost) + * prorows: estimated output rows for a SRF (prorows) + * * Note: allParameterTypes, parameterModes, parameterNames, trftypes, and proconfig * are either arrays of the proper types or NULL. We declare them Datum, * not "ArrayType *", to avoid importing array.h into pg_proc.h. @@ -90,6 +118,7 @@ ProcedureCreate(const char *procedureName, Datum parameterNames, List *parameterDefaults, Datum trftypes, + List *trfoids, Datum proconfig, Oid prosupport, float4 procost, @@ -115,7 +144,6 @@ ProcedureCreate(const char *procedureName, referenced; char *detailmsg; int i; - Oid trfid; ObjectAddresses *addrs; /* @@ -609,25 +637,18 @@ ProcedureCreate(const char *procedureName, ObjectAddressSet(referenced, TypeRelationId, returnType); add_exact_object_address(&referenced, addrs); - /* dependency on transform used by return type, if any */ - if ((trfid = get_transform_oid(returnType, languageObjectId, true))) - { - ObjectAddressSet(referenced, TransformRelationId, trfid); - add_exact_object_address(&referenced, addrs); - } - /* dependency on parameter types */ for (i = 0; i < allParamCount; i++) { ObjectAddressSet(referenced, TypeRelationId, allParams[i]); add_exact_object_address(&referenced, addrs); + } - /* dependency on transform used by parameter type, if any */ - if ((trfid = get_transform_oid(allParams[i], languageObjectId, true))) - { - ObjectAddressSet(referenced, TransformRelationId, trfid); - add_exact_object_address(&referenced, addrs); - } + /* dependency on transforms, if any */ + foreach_oid(transformid, trfoids) + { + ObjectAddressSet(referenced, TransformRelationId, transformid); + add_exact_object_address(&referenced, addrs); } /* dependency on support function, if any */ diff --git a/src/backend/commands/functioncmds.c b/src/backend/commands/functioncmds.c index b9fd7683abb..0335e982b31 100644 --- a/src/backend/commands/functioncmds.c +++ b/src/backend/commands/functioncmds.c @@ -1046,6 +1046,7 @@ CreateFunction(ParseState *pstate, CreateFunctionStmt *stmt) List *parameterDefaults; Oid variadicArgType; List *trftypes_list = NIL; + List *trfoids_list = NIL; ArrayType *trftypes; Oid requiredResultType; bool isWindowFunc, @@ -1157,11 +1158,12 @@ CreateFunction(ParseState *pstate, CreateFunctionStmt *stmt) Oid typeid = typenameTypeId(NULL, lfirst_node(TypeName, lc)); Oid elt = get_base_element_type(typeid); + Oid transformid; typeid = elt ? elt : typeid; - - get_transform_oid(typeid, languageOid, false); + transformid = get_transform_oid(typeid, languageOid, false); trftypes_list = lappend_oid(trftypes_list, typeid); + trfoids_list = lappend_oid(trfoids_list, transformid); } } @@ -1292,6 +1294,7 @@ CreateFunction(ParseState *pstate, CreateFunctionStmt *stmt) PointerGetDatum(parameterNames), parameterDefaults, PointerGetDatum(trftypes), + trfoids_list, PointerGetDatum(proconfig), prosupport, procost, diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c index 3cb3ca1cca1..45ae7472ab5 100644 --- a/src/backend/commands/typecmds.c +++ b/src/backend/commands/typecmds.c @@ -1810,6 +1810,7 @@ makeRangeConstructors(const char *name, Oid namespace, PointerGetDatum(NULL), /* parameterNames */ NIL, /* parameterDefaults */ PointerGetDatum(NULL), /* trftypes */ + NIL, /* trfoids */ PointerGetDatum(NULL), /* proconfig */ InvalidOid, /* prosupport */ 1.0, /* procost */ @@ -1875,6 +1876,7 @@ makeMultirangeConstructors(const char *name, Oid namespace, PointerGetDatum(NULL), /* parameterNames */ NIL, /* parameterDefaults */ PointerGetDatum(NULL), /* trftypes */ + NIL, /* trfoids */ PointerGetDatum(NULL), /* proconfig */ InvalidOid, /* prosupport */ 1.0, /* procost */ @@ -1919,6 +1921,7 @@ makeMultirangeConstructors(const char *name, Oid namespace, PointerGetDatum(NULL), /* parameterNames */ NIL, /* parameterDefaults */ PointerGetDatum(NULL), /* trftypes */ + NIL, /* trfoids */ PointerGetDatum(NULL), /* proconfig */ InvalidOid, /* prosupport */ 1.0, /* procost */ @@ -1957,6 +1960,7 @@ makeMultirangeConstructors(const char *name, Oid namespace, PointerGetDatum(NULL), /* parameterNames */ NIL, /* parameterDefaults */ PointerGetDatum(NULL), /* trftypes */ + NIL, /* trfoids */ PointerGetDatum(NULL), /* proconfig */ InvalidOid, /* prosupport */ 1.0, /* procost */ diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h index 45f593fc958..d7353e7a088 100644 --- a/src/include/catalog/pg_proc.h +++ b/src/include/catalog/pg_proc.h @@ -211,6 +211,7 @@ extern ObjectAddress ProcedureCreate(const char *procedureName, Datum parameterNames, List *parameterDefaults, Datum trftypes, + List *trfoids, Datum proconfig, Oid prosupport, float4 procost,
On 04/06/25 13:59, Tom Lane wrote: > polymorphic arguments: the element type of an anyarray argument can > change on-the-fly from one call to the next in the same query. I > think this is only possible when you're fed pg_stats.most_common_vals > or one of its sibling columns, but that's enough to be a problem. > That's why all of our array-munging functions that use fn_extra to > cache type-dependent state are careful to check the element type > against the cache every single time. This was really bumming me out. I thought "what on earth does that do to the rest of your surrounding query, say if you have anyelement types in the args or return value also?". So the answer to that part is easy: if a routine's types include both anyarray and anyelement (like, say, unnest), it just can't be applied to one of those statistics columns. An attempt to use it that way in a query is rejected early, before any attempt to call the routine: ERROR: cannot determine element type of "anyarray" argument STATEMENT: select unnest(stavalues1) from pg_statistic limit 1; Now, if you have a routine that uses anyarray but no corresponding element pseudotype, you are allowed to apply that to a statistics column. It can even have anyarray as a return/output type also. When it's applied to a statistics column, here's what my dispatcher sees: # select foo(stavalues1) from pg_statistic limit 1; RegProcedure[1255,16861,0]public.foo essentialChecks: checkBody true RegProcedure[1255,16861,0]public.foo prepare(): inputsTemplate : [RegType.Unresolved[1247,2277,0]pg_catalog.anyarray] unresolvedInputs : {0} outputsTemplate : [RegType.Unresolved[1247,2277,0]pg_catalog.anyarray] unresolvedOutputs: {0} RegProcedure[1255,16861,0]public.foo Template.specialize(): precomputed id : 39d9314d inputsDescriptor : [RegType.Unresolved[1247,2277,0]pg_catalog.anyarray] inputsAreSpread : false stableInputs : {} outputsDescriptor: [RegType.Unresolved[1247,2277,0]pg_catalog.anyarray] So the polymorphic type resolution applied at specialize() time "succeeded", raised no error, and left the pseudotype anyarray in both the inputs and outputs supposedly-"resolved" results. That seems to make for a tidy way of recognizing the situation. If you go to specialize and find anyarray among your supposedly-resolved call-site types, well, you know you're in Wonderland and can adjust behavior accordingly. Regards, -Chap
Chapman Flack <jcflack@acm.org> writes: > This was really bumming me out. I thought "what on earth does that do > to the rest of your surrounding query, say if you have anyelement types > in the args or return value also?". > So the answer to that part is easy: if a routine's types include both > anyarray and anyelement (like, say, unnest), it just can't be applied > to one of those statistics columns. An attempt to use it that way in > a query is rejected early, before any attempt to call the routine: > ERROR: cannot determine element type of "anyarray" argument > STATEMENT: select unnest(stavalues1) from pg_statistic limit 1; Right. AFAIK the oddity doesn't "leak out" of functions taking anyarray. > That seems to make for a tidy way of recognizing the situation. If > you go to specialize and find anyarray among your supposedly-resolved > call-site types, well, you know you're in Wonderland and can adjust > behavior accordingly. Cool. regards, tom lane
On 04/06/25 22:37, Tom Lane wrote: > Here's a draft patch to fix the bogus dependencies. As given this'd > only be okay for HEAD, since I doubt we can get away with changing > ProcedureCreate()'s signature in stable branches ... In the back branches > we could make ProcedureCreate() deconstruct the OID array it's given and > then repeat the transform lookups, but ugh. ... > > BTW, I feel a little uncomfortable with the idea that we're adding > dependencies on objects that are explicitly mentioned nowhere in the > pg_proc entry. Pursuing that idea a bit further, was there a compelling original reason the column in pg_proc had to be protrftypes and not just protransforms? The transforms have been looked up at ProcedureCreate time. Any PL that supports transforms has to repeat the transform lookups if there are protrftypes present. Any PL that doesn't support transforms needs only to check for presence and say "sorry, I don't support those", which it could do just as easily with a list of transforms as with a list of types. Regards, -Chap
Hi
út 15. 4. 2025 v 2:32 odesílatel Chapman Flack <jcflack@acm.org> napsal:
On 04/06/25 22:37, Tom Lane wrote:
> Here's a draft patch to fix the bogus dependencies. As given this'd
> only be okay for HEAD, since I doubt we can get away with changing
> ProcedureCreate()'s signature in stable branches ... In the back branches
> we could make ProcedureCreate() deconstruct the OID array it's given and
> then repeat the transform lookups, but ugh. ...
>
> BTW, I feel a little uncomfortable with the idea that we're adding
> dependencies on objects that are explicitly mentioned nowhere in the
> pg_proc entry.
Pursuing that idea a bit further, was there a compelling original reason
the column in pg_proc had to be protrftypes and not just protransforms?
The transforms have been looked up at ProcedureCreate time.
Any PL that supports transforms has to repeat the transform lookups
if there are protrftypes present.
Any PL that doesn't support transforms needs only to check for presence
and say "sorry, I don't support those", which it could do just as easily
with a list of transforms as with a list of types.
There was a long discussion, and I think the main reason for this design is a possibility to use an old code without change although the transformations are installed. And secondly there was a possibility to use a transformation when it was installed, and use it although it was installed after the function was created. For example PL/pgSQL code has no dependency on types or casts that are internally used. And I can write code in PL/Python that can work with or without transformation, and can be unfriendly to force recreate functions to use a transformation. Today we have more tools for processing dependencies in code, but 13 years ago the possibility to work with PL code and transformations independently (how much was possible) made sense. Today it is not a problem to build a new production server and migrate data there, and there are less reasons to maintain old servers. There was more expectation about others PL (than PL/pgSQL) and about transformations (a lot of work where PLPerl was used before can be done now by SQL/XML or SQL/JSON or by FDW).
Regards
Pavel
Regards,
-Chap
Chapman Flack <jcflack@acm.org> writes: > On 04/06/25 22:37, Tom Lane wrote: >> BTW, I feel a little uncomfortable with the idea that we're adding >> dependencies on objects that are explicitly mentioned nowhere in the >> pg_proc entry. > Pursuing that idea a bit further, was there a compelling original reason > the column in pg_proc had to be protrftypes and not just protransforms? The problem from a PL's standpoint is "given this input or output of type FOO, should I transform it, and if so using what?". So the starting point has to be a type not a transform. The lookups are implemented by get_transform_tosql and get_transform_fromsql, and if you look at those you find that the protrftypes data is used as a filter before attempting a pg_transform lookup. If the pg_proc entry contained transform OIDs we'd have to filter after the lookup, which is pretty inefficient, especially if you expect that the normal case is that there's not an applicable transform. So I agree with storing protrftypes. Maybe we should also have stored a parallel array of transform OIDs, but it'd be just to make the dependency more obvious, and maybe it's not worth the storage space. regards, tom lane
transforms [was Re: FmgrInfo allocation patterns (and PL handling as staged programming)]
From
Chapman Flack
Date:
On 04/15/25 10:52, Tom Lane wrote: > The problem from a PL's standpoint is "given this input or output > of type FOO, should I transform it, and if so using what?". So > the starting point has to be a type not a transform. ... protrftypes data > is used as a filter before attempting a pg_transform lookup. If the pg_proc > entry contained transform OIDs we'd have to filter after the lookup, I don't think I follow. If a function is declared in LANGUAGE BAR with TRANSFORM FOR TYPE FOO, then either: 1. There is no transform declared for (trftype foo, trflang bar). CREATE FUNCTION fails, 42704: transform for type foo language "bar" does not exist. 2. CREATE FUNCTION succeeds and there is a transform (trftype foo, trflang bar, trffromsql f1, trftosql f2). So the choice of what to put in pg_proc is between the oid of type foo, or the oid of the transform (foo, bar, f1, f2). If the function is going to encounter type foo at all, it cannot avoid looking up the transform, either by the transform oid or by (type, lang). The only case in which that lookup could be avoided would be where a function declared with TRANSFORM FOR TYPE FOO actually sees no input or output arguments or internally-SQL-encountered values of type foo in a given call or at a given call site. That seems to me like an edge case, so I really am questioning this: > which is pretty inefficient, especially if you expect that the normal > case is that there's not an applicable transform. I *don't* expect that. I expect that if someone took the time to declare the function with TRANSFORM FOR TYPE FOO and CreateFunction confirmed that transform existed, the function is expected to encounter values of type foo and apply that intended transform to them. Perhaps the efficiency argument is really "say a function has a list of 100 arguments and only one is of type foo, how many cycles are wasted in get_transform_tosql and get_transform_fromsql applied to all those other types?" At present, they return quickly when the passed typid isn't found in the passed list of trftypes. If pg_proc had protransforms instead, that would add a step zero: looking up the declared transforms to make an in-memory list of (typid, tosqloid, fromsqloid). After that, get_transform_{tosql,fromsql} would be applied and return quickly when the passed typid isn't in that list. When it is in the list, they'd return just as quickly the transform function oid directly from the list. For a routine with no transforms declared, step zero completes in the time it takes to see protransforms empty and return an empty list. Now the question becomes: how many cycles does step zero spend in excess of those that must be spent for the function to have its intended behavior? My answer would be "zero, except in the vanishingly perverse case of a function declared with transforms for types it never sees." Am I mistaken? On 04/15/25 01:05, Pavel Stehule wrote: > There was a long discussion, and I think the main reason for this design > is a possibility to use an old code without change although the > transformations are installed. And secondly there was a possibility to use > a transformation when it was installed, and use it although it was > installed after the function was created. ... I can write code > in PL/Python that can work with or without transformation > ... > https://www.postgresql.org/message-id/flat/1339713732.11971.79.camel%40vanquo.pezone.net Thank you for the link to that old thread. I can see now that the first version of the patch in mid-2012 had CREATE and DROP TRANSFORM but did not add CREATE FUNCTION syntax for which transforms to use; in that version, it did entail the idea of transforms being selected for use just by existing. But the problem of that changing the behavior of existing functions was already recognized by the fifth message in the thread, where Peter aptly used the words "worst nightmare". By November of 2013 there were already suggestions about explicit CREATE FUNCTION syntax for what transforms to apply, And by January 2014 Peter had found the ISO SQL <transform group specification> that does exactly that, and by April 2015 the patch was adopted in that form. I'll guess that the 'bogus' dependency creation, based only on argument and return types, that Tom fixed last week in b73e6d7, was a vestige from the earliest "apply whatever transforms exist" version of the patch that didn't get changed when the explicit function-creation syntax was added. At any rate, we are now firmly in the world of "apply exactly the transforms explicitly requested at function creation", and that mirrors ISO SQL and seems the sanest world to me. While I don't doubt that a PL/Python function could be written that would work with or without a transform, that sounds to me like the kind of feat undertaken to show it can be done. My normal expectation would be for ~ 100% of real-world functions to faceplant if their data suddenly started arriving in completely different datatypes. And we have, wisely, chosen a design that rules that out. Regards, -Chap
Re: transforms [was Re: FmgrInfo allocation patterns (and PL handling as staged programming)]
From
Tom Lane
Date:
Chapman Flack <jcflack@acm.org> writes: > On 04/15/25 10:52, Tom Lane wrote: >> The problem from a PL's standpoint is "given this input or output >> of type FOO, should I transform it, and if so using what?". So >> the starting point has to be a type not a transform. > Perhaps the efficiency argument is really "say a function has > a list of 100 arguments and only one is of type foo, how many cycles > are wasted in get_transform_tosql and get_transform_fromsql applied > to all those other types?" That, and also "how many cycles are wasted in get_transform_tosql and get_transform_fromsql when there isn't any TRANSFORM clause at all?" > If pg_proc had protransforms instead, that would add a step zero: looking > up the declared transforms to make an in-memory list of (typid, tosqloid, > fromsqloid). After that, get_transform_{tosql,fromsql} would be applied > and return quickly when the passed typid isn't in that list. I don't doubt there are other ways that this could be built. But you've not really made the case that another way is better (let alone enough better to be worth breaking backward compatibility for). I think the "normal" case is that there isn't any TRANSFORM clause. As long as that case falls through quickly, it's hard to credit that there's any meaningful performance difference between different arrangements here. My own beef with the whole setup is that you can't specify *which* arguments or results you want transformed. I don't believe that this should have been keyed off data types to begin with. But that design was the SQL committee's choice so we're stuck ... regards, tom lane