Thread: Why does to_json take "anyelement" rather than "any"?
to_json is declared as taking "anyelement" as input, which means you can't pass it something of unknown type: postgres=# SELECT to_json('foo'); ERROR: could not determine polymorphic type because input has type unknown But this works fine with the very similar json_build_array function: postgres=# SELECT json_build_array('foo'); json_build_array ------------------ ["foo"] (1 row) The difference is that json_build_array takes type "any" as input, while to_json takes "anyelement" as input. Is there some reason to_json couldn't be switched to take "any" as input? Hacking this together seems to mostly just work: postgres=# CREATE FUNCTION my_to_json ("any") RETURNS json LANGUAGE 'internal' AS 'to_json'; postgres=# SELECT my_to_json('foo'); my_to_json ------------ "foo" (1 row) Is there something I'm missing? Nikhil
On Tue, Nov 3, 2020 at 11:54 AM Nikhil Benesch <nikhil.benesch@gmail.com> wrote:
to_json is declared as taking "anyelement" as input, which means
you can't pass it something of unknown type:
postgres=# SELECT to_json('foo');
ERROR: could not determine polymorphic type because input has type unknown
But this works fine with the very similar json_build_array function:
postgres=# SELECT json_build_array('foo');
json_build_array
------------------
["foo"]
(1 row)
The difference is that json_build_array takes type "any" as input, while
to_json takes "anyelement" as input.
Is there some reason to_json couldn't be switched to take "any" as input?
Hacking this together seems to mostly just work:
postgres=# CREATE FUNCTION my_to_json ("any") RETURNS json LANGUAGE 'internal' AS 'to_json';
postgres=# SELECT my_to_json('foo');
my_to_json
------------
"foo"
(1 row)
Is there something I'm missing?
Nikhil
Hm, good question. I am also curious as to why this happens. `json_build_array` ends up casting unknowns to text (from reading the code), which seems like a reasonable (although not completely tight) assumption. Not sure why `to_json` can't just do the same. You can always cast to text yourself, of course, but I am not familiar with the type hierarchy enough to tell why `to_json` can't deduce that as text whereas the other function can.
On Thu, Nov 5, 2020 at 3:43 PM Mohamed Wael Khobalatte <mkhobalatte@grubhub.com> wrote:
You can always cast to text yourself, of course, but I am not familiar with the type hierarchy enough to tell why `to_json` can't deduce that as text whereas the other function can.
My understanding is that "any" is defined to accept that behavior - allowing any pseudo-type and unknown. The "anyelement" polymorphic pseudo-type is defined such that only concrete known types are allowed to match - and then the rules of polymorphism apply when performing a lookup. My uninformed conclusion is that since to_json only defines a single parameter that changing it from "anyelement" to "any" would be reasonable and the hack describe probably "just works" (though I'd test it on a wide-range of built-in types first if I was actually going to use the hack).
You only get to use "any" for a C-language function but that is indeed the case here.
David J.
On 11/5/20 7:38 PM, David G. Johnston wrote: > My understanding is that "any" is defined to accept that behavior - allowing any pseudo-type and unknown. The "anyelement"polymorphic pseudo-type is defined such that only concrete known types are allowed to match - and then the rulesof polymorphism apply when performing a lookup. My uninformed conclusion is that since to_json only defines a singleparameter that changing it from "anyelement" to "any" would be reasonable and the hack describe probably "just works"(though I'd test it on a wide-range of built-in types first if I was actually going to use the hack). > > You only get to use "any" for a C-language function but that is indeed the case here. That exactly matches my understanding as well. I'll put together a patch. Nikhil
Nikhil Benesch <nikhil.benesch@gmail.com> writes: > On 11/5/20 7:38 PM, David G. Johnston wrote: >> My understanding is that "any" is defined to accept that behavior - allowing any pseudo-type and unknown. The "anyelement"polymorphic pseudo-type is defined such that only concrete known types are allowed to match - and then the rulesof polymorphism apply when performing a lookup. My uninformed conclusion is that since to_json only defines a singleparameter that changing it from "anyelement" to "any" would be reasonable and the hack describe probably "just works"(though I'd test it on a wide-range of built-in types first if I was actually going to use the hack). >> >> You only get to use "any" for a C-language function but that is indeed the case here. > That exactly matches my understanding as well. I'll put together a patch. "any" is a dinosaur IMO. It's definitely lower-level than anyelement; for example the function has to be prepared to deal with raw "unknown" literals. So I feel like the proposed solution here is a bit of a hack. What I'm wondering about as I think about this is why we don't allow unknown literals to be resolved as text when matching to anyelement. Maybe that was intentional, or maybe just overly conservative; or maybe there is a good reason for it. I don't recall, but it would be worth excavating in the list archives to see if it was discussed when the polymorphic types were being designed. A relevant data point is that we *do* allow the case with the more recently added "anycompatible" polymorphics: regression=# create function foo(anycompatible) returns anycompatible as 'select $1' language sql; CREATE FUNCTION regression=# select foo('bar'); foo ----- bar (1 row) regression=# select pg_typeof(foo('bar')); pg_typeof ----------- text (1 row) So even if we decide that changing the rules for "anyelement" is too scary, I think switching to_json to anycompatible would be preferable to switching it to "any". regards, tom lane
On 11/5/20 8:58 PM, Tom Lane wrote: > "any" is a dinosaur IMO. It's definitely lower-level than anyelement; > for example the function has to be prepared to deal with raw "unknown" > literals. So I feel like the proposed solution here is a bit of a hack. I see what you are saying, but since the code for to_jsonb is shared with the code for jsonb_build_array and jsonb_build_object, which already handle the pitfalls of "any", the patch seems to be literally this simple: diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index c01da4bf01..11adf748c9 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -8466,7 +8466,7 @@ prosrc => 'json_object_two_arg' }, { oid => '3176', descr => 'map input to json', proname => 'to_json', provolatile => 's', prorettype => 'json', - proargtypes => 'anyelement', prosrc => 'to_json' }, + proargtypes => 'any', prosrc => 'to_json' }, { oid => '3261', descr => 'remove object fields with null values from json', proname => 'json_strip_nulls', prorettype => 'json', proargtypes => 'json', prosrc => 'json_strip_nulls' }, @@ -9289,7 +9289,7 @@ proargtypes => '_text _text', prosrc => 'jsonb_object_two_arg' }, { oid => '3787', descr => 'map input to jsonb', proname => 'to_jsonb', provolatile => 's', prorettype => 'jsonb', - proargtypes => 'anyelement', prosrc => 'to_jsonb' }, + proargtypes => 'any', prosrc => 'to_jsonb' }, { oid => '3265', descr => 'jsonb aggregate transition function', proname => 'jsonb_agg_transfn', proisstrict => 'f', provolatile => 's', prorettype => 'internal', proargtypes => 'internal anyelement', I think my argument is that regardless of which of {any,anyelement,anycompatible} is best, it seems like to_jsonb, jsonb_build_array, and jsonb_build_object should all use the same type. > What I'm wondering about as I think about this is why we don't allow > unknown literals to be resolved as text when matching to anyelement. > Maybe that was intentional, or maybe just overly conservative; or maybe > there is a good reason for it. I don't recall, but it would be worth > excavating in the list archives to see if it was discussed when the > polymorphic types were being designed. I excavated two separate threads from 2003 from when you and Joe Conway were designing SQL99 array support and the initial polymorphic types: https://www.postgresql.org/message-id/3E701869.4020301%40joeconway.com https://www.postgresql.org/message-id/1272.1048633920%40sss.pgh.pa.us I didn't see anything obvious about unknown coercions, though I certainly could have overlooked something. For what it's worth, the error message "could not determine polymorphic type because input has type unknown" has existed, with slightly different wording, since the very first commit of the feature: https://github.com/postgres/postgres/commit/730840c9b (parse_coerce.c, L840) > A relevant data point is that we *do* allow the case with the more > recently added "anycompatible" polymorphics: > > <...snipped examples...> > > So even if we decide that changing the rules for "anyelement" is > too scary, I think switching to_json to anycompatible would be > preferable to switching it to "any". Oh these new polymorphic types are interesting. I hadn't seen these. So, I don't feel particularly qualified to determine how to proceed. These are the options that I would be excited about: 1. Switch to_jsonb to take "any", as in the above patch. 2. Convert all of to_jsonb, jsonb_build_array, and jsonb_build_object to use the new "anycompatible" type. 3. Switch to_jsonb to take "anyelement", but change "anyelement" and friends so that "unknown" arguments are coereced to text. Would someone care to offer guidance on which path to choose? Nikhil
pá 6. 11. 2020 v 1:39 odesílatel David G. Johnston <david.g.johnston@gmail.com> napsal:
On Thu, Nov 5, 2020 at 3:43 PM Mohamed Wael Khobalatte <mkhobalatte@grubhub.com> wrote:You can always cast to text yourself, of course, but I am not familiar with the type hierarchy enough to tell why `to_json` can't deduce that as text whereas the other function can.My understanding is that "any" is defined to accept that behavior - allowing any pseudo-type and unknown. The "anyelement" polymorphic pseudo-type is defined such that only concrete known types are allowed to match - and then the rules of polymorphism apply when performing a lookup. My uninformed conclusion is that since to_json only defines a single parameter that changing it from "anyelement" to "any" would be reasonable and the hack describe probably "just works" (though I'd test it on a wide-range of built-in types first if I was actually going to use the hack).You only get to use "any" for a C-language function but that is indeed the case here.
Type "anyelement" can force the function's result type directly. But there cannot be function that returns UNKNOWN.
Type "any" just accept any argument without any impact on result type. Unfortunately, inside a function is necessary to do much more work related to casting types, and the execution can be slower.
I checked the source code of to_json and this function can use "any" without any change.
Regards
Pavel
David J.