Thread: Why does to_json take "anyelement" rather than "any"?

Why does to_json take "anyelement" rather than "any"?

From
Nikhil Benesch
Date:
to_json is declared as taking "anyelement" as input, which means
you can't pass it something of unknown type:

     postgres=# SELECT to_json('foo');
     ERROR:  could not determine polymorphic type because input has type unknown

But this works fine with the very similar json_build_array function:

     postgres=# SELECT json_build_array('foo');
      json_build_array
     ------------------
      ["foo"]
     (1 row)

The difference is that json_build_array takes type "any" as input, while
to_json takes "anyelement" as input.

Is there some reason to_json couldn't be switched to take "any" as input?
Hacking this together seems to mostly just work:

     postgres=# CREATE FUNCTION my_to_json ("any") RETURNS json LANGUAGE 'internal' AS 'to_json';
     postgres=# SELECT my_to_json('foo');
      my_to_json
     ------------
      "foo"
     (1 row)

Is there something I'm missing?

Nikhil



Re: Why does to_json take "anyelement" rather than "any"?

From
Mohamed Wael Khobalatte
Date:


On Tue, Nov 3, 2020 at 11:54 AM Nikhil Benesch <nikhil.benesch@gmail.com> wrote:
to_json is declared as taking "anyelement" as input, which means
you can't pass it something of unknown type:

     postgres=# SELECT to_json('foo');
     ERROR:  could not determine polymorphic type because input has type unknown

But this works fine with the very similar json_build_array function:

     postgres=# SELECT json_build_array('foo');
      json_build_array
     ------------------
      ["foo"]
     (1 row)

The difference is that json_build_array takes type "any" as input, while
to_json takes "anyelement" as input.

Is there some reason to_json couldn't be switched to take "any" as input?
Hacking this together seems to mostly just work:

     postgres=# CREATE FUNCTION my_to_json ("any") RETURNS json LANGUAGE 'internal' AS 'to_json';
     postgres=# SELECT my_to_json('foo');
      my_to_json
     ------------
      "foo"
     (1 row)

Is there something I'm missing?

Nikhil


Hm, good question. I am also curious as to why this happens. `json_build_array` ends up casting unknowns to text (from reading the code), which seems like a reasonable (although not completely tight) assumption. Not sure why `to_json` can't just do the same. You can always cast to text yourself, of course, but I am not familiar with the type hierarchy enough to tell why `to_json` can't deduce that as text whereas the other function can. 

Re: Why does to_json take "anyelement" rather than "any"?

From
"David G. Johnston"
Date:
On Thu, Nov 5, 2020 at 3:43 PM Mohamed Wael Khobalatte <mkhobalatte@grubhub.com> wrote:
 You can always cast to text yourself, of course, but I am not familiar with the type hierarchy enough to tell why `to_json` can't deduce that as text whereas the other function can. 

My understanding is that "any" is defined to accept that behavior - allowing any pseudo-type and unknown.  The "anyelement" polymorphic pseudo-type is defined such that only concrete known types are allowed to match - and then the rules of polymorphism apply when performing a lookup.  My uninformed conclusion is that since to_json only defines a single parameter that changing it from "anyelement" to "any" would be reasonable and the hack describe probably "just works" (though I'd test it on a wide-range of built-in types first if I was actually going to use the hack).

You only get to use "any" for a C-language function but that is indeed the case here.

David J.

Re: Why does to_json take "anyelement" rather than "any"?

From
Nikhil Benesch
Date:
On 11/5/20 7:38 PM, David G. Johnston wrote:
> My understanding is that "any" is defined to accept that behavior - allowing any pseudo-type and unknown.  The
"anyelement"polymorphic pseudo-type is defined such that only concrete known types are allowed to match - and then the
rulesof polymorphism apply when performing a lookup.  My uninformed conclusion is that since to_json only defines a
singleparameter that changing it from "anyelement" to "any" would be reasonable and the hack describe probably "just
works"(though I'd test it on a wide-range of built-in types first if I was actually going to use the hack).
 
> 
> You only get to use "any" for a C-language function but that is indeed the case here.

That exactly matches my understanding as well. I'll put together a patch.

Nikhil



Re: Why does to_json take "anyelement" rather than "any"?

From
Tom Lane
Date:
Nikhil Benesch <nikhil.benesch@gmail.com> writes:
> On 11/5/20 7:38 PM, David G. Johnston wrote:
>> My understanding is that "any" is defined to accept that behavior - allowing any pseudo-type and unknown.  The
"anyelement"polymorphic pseudo-type is defined such that only concrete known types are allowed to match - and then the
rulesof polymorphism apply when performing a lookup.  My uninformed conclusion is that since to_json only defines a
singleparameter that changing it from "anyelement" to "any" would be reasonable and the hack describe probably "just
works"(though I'd test it on a wide-range of built-in types first if I was actually going to use the hack). 
>>
>> You only get to use "any" for a C-language function but that is indeed the case here.

> That exactly matches my understanding as well. I'll put together a patch.

"any" is a dinosaur IMO.  It's definitely lower-level than anyelement;
for example the function has to be prepared to deal with raw "unknown"
literals.  So I feel like the proposed solution here is a bit of a hack.

What I'm wondering about as I think about this is why we don't allow
unknown literals to be resolved as text when matching to anyelement.
Maybe that was intentional, or maybe just overly conservative; or maybe
there is a good reason for it.  I don't recall, but it would be worth
excavating in the list archives to see if it was discussed when the
polymorphic types were being designed.

A relevant data point is that we *do* allow the case with the more
recently added "anycompatible" polymorphics:

regression=# create function foo(anycompatible) returns anycompatible as 'select $1' language sql;
CREATE FUNCTION
regression=# select foo('bar');
 foo
-----
 bar
(1 row)

regression=# select pg_typeof(foo('bar'));
 pg_typeof
-----------
 text
(1 row)

So even if we decide that changing the rules for "anyelement" is
too scary, I think switching to_json to anycompatible would be
preferable to switching it to "any".

            regards, tom lane



Re: Why does to_json take "anyelement" rather than "any"?

From
Nikhil Benesch
Date:
On 11/5/20 8:58 PM, Tom Lane wrote:
> "any" is a dinosaur IMO.  It's definitely lower-level than anyelement;
> for example the function has to be prepared to deal with raw "unknown"
> literals.  So I feel like the proposed solution here is a bit of a hack.

I see what you are saying, but since the code for to_jsonb is shared with
the code for jsonb_build_array and jsonb_build_object, which already
handle the pitfalls of "any", the patch seems to be literally this
simple:

diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c01da4bf01..11adf748c9 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8466,7 +8466,7 @@
    prosrc => 'json_object_two_arg' },
  { oid => '3176', descr => 'map input to json',
    proname => 'to_json', provolatile => 's', prorettype => 'json',
-  proargtypes => 'anyelement', prosrc => 'to_json' },
+  proargtypes => 'any', prosrc => 'to_json' },
  { oid => '3261', descr => 'remove object fields with null values from json',
    proname => 'json_strip_nulls', prorettype => 'json', proargtypes => 'json',
    prosrc => 'json_strip_nulls' },
@@ -9289,7 +9289,7 @@
    proargtypes => '_text _text', prosrc => 'jsonb_object_two_arg' },
  { oid => '3787', descr => 'map input to jsonb',
    proname => 'to_jsonb', provolatile => 's', prorettype => 'jsonb',
-  proargtypes => 'anyelement', prosrc => 'to_jsonb' },
+  proargtypes => 'any', prosrc => 'to_jsonb' },
  { oid => '3265', descr => 'jsonb aggregate transition function',
    proname => 'jsonb_agg_transfn', proisstrict => 'f', provolatile => 's',
    prorettype => 'internal', proargtypes => 'internal anyelement',

I think my argument is that regardless of which of
{any,anyelement,anycompatible} is best, it seems like to_jsonb,
jsonb_build_array, and jsonb_build_object should all use the same
type.

> What I'm wondering about as I think about this is why we don't allow
> unknown literals to be resolved as text when matching to anyelement.
> Maybe that was intentional, or maybe just overly conservative; or maybe
> there is a good reason for it.  I don't recall, but it would be worth
> excavating in the list archives to see if it was discussed when the
> polymorphic types were being designed.

I excavated two separate threads from 2003 from when you and Joe Conway
were designing SQL99 array support and the initial polymorphic types:

https://www.postgresql.org/message-id/3E701869.4020301%40joeconway.com
https://www.postgresql.org/message-id/1272.1048633920%40sss.pgh.pa.us

I didn't see anything obvious about unknown coercions, though I
certainly could have overlooked something. For what it's worth, the
error message "could not determine polymorphic type because input has
type unknown" has existed, with slightly different wording, since
the very first commit of the feature:

https://github.com/postgres/postgres/commit/730840c9b (parse_coerce.c, L840)

> A relevant data point is that we *do* allow the case with the more
> recently added "anycompatible" polymorphics:
>
>     <...snipped examples...>
>
> So even if we decide that changing the rules for "anyelement" is
> too scary, I think switching to_json to anycompatible would be
> preferable to switching it to "any".

Oh these new polymorphic types are interesting. I hadn't seen these.

So, I don't feel particularly qualified to determine how to proceed.
These are the options that I would be excited about:

   1. Switch to_jsonb to take "any", as in the above patch.
   2. Convert all of to_jsonb, jsonb_build_array, and jsonb_build_object
      to use the new "anycompatible" type.
   3. Switch to_jsonb to take "anyelement", but change "anyelement" and
      friends so that "unknown" arguments are coereced to text.

Would someone care to offer guidance on which path to choose?

Nikhil



Re: Why does to_json take "anyelement" rather than "any"?

From
Pavel Stehule
Date:


pá 6. 11. 2020 v 1:39 odesílatel David G. Johnston <david.g.johnston@gmail.com> napsal:
On Thu, Nov 5, 2020 at 3:43 PM Mohamed Wael Khobalatte <mkhobalatte@grubhub.com> wrote:
 You can always cast to text yourself, of course, but I am not familiar with the type hierarchy enough to tell why `to_json` can't deduce that as text whereas the other function can. 

My understanding is that "any" is defined to accept that behavior - allowing any pseudo-type and unknown.  The "anyelement" polymorphic pseudo-type is defined such that only concrete known types are allowed to match - and then the rules of polymorphism apply when performing a lookup.  My uninformed conclusion is that since to_json only defines a single parameter that changing it from "anyelement" to "any" would be reasonable and the hack describe probably "just works" (though I'd test it on a wide-range of built-in types first if I was actually going to use the hack).

You only get to use "any" for a C-language function but that is indeed the case here.

Type "anyelement"  can force the function's result type directly. But there cannot be function that returns UNKNOWN. 

Type "any" just accept any argument without any impact on result type. Unfortunately, inside a function is necessary to do much more work related to casting types, and the execution can be slower. 

I checked the source code of to_json and this function can use "any" without any change. 

Regards

Pavel



David J.