Thread: PATCH: Implement value_to_json for single-datum conversion

PATCH: Implement value_to_json for single-datum conversion

From
Craig Ringer
Date:
Hi all

Whenever I try to work with the new json types I trip over the lack of a
function to escape text to json. The attached patch against master
provides one by exposing the existing datum_to_json function to SQL.
I've used the name value_to_json, but I'm not sure it's necessarily the
right choice of name.

Please consider this for the 9.2 branch as well as to HEAD, as IMO it's
very important for basic usability of the json functionality. It applies
to 9.2 fine and passes "make check". I know it's late in the game, but
it's also a very small change and it's very hard to build up JSON data
structures other than simple rows or arrays without at minimum a way of
escaping `text' to json strings.

This feels basic enough that I'm wondering if there's a reason it wasn't
included from the start, but I don't see any comments in json.c talking
about anything like this, nor did I find any -hackers discussion about
it. I suspect it's just an oversight.

As value_to_json directly wraps datum_to_json it actually accepts record
and array types too. I didn't see any reason to prevent that and force
the user to instead use row_to_json or array_to_json for those cases. If
you don't want to accept this, I can provide a wrapper for escape_json
that only accepts a text argument instead, but I think *some* way to
escape text to JSON is vital to have in 9.2.

A docs patch will follow shortly if you're happy that this patch is
reasonable.

--
Craig Ringer

Re: PATCH: Implement value_to_json for single-datum conversion

From
Craig Ringer
Date:
Whoops. It actually looks like the posted patch muffed up opr_sanity 
checks. I'm totally new to pg_proc.h wrangling so I'm not sure why yet, 
looking.

Sorry, not sure how I missed that. I'll follow up shortly.

--
Craig Ringer



Re: PATCH: Implement value_to_json for single-datum conversion

From
Craig Ringer
Date:
OK, opr_sanity was failing because I added the value_to_json(text) alias
to ensure that:

   value_to_json('some_literal')

worked, following the same approach as quote_literal(anyelement) and
quote_literal(text). That should be reasonable, right? The comments on
the affected check in opr_sanity say that it's not necessarily wrong so
long as the called function is prepared to handle the different
arguments its self - which it is, since it's already accepting anyelement.

The test comment reads:

   Note that the expected output of this part of the test will
   need to be modified whenever new pairs of types are made
binary-equivalent,
   or when new polymorphic built-in functions are added

so that seems reasonable.


postgres=# \df quote_literal
                               List of functions
    Schema   |     Name      | Result data type | Argument data types |
Type
------------+---------------+------------------+---------------------+--------
  pg_catalog | quote_literal | text             | anyelement | normal
  pg_catalog | quote_literal | text             | text | normal
(2 rows)

postgres=# \df value_to_json
                               List of functions
    Schema   |     Name      | Result data type | Argument data types |
Type
------------+---------------+------------------+---------------------+--------
  pg_catalog | value_to_json | json             | anyelement | normal
  pg_catalog | value_to_json | json             | text | normal
(2 rows)

Revised patch that tweaks the expected result of opr_sanity attached.

--
Craig Ringer

Attachment

Re: PATCH: Implement value_to_json for single-datum conversion

From
Tom Lane
Date:
Craig Ringer <ringerc@ringerc.id.au> writes:
> OK, opr_sanity was failing because I added the value_to_json(text) alias 
> to ensure that:

>    value_to_json('some_literal')

> worked, following the same approach as quote_literal(anyelement) and 
> quote_literal(text). That should be reasonable, right?

No, it isn't.  What you're proposing is to let opr_sanity think that
text and anyelement are interchangeable to C functions, which is so
far from reality as to be ludicrous.  That would be seriously damaging
to its ability to detect errors.

But more to the point, your analogy to quote_literal is faulty anyway.
If you looked at that, what you'd find is that only quote_literal(text)
is a C function.  The other one is a SQL wrapper around a coercion to
text followed by the C function.  I rather imagine that the definition
as you have it would crash on, say, value_to_json(42).
        regards, tom lane



Re: PATCH: Implement value_to_json for single-datum conversion

From
Tom Lane
Date:
Craig Ringer <ringerc@ringerc.id.au> writes:
> Whenever I try to work with the new json types I trip over the lack of a 
> function to escape text to json. The attached patch against master 
> provides one by exposing the existing datum_to_json function to SQL.
> ...
> This feels basic enough that I'm wondering if there's a reason it wasn't 
> included from the start,

There actually was a previous thread about this:
http://archives.postgresql.org/pgsql-hackers/2012-05/msg00001.php
Note in particular Andrew's comment:
Second, RFC 4627 is absolutely clear: a valid JSON value canonly be an object or an array, so this thing about
convertingarbitrarydatum values to JSON is a fantasy. If anything, weshould adjust the JSON input routines to disallow
anythingelse,rather than start to output what is not valid JSON.
 

It's possible he's misread the spec, but I think we ought to tread very
carefully before adding "obvious" conversions we might regret later.

> Please consider this for the 9.2 branch as well as to HEAD, as IMO it's 
> very important for basic usability of the json functionality. It applies 
> to 9.2 fine and passes "make check". I know it's late in the game,

It's several months too late for feature additions to 9.2, especially
ones that would require an initdb to install.  We can look at this for
9.3, but I'm still worried about the spec-compliance question.
        regards, tom lane



Re: PATCH: Implement value_to_json for single-datum conversion

From
Craig Ringer
Date:
On 08/13/2012 12:48 PM, Tom Lane wrote:
> There actually was a previous thread about this:
> http://archives.postgresql.org/pgsql-hackers/2012-05/msg00001.php
> Note in particular Andrew's comment:
>
>     Second, RFC 4627 is absolutely clear: a valid JSON value can
>     only be an object or an array, so this thing about converting
>     arbitrary datum values to JSON is a fantasy. If anything, we
>     should adjust the JSON input routines to disallow anything else,
>     rather than start to output what is not valid JSON.

Thanks for taking a look. That makes sense. I guess these are similar 
issues to those the XML type faces, where working with fragments is a 
problem. The spec requires a single root element, but you don't always 
have that when you're *building* XML, hence the addition of `IS DOCUMENT'.

I was hoping to find a low-impact way to allow SQL-level construction of 
more complex JSON objects with correct text escaping, but it sounds like 
this isn't the right route. I don't currently see any way to achieve the 
kind of on-the-fly building you can do with XML's xmlelement(), 
xmlconcat(), xmlforest() etc; nor equivalent to hstore's 
hstore(text[],text[]), and I was hoping to improve that.

I have a half-finished JSON object constructor 
json_object_from_arrays(text[], json[]) in the same style as 
hstore(text[],text[]) . It won't work without the notion of json-typed 
scalars, though, as the values of keys could then only be arrays or 
objects, which isn't very useful. I can't usefully accept `anyarray' as 
a values argument since arrays are of homogeneous type. Accepting text[] 
would be a bug-magnet even if there was some kind of `text 
json_escape(text)' function.

Would it be reasonable to add a separate json_element type, one that's 
binary-equivalent to `json' but not constrained by the requirement to be 
an array or object/dict? Or a `jsobject' ?


As for the value_to_json crashing, works for me:

postgres=# SELECT value_to_json(42); value_to_json
--------------- 42
(1 row)

... since datum_to_json is happy to accept anything you throw at it 
using output function lookups, and value_to_json its self doesn't care 
about the argument type at all. That was all in the regression tests.

Purely so I understand what the correct handling of the anyelement+text 
overload would've been: In light of your comments on opr_sanity would 
the right approach be to add a second C function like text_to_json that 
only accepts 'text' to avoid confusing the sanity check? So the SQL 
"value_to_json(anyelement)" would point to the C "value_to_json" and the 
SQL "value_to_json(text)" would point to the C "text_to_json" ?

Anyway, clearly the value_to_json approach is out.

--
Craig Ringer



Re: PATCH: Implement value_to_json for single-datum conversion

From
Tom Lane
Date:
Craig Ringer <ringerc@ringerc.id.au> writes:
> As for the value_to_json crashing, works for me:

> postgres=# SELECT value_to_json(42);
>   value_to_json
> ---------------
>   42
> (1 row)

Oh, right, because there actually is support for anyelement in the
underlying C function.  There is not in the quote_literal case.

> Purely so I understand what the correct handling of the anyelement+text 
> overload would've been: In light of your comments on opr_sanity would 
> the right approach be to add a second C function like text_to_json that 
> only accepts 'text' to avoid confusing the sanity check?

Actually, given the above, what did you need value_to_json(text) for at
all?  Wouldn't value_to_json(anyelement) have covered it?

But yeah, the general approach to suppressing complaints from that
opr_sanity test is to make more C entry points.  The point of it,
in some sense, is that if you want to make an assumption that two
types are binary-equivalent then it's better to have that assumption
in C code than embedded in the pg_proc entries.  The cases that we
let pass via the "expected" outputs are only ones where binary
equivalence seems pretty well assured, like text vs varchar.
        regards, tom lane



Re: PATCH: Implement value_to_json for single-datum conversion

From
Craig Ringer
Date:
On 08/13/2012 01:55 PM, Tom Lane wrote:
> Actually, given the above, what did you need value_to_json(text) for at
> all?  Wouldn't value_to_json(anyelement) have covered it?

Usability. Without the version accepting text an explicit cast to text 
is required to disambiguate a literal argument like 
value_to_json('something') .

> But yeah, the general approach to suppressing complaints from that
> opr_sanity test is to make more C entry points.  The point of it,
> in some sense, is that if you want to make an assumption that two
> types are binary-equivalent then it's better to have that assumption
> in C code than embedded in the pg_proc entries.  The cases that we
> let pass via the "expected" outputs are only ones where binary
> equivalence seems pretty well assured, like text vs varchar.

Thanks. I appreciate the explanation, and sorry for the newbie error.


On the JSON stuff, I can see it's not as simple as adding a simple 
escape function. For my needs during the 9.2 timeframe I'll bundle up an 
extension with the functionality I need and deal with the need to port 
to whatever 9.3 includes. Hopefully a "json_value" or "javascript_value" 
or similar can be introduced for 9.3, given comments like:
  http://archives.postgresql.org/pgsql-hackers/2012-05/msg00030.php
http://archives.postgresql.org/pgsql-hackers/2012-05/msg00065.php

.. expressing not only the need for json scalars, but the fact that 
they're already commonplace in pretty much everything else.

Given this:
  http://archives.postgresql.org/pgsql-hackers/2012-05/msg00040.php

it does seem that `json' should be a whole document not a fragment, but 
IMO a way to work with individual JSON values is going to be *necessary* 
to get the most out of the json support - and to stop people who use 
JSON in the real world complaining that Pg's JSON support is broken 
because it follows the standard not real-world practice.

Personally the lack of json scalars has prevented me from using JSON 
support in two different places already, though it's proving very useful 
in many others.

--
Craig Ringer