Thread: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
From
Hannu Krosing
Date:
Hi hackers After playing around with array_to_json() and row_to_json() functions a bit it I have a question - why do we even have 2 variants *_to_json() Collapsing array_to_json() and row_to_json() into just to_json() ---------------------------------------------------------------- As the functionality is not yet release maybe we could still rethink the interface and have just one which can deal with all types : to_json(any) returns json the current two versions have to call the "any" variant internally anyhow, to convert individual field values, so why not just expose the full functionality as a single to_json() function You almost can get the "any" functionality now by wrapping the type in an array and afterwards strip outermost [] from the result. Is there any good reason why not expose any_to_json() it directly ? CREATE OR REPLACE FUNCTION to_json(obj anyelement) RETURNS json AS $$ BEGIN RETURN substring(array_to_json(array[obj])::text FROM '^\[(.*)\]$'); END; $$ LANGUAGE plpgsql; hannu=# select to_json('1'::text) ;to_json ---------"1" (1 row) hannu=# select to_json(test) from test limit 2; to_json -------------------------------------------------------------------{"id":9,"data":"testdata","tstamp":"2012-05-01 09:44:50.175189"}{"id":10,"data":"testdata","tstamp":"2012-05-0109:45:50.260276"} (2 rows) Maybe we can remove the *_to_json(functions completely :) --------------------------------------------------------- As a separate note, could we go even further and fold all this functionality into an universal cast, so that attaching ::json to any object will automagically work ? Removing current limitation of PL/pgSQL ---------------------------------------- The above plpgsql definition of to_json() does not currently work with anonymous records defined inline or arrays of such records, but only because Pl/PgSQL functions currently don't accept these types hannu=# select to_json(s) from (select 1::int as i, 't'::text as t union select 2,'x')s; ERROR: PL/pgSQL functions cannot accept type record CONTEXT: compilation of PL/pgSQL function "to_json" near line 1 I think PL/pgSQL could now start accepting such records as the wrinkles which made it hard to (recursively) get the needed info for anonymous records were ironed out when developing the *_to_json() functions so all of it just works. IIRC some of this needed improving data available in core, and was not just extra surgery done directly inside the *_to_json() functions. hannu=# select row_to_json(s) from (select 1::int as i, 't'::text as t union select 2,'x')s; row_to_json -----------------{"i":1,"t":"t"}{"i":2,"t":"x"} (2 rows) and even hannu=# select row_to_json(s) from (select 1::int as i, (select z from(select 2::int as j, 'x'::text as x)z) as t union select 2,null)s; row_to_json -----------------------------{"i":1,"t":{"j":2,"x":"x"}}{"i":2,"t":null} (2 rows) -- ------- Hannu Krosing PostgreSQL Unlimited Scalability and Performance Consultant 2ndQuadrant Nordic PG Admin Book: http://www.2ndQuadrant.com/books/
Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
From
Merlin Moncure
Date:
On Tue, May 1, 2012 at 7:02 AM, Hannu Krosing <hannu@2ndquadrant.com> wrote: > Hi hackers > > After playing around with array_to_json() and row_to_json() functions a > bit it I have a question - why do we even have 2 variants *_to_json() > > Collapsing array_to_json() and row_to_json() into just to_json() I asked the same question. It was noted that the xml functions aren't overloaded like that and that it's cleaner to introduce datum specific behaviors if you don't overload. I don't really agree with that or any of the naming styles that are in the form inputtype_func() but I think most people are on the other side of the argument. merlin
Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
From
Hannu Krosing
Date:
On Tue, 2012-05-01 at 08:18 -0500, Merlin Moncure wrote: > On Tue, May 1, 2012 at 7:02 AM, Hannu Krosing <hannu@2ndquadrant.com> wrote: > > Hi hackers > > > > After playing around with array_to_json() and row_to_json() functions a > > bit it I have a question - why do we even have 2 variants *_to_json() > > > > Collapsing array_to_json() and row_to_json() into just to_json() > > I asked the same question. It was noted that the xml functions aren't > overloaded like that and that it's cleaner to introduce datum specific > behaviors if you don't overload. XML, being an "enterprise" thing is a large and complex beast. Javascript - and by extension json - comes from the other end, being lightweight and elegant at core. Also, the the *_to_xml functions present still don't match what is there for json, they don't even overlap ! Thus I see no reason why deciding on how to_json() functions (or cast to json) should work needs to be based on how xml works. We currently don't have any of the "database_to_json()" or "querystring_to_json()" and we don't need these either. I'd be much more happy by just having a working cast to json from all types, not a myriad of functions for all possible types - int4_to_json(), text_to_json(), bool_to_json(), record_to_json(), array_to_json(), pg_user_to_json, etc. etc. etc. What we currently have exposed to userspace are two arbitrarily chosen "compex type" functions - array_to_json() for converting arrays of ANY element type to json , inluding arrays consisting of records which may again contain arrays and records. and row_to_json() for converting "rows" again potentially consisting of ANY TYPE, including arrays of any type and any complex type. It handles even the row() type :) hannu=# select row_to_json(row(1,2,3)); row_to_json ------------------------{"f1":1,"f2":2,"f3":3} (1 row) What we currently lack is direct conversion for simple types, though they are easily achieved by converting to a single-element array and then stripping outer [] from the result It would be really nice to also have the casts from json to any type, including records though. And perhaps one functions for converting schema elements to some json representation, so that a json_dump could easily be constructed :) We really do not need footguns similar to database_to_xml() or schema_to_xml() which just to consume all memory in the server on any real database. > I don't really agree with that or any of the naming styles that are in > the form inputtype_func() but I think most people are on the other > side of the argument. I think that most people have not given this any thought yet, so they simply lack any reasoned opinion ;) > merlin -- ------- Hannu Krosing PostgreSQL Unlimited Scalability and Performance Consultant 2ndQuadrant Nordic PG Admin Book: http://www.2ndQuadrant.com/books/
Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
From
Joey Adams
Date:
On Tue, May 1, 2012 at 8:02 AM, Hannu Krosing <hannu@2ndquadrant.com> wrote: > Hi hackers > > After playing around with array_to_json() and row_to_json() functions a > bit it I have a question - why do we even have 2 variants *_to_json() Here's the discussion where that decision was made: http://archives.postgresql.org/pgsql-hackers/2012-01/msg01339.php To quote: >>> why not call all these functions 'to_json' and overload them? >> >> I don't honestly feel that advances clarity much. And we might want to overload each at some stage with options that arespecific to the datum type. We have various foo_to_xml() functions now. > > -1 > > older proposal is more consistent with xml functions The most compelling argument I see here is the one about options specific to the datum type. Two other reasons I can think of: * If someone tries to google for how to convert an array to JSON, having a function named 'array_to_json' will make that easier. * If the JSON type does not yet support, say, converting from a number, it will be apparent from the names and types of the functions, rather than being a hidden surprise. On the other hand, array_to_json and composite_to_json already convert ANY values to JSON, so this doesn't matter, anyway. On Tue, May 1, 2012 at 11:02 AM, Hannu Krosing <hannu@2ndquadrant.com> wrote: > What we currently lack is direct conversion for simple types, though > they are easily achieved by converting to a single-element array and > then stripping outer [] from the result I agree that this function ought to be exposed. Note that such a function (perhaps called datum_to_json) is indeed the same as the proposed to_json function, which tries to convert a value of any type to JSON. > It would be really nice to also have the casts from json to any type, > including records though. What the casts currently do (primarily) is convert between the TEXT and JSON types. So if you have JSON-encoded TEXT, use a cast to convert it to the JSON type (this will perform validation, ensuring that no invalid JSON gets in). Any escape/unescape operations need to be explicit. -Joey
Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
From
Merlin Moncure
Date:
On Tue, May 1, 2012 at 10:49 AM, Joey Adams <joeyadams3.14159@gmail.com> wrote: > On Tue, May 1, 2012 at 8:02 AM, Hannu Krosing <hannu@2ndquadrant.com> wrote: >> Hi hackers >> >> After playing around with array_to_json() and row_to_json() functions a >> bit it I have a question - why do we even have 2 variants *_to_json() > > Here's the discussion where that decision was made: > > http://archives.postgresql.org/pgsql-hackers/2012-01/msg01339.php > > To quote: > >>>> why not call all these functions 'to_json' and overload them? >>> >>> I don't honestly feel that advances clarity much. And we might want to overload each at some stage with options thatare specific to the datum type. We have various foo_to_xml() functions now. >> >> -1 >> >> older proposal is more consistent with xml functions > > The most compelling argument I see here is the one about options > specific to the datum type. I don't find that to be particularly compelling at all. to_timestamp for example supports multiple argument versions depending on the input type. > * If the JSON type does not yet support, say, converting from a > number, it will be apparent from the names and types of the functions, > rather than being a hidden surprise. On the other hand, array_to_json > and composite_to_json already convert ANY values to JSON, so this > doesn't matter, anyway. I don't see how not having to_json(type) is any less surprising than type_to_json(). To add: Are we going to have json_length()? Why shouldn't length operate directly on the json type since it has a length? Or are we going to force an implicit cast to text? An elementary point of generic programming through SQL is that you are supposed to keep *what you are trying to do* decoupled from *what you're doing it on*. It allows for very natural and terse programming. The array, xml, and now the json apis essentially violate this principle. The array api I find particularly galling since you end up having to retype 'array' N times in a single expression. merlin
Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
From
Hannu Krosing
Date:
On Tue, 2012-05-01 at 11:49 -0400, Joey Adams wrote: > On Tue, May 1, 2012 at 8:02 AM, Hannu Krosing <hannu@2ndquadrant.com> wrote: > > Hi hackers > > > > After playing around with array_to_json() and row_to_json() functions a > > bit it I have a question - why do we even have 2 variants *_to_json() > > Here's the discussion where that decision was made: > > http://archives.postgresql.org/pgsql-hackers/2012-01/msg01339.php Thanks, will read it! > To quote: > > >>> why not call all these functions 'to_json' and overload them? > >> > >> I don't honestly feel that advances clarity much. And we might want > to overload each at some stage with options that are specific to the > datum type. We have various foo_to_xml() functions now. > > > > -1 > > > > older proposal is more consistent with xml functions > > The most compelling argument I see here is the one about options > specific to the datum type. > > Two other reasons I can think of: > > * If someone tries to google for how to convert an array to JSON, > having a function named 'array_to_json' will make that easier. Well, if you want to know how to convert an integer to string, you don't use integer_to_text() function. you just use a working cast. and here it is an outright lie: hannu=# select 1::json; ERROR: cannot cast type integer to json LINE 1: select 1::json; the error should be "won't cast type integer to json" :) It very well _can_ convert it, as it does it without a problem when such integer is inside an array or a record type. > * If the JSON type does not yet support, say, converting from a > number, it will be apparent from the names and types of the functions, > rather than being a hidden surprise. On the other hand, array_to_json > and composite_to_json already convert ANY values to JSON, so this > doesn't matter, anyway. By this logic all non-working casts are "hidden surprises" > On Tue, May 1, 2012 at 11:02 AM, Hannu Krosing <hannu@2ndquadrant.com> wrote: > > What we currently lack is direct conversion for simple types, though > > they are easily achieved by converting to a single-element array and > > then stripping outer [] from the result > > I agree that this function ought to be exposed. Note that such a > function (perhaps called datum_to_json) is indeed the same as the > proposed to_json function, which tries to convert a value of any type > to JSON. Hmm, I just have found an answer to my question on how to add to_json() capability to extension types . If whe had to_json as a cast, it would probably be straightforward for extensions like hstore to provide their own to_json casts - especially now that the json type is in core - and we could get a working hstore --> json conversion by just running the CREATE EXTENSION hstore; command. > > It would be really nice to also have the casts from json to any type, > > including records though. > > What the casts currently do (primarily) is convert between the TEXT > and JSON types. So if you have JSON-encoded TEXT, use a cast to > convert it to the JSON type (this will perform validation, ensuring > that no invalid JSON gets in). Any escape/unescape operations need to > be explicit. > > -Joey -- ------- Hannu Krosing PostgreSQL Unlimited Scalability and Performance Consultant 2ndQuadrant Nordic PG Admin Book: http://www.2ndQuadrant.com/books/
Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
From
Andrew Dunstan
Date:
On Tue, May 1, 2012 at 9:05 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
I am away from base on a consulting assignment all this week, so my connectivity and time are severely limited, and I don't have time to respond in depth.
Let me just point out two things. First, we are approaching a beta release. The time for changing this is long since gone, IMNSHO.
Second, RFC 4627 is absolutely clear: a valid JSON value can only be an object or an array, so this thing about converting arbitrary datum values to JSON is a fantasy. If anything, we should adjust the JSON input routines to disallow anything else, rather than start to output what is not valid JSON.
cheersOn Tue, May 1, 2012 at 10:49 AM, Joey Adams <joeyadams3.14159@gmail.com> wrote:I don't find that to be particularly compelling at all. to_timestamp
> On Tue, May 1, 2012 at 8:02 AM, Hannu Krosing <hannu@2ndquadrant.com> wrote:
>> Hi hackers
>>
>> After playing around with array_to_json() and row_to_json() functions a
>> bit it I have a question - why do we even have 2 variants *_to_json()
>
> Here's the discussion where that decision was made:
>
> http://archives.postgresql.org/pgsql-hackers/2012-01/msg01339.php
>
> To quote:
>
>>>> why not call all these functions 'to_json' and overload them?
>>>
>>> I don't honestly feel that advances clarity much. And we might want to overload each at some stage with options that are specific to the datum type. We have various foo_to_xml() functions now.
>>
>> -1
>>
>> older proposal is more consistent with xml functions
>
> The most compelling argument I see here is the one about options
> specific to the datum type.
for example supports multiple argument versions depending on the input
type.
> * If the JSON type does not yet support, say, converting from a
> number, it will be apparent from the names and types of the functions,
> rather than being a hidden surprise. On the other hand, array_to_json
> and composite_to_json already convert ANY values to JSON, so this
> doesn't matter, anyway.
I am away from base on a consulting assignment all this week, so my connectivity and time are severely limited, and I don't have time to respond in depth.
Let me just point out two things. First, we are approaching a beta release. The time for changing this is long since gone, IMNSHO.
Second, RFC 4627 is absolutely clear: a valid JSON value can only be an object or an array, so this thing about converting arbitrary datum values to JSON is a fantasy. If anything, we should adjust the JSON input routines to disallow anything else, rather than start to output what is not valid JSON.
andrew
Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
From
Merlin Moncure
Date:
On Tue, May 1, 2012 at 11:22 AM, Andrew Dunstan <andrew@dunslane.net> wrote: > > > On Tue, May 1, 2012 at 9:05 AM, Merlin Moncure <mmoncure@gmail.com> wrote: >> >> On Tue, May 1, 2012 at 10:49 AM, Joey Adams <joeyadams3.14159@gmail.com> >> wrote: >> > On Tue, May 1, 2012 at 8:02 AM, Hannu Krosing <hannu@2ndquadrant.com> >> > wrote: >> >> Hi hackers >> >> >> >> After playing around with array_to_json() and row_to_json() functions a >> >> bit it I have a question - why do we even have 2 variants *_to_json() >> > >> > Here's the discussion where that decision was made: >> > >> > http://archives.postgresql.org/pgsql-hackers/2012-01/msg01339.php >> > >> > To quote: >> > >> >>>> why not call all these functions 'to_json' and overload them? >> >>> >> >>> I don't honestly feel that advances clarity much. And we might want to >> >>> overload each at some stage with options that are specific to the datum >> >>> type. We have various foo_to_xml() functions now. >> >> >> >> -1 >> >> >> >> older proposal is more consistent with xml functions >> > >> > The most compelling argument I see here is the one about options >> > specific to the datum type. >> >> I don't find that to be particularly compelling at all. to_timestamp >> for example supports multiple argument versions depending on the input >> type. >> >> > * If the JSON type does not yet support, say, converting from a >> > number, it will be apparent from the names and types of the functions, >> > rather than being a hidden surprise. On the other hand, array_to_json >> > and composite_to_json already convert ANY values to JSON, so this >> > doesn't matter, anyway. > > > > I am away from base on a consulting assignment all this week, so my > connectivity and time are severely limited, and I don't have time to respond > in depth. > > Let me just point out two things. First, we are approaching a beta release. > The time for changing this is long since gone, IMNSHO. sure. pedantic philosophical arguments aside, I'm already using the api heavily and would prefer not to see it changed :-). merlin
Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes: > Let me just point out two things. First, we are approaching a beta release. > The time for changing this is long since gone, IMNSHO. This is our last chance to get it right, so that argument doesn't seem to me to carry a lot of weight ... > Second, RFC 4627 is absolutely clear: a valid JSON value can only be an > object or an array, so this thing about converting arbitrary datum values > to JSON is a fantasy. If anything, we should adjust the JSON input routines > to disallow anything else, rather than start to output what is not valid > JSON. ... but this one does. regards, tom lane
Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
From
Joey Adams
Date:
On Tue, May 1, 2012 at 12:22 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > Second, RFC 4627 is absolutely clear: a valid JSON value can only be an > object or an array, so this thing about converting arbitrary datum values to > JSON is a fantasy. If anything, we should adjust the JSON input routines to > disallow anything else, rather than start to output what is not valid JSON. No, the RFC says (emphasis mine): A JSON *text* is a serialized object or array. If we let the JSON type correspond to a *value* instead, this restriction does not apply, and the JSON type has a useful recursive definition. For example, this function would not be possible if we applied the "object or array" restriction: unnest(json) returns setof json Note that a similar distinction appears with the XML type: "document" versus "content". -Joey
Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
From
Andrew Dunstan
Date:
<br /><br /><div class="gmail_quote">On Tue, May 1, 2012 at 9:56 AM, Joey Adams <span dir="ltr"><<a href="mailto:joeyadams3.14159@gmail.com"target="_blank">joeyadams3.14159@gmail.com</a>></span> wrote:<br /><blockquoteclass="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">OnTue, May 1, 2012 at 12:22 PM, Andrew Dunstan <<a href="mailto:andrew@dunslane.net">andrew@dunslane.net</a>>wrote:<br /> > Second, RFC 4627 is absolutely clear: a validJSON value can only be an<br /> > object or an array, so this thing about converting arbitrary datum values to<br/> > JSON is a fantasy. If anything, we should adjust the JSON input routines to<br /> > disallow anything else,rather than start to output what is not valid JSON.<br /><br /></div>No, the RFC says (emphasis mine):<br /><br /> A JSON *text* is a serialized object or array.<br /><br /> If we let the JSON type correspond to a *value* instead, this<br/> restriction does not apply, and the JSON type has a useful recursive<br /> definition.<br /><br /> For example,this function would not be possible if we applied the<br /> "object or array" restriction:<br /><br /> unnest(json)returns setof json<br /><br /> Note that a similar distinction appears with the XML type: "document"<br /> versus"content".<br /><span class="HOEnZb"><font color="#888888"><br /><br /></font></span></blockquote></div><br />I thinkyou're playing with words. But in any case, the RFC says this regarding generators:<br /><br /><pre>5. Generators A JSON generator produces JSON text. The resulting text MUST strictly conform to the JSON grammar.<br /><br />Our functionsdo seem to be JSON generators. So even if we accept things that aren't JSON texts in our parser (possibly permittedby section 4 of the RFC) we should not be generating them.<br /> <br /><br />cheers<br /><br />andrew<br /><br /></pre><br />
Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
From
Hannu Krosing
Date:
On Tue, 2012-05-01 at 12:54 -0400, Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > > Let me just point out two things. First, we are approaching a beta release. > > The time for changing this is long since gone, IMNSHO. > > This is our last chance to get it right, so that argument doesn't seem > to me to carry a lot of weight ... > > > Second, RFC 4627 is absolutely clear: a valid JSON value can only be an > > object or an array, No, according to RFC, a valid "JSON value" can be: an object, an array, a number, a string, or one of false null true >From RFC: ------------------------- 1. Introduction JavaScript Object Notation (JSON) is a text format for the serialization of structured data. It is derived from the object literals of JavaScript, as defined in the ECMAScript Programming Language Standard, Third Edition [ECMA]. JSON can represent four primitive types (strings, numbers, booleans, and null) and two structured types (objects and arrays). ... 2.1. Values A JSON value MUST be an object, array, number, or string, or one of the following three literal names: false null true ------------------------- By having our JSON type mean a "JSON value" instead of "JSON text" (which indeed is required to be array or object) we could make it easy for all extension types to provide casts to "JSON value" and thus automatically plug them into postgreSQL's built-in JSON support. I would very much like this the *_to_array() functions first try a cast to json when converting values, so that for example after the following cast it would do the right thing for hstore . CREATE FUNCTION hstore_to_json(IN hvalue hstore, OUT jvalue json) AS $$ return '{%s}' % hvalue.replace('"=>"','":"') $$ LANGUAGE plpythonu; CREATE CAST (hstore AS json) WITH FUNCTION hstore_to_json(hstore) AS IMPLICIT ; hannu=# select *, datadict::json from test_hstore;id | datadict | datadict ----+---------------------------------+--------------------------------- 1 | "baz"=>"whatever", "foo"=>"bar" | {"baz":"whatever","foo":"bar"} 2 | "bar"=>"the same", "foo"=>"bar" | {"bar":"the same", "foo":"bar"} (2 rows) Currently it seems to be hardwired to do datum --> text conversions hannu=# select row_to_json(test_hstore) from test_hstore; row_to_json ---------------------------------------------------------------{"id":1,"datadict":"\"baz\"=>\"whatever\", \"foo\"=>\"bar\""}{"id":2,"datadict":"\"bar\"=>\"thesame\", \"foo\"=>\"bar\""} (2 rows) I'd like it to try datum --> json first and yield hannu=# select row_to_json(test_hstore) from test_hstore; row_to_json ---------------------------------------------------------------{"id":1,"datadict":{"baz":"whatever", "foo":"bar"}}{"id":2,"datadict":{"bar":"thesame", "foo":"bar"}} (2 rows) This exact case could be made to work even with "JSON text" meaning ob json type, but some other types may not be so lucky. FOr example imagine a tri-value booean with textual values "yes", "no", and "don't know" . Logical mapping to json would be true, false, null, but we can't easily provide a triboolean --> json cast for this if we require json value to be "JSON text" and don't accept "JSON values" > > so this thing about converting arbitrary datum values > > to JSON is a fantasy. It should be possible to cast them to "JSON value", but not always "JSON text" which indeed has to be array or object . > > If anything, we should adjust the JSON input routines > > to disallow anything else, rather than start to output what is not valid > > JSON. Nah, I'd like us to accept what other JSON parsers usually accept, especially the original one described in http://www.json.org/fatfree.html which cited one way to parse json to be responseData = eval('(' + responseText + ')'); :) But then I also like their statement when comparing JSON to XML : JSON has no validator. Being well-formed and valid is not the same as being correct and relevant. Ultimately, every application is responsible for validating its inputs. This cannot be delegated. > ... but this one does. It does, _if_ we accept that json type is for "JSON text" and not "JSON value". in which case we might need also a json_value type for extensible casting to and from json. > regards, tom lane -- ------- Hannu Krosing PostgreSQL Unlimited Scalability and Performance Consultant 2ndQuadrant Nordic PG Admin Book: http://www.2ndQuadrant.com/books/
Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
From
Hannu Krosing
Date:
On Tue, 2012-05-01 at 11:11 -0700, Andrew Dunstan wrote: > On Tue, May 1, 2012 at 9:56 AM, Joey Adams > <joeyadams3.14159@gmail.com> wrote: ... > No, the RFC says (emphasis mine): > > A JSON *text* is a serialized object or array. > > If we let the JSON type correspond to a *value* instead, this > restriction does not apply, and the JSON type has a useful > recursive > definition. > > For example, this function would not be possible if we applied > the > "object or array" restriction: > > unnest(json) returns setof json > > Note that a similar distinction appears with the XML type: > "document" > versus "content". > > > I think you're playing with words. But in any case, the RFC says this > regarding generators: > > 5. Generators > > A JSON generator produces JSON text. The resulting text MUST > strictly conform to the JSON grammar. I know it is a weak argument, but at least python, ruby and Javascript in both Firefox and Chrome do "generate" JSON values, so our users might kind of expect us to do the same : Python --------- hannu@hvost:~$ python Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56) [GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import json >>> json.dumps(1) '1' Ruby: ------- hannu@hvost:~$ irb irb(main):001:0> require 'json' => true irb(main):002:0> JSON.generate(1) => "1" Mozilla Javascript ------------------ >>> JSON.stringify(1) "1" Chrome Javascript ------------------ > JSON.stringify(1) "1" > Our functions do seem to be JSON generators. So even if we accept > things that aren't JSON texts in our parser (possibly permitted > by section 4 of the RFC) we should not be generating them. we could have a function json_generator(json) returns text for generating pure JSON texts ;) the argument for interpreting out json type as "JSON value" instead of "JSON text" would be much bigger flexibility regarding extension types support for json (and not only extension types, but also the types explicitly unsupported by json, like Date) via defining casts to and from json. >From http://en.wikipedia.org/wiki/JSON ------------------------------------------------------------------------- Unsupported native data types ----------------------------- JavaScript syntax defines several native data types not included in the JSON standard:[7] Date, Error, Math, Regular Expression, and Function. These JavaScript data types must be represented as some other data format, with the programs on both ends agreeing on how to convert between types. As of 2011, there are some de facto standards for e.g. converting between Date and String, but none universally recognized.[8][9] Other languages may have a different set of native types that must be serialized carefully to deal with this type of conversion. ------------------------------------------------------------------------- If we allowed user-defined casts for things like Date the out users could decide, which de facto standard to support on each specific case . But if we did interpret out json type strictly as "JSON text", not "JSON value", this would need another type for json_value. And it is entirely possible that somebody does want to do what merlin described recently, that is get a rowset of "json" values from the client and wrap them in '[' and ']' on way out, it wuld be shame to restrict his json array elements to be just objects and arrays and not the other legal json values. -- ------- Hannu Krosing PostgreSQL Unlimited Scalability and Performance Consultant 2ndQuadrant Nordic PG Admin Book: http://www.2ndQuadrant.com/books/
Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
From
Hannu Krosing
Date:
On Tue, 2012-05-01 at 09:22 -0700, Andrew Dunstan wrote: > > > On Tue, May 1, 2012 at 9:05 AM, Merlin Moncure <mmoncure@gmail.com> > wrote: > On Tue, May 1, 2012 at 10:49 AM, Joey Adams > <joeyadams3.14159@gmail.com> wrote: > > On Tue, May 1, 2012 at 8:02 AM, Hannu Krosing > <hannu@2ndquadrant.com> wrote: > >> Hi hackers > >> > >> After playing around with array_to_json() and row_to_json() > functions a > >> bit it I have a question - why do we even have 2 variants > *_to_json() > > > > Here's the discussion where that decision was made: > > > > > http://archives.postgresql.org/pgsql-hackers/2012-01/msg01339.php > > > > To quote: > > > >>>> why not call all these functions 'to_json' and overload > them? > >>> > >>> I don't honestly feel that advances clarity much. And we > might want to overload each at some stage with options that > are specific to the datum type. We have various foo_to_xml() > functions now. > >> > >> -1 > >> > >> older proposal is more consistent with xml functions > > > > The most compelling argument I see here is the one about > options > > specific to the datum type. > > > I don't find that to be particularly compelling at all. > to_timestamp > for example supports multiple argument versions depending on > the input > type. > > > * If the JSON type does not yet support, say, converting > from a > > number, it will be apparent from the names and types of the > functions, > > rather than being a hidden surprise. On the other hand, > array_to_json > > and composite_to_json already convert ANY values to JSON, so > this > > doesn't matter, anyway. > > > > I am away from base on a consulting assignment all this week, so my > connectivity and time are severely limited, and I don't have time to > respond in depth. > > Let me just point out two things. First, we are approaching a beta > release. The time for changing this is long since gone, IMNSHO. First, let me start with stating that I am really happy (and a little amazed and envious ;) ) with what the current to_json functions are capable of. It is already way better than what current query_to_xml could do (at least int 9.1, may have improved since). hannu=# select row_to_json(z) from (select 1::int as a, (select s from (select 2::int as x, 2::text as b)s))z; row_to_json -----------------------------{"a":1,"s":{"x":2,"b":"2"}} (1 row) hannu=# select query_to_xml('select 1::int as a, (select s from (select 2::int as x, 2::text as b)s)',true,true,''); query_to_xml -------------------------------------------------------------<row xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">+ + <a>1</a> + <_x003F_column_x003F_>(2,2)</_x003F_column_x003F_> +</row> + + (1 row) The reason I am whining now is that with minor adjustments in implementation it could all be made much more powerful (try cast to ::json for values before cast to ::text) and much more elegant thanks to PostgreSQL's built in casting. If we allowed json to hold any "JSON value" and tried ::json when generating json for compound types than we would be able to claim that PostgreSQL supports JSON everywhere, defaulting to representing officially unsupported types as strings, but allowing users to convert these to their preferred conventions. I'd also prefer to have default conversions already included for some of our sexier types, like intervals (just a two element array) and hstore (an object) etc. Suddenly we would be the best match database for Web development and all things Ajax and also have a widely used built in and adjustable interchange format to outer world. > Second, RFC 4627 is absolutely clear: a valid JSON value can only be > an object or an array, so this thing about converting arbitrary datum > values to JSON is a fantasy. Probably a typo on your part - valid "JSON _text_" is object or array, valid "JSON value" can also be number, text, true, false and null What I am arguing for is interpreting our json type as representing a "JSON value" not "JSON text", this would enable users to adjust and extend the generation of json values via defining casts for their specific types - most notably Date* types but also things like hstore, which has a natural JSON representation as "object" (a list of key:value pairs for non-js users, a.k.a. a dictionary, hash, etc.) > If anything, we should adjust the JSON input routines to disallow > anything else, rather than start to output what is not valid JSON. I tested python, ruby and javascript in firefox and chrome, all their JSON generators generate 1 for standalone integer 1 and "a" for standalone string a , and none refused to convert either to JSON. -- ------- Hannu Krosing PostgreSQL Scalability and Performance Consultant 2ndQuadrant Nordic PG Admin Book: http://www.2ndQuadrant.com/books/
Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
From
"David Johnston"
Date:
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers- > owner@postgresql.org] On Behalf Of Hannu Krosing > Sent: Tuesday, May 01, 2012 5:29 PM > > The reason I am whining now is that with minor adjustments in > implementation it could all be made much more powerful (try cast to ::json > for values before cast to ::text) and much more elegant thanks to > PostgreSQL's built in casting. > > If we allowed json to hold any "JSON value" and tried ::json when generating > json for compound types than we would be able to claim that PostgreSQL > supports JSON everywhere, defaulting to representing officially unsupported > types as strings, but allowing users to convert these to their preferred > conventions. I get that a JSON Text is always also a JSON Value but the reverse is not true. Thus, if we define JSON to be JSON Valuewe cannot guarantee that the encoded value is a possible JSON Text - the most important property for purposes of datainterchange. > > I'd also prefer to have default conversions already included for some of our > sexier types, like intervals (just a two element array) and hstore (an object) > etc. Interval is not simply 2 values but also denotes whether the particular value is inclusive or exclusive; you would have touse an object unless you transmit in a text format and let the target perform the necessary interpretation of the string. > > Suddenly we would be the best match database for Web development and > all things Ajax and also have a widely used built in and adjustable interchange > format to outer world. > > > Second, RFC 4627 is absolutely clear: a valid JSON value can only be > > an object or an array, so this thing about converting arbitrary datum > > values to JSON is a fantasy. > > Probably a typo on your part - valid "JSON _text_" is object or array, valid > "JSON value" can also be number, text, true, false and null > > What I am arguing for is interpreting our json type as representing a "JSON > value" not "JSON text", this would enable users to adjust and extend the > generation of json values via defining casts for their specific types - most > notably Date* types but also things like hstore, which has a natural JSON > representation as "object" (a list of key:value pairs for non-js users, a.k.a. a > dictionary, hash, etc.) Aside from the fact it is likely too late to change the interpretation I would argue against doing so in any case. Currently, the idea is to get your result all lined up and ready to go and then ship it off to the caller as valid JSON sothat the caller does not have to do so itself. Answering the question "what would this value look like if it was partof a json output?" is good; however, production use is likely to mostly care about the entire json interchange construct(i.e., JSON Text) So: json -> json_text; A JSON Value always has a textual representation but if we were to have an actual type it would make sense to encode it suchthat (strings, objects and arrays) are delimited while (numbers, false, true, and null) are not. Type Name: json_value Output Representations (all output surrounded by double-quotes since all are string-like) - String: "'VALUE'" (single-quote delimiter) Object: "{...}" Array: "[]" Number: "0.00" Other: "false", "true", "null" JSON is fundamentally an interchange format (especially from a database's perspective). JSON Values only really have meaningif they are attached explicitly to a JSON Text structure, if you wanted to store one independently you should convertit into a native representation first. The few exceptions to this would be sufficiently handled via plain text withmeta-data indicating that the stored value is structured in directly JSON compatible syntax. In short, the default contextfor JSON in PostgreSQL should JSON Text (not JSON Value) and thus the unadorned "json" should reflect this default(which it does). > > > If anything, we should adjust the JSON input routines to disallow > > anything else, rather than start to output what is not valid JSON. > > I tested python, ruby and javascript in firefox and chrome, all their JSON > generators generate 1 for standalone integer 1 and "a" for standalone string > a , and none refused to convert either to JSON. > ^Assume that we keep the meaning of json to be JSON Text; what would you suggest occurs if someone attempts a datum -> jsoncast? Given that we are working in a strongly-typed environment the meaning of JSON cannot be changed and so eitherthe cast has to output valid JSON Text or it has to fail. My personal take it is have it fail since any arbitrarydecision to cast to JSON Text is going to make someone unhappy and supposedly they can modify their query so thatthe result generates whatever format they desire. I haven't followed the JSON development in 9.2 too closely but exposing whatever conversion mechanism is currently used togenerate JSON makes sense from a ease-of-development standpoint. But even then, during development passing around trueJSON Text is not a big deal and then no "JSON_Value" API has to be exposed; thus it can be freely refined, along withrelated behavior - e.g., append_to_json(value json_value, location text), in 9.3 So, in short, all of your ideas are still valid but use "json_value" for the data type. But, even them my guess is thatyou would rarely use json_value as a column type whereas you would frequently use json (JSON Text) for one. json_valuewould be a support type to facilitate working with json in a procedural-like way. David J.
Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes: > On Tue, May 1, 2012 at 9:56 AM, Joey Adams <joeyadams3.14159@gmail.com>wrote: >> No, the RFC says (emphasis mine): >> >> A JSON *text* is a serialized object or array. >> >> If we let the JSON type correspond to a *value* instead, this >> restriction does not apply, and the JSON type has a useful recursive >> definition. > I think you're playing with words. But in any case, the RFC says this > regarding generators: > 5. Generators > A JSON generator produces JSON text. The resulting text MUST > strictly conform to the JSON grammar. I read over the RFC, and I think the only reason why they restricted JSON texts to represent just a subset of JSON values is this cute little hack in section 3 (Encoding): Since the first two characters of a JSON text will always be ASCII characters [RFC0020], it is possible to determine whetheran octet stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking at the pattern of nulls in the firstfour octets. 00 00 00 xx UTF-32BE 00 xx 00 xx UTF-16BE xx 00 00 00 UTF-32LE xx00 xx 00 UTF-16LE xx xx xx xx UTF-8 They need a guaranteed 2 ASCII characters to make that work, and they won't necessarily get that many with a bare string literal. Since for our purposes there is not, and never will be, any need to figure out whether a JSON input string is encoded in UTF16 or UTF32, I find myself agreeing with the camp that says we might as well consider that our JSON type corresponds to JSON values not JSON texts. I also notice that json_in() seems to believe that already. However, that doesn't mean I'm sold on the idea of getting rid of array_to_json and row_to_json in favor of a universal "to_json()" function. In particular, both of those have optional "pretty_bool" arguments that don't fit nicely at all in a generic conversion function. The meaning of that flag is very closely tied to the input being an array or record respectively. I'm inclined to leave these functions as they are, and consider adding a universal "to_json(anyelement)" (with no options) later. Because it would not have options, it would not be meant to cover cases where there's value in formatting or conversion options; so it wouldn't render the existing functions entirely obsolete, nor would it mean there would be no need for other specialized conversion functions. regards, tom lane
Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
From
Hannu Krosing
Date:
On Tue, 2012-05-01 at 18:35 -0400, David Johnston wrote: > > -----Original Message----- > > From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers- > > owner@postgresql.org] On Behalf Of Hannu Krosing > > Sent: Tuesday, May 01, 2012 5:29 PM > > > > The reason I am whining now is that with minor adjustments in > > implementation it could all be made much more powerful (try cast to ::json > > for values before cast to ::text) and much more elegant thanks to > > PostgreSQL's built in casting. > > > > If we allowed json to hold any "JSON value" and tried ::json when generating > > json for compound types than we would be able to claim that PostgreSQL > > supports JSON everywhere, defaulting to representing officially unsupported > > types as strings, but allowing users to convert these to their preferred > > conventions. > > I get that a JSON Text is always also a JSON Value but the reverse is not true. > Thus, if we define JSON to be JSON Value we cannot guarantee that the encoded > value is a possible JSON Text - the most important property for purposes of > data interchange. Nope, the most important property for purposes of data interchange is that we produce something that the client expects and can understand without too much extra work on client side. The way to "guarantee" JSON Text is to encode objects that produce it. I see nothing wrong with returning either a complex JSON object of simply null if the object could not be found. > > I'd also prefer to have default conversions already included for some of our > > sexier types, like intervals (just a two element array) and hstore (an object) > > etc. > > Interval is not simply 2 values but also denotes whether the particular value > is inclusive or exclusive; you would have to use an object unless you transmit > in a text format and let the target perform the necessary interpretation of > the string. if you need that info to be passed to _your_ caller you just define a ned cast for youtr interval-to-json which returns "object" notation. It was meant as a sample of what could be included by having generic json values and using json casts. > > Suddenly we would be the best match database for Web development and > > all things Ajax and also have a widely used built in and adjustable interchange > > format to outer world. > > > > > Second, RFC 4627 is absolutely clear: a valid JSON value can only be > > > an object or an array, so this thing about converting arbitrary datum > > > values to JSON is a fantasy. > > > > Probably a typo on your part - valid "JSON _text_" is object or array, valid > > "JSON value" can also be number, text, true, false and null > > > > What I am arguing for is interpreting our json type as representing a "JSON > > value" not "JSON text", this would enable users to adjust and extend the > > generation of json values via defining casts for their specific types - most > > notably Date* types but also things like hstore, which has a natural JSON > > representation as "object" (a list of key:value pairs for non-js users, a.k.a. a > > dictionary, hash, etc.) > > Aside from the fact it is likely too late to change the interpretation I would argue against doing so in any case. > > Currently, the idea is to get your result all lined up and ready to go and > then ship it off to the caller as valid JSON so that the caller does not > have to do so itself. "Valid JSON" is only a small part of the equation, as I quoted before from "JSON: The Fat-Free Alternative to XML" at http://www.json.org/fatfree.html "JSON has no validator. Being well-formed and valid is not the same as being correct and relevant. Ultimately, every application is responsible for validating its inputs." If we produce correct JSON text for things that are converible to JSON text then we should be free to produce JSON values for simple value, like everybody else (that is Javascript , python, ruby, ...) I don't think it is postgreSQL's business to start educating people about "correct" way to do JSON serialisation when everybody else does it the generic way. > Answering the question "what would this value look like if it was part of > a json output?" is good; however, production use is likely to mostly care > about the entire json interchange construct (i.e., JSON Text) the "what would it look like" part is important for values that are not covered by standard and are thus encoded as text. These need to follow conventions outside the JSON spec proper, and thus may need to be adjusted by the developer. Doing it via ::json casts would be the cleanest and simplest way to deal with it. > So: json -> json_text; > > A JSON Value always has a textual representation but if we were to have an > actual type it would make sense to encode it such that (strings, objects > and arrays) are delimited while (numbers, false, true, and null) are not. And so it is, what are you trying to say here ? > Type Name: json_value > > Output Representations (all output surrounded by double-quotes since all are string-like) - > String: "'VALUE'" (single-quote delimiter) strings are surrounded by "" not '' > Object: "{...}" > Array: "[]" > Number: "0.00" "0.00" is a sting according to JSON spec also, objects and arrays don't have surrounding "" > Other: "false", "true", "null" that is not what the standard says - "false" is not the same as false without quotes - the first is string 'false', the second one is boolean untrue > JSON is fundamentally an interchange format (especially from a database's perspective). > JSON Values only really have meaning if they are attached explicitly to a JSON Text > structure, if you wanted to store one independently you should convert it into a > native representation first. The few exceptions to this would be sufficiently > handled via plain text with meta-data indicating that the stored value is structured > in directly JSON compatible syntax. This is something that each developer could define for his specific app by writing an appropriate ::json cast function for the datatype, if such casts were honoured while doing the eoutput conversion. There is no one standard for mosts "outside jsons scope" datatypes, > In short, the default context for JSON in > PostgreSQL should JSON Text (not JSON Value) and thus the unadorned "json" should > reflect this default (which it does). I don't think postgreSQL has hierarchical types, so that json_text would also be json_value, but not the other way. I'm perfectly happy with us being flexible enough to allow people to also generate non-standard JSON - as some of them most likely are now - if it makes the whole system cleaner and easier to use. I don't see why PostgreSQL's JSON should be more restrictive than most others. > > > If anything, we should adjust the JSON input routines to disallow > > > anything else, rather than start to output what is not valid JSON. > > > > I tested python, ruby and javascript in firefox and chrome, all their JSON > > generators generate 1 for standalone integer 1 and "a" for standalone string > > a , and none refused to convert either to JSON. > > > > Assume that we keep the meaning of json to be JSON Text; what would you > suggest occurs if someone attempts a datum -> json cast? I would not assume such thing ;) > Given that we are working in a strongly-typed environment the meaning of > JSON cannot be changed and so either the cast has to output valid JSON Text > or it has to fail. Most people don't work in strongly-typed environment, and thus would work around such restriction if they need a simple JSON value at the other end of the interchange. They would either do like I did and use array_to_json(array[datum]) and then strip off the [] before shipping the JSON in the most likely case that receiver side _wants_ to get JSON value and has been getting a JSON value from other partners, or it may decode it as array and then use the only element. > My personal take it is have it fail since any arbitrary decision to cast > to JSON Text is going to make someone unhappy and supposedly they can > modify their query so that the result generates whatever format they desire. Do you actually have such an experience or is it just a wild guess ? Would the one being "unhappy" be some developer who is actually using JSON, or somebody whoi has only academic interest in RFCs ? > I haven't followed the JSON development in 9.2 too closely but exposing > whatever conversion mechanism is currently used to generate JSON makes > sense from a ease-of-development standpoint. But even then, during > development passing around true JSON Text is not a big deal and then > no "JSON_Value" API has to be exposed; thus it can be freely refined, The use of casts to json would be a very natural way to make all postgreSQL type exportable to correct JSON and would give the control over the exact conversion details to the developer. Currently we blindly use for json value whatever the text format of the field happens to be with no choice other than either do the whole conversion ourselves or then accept that some datums have unsuitable formats and need to do some ugly and possibly error-prone conversions on the client. that way madness lies - I'm pretty sure I don't want to do any of that. I want my database do the conversion for standard tyoes and give me a choice to change the conversion for _only_ the types I need, Using type specific casts to json give me exactly this. > along with related behavior - e.g., append_to_json(value json_value, location text), in 9.3 > > So, in short, all of your ideas are still valid but use "json_value" > for the data type. But, even them my guess is that you would rarely > use json_value as a column type whereas you would frequently use json > (JSON Text) for one. > json_value would be a support type to facilitate working with json in > a procedural-like way. If you are really paranoid about somebody returning json value out of postgresql why not just use a filter function which would fail if the argument is not an array or text define fail_if_not_json_text(json) returns json this gives the developers the choice to still return JSON Values to clients if they need. -- ------- Hannu Krosing PostgreSQL Unlimited Scalability and Performance Consultant 2ndQuadrant Nordic PG Admin Book: http://www.2ndQuadrant.com/books/
Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
From
David Johnston
Date:
On May 1, 2012, at 20:41, Hannu Krosing <hannu@2ndQuadrant.com> wrote: > > Most people don't work in strongly-typed environment, and thus would > work around such restriction if they need a simple JSON value at the > other end of the interchange. > > >> My personal take it is have it fail since any arbitrary decision to cast >> to JSON Text is going to make someone unhappy and supposedly they can >> modify their query so that the result generates whatever format they desire. > > Do you actually have such an experience or is it just a wild guess ? > > So even given the semantic differences between an object and a scalar I am better understanding where interpreting JSON asJSON Value makes sense. However, if I convert a record or array to JSON I expect to get a JSON Text even if the thereis only a single column or value in the input. I guess my take is that record -> JSON text while anything else is JSON value. Whether it is worth maiming the special casefor record is worthwhile I really do not know but the semantic difference does exist; and record output is a significantaspect of PostgreSQL output. I get the ease-of-use aspect but also recognize that sometimes being slightly harder to use is worthwhile if you eliminateambiguities or limit the possibility to make mistakes. FWIW my background on this topic is more theoretical than experiential though I am an web-application developer by tradeand do use some JSON in that capacity. David J.
Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
From
Hannu Krosing
Date:
On Tue, 2012-05-01 at 21:22 -0400, David Johnston wrote: > On May 1, 2012, at 20:41, Hannu Krosing <hannu@2ndQuadrant.com> wrote: > > > > Most people don't work in strongly-typed environment, and thus would > > work around such restriction if they need a simple JSON value at the > > other end of the interchange. > > > > > >> My personal take it is have it fail since any arbitrary decision to cast > >> to JSON Text For arrays and records the json text and jason value are exactly the same. it is just that json representations of simple types are officially not JSON texts. > is going to make someone unhappy and supposedly they can > >> modify their query so that the result generates whatever format they desire. > > > > Do you actually have such an experience or is it just a wild guess ? > > > > > > So even given the semantic differences between an object and a scalar > I am better understanding where interpreting JSON as JSON Value makes > sense. However, if I convert a record or array to JSON I expect to get > a JSON Text even if the there is only a single column or value in the input. Of course you will, and you will get a Json Text even for empty object or array. array[1] and 1 and {'one':1} are all different and will stay such. > I guess my take is that record -> JSON text while anything else is JSON > value. Whether it is worth maiming the special case for record is > worthwhile I really do not know but the semantic difference does exist; > and record output is a significant aspect of PostgreSQL output. I have never suggested that we special-case an 1-element record or list and start returning only the contained value for these. > I get the ease-of-use aspect but also recognize that sometimes being slightly > harder to use is worthwhile if you eliminate ambiguities or limit the > possibility to make mistakes. There are no ambiguities in what is returnded for record or array. But not being able to return JSON values via cast to json for some types or not using such casts will make extending the json support for types by user much much harder. And nonstandard. Using simple cast to json is very PostgreSQL-ish way to give support of json to any type -- ------- Hannu Krosing PostgreSQL Unlimited Scalability and Performance Consultant 2ndQuadrant Nordic PG Admin Book: http://www.2ndQuadrant.com/books/
Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
From
Hannu Krosing
Date:
On Tue, 2012-05-01 at 19:11 -0400, Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > > On Tue, May 1, 2012 at 9:56 AM, Joey Adams <joeyadams3.14159@gmail.com>wrote: > >> No, the RFC says (emphasis mine): > >> > >> A JSON *text* is a serialized object or array. > >> > >> If we let the JSON type correspond to a *value* instead, this > >> restriction does not apply, and the JSON type has a useful recursive > >> definition. > > > I think you're playing with words. But in any case, the RFC says this > > regarding generators: > > 5. Generators > > A JSON generator produces JSON text. The resulting text MUST > > strictly conform to the JSON grammar. > > I read over the RFC, and I think the only reason why they restricted > JSON texts to represent just a subset of JSON values is this cute > little hack in section 3 (Encoding): > > Since the first two characters of a JSON text will always be ASCII > characters [RFC0020], it is possible to determine whether an octet > stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking > at the pattern of nulls in the first four octets. > 00 00 00 xx UTF-32BE > 00 xx 00 xx UTF-16BE > xx 00 00 00 UTF-32LE > xx 00 xx 00 UTF-16LE > xx xx xx xx UTF-8 > > They need a guaranteed 2 ASCII characters to make that work, and > they won't necessarily get that many with a bare string literal. > > Since for our purposes there is not, and never will be, any need to > figure out whether a JSON input string is encoded in UTF16 or UTF32, > I find myself agreeing with the camp that says we might as well consider > that our JSON type corresponds to JSON values not JSON texts. I also > notice that json_in() seems to believe that already. > > However, that doesn't mean I'm sold on the idea of getting rid of > array_to_json and row_to_json in favor of a universal "to_json()" > function. In particular, both of those have optional "pretty_bool" > arguments that don't fit nicely at all in a generic conversion > function. The meaning of that flag is very closely tied to the > input being an array or record respectively. The flags probably should not be tied to specific type, as JSON is recursive and as such I think the current one-top-level-element-per row is quite limited form of pretty-printing. I have a table with a field the type of which is an array of type of another table, and what I currently get with pretty=true is hannu=# select row_to_json(test3, true) from test3; -[ RECORD 1 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- row_to_json | {"id":1, | "data3":[{"id":1,"data2":{"id":1,"data":"0.262814193032682","tstamp":"2012-04-05 13:21:03.235204"},"tstamp":"2012-04-05 13:25:03.644497"},{"id":2,"data2":{"id":2,"data":"0.157406373415142","tstamp":"2012-04-05 13:21:05.2033"},"tstamp":"2012-04-0513:25:03.644497"}], | "tstamp":"2012-04-16 14:40:15.795947"} What I would like to get what python's pprint does for the same json: >>> pprint(row) {'id': 1,'data3': [{'data2': {'data': '0.262814193032682', 'id': 1, 'tstamp': '2012-04-0513:21:03.235204'}, 'id': 1, 'tstamp': '2012-04-05 13:25:03.644497'}, {'data2': {'data':'0.157406373415142', 'id': 2, 'tstamp': '2012-04-05 13:21:05.2033'}, 'id': 2, 'tstamp': '2012-04-05 13:25:03.644497'}],'tstamp': '2012-04-16 14:40:15.795947'} If we have a pretty flag why not make it work all the way down the structure ? > I'm inclined to leave these functions as they are, and consider > adding a universal "to_json(anyelement)" (with no options) later. To achieve recursive prettyprinting the better way is to have an universal to_json(anyelement) with a prettyprinting option to_json(datum anyelement, indent int) with the behavior that if indent is NULL or negative integer no pretty-printing is done, if it is 0 printing starts at left margin and if it is a positive integer then this number of spaces is added to the left for each row (except the first one) of the json representation. And it would be overridable for specific types, so that hstore could provide its own to_json(datum hstore, indent int) which would do the correct pretty-printing for hstor-as-json_object representation. > Because it would not have options, it would not be meant to cover > cases where there's value in formatting or conversion options; > so it wouldn't render the existing functions entirely obsolete, > nor would it mean there would be no need for other specialized > conversion functions. I don't object to row_to_json() and array_to_json() functions being there as a convenience and as the two "official" functions guaranteed to return "JSON text". > regards, tom lane -- ------- Hannu Krosing PostgreSQL Unlimited Scalability and Performance Consultant 2ndQuadrant Nordic PG Admin Book: http://www.2ndQuadrant.com/books/
Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
From
Hannu Krosing
Date:
On Wed, 2012-05-02 at 12:06 -0700, Andrew Dunstan wrote: > > > On Wed, May 2, 2012 at 2:29 AM, Hannu Krosing <hannu@2ndquadrant.com> > wrote: > > > > I don't object to row_to_json() and array_to_json() functions > being > there as a convenience and as the two "official" functions > guaranteed to > return "JSON text". > > > So given that do we do anything about this now, or wait till 9.3? Sorry for missing this mail, followed this only on list I hope it is ok to CC this back to list I'd like the json support in 9.2 updated as follows Generic to_json(...) returning a "JSON value" ============================================= we should have a generic to_json(...) both for eas and use and for easy extensibility, as explained below. to_json(...) should work for all types, returning a "json value" similar to what current json_in does, but for all types, not just cstring. We could keep row_to_json() and array_to_json() as official json-text returning functions Configurable and extensible to_json() ====================================== When working on structured types, always the first try for getting an element-as-json should be running to_json(element) and only if this fails fall back to current "use text representation" code. this enables two important things 1) configurable json-coding of values not explicitly supported by standard You can read about an attempt to standardise json-date formart here http://weblogs.asp.net/bleroy/archive/2008/01/18/dates-and-json.aspx . By allowing developers just to define their own to_json(date) function we give them the power do decide which one to use. And if we honour search_path when looking up the to_json() functions, then they can even choose to have different conventions for different applications. 2) flexibility in adding support for extension types, like representing hstore as object/dict by just providing the to_json(hstore, ...) functions in hstore extension Pretty-printing =============== If we were to support prettyprinting of anything more complex than single level structs (record or array), then we need to pass "ident" into the to_json() function my recommendation would be to have the signature to_json(datum any, ident int) with ident = NULL meaning no prettyprint , ident =0 meaninf top level, or "starting at left margin" and anything else meaning the amount of spaces needed to be added to the beginning of all rows exept the first one, for example the query hannu=# select array_to_json(array(select test from test limit 2),true); array_to_json ---------------------------------------------------------------------[{"id":9,"data":"testdata","tstamp":"2012-05-01 09:44:50.175189"},+ {"id":10,"data":"testdata","tstamp":"2012-05-01 09:45:50.260276"}] (1 row) could return this: [{'data': 'testdata', 'id': 9, 'tstamp': '2012-05-01 09:44:50.175189'},{'data': 'testdata', 'id': 10, 'tstamp': '2012-05-0109:45:50.260276'}] if it would call to_json(row, 1) for getting each row prettyprinted with ident 1 Getting a record _from_ json() ============================== JSON support would be much more useful if we supported the function of converting the other way as well, that is from json to record The best model easy to support seems to be what Merlin suggested, that is the populate_record(record,hstore) function and corresponding #= operator (see: http://www.postgresql.org/docs/9.1/static/hstore.html) The complete plan for 9.2 once more =================================== What is needed to nicely round up a simple and powerful json type is 1. the "json" TYPE an agreement tha json type represents a "JSON value" (this is what json_in currently does) 2. json output 2.1 doing the right thing with internal values to_json() function for converting to this "JSON value" for any type. default implementation for 'non-json' types returnstheir postgresql textual representation in double quotes (date -> "2012-05-01 09:45:50.260276" structured types use to_json() for getting values internally, so that by defining to_json(hstore) you can automaticallyget hstore represented in javascript object or dictionary representation hannu=# select row_to_json(r) from (select 1::int id, '"foo"=>1, "bar"=>"baz"'::hstore)r; should not return this: row_to_json ------------------------------------------------------ {"id":1,"hstore":"\"bar\"=>\"baz\",\"foo\"=>\"1\""} (1 row) but this row_to_json ------------------------------------------------------ {"id":1,"hstore":{"bar": "baz", "foo":"1"}} (1 row) 2.1 getting the pretty-printing right for structured types to_json(any, indent) functions for getting recursive prettyprinting. we might also need another argument telling the "pagewidth" we want to pretty print to. 3. json input for records and arrays 3.1 json row to record a function to convert from json to record, so that we can use json also as input format. modeled after populate_record(record,hstore) from http://www.postgresql.org/docs/9.1/static/hstore.html one json_to_row should be enough, as we can use 3.2 json array to json rows if we also want to use input arrays , say send all invoice rows as a json array, we could add a unnest_json(json_array_of_rowsjson) returns table(json_row json) function and then do the inputting as insert into invoice_row select json_to_row(null::invoice_row, json_row) from (select json_row as unnest_json(json_array_of_rows))s; > cheers > > andrew -- ------- Hannu Krosing PostgreSQL Unlimited Scalability and Performance Consultant 2ndQuadrant Nordic PG Admin Book: http://www.2ndQuadrant.com/books/
Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
From
Tom Lane
Date:
Hannu Krosing <hannu@2ndQuadrant.com> writes: > On Wed, 2012-05-02 at 12:06 -0700, Andrew Dunstan wrote: >> So given that do we do anything about this now, or wait till 9.3? > I'd like the json support in 9.2 updated as follows I think it's too late to be entertaining proposals for such changes in 9.2. If we had concluded that the existing functions were actively wrong or a bad idea, then of course we'd need to do something; but they are not, so we can just as well consider additions in the 9.3 cycle rather than now. I am not convinced that this proposal is fully baked yet, anyway; not to mention that right now we need to have our heads down on resolving the remaining open issues, not designing, implementing, and reviewing a pile of brand new code for json. > By allowing developers just to define their own to_json(date) function > we give them the power do decide which one to use. And if we honour > search_path when looking up the to_json() functions, then they can even > choose to have different conventions for different applications. This is not going to work anywhere near as nicely as you think. If somebody tries to define multiple to_json() functions that override a generic to_json(anyelement) one, he will start getting "function is not unique" parse failures. The parser will only successfully decide which function to call when the input data type exactly matches one of the specialized functions, which means you might as well not have the generic one at all. regards, tom lane
Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
From
Andrew Dunstan
Date:
On 05/04/2012 09:52 AM, Tom Lane wrote: > Hannu Krosing<hannu@2ndQuadrant.com> writes: >> On Wed, 2012-05-02 at 12:06 -0700, Andrew Dunstan wrote: >>> So given that do we do anything about this now, or wait till 9.3? >> I'd like the json support in 9.2 updated as follows > I think it's too late to be entertaining proposals for such changes in > 9.2. If we had concluded that the existing functions were actively > wrong or a bad idea, then of course we'd need to do something; but they > are not, so we can just as well consider additions in the 9.3 cycle > rather than now. I am not convinced that this proposal is fully baked > yet, anyway; not to mention that right now we need to have our heads > down on resolving the remaining open issues, not designing, > implementing, and reviewing a pile of brand new code for json. Yeah, that was my feeling. We usually take a release or two to get things right, fill in what's missing, etc. and I don't think this will be ant different. > >> By allowing developers just to define their own to_json(date) function >> we give them the power do decide which one to use. And if we honour >> search_path when looking up the to_json() functions, then they can even >> choose to have different conventions for different applications. > This is not going to work anywhere near as nicely as you think. If > somebody tries to define multiple to_json() functions that override a > generic to_json(anyelement) one, he will start getting "function is not > unique" parse failures. The parser will only successfully decide which > function to call when the input data type exactly matches one of the > specialized functions, which means you might as well not have the > generic one at all. > > Yeah, what I've been thinking about in conjunction with similar problems is some sort of type registry, so that we could code for non-builtin types in certain cases. Maybe we should add that the the developers' meeting agenda. cheers andrew
Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes: > Yeah, what I've been thinking about in conjunction with similar problems > is some sort of type registry, so that we could code for non-builtin > types in certain cases. Maybe we should add that the the developers' > meeting agenda. Maybe. I don't want to see a json-specific hack for this, but some sort of generic way to add type knowledge could be useful, if we could figure out what we want. regards, tom lane
Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
From
Robert Haas
Date:
On Fri, May 4, 2012 at 12:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> Yeah, what I've been thinking about in conjunction with similar problems >> is some sort of type registry, so that we could code for non-builtin >> types in certain cases. Maybe we should add that the the developers' >> meeting agenda. > > Maybe. I don't want to see a json-specific hack for this, but some sort > of generic way to add type knowledge could be useful, if we could figure > out what we want. For this particular case, I think you just need some place to store a pg_type -> pg_proc mapping. I'm not exactly sure how to make that not a JSON-specific hack, since I certainly don't think we'd want to add a new catalog just for that. In general, I think it would be very useful to have some way of identifying particular types - and versions of types - independently of a particular installation - e.g. by assigning each type a UUID that never changes and a version number that we bump when we change something about that type. That seems like it would be very useful for schema comparison tools, or for logical replication, where you want to know whether two types are "the same type" even though they are in different clusters. pg_upgrade has had past needs in this area as well. However, I'm not sure that'd help solve this particular problem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
From
Hannu Krosing
Date:
On Fri, 2012-05-04 at 13:43 -0400, Robert Haas wrote: > On Fri, May 4, 2012 at 12:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Andrew Dunstan <andrew@dunslane.net> writes: > >> Yeah, what I've been thinking about in conjunction with similar problems > >> is some sort of type registry, so that we could code for non-builtin > >> types in certain cases. Maybe we should add that the the developers' > >> meeting agenda. > > > > Maybe. I don't want to see a json-specific hack for this, but some sort > > of generic way to add type knowledge could be useful, if we could figure > > out what we want. > > For this particular case, I think you just need some place to store a > pg_type -> pg_proc mapping. I'm not exactly sure how to make that not > a JSON-specific hack, since I certainly don't think we'd want to add a > new catalog just for that. This was my initial proposal to have casts to ::json for all types. I backed out from this in favot of generic to_json(datum, indent) in order to support prettyprinting. > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
From
Hannu Krosing
Date:
On Fri, 2012-05-04 at 09:52 -0400, Tom Lane wrote: > Hannu Krosing <hannu@2ndQuadrant.com> writes: > > On Wed, 2012-05-02 at 12:06 -0700, Andrew Dunstan wrote: > >> So given that do we do anything about this now, or wait till 9.3? > > > I'd like the json support in 9.2 updated as follows > > I think it's too late to be entertaining proposals for such changes in > 9.2. If we had concluded that the existing functions were actively > wrong or a bad idea, I think that hard-coding "postgresql text" representation as our json representation without a possibility for the user tio easily fix it without rewriting foll xx_to_json() functions is borderline "actively wrong". Can we at least have the xxx_to_json() functions try cast to json first and fall back to text if the cast fails. This would address my worst problem, all the rest can be easily defined in user functions. > then of course we'd need to do something; but they > are not, so we can just as well consider additions in the 9.3 cycle > rather than now. I am not convinced that this proposal is fully baked > yet, anyway; not to mention that right now we need to have our heads > down on resolving the remaining open issues, not designing, > implementing, and reviewing a pile of brand new code for json. > > > By allowing developers just to define their own to_json(date) function > > we give them the power do decide which one to use. And if we honour > > search_path when looking up the to_json() functions, then they can even > > choose to have different conventions for different applications. > > This is not going to work anywhere near as nicely as you think. If > somebody tries to define multiple to_json() functions that override a > generic to_json(anyelement) one, he will start getting "function is not > unique" parse failures. The parser will only successfully decide which > function to call when the input data type exactly matches one of the > specialized functions, which means you might as well not have the > generic one at all. > > regards, tom lane >
Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
From
Robert Haas
Date:
On Fri, May 4, 2012 at 3:49 PM, Hannu Krosing <hannu@krosing.net> wrote: > On Fri, 2012-05-04 at 09:52 -0400, Tom Lane wrote: >> Hannu Krosing <hannu@2ndQuadrant.com> writes: >> > On Wed, 2012-05-02 at 12:06 -0700, Andrew Dunstan wrote: >> >> So given that do we do anything about this now, or wait till 9.3? >> >> > I'd like the json support in 9.2 updated as follows >> >> I think it's too late to be entertaining proposals for such changes in >> 9.2. If we had concluded that the existing functions were actively >> wrong or a bad idea, > > I think that hard-coding "postgresql text" representation as our json > representation without a possibility for the user tio easily fix it > without rewriting foll xx_to_json() functions is borderline "actively > wrong". > > Can we at least have the xxx_to_json() functions try cast to json first > and fall back to text if the cast fails. I think the idea that you can involve the casting machinery in this is misguided. sometextval::json has got to mean that sometextval is expected to be in the form of a syntactically correct JSON value - and NOT that we wrap it in a JSON string. We can have constructors for JSON, but they've got to be separate from the casting machinery. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, May 4, 2012 at 3:49 PM, Hannu Krosing <hannu@krosing.net> wrote: >> Can we at least have the xxx_to_json() functions try cast to json first >> and fall back to text if the cast fails. > I think the idea that you can involve the casting machinery in this is > misguided. It is possible that that can be made to work, but it's a research project, not something to be crammed into 9.2 at the last possible minute. In any case, I really dislike the idea that array_to_json and row_to_json would contain two entirely different behaviors. Leave the extensibility ideas for a future to_json() function. regards, tom lane
Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
From
Hannu Krosing
Date:
On Fri, 2012-05-04 at 15:59 -0400, Robert Haas wrote: > On Fri, May 4, 2012 at 3:49 PM, Hannu Krosing <hannu@krosing.net> wrote: > > On Fri, 2012-05-04 at 09:52 -0400, Tom Lane wrote: > >> Hannu Krosing <hannu@2ndQuadrant.com> writes: > >> > On Wed, 2012-05-02 at 12:06 -0700, Andrew Dunstan wrote: > >> >> So given that do we do anything about this now, or wait till 9.3? > >> > >> > I'd like the json support in 9.2 updated as follows > >> > >> I think it's too late to be entertaining proposals for such changes in > >> 9.2. If we had concluded that the existing functions were actively > >> wrong or a bad idea, > > > > I think that hard-coding "postgresql text" representation as our json > > representation without a possibility for the user tio easily fix it > > without rewriting foll xx_to_json() functions is borderline "actively > > wrong". > > > > Can we at least have the xxx_to_json() functions try cast to json first > > and fall back to text if the cast fails. > > I think the idea that you can involve the casting machinery in this is > misguided. sometextval::json has got to mean that sometextval is > expected to be in the form of a syntactically correct JSON value - and > NOT that we wrap it in a JSON string. We can have constructors for > JSON, but they've got to be separate from the casting machinery. on the contrary - the string representation of textual value a is "a" casting should _not_ neam syntax check, casting is by definition a conversion. if we cast text to int, we return value of type int , if we cast int to numeric(5,2) we return value of type numeric(5,2) why should casring to json work differntly ? > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
From
"David Johnston"
Date:
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers- > owner@postgresql.org] On Behalf Of Hannu Krosing > Sent: Friday, May 04, 2012 4:40 PM > To: Robert Haas > Cc: Tom Lane; Andrew Dunstan; PostgreSQL-development; Merlin Moncure > Subject: Re: [HACKERS] JSON in 9.2 - Could we have just one to_json() > function instead of two separate versions ? > > On Fri, 2012-05-04 at 15:59 -0400, Robert Haas wrote: > > On Fri, May 4, 2012 at 3:49 PM, Hannu Krosing <hannu@krosing.net> > wrote: > > > On Fri, 2012-05-04 at 09:52 -0400, Tom Lane wrote: > > >> Hannu Krosing <hannu@2ndQuadrant.com> writes: > > >> > On Wed, 2012-05-02 at 12:06 -0700, Andrew Dunstan wrote: > > >> >> So given that do we do anything about this now, or wait till 9.3? > > >> > > >> > I'd like the json support in 9.2 updated as follows > > >> > > >> I think it's too late to be entertaining proposals for such changes > > >> in 9.2. If we had concluded that the existing functions were > > >> actively wrong or a bad idea, > > > > > > I think that hard-coding "postgresql text" representation as our > > > json representation without a possibility for the user tio easily > > > fix it without rewriting foll xx_to_json() functions is borderline > > > "actively wrong". > > > > > > Can we at least have the xxx_to_json() functions try cast to json > > > first and fall back to text if the cast fails. > > > > I think the idea that you can involve the casting machinery in this is > > misguided. sometextval::json has got to mean that sometextval is > > expected to be in the form of a syntactically correct JSON value - and > > NOT that we wrap it in a JSON string. We can have constructors for > > JSON, but they've got to be separate from the casting machinery. > > on the contrary - the string representation of textual value a is "a" > > casting should _not_ neam syntax check, casting is by definition a conversion. > > if we cast text to int, we return value of type int , if we cast int to > numeric(5,2) we return value of type numeric(5,2) > > why should casring to json work differntly ? > What is the distinction between what you are thinking regarding JSON and this example? SELECT '1a'::integer; SQL Error: ERROR: invalid input syntax for integer: "1a" LINE 1: SELECT '1a'::integer As a user if I cast something to something else I want the result to be of the correct type and deterministic; otherwisethrow me some kind of "invalid input format" exception (or syntax exception). Casting vs. Constructors is reallya meaningless distinction to a lay person. When I cast I do so by constructing a new value using my existing valuefor input. When I use an explicit CAST I am unable to supply additional parameters to configure the casting whereasa constructor function gives me that possibility. But a constructor function without any additional parameters isnot semantically different than a cast. I guess the concern to address is something like: SELECT '{key: value}'::json OR SELECT '[1.25]'::json; Do you interpret this as already being valid JSON and thus outputobject/array constructs (i.e., JSON Text) or do you treat them as string literals and output scalars (i.e., JSON Value). Even if you feel these are artificial constructs the concepts holds that there may be ambiguous data that can beinterpreted in multiple ways (this applies even to function forms, though in the function form you could specify whichone you want to output using a separate DEFAULTed parameter). I can see the primary use-case for JSON Value casting as being queries of the following forms (since the record and arrayforms are going through the record/array_to_json function): SELECT COUNT(*)::json FROM table [WHERE ...]; SELECT single_col::json FROM table WHERE id = ?; Where the single provided value can be sent directly back to the web-caller JavaScript and used as-is because it is validJSON. Though, honestly, both SELECT to_json(single_col) and SELECT to_json(COUNT(*)) are equally usable so any distinctionbetween them is a pure technical issue to me. Am I correct in assuming the following expected behavior (the forgive the blatantly wrong syntax but you should get the point)? RAISE NOTICE '%', SELECT 'A'::text => A RAISE NOTICE '%', SELECT 'A'::json => "A" David J
Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
From
Hannu Krosing
Date:
On Fri, 2012-05-04 at 15:59 -0400, Robert Haas wrote: > On Fri, May 4, 2012 at 3:49 PM, Hannu Krosing <hannu@krosing.net> wrote: > > On Fri, 2012-05-04 at 09:52 -0400, Tom Lane wrote: > >> Hannu Krosing <hannu@2ndQuadrant.com> writes: > >> > On Wed, 2012-05-02 at 12:06 -0700, Andrew Dunstan wrote: > >> >> So given that do we do anything about this now, or wait till 9.3? > >> > >> > I'd like the json support in 9.2 updated as follows > >> > >> I think it's too late to be entertaining proposals for such changes in > >> 9.2. If we had concluded that the existing functions were actively > >> wrong or a bad idea, > > > > I think that hard-coding "postgresql text" representation as our json > > representation without a possibility for the user tio easily fix it > > without rewriting foll xx_to_json() functions is borderline "actively > > wrong". > > > > Can we at least have the xxx_to_json() functions try cast to json first > > and fall back to text if the cast fails. > > I think the idea that you can involve the casting machinery in this is > misguided. sometextval::json has got to mean that sometextval is > expected to be in the form of a syntactically correct JSON value - and > NOT that we wrap it in a JSON string. PostgreSQL CAST is different from casting a C pointer, they actually perform a conversion when defined WITH FUNCTION or WITH INOUT. And they pass value unchanged when defined WITHOUT FUNCTION. Casts _do_not_ perform syntax checks, they assume the source to be of the type castted from and they return result of the target type. I think you are confusing input/output formatting with CAST here. This confusion about cast-as-syntax-check probably qualifies as something being "actively wrong" with current implementation, though it does not seem to be something actively defined (there are no casts defined for json type) but rather an artifact of how postgresql input works. CAST is something that should convert one type to another, in this case a textual type to its "json value" representation and back. 'sometext'::text::json --> '"sometext"' and '"sometext"'::json::text --> 'sometext' the suggested "syntax check only" should only be done by the type i/o functions json_in(cstring) and json_recv(internal) and not casts. Casts should do "casting", that in PostgreSQL means type conversion > We can have constructors for > JSON, but they've got to be separate from the casting machinery. Currently we do have constructors - json_in(csting) and json_recv(internal) These are the ones that should and do check for correct syntax. Some more confusiong examples for pondering on cast vs i/o functions -------------------------------------------------------------------- Some of the difficulties of understanding and explaining what a json type should be are exemplified in the following hannu=# create table jtest(plaintext text, jsontext json); CREATE TABLE hannu=# insert into jtest values('"A"','"A"'); hannu=# insert into jtest values('true','true'); hannu=# insert into jtest values('null','null'); hannu=# insert into jtest values(null,null); hannu=# select row_to_json(jtest) from jtest; row_to_json --------------------------------------{"plaintext":"\"A\"","jsontext":"A"}{"plaintext":"true","jsontext":true}{"plaintext":"null","jsontext":null}{"plaintext":null,"jsontext":null} (4 rows) hannu=# insert into jtest values('a','a'); ERROR: invalid input syntax for type json LINE 1: insert into jtest values('a','a'); ^ DETAIL: line 1: Token "a" is invalid. -- ------- Hannu Krosing PostgreSQL Unlimited Scalability and Performance Consultant 2ndQuadrant Nordic PG Admin Book: http://www.2ndQuadrant.com/books/
Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
From
Hannu Krosing
Date:
On Fri, 2012-05-04 at 16:12 -0400, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Fri, May 4, 2012 at 3:49 PM, Hannu Krosing <hannu@krosing.net> wrote: > >> Can we at least have the xxx_to_json() functions try cast to json first > >> and fall back to text if the cast fails. > > > I think the idea that you can involve the casting machinery in this is > > misguided. > It is possible that that can be made to work, but it's a research > project, not something to be crammed into 9.2 at the last possible > minute. In any case, I really dislike the idea that array_to_json > and row_to_json would contain two entirely different behaviors. > Leave the extensibility ideas for a future to_json() function. I did not mean that array_to_json and row_to_json would be different than the generic to_json, just that they would be thin wrappers around the to_json function which check that the arguments are of the correct types for casting to "JSON text". They need to recurse to each other and generic to_json anyway. > regards, tom lane -- ------- Hannu Krosing PostgreSQL Unlimited Scalability and Performance Consultant 2ndQuadrant Nordic PG Admin Book: http://www.2ndQuadrant.com/books/
Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
From
Peter Eisentraut
Date:
On fre, 2012-05-04 at 13:43 -0400, Robert Haas wrote: > For this particular case, I think you just need some place to store a > pg_type -> pg_proc mapping. I'm not exactly sure how to make that not > a JSON-specific hack, since I certainly don't think we'd want to add a > new catalog just for that. I think you're thinking of pg_cast.
Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
From
Peter Eisentraut
Date:
On fre, 2012-05-04 at 12:30 -0400, Andrew Dunstan wrote: > Yeah, what I've been thinking about in conjunction with similar > problems is some sort of type registry, so that we could code for > non-builtin types in certain cases. It certainly seems to come up a lot, but I'm not sure whether the two main use cases -- mapping types into languages, and mapping types into JSON (or XML, as an alternative) -- would have the same solution. A third use case that could be included as well is changing the main text or binary format of a type (e.g., what datestyle does). So instead of having just a hard-coded set of typinput/typoutput, typrecv/typsend, you'd have a catalog of (context, inputfunc, outputfunc) and the context would a language, something json or xml, or a user-settable value. This could become quite complicated, but it sure could solve a lot of issues in one go.
Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
From
Peter Eisentraut
Date:
On fre, 2012-05-04 at 15:59 -0400, Robert Haas wrote: > > Can we at least have the xxx_to_json() functions try cast to json > first > > and fall back to text if the cast fails. > > I think the idea that you can involve the casting machinery in this is > misguided. sometextval::json has got to mean that sometextval is > expected to be in the form of a syntactically correct JSON value - and > NOT that we wrap it in a JSON string. I think it's only wrong if you try casting first and fall back to text. Otherwise it could work, if the set of all json casts is defined consistently.
Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
From
Hannu Krosing
Date:
On Sat, 2012-05-05 at 12:16 +0300, Peter Eisentraut wrote: > On fre, 2012-05-04 at 15:59 -0400, Robert Haas wrote: > > > Can we at least have the xxx_to_json() functions try cast to json > > first > > > and fall back to text if the cast fails. > > > > I think the idea that you can involve the casting machinery in this is > > misguided. sometextval::json has got to mean that sometextval is > > expected to be in the form of a syntactically correct JSON value - and > > NOT that we wrap it in a JSON string. > > I think it's only wrong if you try casting first and fall back to text. > Otherwise it could work, if the set of all json casts is defined > consistently. Currently the default "cast" for non-number, non-bool, not-already-json, non-null values is to wrap text representation in double quotes. So casting first then fall back to _quoted_ text is wrong only for those types which have a very ugly text representation :) -- ------- Hannu Krosing PostgreSQL Unlimited Scalability and Performance Consultant 2ndQuadrant Nordic PG Admin Book: http://www.2ndQuadrant.com/books/
Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
From
Tom Lane
Date:
Hannu Krosing <hannu@2ndQuadrant.com> writes: > CAST is something that should convert one type to another, in this case > a textual type to its "json value" representation and back. > 'sometext'::text::json --> '"sometext"' > and > '"sometext"'::json::text --> 'sometext' Well, that's a pretty interesting example, because if you expect that to work like that, then what should happen with this? '{"f1":4,"f2":2}'::text::json'{"f1":4,"f2":2}'::json::text Does the first of these produce a JSON object, or a quoted string? At the moment you get the former, but it's pretty hard to argue that it shouldn't produce a quoted string if transparent conversion is the expectation. In the second case, do you end up with a JSON text (which is what happens at the moment) or does it just slap some quotes around the value as a string? I'm not convinced that you've made a principled argument as to what should happen when. In general, I think casts should only be used for conversions where there is just one unsurprising choice of behavior, since the cast syntax by definition doesn't provide any room for options. It's not clear to me that JSON conversions are so obvious as to meet that standard. If you start throwing random user-defined conversions into the mix, it's even less obvious that there's only one unsurprising choice. In the particular case of casts to and from text, we've essentially set a project policy that those should behave equivalently to the type's I/O conversion functions whenever possible. So I think the existing behavior of those operations is correct and what you propose above is wrong. There is certainly scope for a conversion function that takes any random text string and produces a JSON quoted string from it, but the cast operator is not the place for that. regards, tom lane