Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ? - Mailing list pgsql-hackers
| From | Hannu Krosing | 
|---|---|
| Subject | Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ? | 
| Date | |
| Msg-id | 1335919274.3106.343.camel@hvost Whole thread Raw | 
| In response to | Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ? ("David Johnston" <polobo@yahoo.com>) | 
| Responses | Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ? | 
| List | pgsql-hackers | 
On Tue, 2012-05-01 at 18:35 -0400, David Johnston wrote: > > -----Original Message----- > > From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers- > > owner@postgresql.org] On Behalf Of Hannu Krosing > > Sent: Tuesday, May 01, 2012 5:29 PM > > > > The reason I am whining now is that with minor adjustments in > > implementation it could all be made much more powerful (try cast to ::json > > for values before cast to ::text) and much more elegant thanks to > > PostgreSQL's built in casting. > > > > If we allowed json to hold any "JSON value" and tried ::json when generating > > json for compound types than we would be able to claim that PostgreSQL > > supports JSON everywhere, defaulting to representing officially unsupported > > types as strings, but allowing users to convert these to their preferred > > conventions. > > I get that a JSON Text is always also a JSON Value but the reverse is not true. > Thus, if we define JSON to be JSON Value we cannot guarantee that the encoded > value is a possible JSON Text - the most important property for purposes of > data interchange. Nope, the most important property for purposes of data interchange is that we produce something that the client expects and can understand without too much extra work on client side. The way to "guarantee" JSON Text is to encode objects that produce it. I see nothing wrong with returning either a complex JSON object of simply null if the object could not be found. > > I'd also prefer to have default conversions already included for some of our > > sexier types, like intervals (just a two element array) and hstore (an object) > > etc. > > Interval is not simply 2 values but also denotes whether the particular value > is inclusive or exclusive; you would have to use an object unless you transmit > in a text format and let the target perform the necessary interpretation of > the string. if you need that info to be passed to _your_ caller you just define a ned cast for youtr interval-to-json which returns "object" notation. It was meant as a sample of what could be included by having generic json values and using json casts. > > Suddenly we would be the best match database for Web development and > > all things Ajax and also have a widely used built in and adjustable interchange > > format to outer world. > > > > > Second, RFC 4627 is absolutely clear: a valid JSON value can only be > > > an object or an array, so this thing about converting arbitrary datum > > > values to JSON is a fantasy. > > > > Probably a typo on your part - valid "JSON _text_" is object or array, valid > > "JSON value" can also be number, text, true, false and null > > > > What I am arguing for is interpreting our json type as representing a "JSON > > value" not "JSON text", this would enable users to adjust and extend the > > generation of json values via defining casts for their specific types - most > > notably Date* types but also things like hstore, which has a natural JSON > > representation as "object" (a list of key:value pairs for non-js users, a.k.a. a > > dictionary, hash, etc.) > > Aside from the fact it is likely too late to change the interpretation I would argue against doing so in any case. > > Currently, the idea is to get your result all lined up and ready to go and > then ship it off to the caller as valid JSON so that the caller does not > have to do so itself. "Valid JSON" is only a small part of the equation, as I quoted before from "JSON: The Fat-Free Alternative to XML" at http://www.json.org/fatfree.html "JSON has no validator. Being well-formed and valid is not the same as being correct and relevant. Ultimately, every application is responsible for validating its inputs." If we produce correct JSON text for things that are converible to JSON text then we should be free to produce JSON values for simple value, like everybody else (that is Javascript , python, ruby, ...) I don't think it is postgreSQL's business to start educating people about "correct" way to do JSON serialisation when everybody else does it the generic way. > Answering the question "what would this value look like if it was part of > a json output?" is good; however, production use is likely to mostly care > about the entire json interchange construct (i.e., JSON Text) the "what would it look like" part is important for values that are not covered by standard and are thus encoded as text. These need to follow conventions outside the JSON spec proper, and thus may need to be adjusted by the developer. Doing it via ::json casts would be the cleanest and simplest way to deal with it. > So: json -> json_text; > > A JSON Value always has a textual representation but if we were to have an > actual type it would make sense to encode it such that (strings, objects > and arrays) are delimited while (numbers, false, true, and null) are not. And so it is, what are you trying to say here ? > Type Name: json_value > > Output Representations (all output surrounded by double-quotes since all are string-like) - > String: "'VALUE'" (single-quote delimiter) strings are surrounded by "" not '' > Object: "{...}" > Array: "[]" > Number: "0.00" "0.00" is a sting according to JSON spec also, objects and arrays don't have surrounding "" > Other: "false", "true", "null" that is not what the standard says - "false" is not the same as false without quotes - the first is string 'false', the second one is boolean untrue > JSON is fundamentally an interchange format (especially from a database's perspective). > JSON Values only really have meaning if they are attached explicitly to a JSON Text > structure, if you wanted to store one independently you should convert it into a > native representation first. The few exceptions to this would be sufficiently > handled via plain text with meta-data indicating that the stored value is structured > in directly JSON compatible syntax. This is something that each developer could define for his specific app by writing an appropriate ::json cast function for the datatype, if such casts were honoured while doing the eoutput conversion. There is no one standard for mosts "outside jsons scope" datatypes, > In short, the default context for JSON in > PostgreSQL should JSON Text (not JSON Value) and thus the unadorned "json" should > reflect this default (which it does). I don't think postgreSQL has hierarchical types, so that json_text would also be json_value, but not the other way. I'm perfectly happy with us being flexible enough to allow people to also generate non-standard JSON - as some of them most likely are now - if it makes the whole system cleaner and easier to use. I don't see why PostgreSQL's JSON should be more restrictive than most others. > > > If anything, we should adjust the JSON input routines to disallow > > > anything else, rather than start to output what is not valid JSON. > > > > I tested python, ruby and javascript in firefox and chrome, all their JSON > > generators generate 1 for standalone integer 1 and "a" for standalone string > > a , and none refused to convert either to JSON. > > > > Assume that we keep the meaning of json to be JSON Text; what would you > suggest occurs if someone attempts a datum -> json cast? I would not assume such thing ;) > Given that we are working in a strongly-typed environment the meaning of > JSON cannot be changed and so either the cast has to output valid JSON Text > or it has to fail. Most people don't work in strongly-typed environment, and thus would work around such restriction if they need a simple JSON value at the other end of the interchange. They would either do like I did and use array_to_json(array[datum]) and then strip off the [] before shipping the JSON in the most likely case that receiver side _wants_ to get JSON value and has been getting a JSON value from other partners, or it may decode it as array and then use the only element. > My personal take it is have it fail since any arbitrary decision to cast > to JSON Text is going to make someone unhappy and supposedly they can > modify their query so that the result generates whatever format they desire. Do you actually have such an experience or is it just a wild guess ? Would the one being "unhappy" be some developer who is actually using JSON, or somebody whoi has only academic interest in RFCs ? > I haven't followed the JSON development in 9.2 too closely but exposing > whatever conversion mechanism is currently used to generate JSON makes > sense from a ease-of-development standpoint. But even then, during > development passing around true JSON Text is not a big deal and then > no "JSON_Value" API has to be exposed; thus it can be freely refined, The use of casts to json would be a very natural way to make all postgreSQL type exportable to correct JSON and would give the control over the exact conversion details to the developer. Currently we blindly use for json value whatever the text format of the field happens to be with no choice other than either do the whole conversion ourselves or then accept that some datums have unsuitable formats and need to do some ugly and possibly error-prone conversions on the client. that way madness lies - I'm pretty sure I don't want to do any of that. I want my database do the conversion for standard tyoes and give me a choice to change the conversion for _only_ the types I need, Using type specific casts to json give me exactly this. > along with related behavior - e.g., append_to_json(value json_value, location text), in 9.3 > > So, in short, all of your ideas are still valid but use "json_value" > for the data type. But, even them my guess is that you would rarely > use json_value as a column type whereas you would frequently use json > (JSON Text) for one. > json_value would be a support type to facilitate working with json in > a procedural-like way. If you are really paranoid about somebody returning json value out of postgresql why not just use a filter function which would fail if the argument is not an array or text define fail_if_not_json_text(json) returns json this gives the developers the choice to still return JSON Values to clients if they need. -- ------- Hannu Krosing PostgreSQL Unlimited Scalability and Performance Consultant 2ndQuadrant Nordic PG Admin Book: http://www.2ndQuadrant.com/books/
pgsql-hackers by date: