Thread: data to json enhancements
Drawing together various discussions both here and elsewhere (e.g. the PostgresOpen hallway track) I propose to work on the following: 1. make datum_to_json() honor a type's cast to json if it exists. The fallback is to use the type's string representation, as now. 2. add a cast hstore -> json (any others needed for core / contrib types ?) 3. add a to_json(anyelement) function 4. add a new aggregate function json_agg(anyrecord) -> json to simplify and make more effecient turning a resultset into json. Comments welcome. I also propose to work on some accessor functions to pull data out of json (as opposed to producing json from non-json). I will email separately about that when I have firmed up a proposed API a bit more. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Drawing together various discussions both here and elsewhere (e.g. the > PostgresOpen hallway track) I propose to work on the following: > 1. make datum_to_json() honor a type's cast to json if it exists. The > fallback is to use the type's string representation, as now. > 2. add a cast hstore -> json (any others needed for core / contrib types ?) > 3. add a to_json(anyelement) function > 4. add a new aggregate function json_agg(anyrecord) -> json to simplify > and make more effecient turning a resultset into json. > Comments welcome. ISTM the notion of to_json(anyelement) was already heavily discussed and had spec-compliance issues ... in fact, weren't you one of the people complaining? What exactly does #3 mean that is different from the previous thread? Also, on reflection I'm not sure about commandeering cast-to-json for this --- aren't we really casting to "json member" or something like that? The distinction between a container and its contents seems important here. With a container type as source, it might be important to do something different if we're coercing it to a complete JSON value versus something that will be just one member. I'm handwaving here because I don't feel like going back to re-read the RFC, but it seems like something that should be considered carefully before we lock down an assumption that there can never be a difference. regards, tom lane
On 09/26/2012 01:46 PM, Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> Drawing together various discussions both here and elsewhere (e.g. the >> PostgresOpen hallway track) I propose to work on the following: >> 1. make datum_to_json() honor a type's cast to json if it exists. The >> fallback is to use the type's string representation, as now. >> 2. add a cast hstore -> json (any others needed for core / contrib types ?) >> 3. add a to_json(anyelement) function >> 4. add a new aggregate function json_agg(anyrecord) -> json to simplify >> and make more effecient turning a resultset into json. >> Comments welcome. > ISTM the notion of to_json(anyelement) was already heavily discussed and > had spec-compliance issues ... in fact, weren't you one of the people > complaining? What exactly does #3 mean that is different from the > previous thread? I thought I got shouted down on that issue. The main reason we didn't include it was that it was getting rather late for those changes, IIRC - we only just got in any json stuff at all in under the wire. And in fact you can have a json value now that's not an array or object: andrew=# select json '1' as num, json '"foo"' as txt; num | txt -----+------- 1 | "foo" > > Also, on reflection I'm not sure about commandeering cast-to-json for > this --- aren't we really casting to "json member" or something like > that? The distinction between a container and its contents seems > important here. With a container type as source, it might be important > to do something different if we're coercing it to a complete JSON > value versus something that will be just one member. I'm handwaving > here because I don't feel like going back to re-read the RFC, but > it seems like something that should be considered carefully before > we lock down an assumption that there can never be a difference. I think in view of the above this would be moot, no? cheers andrew
On 09/26/2012 06:46 PM, Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> Drawing together various discussions both here and elsewhere (e.g. the >> PostgresOpen hallway track) I propose to work on the following: >> 1. make datum_to_json() honor a type's cast to json if it exists. The >> fallback is to use the type's string representation, as now. >> 2. add a cast hstore -> json (any others needed for core / contrib types ?) >> 3. add a to_json(anyelement) function >> 4. add a new aggregate function json_agg(anyrecord) -> json to simplify >> and make more effecient turning a resultset into json. >> Comments welcome. > ISTM the notion of to_json(anyelement) was already heavily discussed and > had spec-compliance issues ... in fact, weren't you one of the people > complaining? What exactly does #3 mean that is different from the > previous thread? > > Also, on reflection I'm not sure about commandeering cast-to-json for > this --- aren't we really casting to "json member" or something like > that? The distinction between a container and its contents seems > important here. With a container type as source, it might be important > to do something different if we're coercing it to a complete JSON > value versus something that will be just one member. I'm handwaving > here because I don't feel like going back to re-read the RFC, but > it seems like something that should be considered carefully before > we lock down an assumption that there can never be a difference. "json value" is a superset of "json object", so no special handling should be required here - they nest cleanly. (you can check http://www.json.org/ for definition.) I agree that one of the standards did say that "JSON generators" should produce only JSON-serialised arrays and dictionaries and not "JSON values" - that is one of (literal null, true or false or a json array, dictionary, number or double-quoted string) But if we would do that we would really be the _only_ one who would place this restriction on their to_json function. As I already reported in the previous discussion, neither python, ruby or neither of the two javascripts I tested (mozilla & chrome) place this restriction on their json serialisation functions. Maybe the "JSON generator" in the standard means something else, like a stand-alone program or server. The only mention/definition for Generator in the RFC is "5. Generators A JSON generator produces JSON text. The resulting text MUST strictly conform to the JSON grammar." To be formally standards compliant I propose that we officially define our json type to store "a json value" and as json object is really just a subset of json value we can provide (or just advise the use of) a domain json_object, which has CHECK for the first non-whitespace char being { or [ . > regards, tom lane > >
On Wed, Sep 26, 2012 at 1:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Also, on reflection I'm not sure about commandeering cast-to-json for > this --- aren't we really casting to "json member" or something like > that? The distinction between a container and its contents seems > important here. With a container type as source, it might be important > to do something different if we're coercing it to a complete JSON > value versus something that will be just one member. I'm handwaving > here because I don't feel like going back to re-read the RFC, but > it seems like something that should be considered carefully before > we lock down an assumption that there can never be a difference. I feel like there are two different behaviors that someone might want here, and a cast cannot mean both. 1. Please reinterpret the existing value that I have already got as a JSON object. For example, you might have a text field in which you have been storing JSON values. Once you upgrade to 9.2, you might want to reinterpret the existing contents of the field - which are already valid JSON - as JSON objects. 2. Please convert the value that I have into a JSON object according to a type-specific rule. For example, you might have a text field in which you store arbitrary strings. But perhaps you need to store structured data there, so once you upgrade to 9.2 you might want to wrap up your strings inside JSON strings. Now there is some subtle ambiguity here because in some cases the behavior can be exactly the same in both cases. For example most numeric values will get the same treatment either way, but NaN cannot.If you do mynumeric::json, interpretation #1 will failfor NaN but interpretation #2 will probably produce something like "NaN". Similarly if the type is boolean, we could likely get away with producing true and false for either interpretation. If the type is hstore, then #1 is going to fail, but #2 is going to convert "1"=>"2" to {"1":"2"}. So in general it might seem that #2 is the better interpretation, because it gives many casts a sensible interpretation that is otherwise lacking. But, what about text? It seems to me that users will count on the fact that '[1,2,3]'::text::json is going to produce [1,2,3] (a JSON array containing the first three numbers) and NOT "[1,2,3]" (a JSON string containing 7 characters). And that is emphatically interpretation #1. I think it would be an extremely bad idea to decide that casts should have interpretation #2 for all data types except things that are kind of like text, which should instead behave like #1. And if we standardize completely on interpretation #2, then I think that '[1,2,3]'::json will end up meaning something different from '[1,2,3]'::text::json, because the former will (IIUC) go through the type-input function and end up creating a JSON array, whereas the latter will go through the cast function and end up creating a JSON string. It would also mean that in more complex queries, you could get substantially different behavior in simple cases where the parser maintains literals as unknown vs. more complex cases where it decides that they must be text for lack of a full type inference system. Maybe I am being too pedantic about this and there is a way to make it all work nicely, but it sure feels like using the casting machinery here is blending together two different concepts that are only sometimes the same. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company [1] Perhaps something to consider for a future extension of the standard.
On Thu, Sep 27, 2012 at 8:22 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Sep 26, 2012 at 1:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Also, on reflection I'm not sure about commandeering cast-to-json for >> this --- aren't we really casting to "json member" or something like >> that? The distinction between a container and its contents seems >> important here. With a container type as source, it might be important >> to do something different if we're coercing it to a complete JSON >> value versus something that will be just one member. I'm handwaving >> here because I don't feel like going back to re-read the RFC, but >> it seems like something that should be considered carefully before >> we lock down an assumption that there can never be a difference. > > I feel like there are two different behaviors that someone might want > here, and a cast cannot mean both. > > 1. Please reinterpret the existing value that I have already got as a > JSON object. For example, you might have a text field in which you > have been storing JSON values. Once you upgrade to 9.2, you might > want to reinterpret the existing contents of the field - which are > already valid JSON - as JSON objects. > > 2. Please convert the value that I have into a JSON object according > to a type-specific rule. For example, you might have a text field in > which you store arbitrary strings. But perhaps you need to store > structured data there, so once you upgrade to 9.2 you might want to > wrap up your strings inside JSON strings. > > Now there is some subtle ambiguity here because in some cases the > behavior can be exactly the same in both cases. For example most > numeric values will get the same treatment either way, but NaN cannot. > If you do mynumeric::json, interpretation #1 will fail for NaN but > interpretation #2 will probably produce something like "NaN". > Similarly if the type is boolean, we could likely get away with > producing true and false for either interpretation. If the type is > hstore, then #1 is going to fail, but #2 is going to convert "1"=>"2" > to {"1":"2"}. So in general it might seem that #2 is the better > interpretation, because it gives many casts a sensible interpretation > that is otherwise lacking. > > But, what about text? It seems to me that users will count on the > fact that '[1,2,3]'::text::json is going to produce [1,2,3] (a JSON > array containing the first three numbers) and NOT "[1,2,3]" (a JSON > string containing 7 characters). And that is emphatically > interpretation #1. Hm. Well, that's a really good point although I kinda disagree with your assumption: I think it's much cleaner to have: select '[1,2,3]'::int[]::json produce a json array. All types but text (record[] etc) would seem to use the type structure to define how the json gets laid out. 'text::json' is an exception, because there is an implied parse, which I'm starting to unfortunately think is the wrong behavior if you want to be able to make json datums out of sql datums: how do you create a vanilla json text datum? merlin
On 09/27/2012 09:22 AM, Robert Haas wrote: > On Wed, Sep 26, 2012 at 1:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Also, on reflection I'm not sure about commandeering cast-to-json for >> this --- aren't we really casting to "json member" or something like >> that? The distinction between a container and its contents seems >> important here. With a container type as source, it might be important >> to do something different if we're coercing it to a complete JSON >> value versus something that will be just one member. I'm handwaving >> here because I don't feel like going back to re-read the RFC, but >> it seems like something that should be considered carefully before >> we lock down an assumption that there can never be a difference. > I feel like there are two different behaviors that someone might want > here, and a cast cannot mean both. [snip] > > Maybe I am being too pedantic about this and there is a way to make it > all work nicely, but it sure feels like using the casting machinery > here is blending together two different concepts that are only > sometimes the same. OK. I think that's a very good point. I guess I was kinda swept away by this being suggested by a couple of influential people. [pause for another bout of vigorous self-criticism] So how about this suggestion: we'll look for a visible function named "as_json" or some such which has one parameter of the given type and returns json, and if it's present use it instead of the standard text representation. As an optimization we'll skip that lookup for builtin types, since there won't be one. Of course, we'll have one for hstore. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > On 09/27/2012 09:22 AM, Robert Haas wrote: >> Maybe I am being too pedantic about this and there is a way to make it >> all work nicely, but it sure feels like using the casting machinery >> here is blending together two different concepts that are only >> sometimes the same. > OK. I think that's a very good point. I guess I was kinda swept away by > this being suggested by a couple of influential people. Well, that doesn't make it wrong, it just means there's more work needed. I'm not that thrilled with magic assumptions about function names either; schema search path issues, for example, will make that dangerous. We've gone to considerable lengths to avoid embedding assumptions about operator names, and assumptions about function names aren't any better. There are at least three ways we could use the cast machinery for this: (1) Reject Robert's assumption that we have to support both interpretations for every cast situation. For instance, it doesn't seem that unreasonable to me to insist that you have to cast to text and then to json if you want the literal-reinterpretation behavior. The main problem then is figuring out a convenient way to provide interpretation #2 for text itself. (2) Add another hidden argument to cast functions, or perhaps repurpose one of the ones that exist now. This is likely to come out rather ugly because of the need to shoehorn it into an API that's already suffered a couple of rounds of after-the-fact additions, but it's certainly possible in principle. The main thing I'd want is to not define it in a JSON-only fashion --- so the first thing is to be able to explain the distinction we're trying to make in a type-independent way. (3) Invent an auxiliary type along the lines of "json_value" and say that you create a cast from foo to json_value when you want one interpretation, or directly to json if you want the other. Then things like record_to_json would look for the appropriate type of cast. This is a bit ugly because the auxiliary type has no reason to live other than to separate the two kinds of cast, but it avoids creating any new JSON-specific mechanisms in the type system. There might be some other ideas I'm not thinking of. regards, tom lane
On Thu, Sep 27, 2012 at 10:09 AM, Andrew Dunstan <andrew@dunslane.net> wrote: > So how about this suggestion: we'll look for a visible function named > "as_json" or some such which has one parameter of the given type and returns > json, and if it's present use it instead of the standard text > representation. As an optimization we'll skip that lookup for builtin types, > since there won't be one. Of course, we'll have one for hstore. I think that general approach has some promise, although there are some security issues to worry about. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hm...
IMO,
'[1,2,3]'::json
'[1,2,3]'::text::json
'[1,2,3]'::int[]::json
are the same thing... (though I am not sure '[1,2,3]'::int[] is valid in postgres...)
in js var o = JSON.parse(result_of_any_cast_above) should produce array of 3 integer
'[1,2,3]' is different then'"[1,2,3]"'
If there is the need to some text value as '[1,2,3]' be treated as JSON text value, then it would be: quote_literal('[1,2,3]')::json
Kind Regards,
Misa
||2012/9/27 Merlin Moncure <mmoncure@gmail.com>
Hm. Well, that's a really good point although I kinda disagree withOn Thu, Sep 27, 2012 at 8:22 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Sep 26, 2012 at 1:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Also, on reflection I'm not sure about commandeering cast-to-json for
>> this --- aren't we really casting to "json member" or something like
>> that? The distinction between a container and its contents seems
>> important here. With a container type as source, it might be important
>> to do something different if we're coercing it to a complete JSON
>> value versus something that will be just one member. I'm handwaving
>> here because I don't feel like going back to re-read the RFC, but
>> it seems like something that should be considered carefully before
>> we lock down an assumption that there can never be a difference.
>
> I feel like there are two different behaviors that someone might want
> here, and a cast cannot mean both.
>
> 1. Please reinterpret the existing value that I have already got as a
> JSON object. For example, you might have a text field in which you
> have been storing JSON values. Once you upgrade to 9.2, you might
> want to reinterpret the existing contents of the field - which are
> already valid JSON - as JSON objects.
>
> 2. Please convert the value that I have into a JSON object according
> to a type-specific rule. For example, you might have a text field in
> which you store arbitrary strings. But perhaps you need to store
> structured data there, so once you upgrade to 9.2 you might want to
> wrap up your strings inside JSON strings.
>
> Now there is some subtle ambiguity here because in some cases the
> behavior can be exactly the same in both cases. For example most
> numeric values will get the same treatment either way, but NaN cannot.
> If you do mynumeric::json, interpretation #1 will fail for NaN but
> interpretation #2 will probably produce something like "NaN".
> Similarly if the type is boolean, we could likely get away with
> producing true and false for either interpretation. If the type is
> hstore, then #1 is going to fail, but #2 is going to convert "1"=>"2"
> to {"1":"2"}. So in general it might seem that #2 is the better
> interpretation, because it gives many casts a sensible interpretation
> that is otherwise lacking.
>
> But, what about text? It seems to me that users will count on the
> fact that '[1,2,3]'::text::json is going to produce [1,2,3] (a JSON
> array containing the first three numbers) and NOT "[1,2,3]" (a JSON
> string containing 7 characters). And that is emphatically
> interpretation #1.
your assumption: I think it's much cleaner to have:
select '[1,2,3]'::int[]::json
produce a json array.
All types but text (record[] etc) would seem to use the type structure
to define how the json gets laid out. 'text::json' is an exception,
because there is an implied parse, which I'm starting to unfortunately
think is the wrong behavior if you want to be able to make json datums
out of sql datums: how do you create a vanilla json text datum?
merlin
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 09/27/2012 10:36 AM, Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> On 09/27/2012 09:22 AM, Robert Haas wrote: >>> Maybe I am being too pedantic about this and there is a way to make it >>> all work nicely, but it sure feels like using the casting machinery >>> here is blending together two different concepts that are only >>> sometimes the same. >> OK. I think that's a very good point. I guess I was kinda swept away by >> this being suggested by a couple of influential people. > Well, that doesn't make it wrong, it just means there's more work > needed. I'm not that thrilled with magic assumptions about function > names either; schema search path issues, for example, will make that > dangerous. We've gone to considerable lengths to avoid embedding > assumptions about operator names, and assumptions about function names > aren't any better. > > There are at least three ways we could use the cast machinery for this: > > (1) Reject Robert's assumption that we have to support both > interpretations for every cast situation. For instance, it doesn't > seem that unreasonable to me to insist that you have to cast to text > and then to json if you want the literal-reinterpretation behavior. > The main problem then is figuring out a convenient way to provide > interpretation #2 for text itself. The trouble is, ISTM, that both things seem equally intuitive. You could easily argue that x::text::json means take x as text and treat it as json, or that it means take x as text and produce a valid json value from it by escaping and quoting it. It's particularly ambiguous when x is itself already a text value. If we go this way I suspect we'll violate POLA for a good number of users. > > (2) Add another hidden argument to cast functions, or perhaps repurpose > one of the ones that exist now. This is likely to come out rather ugly > because of the need to shoehorn it into an API that's already suffered > a couple of rounds of after-the-fact additions, but it's certainly > possible in principle. The main thing I'd want is to not define it > in a JSON-only fashion --- so the first thing is to be able to explain > the distinction we're trying to make in a type-independent way. I agree with the "ugly" part of this analysis :-) > > (3) Invent an auxiliary type along the lines of "json_value" and say > that you create a cast from foo to json_value when you want one > interpretation, or directly to json if you want the other. Then > things like record_to_json would look for the appropriate type of cast. > This is a bit ugly because the auxiliary type has no reason to live > other than to separate the two kinds of cast, but it avoids creating > any new JSON-specific mechanisms in the type system. I could accept this. The reason is that very few types are in fact going to need a gadget like this. Yes it's mildly ugly, but really fairly unobtrusive. cheers andrew > > There might be some other ideas I'm not thinking of. Yeah. You've done better than me though :-) cheers andrew
On 09/27/2012 09:18 PM, Andrew Dunstan wrote: > > On 09/27/2012 10:36 AM, Tom Lane wrote: >> Andrew Dunstan <andrew@dunslane.net> writes: >>> On 09/27/2012 09:22 AM, Robert Haas wrote: >>>> Maybe I am being too pedantic about this and there is a way to make it >>>> all work nicely, but it sure feels like using the casting machinery >>>> here is blending together two different concepts that are only >>>> sometimes the same. >>> OK. I think that's a very good point. I guess I was kinda swept away by >>> this being suggested by a couple of influential people. >> Well, that doesn't make it wrong, it just means there's more work >> needed. I'm not that thrilled with magic assumptions about function >> names either; schema search path issues, for example, will make that >> dangerous. We've gone to considerable lengths to avoid embedding >> assumptions about operator names, and assumptions about function names >> aren't any better. >> >> There are at least three ways we could use the cast machinery for this: >> >> (1) Reject Robert's assumption that we have to support both >> interpretations for every cast situation. For instance, it doesn't >> seem that unreasonable to me to insist that you have to cast to text >> and then to json if you want the literal-reinterpretation behavior. Maybe cast not to text but to cstring for getting the text-is-already-json ? That is, reuse the current type io as "literal" casts. This way a cast of '{"a": 1}'::json::text will fail, as this json value really does not represent a text/string value. >> The main problem then is figuring out a convenient way to provide >> interpretation #2 for text itself. > > > The trouble is, ISTM, that both things seem equally intuitive. You > could easily argue that x::text::json means take x as text and treat > it as json, or that it means take x as text and produce a valid json > value from it by escaping and quoting it. It's particularly ambiguous > when x is itself already a text value. If we go this way I suspect > we'll violate POLA for a good number of users. It may be easier to sort this out if we think in terms of symmetry and unambiguity. let's postulate that mytype::json::mytype and json::mytype::json should always reproduce the original result or they should fail. so '[1,2,3]'::text::json::text === '[1,2,3]'::text with intermediate json being '"[1,2,3]"' and '[1,2,3]'::json::text::json fails the json-->text casts as '[1,2,3]'::json does not represent a text value (in a similar way as '[1,2,3]'::json::date fails) on the other hand '[1,2,3]'::json::int[]::json should succeed as there is a direct mapping to int array. .... >> (3) Invent an auxiliary type along the lines of "json_value" and say >> that you create a cast from foo to json_value when you want one >> interpretation, or directly to json if you want the other. Then >> things like record_to_json would look for the appropriate type of cast. >> This is a bit ugly because the auxiliary type has no reason to live >> other than to separate the two kinds of cast, but it avoids creating >> any new JSON-specific mechanisms in the type system. As suggested above, this special type could be on the other side - the type cstring as already used for type io functions the main problem here is, that currently we do interpret ::text::json as it were the type input function. we do proper selective quoting when converting to back json hannu=# create table jt(t text, j json); CREATE TABLE hannu=# insert into jt values ('[1,2]','[3,4]'); INSERT 0 1 hannu=# select row_to_json(jt) from jt; row_to_json ------------------------- {"t":"[1,2]","j":[3,4]} (1 row) but we do automatic casting through cstring and json type input func when converting to json. hannu=# select t::json, j::json from jt; t | j -------+------- [1,2] | [3,4] (1 row) This should probably be cleaned up. > > I could accept this. The reason is that very few types are in fact > going to need a gadget like this. Yes it's mildly ugly, but really > fairly unobtrusive. > > cheers > > andrew > >> >> There might be some other ideas I'm not thinking of. > > > Yeah. You've done better than me though :-) > > cheers > > andrew >
On 09/27/2012 06:58 PM, Hannu Krosing wrote: > On 09/27/2012 09:18 PM, Andrew Dunstan wrote: >> >> On 09/27/2012 10:36 AM, Tom Lane wrote: >>> Andrew Dunstan <andrew@dunslane.net> writes: >>>> On 09/27/2012 09:22 AM, Robert Haas wrote: >>>>> Maybe I am being too pedantic about this and there is a way to >>>>> make it >>>>> all work nicely, but it sure feels like using the casting machinery >>>>> here is blending together two different concepts that are only >>>>> sometimes the same. >>>> OK. I think that's a very good point. I guess I was kinda swept >>>> away by >>>> this being suggested by a couple of influential people. >>> Well, that doesn't make it wrong, it just means there's more work >>> needed. I'm not that thrilled with magic assumptions about function >>> names either; schema search path issues, for example, will make that >>> dangerous. We've gone to considerable lengths to avoid embedding >>> assumptions about operator names, and assumptions about function names >>> aren't any better. >>> >>> There are at least three ways we could use the cast machinery for this: >>> >>> (1) Reject Robert's assumption that we have to support both >>> interpretations for every cast situation. For instance, it doesn't >>> seem that unreasonable to me to insist that you have to cast to text >>> and then to json if you want the literal-reinterpretation behavior. > Maybe cast not to text but to cstring for getting the > text-is-already-json ? > > That is, reuse the current type io as "literal" casts. > > This way a cast of '{"a": 1}'::json::text will fail, as this json > value really does not > represent a text/string value. > >>> The main problem then is figuring out a convenient way to provide >>> interpretation #2 for text itself. >> >> >> The trouble is, ISTM, that both things seem equally intuitive. You >> could easily argue that x::text::json means take x as text and treat >> it as json, or that it means take x as text and produce a valid json >> value from it by escaping and quoting it. It's particularly ambiguous >> when x is itself already a text value. If we go this way I suspect >> we'll violate POLA for a good number of users. > It may be easier to sort this out if we think in terms of symmetry and > unambiguity. > > let's postulate that mytype::json::mytype and json::mytype::json > should always reproduce the original result or they should fail. Where are all these casts from json going to come from? What is going to dequote and unescape strings, or turn objects into hstores? You're making this much bigger than what I had in mind. The advantage of Tom's option (3) that I liked is that it is very minimal. Any type can provide its own function for conversion to json. If it's there we use it, if it's not we use its standard text representation. Let's stick to the KISS principle. cheers andrew
On 09/28/2012 12:42 AM, Andrew Dunstan wrote: > > On 09/27/2012 06:58 PM, Hannu Krosing wrote: >> On 09/27/2012 09:18 PM, Andrew Dunstan wrote: >>> >>> On 09/27/2012 10:36 AM, Tom Lane wrote: >>>> Andrew Dunstan <andrew@dunslane.net> writes: >>>>> On 09/27/2012 09:22 AM, Robert Haas wrote: >>>>>> Maybe I am being too pedantic about this and there is a way to >>>>>> make it >>>>>> all work nicely, but it sure feels like using the casting machinery >>>>>> here is blending together two different concepts that are only >>>>>> sometimes the same. >>>>> OK. I think that's a very good point. I guess I was kinda swept >>>>> away by >>>>> this being suggested by a couple of influential people. >>>> Well, that doesn't make it wrong, it just means there's more work >>>> needed. I'm not that thrilled with magic assumptions about function >>>> names either; schema search path issues, for example, will make that >>>> dangerous. We've gone to considerable lengths to avoid embedding >>>> assumptions about operator names, and assumptions about function names >>>> aren't any better. >>>> >>>> There are at least three ways we could use the cast machinery for >>>> this: >>>> >>>> (1) Reject Robert's assumption that we have to support both >>>> interpretations for every cast situation. For instance, it doesn't >>>> seem that unreasonable to me to insist that you have to cast to text >>>> and then to json if you want the literal-reinterpretation behavior. >> Maybe cast not to text but to cstring for getting the >> text-is-already-json ? >> >> That is, reuse the current type io as "literal" casts. >> >> This way a cast of '{"a": 1}'::json::text will fail, as this json >> value really does not >> represent a text/string value. >> >>>> The main problem then is figuring out a convenient way to provide >>>> interpretation #2 for text itself. >>> >>> >>> The trouble is, ISTM, that both things seem equally intuitive. You >>> could easily argue that x::text::json means take x as text and treat >>> it as json, or that it means take x as text and produce a valid json >>> value from it by escaping and quoting it. It's particularly >>> ambiguous when x is itself already a text value. If we go this way I >>> suspect we'll violate POLA for a good number of users. >> It may be easier to sort this out if we think in terms of symmetry >> and unambiguity. >> >> let's postulate that mytype::json::mytype and json::mytype::json >> should always reproduce the original result or they should fail. > > > Where are all these casts from json going to come from? What is going > to dequote and unescape strings, or turn objects into hstores? as json is defined to encode only 3 base types - boolean (true/false), number and string - and two composite types - array and "object" - it should not be too hard to provide casts for these and then use existing casts to go on from number and text Something extra should probably be done for number, perhaps we need separate casts for float and decimal/numeric but the rest should be relatively simple. the json null vs SQL NULL poses and interesting problem though ;) > You're making this much bigger than what I had in mind. The advantage > of Tom's option (3) that I liked is that it is very minimal. Any type > can provide its own function for conversion to json. If it's there we > use it, if it's not we use its standard text representation. Let's > stick to the KISS principle. > > cheers > > andrew > > >
On 9/27/12 10:36 AM, Tom Lane wrote: > (1) Reject Robert's assumption that we have to support both > interpretations for every cast situation. For instance, it doesn't > seem that unreasonable to me to insist that you have to cast to text > and then to json if you want the literal-reinterpretation behavior. > The main problem then is figuring out a convenient way to provide > interpretation #2 for text itself. I don't think it's going to work to special-case text like that. For one thing, it would mean wildly different things would happen depending on whether text or varchar is used. > (2) Add another hidden argument to cast functions, or perhaps repurpose > one of the ones that exist now. This is likely to come out rather ugly > because of the need to shoehorn it into an API that's already suffered > a couple of rounds of after-the-fact additions, but it's certainly > possible in principle. Could there be a case where you'd want to be able to serialize a type to JSON like that, but don't actually want a regular cast to exist (even explicit)? > The main thing I'd want is to not define it > in a JSON-only fashion --- so the first thing is to be able to explain > the distinction we're trying to make in a type-independent way. There is XML, which would use the same mechanism. For example, we currently have a cast from text to xml, which interprets the text as an XML document. The other interpretation would escape <, >, etc. > (3) Invent an auxiliary type along the lines of "json_value" and say > that you create a cast from foo to json_value when you want one > interpretation, or directly to json if you want the other. Then > things like record_to_json would look for the appropriate type of cast. > This is a bit ugly because the auxiliary type has no reason to live > other than to separate the two kinds of cast, but it avoids creating > any new JSON-specific mechanisms in the type system. What if some wanted to create a yaml type as an extension. How would they associate yaml and yaml_value?
On 09/28/2012 08:22 AM, Peter Eisentraut wrote: >> (3) Invent an auxiliary type along the lines of "json_value" and say >> that you create a cast from foo to json_value when you want one >> interpretation, or directly to json if you want the other. Then >> things like record_to_json would look for the appropriate type of cast. >> This is a bit ugly because the auxiliary type has no reason to live >> other than to separate the two kinds of cast, but it avoids creating >> any new JSON-specific mechanisms in the type system. > What if some wanted to create a yaml type as an extension. How would > they associate yaml and yaml_value? Well, of course it wouldn't be a problem if the code could know the OID of yaml_value ... :-) cheers andrew
Hi Guys,
I have made some blog about the subject:
Hopefully will help on some kind...
Kind Regards,
Misa
On 09/28/2012 10:34 PM, Misa Simic wrote: > Hi Guys, > > I have made some blog about the subject: > > http://misasimic.blogspot.co.uk/2012/09/postgresql-92-and-json-datatype.html > > Hopefully will help on some kind... > > I think if you want to contribute you should post on the mailing list - otherwise the conversation just becomes way too fragmented. cheers andrew
No probs...
And I did...The thing is, subject is to wide... Post too long... Intention was just to better explain thoughts... I am not a blogger anyway, just new in Postgres community... Trying to say, probably 90% of post would be suficient just for the list, and because of i am new it is hard to me to identify that right 10% peace :)
cheers,
Misa
On Saturday, September 29, 2012, Andrew Dunstan wrote:
On Saturday, September 29, 2012, Andrew Dunstan wrote:
I think if you want to contribute you should post on the mailing list - otherwise the conversation just becomes way too fragmented.
cheers
andrew
On 09/26/2012 06:46 PM, Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> Drawing together various discussions both here and elsewhere (e.g. the >> PostgresOpen hallway track) I propose to work on the following: >> 1. make datum_to_json() honor a type's cast to json if it exists. The >> fallback is to use the type's string representation, as now. >> 2. add a cast hstore -> json (any others needed for core / contrib types ?) >> 3. add a to_json(anyelement) function >> 4. add a new aggregate function json_agg(anyrecord) -> json to simplify >> and make more effecient turning a resultset into json. >> Comments welcome. > ISTM the notion of to_json(anyelement) was already heavily discussed and > had spec-compliance issues ... in fact, weren't you one of the people > complaining? What exactly does #3 mean that is different from the > previous thread? > > Also, on reflection I'm not sure about commandeering cast-to-json for > this --- aren't we really casting to "json member" or something like > that? The distinction between a container and its contents seems > important here. With a container type as source, it might be important > to do something different if we're coercing it to a complete JSON > value versus something that will be just one member. I'm handwaving > here because I don't feel like going back to re-read the RFC, but > it seems like something that should be considered carefully before > we lock down an assumption that there can never be a difference. > > regards, tom lane Reflecting over the dual possible interpretation of what it does mean to convert between "text" and "json" data types it has dawned to me that the confusion may come mainly from wanting json to be two things at once: 1. - a serialisation of of a subset of javascript objects to a string. 2. - a dynamic type represented by the above serialisation. case 1 ------ If we stick with interpretation 1. then json datatype is really no more than a domain based on "text" type and having a CHECK is_valid_json() constraint. For this interpretation it makes complete sense to interpret any text as already being serialised and no casts (other than casts to a text type) have place here. a few datatypes - like hstore - could have their "to_json_text()" serialiser functions if there is a better serialisation to text than the types defaul one, but other than that the "serialise to text and quote if not null, boolean or numeric type" should be needed. if there is strong aversion to relying on function names for getting the right serialisation function, we could invent a new "cast-like" feature for serialising types so we could define a serialiser for hstore to json using CREATE SERIALISATION (hstore AS json) WITH FUNCTION hstore_as_json(hstore); this probably will not be any safer than just using the name for lookup directly unless we place some restrictions on who is allowed to create the serialisation; case 2 ------ My suggestions on using typecasts for convert-to-json were result of this interpretation of json-as-dynamic-type. Having thought more of this I now think that we probably should leave JSON alone and develop an separate dynamic type here. I have started work on doing this based on ideas from BSON data format, except using postgreSQL datatypes. It will still have to solve similar problems we have had here with JSON, but being both a new type and a binary type there will probably be no expectation of 1-to-1 conversion from to-text. Will post here soon for more discussion on what this ned type does and how it should be used. Hannu
On 09/29/2012 11:47 AM, Hannu Krosing wrote: > On 09/26/2012 06:46 PM, Tom Lane wrote: >> Andrew Dunstan <andrew@dunslane.net> writes: >>> Drawing together various discussions both here and elsewhere (e.g. the >>> PostgresOpen hallway track) I propose to work on the following: >>> 1. make datum_to_json() honor a type's cast to json if it exists. The >>> fallback is to use the type's string representation, as now. >>> 2. add a cast hstore -> json (any others needed for core / contrib >>> types ?) >>> 3. add a to_json(anyelement) function >>> 4. add a new aggregate function json_agg(anyrecord) -> json to simplify >>> and make more effecient turning a resultset into json. >>> Comments welcome. >> ISTM the notion of to_json(anyelement) was already heavily discussed and >> had spec-compliance issues ... in fact, weren't you one of the people >> complaining? What exactly does #3 mean that is different from the >> previous thread? >> >> Also, on reflection I'm not sure about commandeering cast-to-json for >> this --- aren't we really casting to "json member" or something like >> that? The distinction between a container and its contents seems >> important here. With a container type as source, it might be important >> to do something different if we're coercing it to a complete JSON >> value versus something that will be just one member. I'm handwaving >> here because I don't feel like going back to re-read the RFC, but >> it seems like something that should be considered carefully before >> we lock down an assumption that there can never be a difference. >> >> regards, tom lane > Reflecting over the dual possible interpretation of what it does mean > to convert between "text" and "json" data types it has dawned to me > that the confusion may come mainly from wanting json to be two things > at once: > > 1. - a serialisation of of a subset of javascript objects to a string. > > 2. - a dynamic type represented by the above serialisation. > > case 1 > ------ > > If we stick with interpretation 1. then json datatype is really no > more than a domain based on "text" type and having a CHECK > is_valid_json() constraint. > > For this interpretation it makes complete sense to interpret any text > as already being serialised and no casts (other than casts to a text > type) have place here. > > a few datatypes - like hstore - could have their "to_json_text()" > serialiser functions if there is a better serialisation to text than > the types defaul one, but other than that the "serialise to text and > quote if not null, boolean or numeric type" should be needed. > > if there is strong aversion to relying on function names for getting > the right serialisation function, we could invent a new "cast-like" > feature for serialising types so we could define a serialiser for > hstore to json using > > CREATE SERIALISATION (hstore AS json) > WITH FUNCTION hstore_as_json(hstore); > > this probably will not be any safer than just using the name for > lookup directly unless we place some restrictions on who is allowed to > create the serialisation; > > case 2 > ------ > > My suggestions on using typecasts for convert-to-json were result of > this interpretation of json-as-dynamic-type. > > Having thought more of this I now think that we probably should leave > JSON alone and develop an separate dynamic type here. > > I have started work on doing this based on ideas from BSON data > format, except using postgreSQL datatypes. > > It will still have to solve similar problems we have had here with > JSON, but being both a new type and a binary type there will probably > be no expectation of 1-to-1 conversion from to-text. > > Will post here soon for more discussion on what this ned type does and > how it should be used. I am not opposed to making a new type, but I really don't think that means we need to do nothing for the existing data type. The suggested SERIALIZATION mechanism seems to be fairly intrusive and heavy handed, as opposed to the very lightweight mechanism that is Tom's option 3. Personally I don't have a strong feeling about a general to_json function, but it's something other people have asked for. The things I do care about are the json_agg function (to which nobody has objected) and finding a mechanism for reasonably converting structured types, particularly hstore, to json. I still think Tom's suggestion is the best and simplest way to do that. cheers andrew
On 09/29/2012 05:40 PM, Andrew Dunstan wrote: > > > > I am not opposed to making a new type, but I really don't think that > means we need to do nothing for the existing data type. The suggested > SERIALIZATION mechanism seems to be fairly intrusive and heavy handed, > as opposed to the very lightweight mechanism that is Tom's option 3. Agreed this would be the simplest one. I prefer it to be called something like "json_embedded?string" to better convey it's use as it is needed only when converting a postgresql string type to json string type. json_value already has a standard-defined meaning and is a supertype of json (which unfortunately is called "json text". > Personally I don't have a strong feeling about a general to_json > function, but it's something other people have asked for. The things I > do care about are the json_agg function (to which nobody has objected) Not just objected but i am very much for it. +1 from me. > and finding a mechanism for reasonably converting structured types, > particularly hstore, to json. hstore to json is what started this discussion and using to_json(<sometype>) function was one of the proposed solutions for this. Using the same mechanism for enabling users to also have custom serialisations for thins that the standard leaves open - like datetime - is an added bonus. > I still think Tom's suggestion is the best and simplest way to do that. which Toms suggestion you mean here ? The 3. mentioned above was for making possible 2 separate ways to convert (serialise/quote/escape and parse/check-for-valid-json) string to json and afair not about hstore to json. I'm also looking forward for an easy way or two to populate a record from json and extract an array from json. > > cheers > > andrew
On 09/29/2012 05:01 PM, Hannu Krosing wrote: > On 09/29/2012 05:40 PM, Andrew Dunstan wrote: >> >> I still think Tom's suggestion is the best and simplest way to do that. > which Toms suggestion you mean here ? > > The 3. mentioned above was for making possible 2 separate ways to > convert (serialise/quote/escape and parse/check-for-valid-json) string > to json and afair not about hstore to json. Er, what? yes, I meant option 3, and it is a perfect mechanism for doing conversion of an hstore field inside datum_to_json: the code would look for a cast to the new type (whatever we call it) and use that instead of the normal text representation. > > I'm also looking forward for an easy way or two to populate a record > from json and extract an array from json. I am prepared to take this on - at least starting with json to array which I think is the simpler case. I have some ideas about how to do this and have done a bit of experimental work along these lines. Personally I'd be inclined to make it do a conversion to text[] and then cast from that to anything else we needed. cheers andrew
Hannu Krosing wrote: > Reflecting over the dual possible interpretation of what it does mean to > convert between "text" and "json" data types it has dawned to me that > the confusion may come mainly from wanting json to be two things at once: > > 1. - a serialisation of of a subset of javascript objects to a string. > > 2. - a dynamic type represented by the above serialisation. > > case 1 > ------ > > If we stick with interpretation 1. then json datatype is really no more > than a domain based on "text" type and having a CHECK is_valid_json() > constraint. <snip> > case 2 > ------ > > My suggestions on using typecasts for convert-to-json were result of > this interpretation of json-as-dynamic-type. > > Having thought more of this I now think that we probably should leave > JSON alone and develop an separate dynamic type here. <snip> I think it would be best to have 2 main JSON-concerning data types: 1. A proper subset of "text" consisting of every value meeting some is_valid_json() constraint, as a DOMAIN; every value of this type is a "text". 2. A type that is disjoint from "text", that is, no value of this type would compare as equal to any "text" value. It would be considered a collection type, similarly to how an array or tuple or relation is, but having arbitrary depth and that is heterogeneous in the general case. You could say that #1 is to textual source code what #2 is to a parsed syntax tree of that code. Or that #1 is to textual XML what #2 is to an XML DOM. It would be type #2 above that is the primary JSON type, which has all the special operators for working with JSON, while type #1 would be opaque, just a character string, and must be cast as type #2 in order to use any special operators on it. Similarly, all the converting operators between other types and JSON would be with #2 only, and producing #1 must go through #2. So call #1 say JSON_source and #2 say JSON_model, or JSON_text and JSON respectively. That's how I think it should work. -- Darren Duncan
On 09/29/2012 10:29 PM, Andrew Dunstan wrote: > > On 09/29/2012 05:01 PM, Hannu Krosing wrote: >> On 09/29/2012 05:40 PM, Andrew Dunstan wrote: >>> >>> I still think Tom's suggestion is the best and simplest way to do that. >> which Toms suggestion you mean here ? >> >> The 3. mentioned above was for making possible 2 separate ways to >> convert (serialise/quote/escape and parse/check-for-valid-json) >> string to json and afair not about hstore to json. > > Er, what? yes, I meant option 3, and it is a perfect mechanism for > doing conversion of an hstore field inside datum_to_json: the code > would look for a cast to the new type (whatever we call it) and use > that instead of the normal text representation. The only place where the new type is needed is text to json conversion, for all others cast to the main json type should be enough. Or do you mean to still do it as a generic cast to "json_component" type so it would be usable for text types as well ? > >> >> I'm also looking forward for an easy way or two to populate a record >> from json and extract an array from json. > > > I am prepared to take this on - at least starting with json to array > which I think is the simpler case. I have some ideas about how to do > this and have done a bit of experimental work along these lines. > Personally I'd be inclined to make it do a conversion to text[] and > then cast from that to anything else we needed. > > cheers > > andrew > > >
<p class="MsoNormal"><p class="MsoNormal">Datum_to_json<p class="MsoNormal"><br /><p class="MsoNormal"><br /><p class="MsoNormal">SELECT'a=>1'::hstore, '<a>1</a>'::xml, '{"a":1}'<p class="MsoNormal"><br /><p class="MsoNormal">(Pleasenote that last column is unknown – datatype)<p class="MsoNormal"><p class="MsoNormal"><br /><p class="MsoNormal"><br/><p class="MsoNormal">Now, what is the main goal? to get:<p class="MsoNormal"><p class="MsoListParagraphCxSpMiddle">1) <pclass="MsoListParagraphCxSpMiddle"><p class="MsoListParagraphCxSpMiddle">{<p class="MsoListParagraphCxSpMiddle"> "hstore": "\"a\"=>\"1\"",<p class="MsoListParagraphCxSpMiddle"> "xml": "<a>1</a>",<pclass="MsoListParagraphCxSpMiddle"> "?column?": "{\"a\":1}"<p class="MsoListParagraphCxSpMiddle">}<p class="MsoListParagraphCxSpMiddle"><br /><p class="MsoListParagraphCxSpMiddle">or:<pclass="MsoListParagraphCxSpMiddle">2)<p class="MsoListParagraphCxSpMiddle">{<p class="MsoListParagraphCxSpMiddle"><pclass="MsoListParagraphCxSpMiddle"> "hstore": {<p class="MsoListParagraphCxSpMiddle"> "a": "1"<p class="MsoListParagraphCxSpMiddle"> },<p class="MsoListParagraphCxSpMiddle"> "xml": {<p class="MsoListParagraphCxSpMiddle"> "a": "1"<p class="MsoListParagraphCxSpMiddle"> },<p class="MsoListParagraphCxSpMiddle"> "?column?": {<p class="MsoListParagraphCxSpMiddle"> "a": 1<p class="MsoListParagraphCxSpMiddle"> }<p class="MsoListParagraphCxSpMiddle">}<p><pclass="MsoListParagraphCxSpMiddle"><br /><p><p class="MsoListParagraphCxSpMiddle">1) is already possible to get now:<p class="MsoListParagraphCxSpMiddle"> <p class="MsoListParagraphCxSpMiddle">SELECTrow_to_json(t) FROM (SELECT 'a=>1'::hstore, '<a>1</a>'::xml, '{"a":1}')AS t<p class="MsoListParagraphCxSpMiddle"> <p class="MsoListParagraphCxSpMiddle">I don’t know how things work underthe hood (haven’t taken a look on row_to_json source…) But it says to me that there is already Datum_to_json – somewhere…<pclass="MsoListParagraphCxSpMiddle"><br /><p class="MsoListParagraphCxSpMiddle">2) Is not possible atm... butwould be if we have CAST functions for each DataType - I am not sure is it possible to write some generic function whatwill convert any datype to JSON (or to an "intermediate" dynamic datatype) without knowing specific things about concreteDataType (though I dont see big difference will type itself provide _to_json or to_dynamic_type function)...<p class="MsoListParagraphCxSpMiddle"><br/><p class="MsoListParagraphCxSpMiddle"><br /><p class="MsoListParagraphCxSpMiddle">IsJSON really a type?<p class="MsoListParagraphCxSpMiddle"><br /><p class="MsoListParagraphCxSpMiddle">I think it is... But just needs better handling... We have atm type JSON - though we can'tsay<p class="MsoListParagraphCxSpMiddle"><br /><p class="MsoListParagraphCxSpMiddle">SELECT * FROM foo WHERE json_member("member_name",json_column) = 1;<p class="MsoListParagraphCxSpMiddle">I mean - we can't without plv8... to bemore precise... We have used to use plv8 - but it has became a bit slow how table grows... so we have made some workarounds...with custom functions what improves response for above query... however it is very customised... but maybeit could give some ideas for indexing JSON...<p class="MsoListParagraphCxSpMiddle"><br /><p class="MsoListParagraphCxSpMiddle">basically...ifwe have table:<p class="MsoListParagraphCxSpMiddle">foo (id PK, some columns,json_column)<p class="MsoListParagraphCxSpMiddle"> we have made another table:<p class="MsoListParagraphCxSpMiddle">index_json(idint, member_name ltree, json_value text) table with index (member_name, json_value)<pclass="MsoListParagraphCxSpMiddle">when we instert row in foo we also json_column value i.e. example from desiredresult 2) above transfer to<p class="MsoListParagraphCxSpMiddle">1, 'hstore.a', '"1"'<p class="MsoListParagraphCxSpMiddle">1,'xml.a', '"1"'<p class="MsoListParagraphCxSpMiddle">1, '?column?.a', '1'<p class="MsoListParagraphCxSpMiddle">andnow when we want result for SELECT * FROM foo WHERE json_member("xml.a", json_column)= 1;<p class="MsoListParagraphCxSpMiddle">we actually asks SELECT id FROM index_json WHERE json_member = $json_memberand json_value = $json_value into my_ids<p class="MsoListParagraphCxSpMiddle">and then SELECT * FROM foo WHEREid = ANY(my_ids)<p class="MsoListParagraphCxSpMiddle"><br /><p class="MsoListParagraphCxSpMiddle">Also, to get SELECT* FROM foo as one JSON - atm, I think query is a monster (without plv8), best would be if it is possible to providesome shortcut support...<p class="MsoListParagraphCxSpMiddle"><br /><p class="MsoListParagraphCxSpMiddle">Kind regards,<pclass="MsoListParagraphCxSpMiddle">Misa<p>