Re: JSON and unicode surrogate pairs - Mailing list pgsql-hackers
From | Hannu Krosing |
---|---|
Subject | Re: JSON and unicode surrogate pairs |
Date | |
Msg-id | 51B73621.4030003@2ndQuadrant.com Whole thread Raw |
In response to | Re: JSON and unicode surrogate pairs (Stefan Drees <stefan@drees.name>) |
List | pgsql-hackers |
On 06/11/2013 04:04 PM, Stefan Drees wrote: > On 2013-06-11 15:23 CEST, Hannu Krosing wrote: >> On 06/11/2013 03:08 PM, Stefan Drees wrote: >>> ... >>> >>> What about this: >>> =# SELECT '{"measure":"seconds", "measure":42}'::json; >>> json >>> -------------------------------------- >>> {"measure":42} >>> >>> I presume people being used to store metadata in "preceding" json >>> object members with duplicate names, would want to decide in the >>> client requesting the data what to do with the metadata information >>> and at what point to "drop", wouldn't they :-?) >> Seems like blatant misuse of JSON format :) >> >> I assume that as JSON is _serialisation_ format, it should represent a >> data structure, not processing instructions. >> >> I can see no possible JavaScript structure which could produce duplicate >> key when serialised. > > ahem, JSON is a notation that allows toplevel an object or an array. > If it is an object, this consists of pairs called (name, value). > Here value can be any object, array, number, string or the literals > null, false or true. > The name must be a string. That's it :-) no key **and** also no > ordering on these "name"s ;-) and as the RFC does not care, where the > data came from or how it was represented before it became "JSON text" > (the top-level element of a JSON document) how should the parser know > ... but delta notaion, commenting, or "streaming" needs created many > applications that deliver multibags and trust on some ordering > conventions in their dataexchanging relations. > >> And I don't think that any standard JSON reader supports this either. > > Oh yes. Convention is merely: Keep all ("Streaming") or the last > (whatever the last may mean, must be carefully ensured in the > interchange relation). > All would like these two scenarios, but the RFC as is does not prevent > an early-out (like INSERT OR IGNORE) :-)) I was kind of assuming that JSON is a (JavaScript) Object Serialization Notation, that is, there is a unique implied "JavaScript Object" which can be "Serialized" int any given JSON string. IOW, that if you serialise an object then this is what JSON should be. The fact that most JSON to Object readers support multiple keys is just an implementation artifact and not something that is required by RFC. > >> Of you want to store any JavaScript snippets in database use text. > > JSON is language agnostic. I use more JSON from python, php than from > js, but others do so differently ... Agreed. Even the fact that you can define any operations on a "JSON" string - like extracting a value for key - is actually non-standard :) Perhaps I should stop thinking of json type as something that implies any underlying structure ... > >> Or perhaps pl/v8 :) >> > > Do you mean the "V8 Engine Javascript Procedural Language add-on for > PostgreSQL" (http://code.google.com/p/plv8js/), I guess so. > > I did not want to hijack the thread, as this centered more around > escaping where and what in which context (DB vs. client encoding). > > As the freshly created IETF json working group revamps the JSON RFC on > its way to the standards track, there are currently also discussions > on what to do with unicode surrogate pairs. See eg. this thread > http://www.ietf.org/mail-archive/web/json/current/msg00675.html > starting a summarizing effort. Wow. The rabbit hole is much deeper than I thought :) -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
pgsql-hackers by date: