On 10 February 2014 20:11, Hannu Krosing <hannu@krosing.net> wrote:
> The fastest and lowest parsing cost format for "JSON" is tnetstrings
> http://tnetstrings.org/ why not use it as the binary wire format ?
>
> It would be as binary as it gets and still be generally parse-able by
> lots of different platforms, at leas by all of these we care about.
If we do go down the binary encoding path in a future release, can I
please suggest *not* using something like tnetstrings, which suffers
the same problem that a few binary transport formats suffer,
particularly when they're developed by people whose native language
doesn't distinguish between byte arrays and strings - all strings are
considered byte arrays and it's up to an application to decide on
character encoding and which things are data vs strings in the
application.
This makes writing a parser in a language which does treat byte arrays
and strings differently very difficult, see e.g. the java tnetstrings
API [1] which is forced into treating strings as byte arrays until the
programmer then asks it to parse the thing again, but please treat
everything as a string this time. The msgpack people after much
wrangling have ended up issuing a new version of the protocol which
avoids this issue and which they are strongly encouraging users to
switch to, see [2] for the gory details.
While we may not ever store types in our jsonb format other than the
standard json data types (I can foresee people wanting to do it,
though), I would strongly recommend picking a format which at least is
clear that a value is a string (text, whatever), and preferably makes
it clear what the character encoding is. Or maybe it should just
follow whatever the client encoding is at the time - as long as that
is completely unambiguous to a client.
Cheers
Tom
[1] https://github.com/asinger/tnetstringsj
[2] https://github.com/msgpack/msgpack/issues/128