Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP) - Mailing list pgsql-hackers

From Terry Laurenzo
Subject Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date
Msg-id AANLkTiknwbEFW-gFyndRfCXsYoyq8pRm=NKwHecQRyJw@mail.gmail.com
Whole thread
In response to Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)  (Itagaki Takahiro <itagaki.takahiro@gmail.com>)
Responses Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
List pgsql-hackers
Good points.  In addition, any binary format needs to support object property traversal without having to do a deep scan of all descendants.  BSON handles this with explicit lengths for document types (objects and arrays) so that entire parts of the tree can be skipped during sibling traversal.

It would also be nice to make sure that we store fully parsed strings.  There are lots of escape options that simply do not need to be preserved (c escapes, unicode, octal, hex sequences) and hinder the ability to do direct comparisons.  BSON also makes a small extra effort to ensure that object property names are encoded in a way that is easily comparable, as this will be the most frequently compared items.

I'm still going to write up a proposed grammar that takes these items into account - just ran out of time tonight.

Terry

On Wed, Oct 20, 2010 at 12:46 AM, Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote:
On Wed, Oct 20, 2010 at 6:39 AM, Terry Laurenzo <tj@laurenzo.org> wrote:
> The answer may be to have both a jsontext and jsonbinary type as each will
> be optimized for a different case.

I want to choose one format for JSON rather than having two types.
It should be more efficient than other format in many cases,
and not so bad in other cases.

I think the discussion was started with
 "BSON could represent was a subset of what JSON could represent".
So, any binary format could be acceptable that have enough
representational power compared with text format.

For example, a sequence of <byte-length> <text> could reduce
CPU cycles for reparsing and hold all of the input as-is except
ignorable white-spaces. It is not a BSON, but is a binary format.

Or, if we want to store numbers in binary form, I think the
format will be numeric type in postgres. It has high precision,
and we don't need any higher precision than it to compare two
numbers eventually. Even if we use BSON format, we need to extend
it to store all of numeric values, that precision is 10^1000.

--
Itagaki Takahiro

pgsql-hackers by date:

Previous
From: Itagaki Takahiro
Date:
Subject: Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Next
From: Dimitri Fontaine
Date:
Subject: Re: Extensions, this time with a patch