Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP) - Mailing list pgsql-hackers

From Robert Haas
Subject Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date
Msg-id AANLkTim7htDPripP15JsQEjLGpyMNuvMJurb0aZOpKpO@mail.gmail.com
Whole thread Raw
In response to Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
List pgsql-hackers
On Tue, Oct 19, 2010 at 6:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Greg Stark <gsstark@mit.edu> writes:
>> The elephant in the room is if the binary encoded form is smaller then
>> it occupies less ram and disk bandwidth to copy it around.
>
> It seems equally likely that a binary-encoded form could be larger
> than the text form (that's often true for our other datatypes).
> Again, this is an argument that would require experimental evidence
> to back it up.

That's exactly what I was thinking when I read Greg's email.  I
designed something vaguely (very vaguely) like this many years ago and
the binary format that I worked so hard to create was enormous
compared to the text format, mostly because I had a lot of small
integers in the data I was serializing, and as it turns out,
representing {0,1,2} in less than 7 bytes is not very easy.  It can
certainly be done if you set out to optimize for precisely those kinds
of cases, but I ended up with something awful like:

<4 byte type = list> <4 byte list length = 3> <4 byte type = integer>
<4 byte integer = 0> <4 byte type = integer> <4 byte integer = 1> <4
byte type = integer> <4 byte integer = 2>

= 32 bytes.  Even if you were a little smarter than I was and used 2
byte integers (with some escape hatch allowing larger numbers to be
represented) it's still more than twice the size of the text
representation.  Even if you use 1 byte integers it's still bigger.
To get it down to being smaller, you've got to do something like make
the high nibble of each byte a type field and the low nibble the first
4 payload bits.  You can certainly do all of this but you could also
just store it as text and let the TOAST compression algorithm worry
about making it smaller.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Next
From: Hsien-Wen Chu
Date:
Subject: PostgreSQL and HugePage