Thread: JSON vs. JSONB storage size

JSON vs. JSONB storage size

From
Thomas Kellerer
Date:
I recently stumbled over the presentation "How to Use JSON in MySQL Wrong" by Bill Karwin[1]

While most of the indexing part simply doesn't apply to Postgres, I was curious about the statement that the data type
ofa json value inside the json matters as well (Slide 56)
 

Apparently in MySQL storing {"a": 123456789} takes less space than {"a": '123456789'}

So I tested that with Postgres both using json and jsonb - my expectation was, that this would be similar in Postgres
aswell. 
 

However, it turned out that for a json column there was no difference at all (both versions would show up the same with
pg_total_relation_size())

The table size with jsonb was bigger in general, but the one with the "integer" value was even bigger than the one with
the"string" storage. 
 

The following little test script:

    create table json_length_test1 (id serial primary key, d json);
    insert into json_length_test1
    select i, jsonb_build_object('a', 1234567890)
    from generate_series(1,1e6) t(i);

    create table json_length_test2 (id serial primary key, d json);
    insert into json_length_test2
    select i, jsonb_build_object('a', '1234567890')
    from generate_series(1,1e6) t(i);

    create table jsonb_length_test1 (id serial primary key, d jsonb);
    insert into jsonb_length_test1
    select i, jsonb_build_object('a', 1234567890)
    from generate_series(1,1e6) t(i);

    create table jsonb_length_test2 (id serial primary key, d jsonb);
    insert into jsonb_length_test2
    select i, jsonb_build_object('a', '1234567890')
    from generate_series(1,1e6) t(i);

    select 'json', pg_size_pretty(pg_total_relation_size('json_length_test1')) as json_int_size, 
                   pg_size_pretty(pg_total_relation_size('json_length_test2')) as json_text_size
    union all
    select 'jsonb', pg_size_pretty(pg_total_relation_size('jsonb_length_test1')) as json_int_size, 
                    pg_size_pretty(pg_total_relation_size('jsonb_length_test2')) as json_text_size


Returns (Postgres 12, Windows 10)

?column? | json_int_size | json_text_size
---------+---------------+---------------
json     | 71 MB         | 71 MB         
jsonb    | 87 MB         | 79 MB         

I am a bit surprised by this (not because the jsonb sizes are generally bigger, but that the string value takes less
space)

Is this caused by the fact that a string value compresses better internally? 


Thomas

  [1] https://www.slideshare.net/billkarwin/how-to-use-json-in-mysql-wrong



Re: JSON vs. JSONB storage size

From
Dmitry Dolgov
Date:
> On Fri, Oct 11, 2019 at 1:40 PM Thomas Kellerer <spam_eater@gmx.net> wrote:
>
> I am a bit surprised by this (not because the jsonb sizes are generally
> bigger, but that the string value takes less space)
>
> Is this caused by the fact that a string value compresses better internally?

Those jsonb objects are quite small, so it could be that an alignment kicks in,
since as far as I remember, jsonb header and data should be aligned by 4 byte
boundary.



Re: JSON vs. JSONB storage size

From
Adrian Klaver
Date:
On 10/11/19 4:40 AM, Thomas Kellerer wrote:
> I recently stumbled over the presentation "How to Use JSON in MySQL Wrong" by Bill Karwin[1]
> 
> While most of the indexing part simply doesn't apply to Postgres, I was curious about the statement that the data
typeof a json value inside the json matters as well (Slide 56)
 
> 
> Apparently in MySQL storing {"a": 123456789} takes less space than {"a": '123456789'}
> 
> So I tested that with Postgres both using json and jsonb - my expectation was, that this would be similar in Postgres
aswell.
 
> 
> However, it turned out that for a json column there was no difference at all (both versions would show up the same
withpg_total_relation_size())
 
> 
> The table size with jsonb was bigger in general, but the one with the "integer" value was even bigger than the one
withthe "string" storage.
 
> 
> The following little test script:
> 
>      create table json_length_test1 (id serial primary key, d json);
>      insert into json_length_test1
>      select i, jsonb_build_object('a', 1234567890)
>      from generate_series(1,1e6) t(i);
> 
>      create table json_length_test2 (id serial primary key, d json);
>      insert into json_length_test2
>      select i, jsonb_build_object('a', '1234567890')
>      from generate_series(1,1e6) t(i);
> 
>      create table jsonb_length_test1 (id serial primary key, d jsonb);
>      insert into jsonb_length_test1
>      select i, jsonb_build_object('a', 1234567890)
>      from generate_series(1,1e6) t(i);
> 
>      create table jsonb_length_test2 (id serial primary key, d jsonb);
>      insert into jsonb_length_test2
>      select i, jsonb_build_object('a', '1234567890')
>      from generate_series(1,1e6) t(i);
> 
>      select 'json', pg_size_pretty(pg_total_relation_size('json_length_test1')) as json_int_size,
>                     pg_size_pretty(pg_total_relation_size('json_length_test2')) as json_text_size
>      union all
>      select 'jsonb', pg_size_pretty(pg_total_relation_size('jsonb_length_test1')) as json_int_size,
>                      pg_size_pretty(pg_total_relation_size('jsonb_length_test2')) as json_text_size
> 
> 
> Returns (Postgres 12, Windows 10)
> 
> ?column? | json_int_size | json_text_size
> ---------+---------------+---------------
> json     | 71 MB         | 71 MB
> jsonb    | 87 MB         | 79 MB
> 
> I am a bit surprised by this (not because the jsonb sizes are generally bigger, but that the string value takes less
space)
> 
> Is this caused by the fact that a string value compresses better internally?

Not sure if it applies here:

https://www.postgresql.org/docs/11/datatype-json.html

"When converting textual JSON input into jsonb, the primitive types 
described by RFC 7159 are effectively mapped onto native PostgreSQL 
types, as shown in Table 8.23. ..."

Table 8.23.
JSON primitive type PostgreSQL type  Notes
...
number                 numeric          NaN and infinity values are disallowed
...
> 
> 
> Thomas
> 
>    [1] https://www.slideshare.net/billkarwin/how-to-use-json-in-mysql-wrong
> 
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: JSON vs. JSONB storage size

From
Andrew Gierth
Date:
>>>>> "Thomas" == Thomas Kellerer <spam_eater@gmx.net> writes:

 Thomas> The table size with jsonb was bigger in general, but the one
 Thomas> with the "integer" value was even bigger than the one with the
 Thomas> "string" storage.

jsonb stores numeric values as "numeric", not as integers or floats, so
the storage needed will depend on the number of decimal digits.

The size results you're seeing are mainly the consequence of the fact
that jsonb stores the whole Numeric datum, varlena header included (and
without packing the header), so there's an extra 4 bytes you might not
have accounted for: 1234567890 is three numeric "digits" (2 bytes each)
plus a 2 byte numeric header (for weight/scale/sign) plus the 4 byte
varlena header, for 12 bytes total, whereas "1234567890" takes only 10
(since the length is encoded in the jsonb value offsets). Furthermore,
there may be up to 3 padding bytes before the numeric value.

I think in your test, the extra 3 bytes is pushing the size of a single
row up to the next multiple of MAXALIGN, so you're getting slightly
fewer rows per page. I don't know what Windows is doing, but on my
system (freebsd amd64) I get 136 rows/page vs. 120 rows/page, which
would make a million rows take 57MB or 65MB. (Your use of
pg_total_relation_size is including the pkey index, which confuses the
results a bit.)

-- 
Andrew (irc:RhodiumToad)