Thread: JSON vs. JSONB storage size
I recently stumbled over the presentation "How to Use JSON in MySQL Wrong" by Bill Karwin[1] While most of the indexing part simply doesn't apply to Postgres, I was curious about the statement that the data type ofa json value inside the json matters as well (Slide 56) Apparently in MySQL storing {"a": 123456789} takes less space than {"a": '123456789'} So I tested that with Postgres both using json and jsonb - my expectation was, that this would be similar in Postgres aswell. However, it turned out that for a json column there was no difference at all (both versions would show up the same with pg_total_relation_size()) The table size with jsonb was bigger in general, but the one with the "integer" value was even bigger than the one with the"string" storage. The following little test script: create table json_length_test1 (id serial primary key, d json); insert into json_length_test1 select i, jsonb_build_object('a', 1234567890) from generate_series(1,1e6) t(i); create table json_length_test2 (id serial primary key, d json); insert into json_length_test2 select i, jsonb_build_object('a', '1234567890') from generate_series(1,1e6) t(i); create table jsonb_length_test1 (id serial primary key, d jsonb); insert into jsonb_length_test1 select i, jsonb_build_object('a', 1234567890) from generate_series(1,1e6) t(i); create table jsonb_length_test2 (id serial primary key, d jsonb); insert into jsonb_length_test2 select i, jsonb_build_object('a', '1234567890') from generate_series(1,1e6) t(i); select 'json', pg_size_pretty(pg_total_relation_size('json_length_test1')) as json_int_size, pg_size_pretty(pg_total_relation_size('json_length_test2')) as json_text_size union all select 'jsonb', pg_size_pretty(pg_total_relation_size('jsonb_length_test1')) as json_int_size, pg_size_pretty(pg_total_relation_size('jsonb_length_test2')) as json_text_size Returns (Postgres 12, Windows 10) ?column? | json_int_size | json_text_size ---------+---------------+--------------- json | 71 MB | 71 MB jsonb | 87 MB | 79 MB I am a bit surprised by this (not because the jsonb sizes are generally bigger, but that the string value takes less space) Is this caused by the fact that a string value compresses better internally? Thomas [1] https://www.slideshare.net/billkarwin/how-to-use-json-in-mysql-wrong
> On Fri, Oct 11, 2019 at 1:40 PM Thomas Kellerer <spam_eater@gmx.net> wrote: > > I am a bit surprised by this (not because the jsonb sizes are generally > bigger, but that the string value takes less space) > > Is this caused by the fact that a string value compresses better internally? Those jsonb objects are quite small, so it could be that an alignment kicks in, since as far as I remember, jsonb header and data should be aligned by 4 byte boundary.
On 10/11/19 4:40 AM, Thomas Kellerer wrote: > I recently stumbled over the presentation "How to Use JSON in MySQL Wrong" by Bill Karwin[1] > > While most of the indexing part simply doesn't apply to Postgres, I was curious about the statement that the data typeof a json value inside the json matters as well (Slide 56) > > Apparently in MySQL storing {"a": 123456789} takes less space than {"a": '123456789'} > > So I tested that with Postgres both using json and jsonb - my expectation was, that this would be similar in Postgres aswell. > > However, it turned out that for a json column there was no difference at all (both versions would show up the same withpg_total_relation_size()) > > The table size with jsonb was bigger in general, but the one with the "integer" value was even bigger than the one withthe "string" storage. > > The following little test script: > > create table json_length_test1 (id serial primary key, d json); > insert into json_length_test1 > select i, jsonb_build_object('a', 1234567890) > from generate_series(1,1e6) t(i); > > create table json_length_test2 (id serial primary key, d json); > insert into json_length_test2 > select i, jsonb_build_object('a', '1234567890') > from generate_series(1,1e6) t(i); > > create table jsonb_length_test1 (id serial primary key, d jsonb); > insert into jsonb_length_test1 > select i, jsonb_build_object('a', 1234567890) > from generate_series(1,1e6) t(i); > > create table jsonb_length_test2 (id serial primary key, d jsonb); > insert into jsonb_length_test2 > select i, jsonb_build_object('a', '1234567890') > from generate_series(1,1e6) t(i); > > select 'json', pg_size_pretty(pg_total_relation_size('json_length_test1')) as json_int_size, > pg_size_pretty(pg_total_relation_size('json_length_test2')) as json_text_size > union all > select 'jsonb', pg_size_pretty(pg_total_relation_size('jsonb_length_test1')) as json_int_size, > pg_size_pretty(pg_total_relation_size('jsonb_length_test2')) as json_text_size > > > Returns (Postgres 12, Windows 10) > > ?column? | json_int_size | json_text_size > ---------+---------------+--------------- > json | 71 MB | 71 MB > jsonb | 87 MB | 79 MB > > I am a bit surprised by this (not because the jsonb sizes are generally bigger, but that the string value takes less space) > > Is this caused by the fact that a string value compresses better internally? Not sure if it applies here: https://www.postgresql.org/docs/11/datatype-json.html "When converting textual JSON input into jsonb, the primitive types described by RFC 7159 are effectively mapped onto native PostgreSQL types, as shown in Table 8.23. ..." Table 8.23. JSON primitive type PostgreSQL type Notes ... number numeric NaN and infinity values are disallowed ... > > > Thomas > > [1] https://www.slideshare.net/billkarwin/how-to-use-json-in-mysql-wrong > > > -- Adrian Klaver adrian.klaver@aklaver.com
>>>>> "Thomas" == Thomas Kellerer <spam_eater@gmx.net> writes: Thomas> The table size with jsonb was bigger in general, but the one Thomas> with the "integer" value was even bigger than the one with the Thomas> "string" storage. jsonb stores numeric values as "numeric", not as integers or floats, so the storage needed will depend on the number of decimal digits. The size results you're seeing are mainly the consequence of the fact that jsonb stores the whole Numeric datum, varlena header included (and without packing the header), so there's an extra 4 bytes you might not have accounted for: 1234567890 is three numeric "digits" (2 bytes each) plus a 2 byte numeric header (for weight/scale/sign) plus the 4 byte varlena header, for 12 bytes total, whereas "1234567890" takes only 10 (since the length is encoded in the jsonb value offsets). Furthermore, there may be up to 3 padding bytes before the numeric value. I think in your test, the extra 3 bytes is pushing the size of a single row up to the next multiple of MAXALIGN, so you're getting slightly fewer rows per page. I don't know what Windows is doing, but on my system (freebsd amd64) I get 136 rows/page vs. 120 rows/page, which would make a million rows take 57MB or 65MB. (Your use of pg_total_relation_size is including the pkey index, which confuses the results a bit.) -- Andrew (irc:RhodiumToad)