Thread: Why is writing JSONB faster than just JSON?

Why is writing JSONB faster than just JSON?

From

Mitar

Date:

14 April 2021, 06:38:04

Hi!

I have a project where we among other data want to store static JSON
objects which can get pretty large (10-100 KB). I was trying to
evaluate how it would work if we simply store it as an additional
column in a PostgreSQL database. So I made a benchmark [1]. The
results surprised me a bit and I am writing here because I would like
to understand them. Namely, it looks like writing into a jsonb typed
column is 30% faster than writing into a json typed column. Why is
that? Does not jsonb require parsing of JSON and conversion? That
should be slower than just storing a blob as-is?

[1] https://gitlab.com/mitar/benchmark-pg-json


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m

Re: Why is writing JSONB faster than just JSON?

From

Dmitry Dolgov

Date:

14 April 2021, 08:48:45

> On Tue, Apr 13, 2021 at 11:38:04PM -0700, Mitar wrote:
> Hi!
>
> I have a project where we among other data want to store static JSON
> objects which can get pretty large (10-100 KB). I was trying to
> evaluate how it would work if we simply store it as an additional
> column in a PostgreSQL database. So I made a benchmark [1]. The
> results surprised me a bit and I am writing here because I would like
> to understand them. Namely, it looks like writing into a jsonb typed
> column is 30% faster than writing into a json typed column. Why is
> that? Does not jsonb require parsing of JSON and conversion? That
> should be slower than just storing a blob as-is?
>
> [1] https://gitlab.com/mitar/benchmark-pg-json

Interesting. I've tried to reproduce the schema and queries from the repository
above (with a different generated data though and without the app itself) and
indeed got a bit slower (not 30%, but visibly) execution for json column
instead of jsonb. There are couple of important points:

* Parsing is happening in both cases, for json it's done for validation
  purposes.

* Compression of json data is actually dominates the runtime load for large
  json objects, making other factors less visible and reducing difference in
  size (jsonb also should be a bit bigger, that's why I would think it would be
  slower).

* At the same time perf shows that json compression spends a bit more time in
  pglz_find_match (mostly around accessing and checking history entries), which
  is compression optimization via history table. So probably due to differences
  in data layout this optimization works slightly worse for plain json than for
  jsonb?

Re: Why is writing JSONB faster than just JSON?

From

Tom Lane

Date:

14 April 2021, 14:26:23

Dmitry Dolgov <9erthalion6@gmail.com> writes:
>> On Tue, Apr 13, 2021 at 11:38:04PM -0700, Mitar wrote:
>> ... Namely, it looks like writing into a jsonb typed
>> column is 30% faster than writing into a json typed column. Why is
>> that? Does not jsonb require parsing of JSON and conversion? That
>> should be slower than just storing a blob as-is?

> * Parsing is happening in both cases, for json it's done for validation
>   purposes.

Right.

> * Compression of json data is actually dominates the runtime load for large
>   json objects, making other factors less visible and reducing difference in
>   size (jsonb also should be a bit bigger, that's why I would think it would be
>   slower).
> * At the same time perf shows that json compression spends a bit more time in
>   pglz_find_match (mostly around accessing and checking history entries), which
>   is compression optimization via history table. So probably due to differences
>   in data layout this optimization works slightly worse for plain json than for
>   jsonb?

Interesting.  I recall that we made some last-minute changes in the JSONB
physical representation after finding that the original choices resulted
in sucky compression behavior.  But I think we were only looking at the
resultant compressed size, not time-to-compress.

My own guess about this, without having tried to reproduce it, is that
JSONB might end up physically smaller than JSON, resulting in less work
to push the toasted datum out to disk.  This'd depend a lot on your
formatting habits for JSON, of course.  But in any case, it'd be worth
comparing pg_column_size() results to see what's up with that.

            regards, tom lane

Re: Why is writing JSONB faster than just JSON?

From

Dmitry Dolgov

Date:

15 April 2021, 15:43:11

> On Wed, Apr 14, 2021 at 10:26:23AM -0400, Tom Lane wrote:
> Dmitry Dolgov <9erthalion6@gmail.com> writes:
> >> On Tue, Apr 13, 2021 at 11:38:04PM -0700, Mitar wrote:
> >> ... Namely, it looks like writing into a jsonb typed
> >> column is 30% faster than writing into a json typed column. Why is
> >> that? Does not jsonb require parsing of JSON and conversion? That
> >> should be slower than just storing a blob as-is?
>
> > * Parsing is happening in both cases, for json it's done for validation
> >   purposes.
>
> Right.
>
> > * Compression of json data is actually dominates the runtime load for large
> >   json objects, making other factors less visible and reducing difference in
> >   size (jsonb also should be a bit bigger, that's why I would think it would be
> >   slower).
> > * At the same time perf shows that json compression spends a bit more time in
> >   pglz_find_match (mostly around accessing and checking history entries), which
> >   is compression optimization via history table. So probably due to differences
> >   in data layout this optimization works slightly worse for plain json than for
> >   jsonb?
>
> Interesting.  I recall that we made some last-minute changes in the JSONB
> physical representation after finding that the original choices resulted
> in sucky compression behavior.  But I think we were only looking at the
> resultant compressed size, not time-to-compress.
>
> My own guess about this, without having tried to reproduce it, is that
> JSONB might end up physically smaller than JSON, resulting in less work
> to push the toasted datum out to disk.  This'd depend a lot on your
> formatting habits for JSON, of course.  But in any case, it'd be worth
> comparing pg_column_size() results to see what's up with that.

Oh, of course I've missed that the input I was using was indeed
formatted, without formatting both cases perform equally well and I
can't reproduce the issue. Although if I understand correctly the
original code in question doesn't actually do any formatting.

Re: Why is writing JSONB faster than just JSON?

From

Tom Lane

Date:

15 April 2021, 16:47:25

Dmitry Dolgov <9erthalion6@gmail.com> writes:
> On Wed, Apr 14, 2021 at 10:26:23AM -0400, Tom Lane wrote:
>> My own guess about this, without having tried to reproduce it, is that
>> JSONB might end up physically smaller than JSON, resulting in less work
>> to push the toasted datum out to disk.  This'd depend a lot on your
>> formatting habits for JSON, of course.  But in any case, it'd be worth
>> comparing pg_column_size() results to see what's up with that.

> Oh, of course I've missed that the input I was using was indeed
> formatted, without formatting both cases perform equally well and I
> can't reproduce the issue. Although if I understand correctly the
> original code in question doesn't actually do any formatting.

My point was that for JSON, after validating that the input is
syntactically correct, we just store it as-received.  So in particular
the amount of whitespace in the value would depend on how the client
had chosen to format the JSON.  This'd affect the stored size of
course, and I think it would have an effect on compression time too.

            regards, tom lane

Re: Why is writing JSONB faster than just JSON?

From

Dmitry Dolgov

Date:

15 April 2021, 19:11:40

> On Thu, Apr 15, 2021 at 12:47:25PM -0400, Tom Lane wrote:
> Dmitry Dolgov <9erthalion6@gmail.com> writes:
> > On Wed, Apr 14, 2021 at 10:26:23AM -0400, Tom Lane wrote:
> >> My own guess about this, without having tried to reproduce it, is that
> >> JSONB might end up physically smaller than JSON, resulting in less work
> >> to push the toasted datum out to disk.  This'd depend a lot on your
> >> formatting habits for JSON, of course.  But in any case, it'd be worth
> >> comparing pg_column_size() results to see what's up with that.
>
> > Oh, of course I've missed that the input I was using was indeed
> > formatted, without formatting both cases perform equally well and I
> > can't reproduce the issue. Although if I understand correctly the
> > original code in question doesn't actually do any formatting.
>
> My point was that for JSON, after validating that the input is
> syntactically correct, we just store it as-received.  So in particular
> the amount of whitespace in the value would depend on how the client
> had chosen to format the JSON.  This'd affect the stored size of
> course, and I think it would have an effect on compression time too.

Yes, I got it and just wanted to confirm you were right - this was the
reason I've observed slowdown trying to reproduce the report.

Re: Why is writing JSONB faster than just JSON?

From

Mitar

Date:

23 April 2021, 08:56:57

Hi!

On Thu, Apr 15, 2021 at 12:11 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> > My point was that for JSON, after validating that the input is
> > syntactically correct, we just store it as-received.  So in particular
> > the amount of whitespace in the value would depend on how the client
> > had chosen to format the JSON.  This'd affect the stored size of
> > course, and I think it would have an effect on compression time too.
>
> Yes, I got it and just wanted to confirm you were right - this was the
> reason I've observed slowdown trying to reproduce the report.

Thank you for trying to reproduce the report. I did a bit more digging
myself and I am still confused.

First, it is important to note that the JSON I am using contains
primarily random strings as values, so not really something which is
easy to compress. See example at [1]. I have realized though that in
the previous benchmark I have been using the same JSON document and
inserting it multiple times, so compression might work across
documents or something. So I ran a version of the benchmark with
different JSONs being inserted (but with the same structure, just
values are random strings). There was no difference.

Second, as you see from [1], the JSON representation I am using is
really compact and has no extra spaces. I also used
pg_total_relation_size to get the size of the table after inserting
10k rows and the numbers are similar, with JSONB being slightly larger
than others. So I think the idea of compression does not hold.

So I do not know what is happening and why you cannot reproduce it.
Maybe explain a bit how you are trying to reproduce it? Directly from
psql console? Are you using the same version as me (13.2)?

Numbers with inserting the same large JSON 10k times:

Type: jsonb
Mean: 200243.1
Stddev: 1679.7741187433503
Size: { pg_total_relation_size: '4611792896' }
Type: json
Mean: 256938.5
Stddev: 2471.9909890612466
Size: { pg_total_relation_size: '4597833728' }
Type: text
Mean: 248175.3
Stddev: 376.677594236769
Size: { pg_total_relation_size: '4597833728' }

Inserting different JSON 10k times:

Type: jsonb
Mean: 202794.5
Stddev: 978.5346442512907
Size: { pg_total_relation_size: '4611792896' }
Type: json
Mean: 259437.9
Stddev: 1785.8411155531167
Size: { pg_total_relation_size: '4597833728' }
Type: text
Mean: 250060.5
Stddev: 912.9207249263213
Size: { pg_total_relation_size: '4597833728' }

[1] https://gitlab.com/mitar/benchmark-pg-json/-/blob/master/example.json

Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m

Re: Why is writing JSONB faster than just JSON?

From

Francisco Olarte

Date:

23 April 2021, 17:28:00

Just a note:

On Fri, Apr 23, 2021 at 10:57 AM Mitar <mmitar@gmail.com> wrote:
> First, it is important to note that the JSON I am using contains
> primarily random strings as values, so not really something which is
> easy to compress. See example at [1].

A fast look at the link. It seems to be long string of random LOWER
CASE HEX DIGITS. A simple huffman coder can probably put it in 5 bits
per char, and a more sophisticated algorithm can probably approach 4.

FOS.

Re: Why is writing JSONB faster than just JSON?

From

Mitar

Date:

23 April 2021, 17:33:34

Hi!

On Fri, Apr 23, 2021 at 10:28 AM Francisco Olarte
<folarte@peoplecall.com> wrote:
> A fast look at the link. It seems to be long string of random LOWER
> CASE HEX DIGITS. A simple huffman coder can probably put it in 5 bits
> per char, and a more sophisticated algorithm can probably approach 4.

But this compression-ility would apply to both JSONB and JSON column
types, no? Moreover, it looks like JSONB column type ends up larger on
disk.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m

Re: Why is writing JSONB faster than just JSON?

From

Francisco Olarte

Date:

23 April 2021, 17:48:41

Mitar:

On Fri, Apr 23, 2021 at 7:33 PM Mitar <mmitar@gmail.com> wrote:
> On Fri, Apr 23, 2021 at 10:28 AM Francisco Olarte
> <folarte@peoplecall.com> wrote:
> > A fast look at the link. It seems to be long string of random LOWER
> > CASE HEX DIGITS. A simple huffman coder can probably put it in 5 bits
> > per char, and a more sophisticated algorithm can probably approach 4.
> But this compression-ility would apply to both JSONB and JSON column
> types, no? Moreover, it looks like JSONB column type ends up larger on
> disk.

Of course, I did not follow the thread to deeply, just pointed that in
case you were assuming that was not going to be stored compressed.

Also, not surprised JSONB ends up being fatter, when no binary data is
transmitted and no compression is used compact JSON has very little
redundancy, about 6 chars per k/v pair, this is hard to beat without
using sophisticated encodings, especially with long data. And if jsonb
starts up bigger,  and IIRC it does trade off compactness to be easily
indexable an other stuff, it will normally end up bigger. But not an
expert on this.

FOS.

Re: Why is writing JSONB faster than just JSON?

From

Mitar

Date:

23 April 2021, 19:04:00

Hi!

On Fri, Apr 23, 2021 at 10:49 AM Francisco Olarte
<folarte@peoplecall.com> wrote:
> Of course, I did not follow the thread to deeply, just pointed that in
> case you were assuming that was not going to be stored compressed.

Thanks for pointing that out. I was just trying to make sure I am
understanding you correctly and that we are all on the same page about
implications. It seems we are.

> Also, not surprised JSONB ends up being fatter,

Yes, by itself this is not surprising. Why I mentioned it is because
in my original post in this thread, I posted that I am surprised that
inserting into JSONB column seems observably faster than into JSON or
TEXT column (for same data) and I wonder why that is. One theory
presented was that JSONB might compress better so there is less IO so
insertion is faster. But JSONB does not look more compressed (I have
not measured the size in my original benchmark), so now I am searching
for other explanations for the results of my benchmark.

Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m

Re: Why is writing JSONB faster than just JSON?

From

Dmitry Dolgov

Date:

02 May 2021, 12:50:47

> On Fri, Apr 23, 2021 at 01:56:57AM -0700, Mitar wrote:
> Hi!
> 
> On Thu, Apr 15, 2021 at 12:11 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> > > My point was that for JSON, after validating that the input is
> > > syntactically correct, we just store it as-received.  So in particular
> > > the amount of whitespace in the value would depend on how the client
> > > had chosen to format the JSON.  This'd affect the stored size of
> > > course, and I think it would have an effect on compression time too.
> >
> > Yes, I got it and just wanted to confirm you were right - this was the
> > reason I've observed slowdown trying to reproduce the report.
> 
> Thank you for trying to reproduce the report. I did a bit more digging
> myself and I am still confused.
>
> ...
>
> So I do not know what is happening and why you cannot reproduce it.

Could you maybe get a profile with perf for both cases? Since they're
executed within a single backend, you can profile only a single pid.
Having a reasonable profiling frequency, --call-graph dwarf and probably
limit events to only user space with precise tagging (cycles:uppp)
should give an impression what's going on.