Re: jsonb format is pessimal for toast compression - Mailing list pgsql-hackers

From Arthur Silva
Subject Re: jsonb format is pessimal for toast compression
Date
Msg-id CAO_YK0VzGrtndbQiUL67-aRxD0t2-YsC9KREgZmJCOJAz6AKhg@mail.gmail.com
Whole thread Raw
In response to Re: jsonb format is pessimal for toast compression  (Josh Berkus <josh@agliodbs.com>)
Responses Re: jsonb format is pessimal for toast compression
List pgsql-hackers

On Thu, Sep 11, 2014 at 10:01 PM, Josh Berkus <josh@agliodbs.com> wrote:
So, I finally got time to test Tom's latest patch on this.

TLDR: we want to go with Tom's latest patch and release beta3.

Figures:

So I tested HEAD against the latest lengths patch.  Per Arthur Silva, I
checked uncompressed times for JSONB against compressed times.  This
changed the picture considerably.

TABLE SIZES
-----------

HEAD

      ?column?       | pg_size_pretty
---------------------+----------------
 json text format    | 393 MB
 jsonb: compressed   | 1147 MB
 jsonb: uncompressed | 1221 MB

PATCHED

      ?column?       | pg_size_pretty
---------------------+----------------
 json text format    | 394 MB
 jsonb: compressed   | 525 MB
 jsonb: uncompressed | 1200 MB


EXTRACTION TIMES
----------------

HEAD

Q1 (search via GIN index followed by extracting 100,000 values from rows):

jsonb compressed: 4000
jsonb uncompressed: 3250


Q2 (seq scan and extract 200,000 values from rows):

json: 11700
jsonb compressed: 3150
jsonb uncompressed: 2700


PATCHED

Q1:

jsonb compressed: 6750
jsonb uncompressed: 3350

Q2:

json: 11796
jsonb compressed: 4700
jsonb uncompressed: 2650

----------------------

Conclusion: with Tom's patch, compressed JSONB is 55% smaller when
compressed (EXTENDED).  Extraction times are 50% to 70% slower, but this
appears to be almost entirely due to decompression overhead.  When not
compressing (EXTERNAL), extraction times for patch versions are
statistically the same as HEAD, and file sizes are similar to HEAD.

USER REACTION
-------------

I polled at both PDXpgDay and at FOSS4G, asking some ~~ 80 Postgres
users how they would feel about a compression vs. extraction time
tradeoff.  The audience was evenly split.

However, with the current patch, the user can choose.  Users who know
enough for performance tuning can set JSONB columns to EXTERNAL, and the
the same performance as the unpatched version.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


The compression ratio difference is exaggerated in this case but it does support that Tom's patch alleviates the extraction penalty.

In my testings with the github archive data the savings <-> performance-penalty was fine, but I'm not confident in those results since there were only 8 top level keys.
For comparison, some twitter api objects[1] have 30+ top level keys. If I have time in the next couple of days I'll conduct some testings with the public twitter fire-hose data.

[1] https://dev.twitter.com/rest/reference/get/statuses/home_timeline

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: vacuumdb --all --analyze-in-stages - wrong order?
Next
From: Alvaro Herrera
Date:
Subject: Re: pg_dump refactor patch to remove global variables