On Thu, Sep 11, 2014 at 10:01 PM, Josh Berkus <josh@agliodbs.com> wrote:
So, I finally got time to test Tom's latest patch on this.
TLDR: we want to go with Tom's latest patch and release beta3.
Figures:
So I tested HEAD against the latest lengths patch. Per Arthur Silva, I checked uncompressed times for JSONB against compressed times. This changed the picture considerably.
TABLE SIZES -----------
HEAD
?column? | pg_size_pretty ---------------------+---------------- json text format | 393 MB jsonb: compressed | 1147 MB jsonb: uncompressed | 1221 MB
PATCHED
?column? | pg_size_pretty ---------------------+---------------- json text format | 394 MB jsonb: compressed | 525 MB jsonb: uncompressed | 1200 MB
EXTRACTION TIMES ----------------
HEAD
Q1 (search via GIN index followed by extracting 100,000 values from rows):
jsonb compressed: 4000 jsonb uncompressed: 3250
Q2 (seq scan and extract 200,000 values from rows):
Conclusion: with Tom's patch, compressed JSONB is 55% smaller when compressed (EXTENDED). Extraction times are 50% to 70% slower, but this appears to be almost entirely due to decompression overhead. When not compressing (EXTERNAL), extraction times for patch versions are statistically the same as HEAD, and file sizes are similar to HEAD.
USER REACTION -------------
I polled at both PDXpgDay and at FOSS4G, asking some ~~ 80 Postgres users how they would feel about a compression vs. extraction time tradeoff. The audience was evenly split.
However, with the current patch, the user can choose. Users who know enough for performance tuning can set JSONB columns to EXTERNAL, and the the same performance as the unpatched version. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
The compression ratio difference is exaggerated in this case but it does support that Tom's patch alleviates the extraction penalty.
In my testings with the github archive data the savings <-> performance-penalty was fine, but I'm not confident in those results since there were only 8 top level keys. For comparison, some twitter api objects[1] have 30+ top level keys. If I have time in the next couple of days I'll conduct some testings with the public twitter fire-hose data.