Re: jsonb format is pessimal for toast compression - Mailing list pgsql-hackers

From Larry White
Subject Re: jsonb format is pessimal for toast compression
Date
Msg-id CAMdbzVi_KfSfyUHBt9Q4LNonmtJ47dVWdHDYwNx1vXcftLt_bQ@mail.gmail.com
Whole thread Raw
In response to Re: jsonb format is pessimal for toast compression  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
I was not complaining; I think JSONB is awesome. 

But I am one of those people who would like to put 100's of GB (or more) JSON files into Postgres and I am concerned about file size and possible future changes to the format.


On Fri, Aug 8, 2014 at 7:10 PM, Peter Geoghegan <pg@heroku.com> wrote:
On Fri, Aug 8, 2014 at 12:06 PM, Josh Berkus <josh@agliodbs.com> wrote:
> One we ship 9.4, many users are going to load 100's of GB into JSONB
> fields.  Even if we fix the compressability issue in 9.5, those users
> won't be able to fix the compression without rewriting all their data,
> which could be prohibitive.  And we'll be  in a position where we have
> to support the 9.4 JSONB format/compression technique for years so that
> users aren't blocked from upgrading.

FWIW, if we take the delicious JSON data as representative, a table
storing that data as jsonb is 1374 MB in size. Whereas an equivalent
table with the data typed using the original json datatype (but with
white space differences more or less ignored, because it was created
using a jsonb -> json cast), the same data is 1352 MB.

Larry's complaint is valid; this is a real problem, and I'd like to
fix it before 9.4 is out. However, let us not lose sight of the fact
that JSON data is usually a poor target for TOAST compression. With
idiomatic usage, redundancy is very much more likely to appear across
rows, and not within individual Datums. Frankly, we aren't doing a
very good job there, and doing better requires an alternative
strategy.

--
Peter Geoghegan

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: jsonb format is pessimal for toast compression
Next
From: Peter Geoghegan
Date:
Subject: Re: B-Tree support function number 3 (strxfrm() optimization)