Re: jsonb format is pessimal for toast compression - Mailing list pgsql-hackers

From Tom Lane
Subject Re: jsonb format is pessimal for toast compression
Date
Msg-id 29828.1407545120@sss.pgh.pa.us
Whole thread Raw
In response to Re: jsonb format is pessimal for toast compression  (Stephen Frost <sfrost@snowman.net>)
Responses Re: jsonb format is pessimal for toast compression
Re: jsonb format is pessimal for toast compression
Re: jsonb format is pessimal for toast compression
List pgsql-hackers
Stephen Frost <sfrost@snowman.net> writes:
> What about considering how large the object is when we are analyzing if
> it compresses well overall?

Hmm, yeah, that's a possibility: we could redefine the limit at which
we bail out in terms of a fraction of the object size instead of a fixed
limit.  However, that risks expending a large amount of work before we
bail, if we have a very large incompressible object --- which is not
exactly an unlikely case.  Consider for example JPEG images stored as
bytea, which I believe I've heard of people doing.  Another issue is
that it's not real clear that that fixes the problem for any fractional
size we'd want to use.  In Larry's example of a jsonb value that fails
to compress, the header size is 940 bytes out of about 12K, so we'd be
needing to trial-compress about 10% of the object before we reach
compressible data --- and I doubt his example is worst-case.

>> 1. The real problem here is that jsonb is emitting quite a bit of
>> fundamentally-nonrepetitive data, even when the user-visible input is very
>> repetitive.  That's a compression-unfriendly transformation by anyone's
>> measure.

> I disagree that another algorithm wouldn't be able to manage better on
> this data than pglz.  pglz, from my experience, is notoriously bad a
> certain data sets which other algorithms are not as poorly impacted by.

Well, I used to be considered a compression expert, and I'm going to
disagree with you here.  It's surely possible that other algorithms would
be able to get some traction where pglz fails to get any, but that doesn't
mean that presenting them with hard-to-compress data in the first place is
a good design decision.  There is no scenario in which data like this is
going to be friendly to a general-purpose compression algorithm.  It'd
be necessary to have explicit knowledge that the data consists of an
increasing series of four-byte integers to be able to do much with it.
And then such an assumption would break down once you got past the
header ...

> Perhaps another options would be a new storage type which basically says
> "just compress it, no matter what"?  We'd be able to make that the
> default for jsonb columns too, no?

Meh.  We could do that, but it would still require adding arguments to
toast_compress_datum() that aren't there now.  In any case, this is a
band-aid solution; and as Josh notes, once we ship 9.4 we are going to
be stuck with jsonb's on-disk representation pretty much forever.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: jsonb format is pessimal for toast compression
Next
From: Andrew Dunstan
Date:
Subject: Re: jsonb format is pessimal for toast compression