Home > mailing lists

Re: jsonb format is pessimal for toast compression - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: jsonb format is pessimal for toast compression
Date	August 9, 2014 05:41:54
Msg-id	CAA4eK1+SFKLEB8psumNxV_M5yh9kaeZek_=qetA6TpLLkwgN6g@mail.gmail.com Whole thread Raw
In response to	Re: jsonb format is pessimal for toast compression (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: jsonb format is pessimal for toast compression
List	pgsql-hackers

Tree view

On Sat, Aug 9, 2014 at 6:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Stephen Frost <sfrost@snowman.net> writes:
> > What about considering how large the object is when we are analyzing if
> > it compresses well overall?
>
> Hmm, yeah, that's a possibility: we could redefine the limit at which
> we bail out in terms of a fraction of the object size instead of a fixed
> limit. However, that risks expending a large amount of work before we
> bail, if we have a very large incompressible object --- which is not
> exactly an unlikely case. Consider for example JPEG images stored as
> bytea, which I believe I've heard of people doing. Another issue is
> that it's not real clear that that fixes the problem for any fractional
> size we'd want to use. In Larry's example of a jsonb value that fails
> to compress, the header size is 940 bytes out of about 12K, so we'd be
> needing to trial-compress about 10% of the object before we reach
> compressible data --- and I doubt his example is worst-case.
>
> >> 1. The real problem here is that jsonb is emitting quite a bit of
> >> fundamentally-nonrepetitive data, even when the user-visible input is very
> >> repetitive. That's a compression-unfriendly transformation by anyone's
> >> measure.
>
> > I disagree that another algorithm wouldn't be able to manage better on
> > this data than pglz. pglz, from my experience, is notoriously bad a
> > certain data sets which other algorithms are not as poorly impacted by.
>
> Well, I used to be considered a compression expert, and I'm going to
> disagree with you here. It's surely possible that other algorithms would
> be able to get some traction where pglz fails to get any,

During my previous work in this area, I had seen that some algorithms

use skipping logic which can be useful for incompressible data followed

by compressible data or in general as well. One of the technique could

be If we don't find any match for first 4 bytes, then skip 4 bytes

and if we don't find match again for next 8 bytes, then skip 8
bytes and keep on doing the same until we find first match in which

case it would go back to beginning of data. Now here we could follow

this logic until we actually compare total of first_success_by bytes.

There can be caveats in this particular scheme of skipping but I

just wanted to mention in general about the skipping idea to reduce

the number of situations where we will bail out even though there is

lot of compressible data.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Michael Paquier
Date: 09 August 2014, 05:12:17
Subject: Re: 9.4 pg_restore --help changes

From: Bruce Momjian
Date: 09 August 2014, 05:47:44
Subject: Re: jsonb format is pessimal for toast compression

Re: jsonb format is pessimal for toast compression - Mailing list pgsql-hackers

Previous

Next