Home > mailing lists

Re: jsonb format is pessimal for toast compression - Mailing list pgsql-hackers

From	Ashutosh Bapat
Subject	Re: jsonb format is pessimal for toast compression
Date	August 8, 2014 09:27:56
Msg-id	CAFjFpRfRpEKKUWNaUYxxQfPSjpFC2OM8pH1z7-6=H3-0O=jNzg@mail.gmail.com Whole thread Raw
In response to	Re: jsonb format is pessimal for toast compression (Stephen Frost <sfrost@snowman.net>)
List	pgsql-hackers

Tree view

On Fri, Aug 8, 2014 at 10:48 AM, Stephen Frost <sfrost@snowman.net> wrote:

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> I looked into the issue reported in bug #11109. The problem appears to be
> that jsonb's on-disk format is designed in such a way that the leading
> portion of any JSON array or object will be fairly incompressible, because
> it consists mostly of a strictly-increasing series of integer offsets.
> This interacts poorly with the code in pglz_compress() that gives up if
> it's found nothing compressible in the first first_success_by bytes of a
> value-to-be-compressed. (first_success_by is 1024 in the default set of
> compression parameters.)

I haven't looked at this in any detail, so take this with a grain of
salt, but what about teaching pglz_compress about using an offset
farther into the data, if the incoming data is quite a bit larger than
1k? This is just a test to see if it's worthwhile to keep going, no? I
wonder if this might even be able to be provided as a type-specific
option, to avoid changing the behavior for types other than jsonb in
this regard.

+1 for offset. Or sample the data in the beginning, middle and end. Obviously one could always come up with worst case, but.

(I'm imaginging a boolean saying "pick a random sample", or perhaps a
function which can be called that'll return "here's where you wanna test
if this thing is gonna compress at all")

I'm rather disinclined to change the on-disk format because of this
specific test, that feels a bit like the tail wagging the dog to me,
especially as I do hope that some day we'll figure out a way to use a
better compression algorithm than pglz.

Thanks,

Stephen

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

pgsql-hackers by date:

From: Ashutosh Bapat
Date: 08 August 2014, 09:23:23
Subject: Re: Introducing coarse grain parallelism by postgres_fdw.

From: Benedikt Grundmann
Date: 08 August 2014, 09:57:39
Subject: Re: Proposal: Incremental Backup

Re: jsonb format is pessimal for toast compression - Mailing list pgsql-hackers

Previous

Next