Re: jsonb format is pessimal for toast compression - Mailing list pgsql-hackers

From Marti Raudsepp
Subject Re: jsonb format is pessimal for toast compression
Date
Msg-id CABRT9RDKfOF7+8gonQggcPXSvu8TwXOTGJKvV4=u=SHBq8Dspg@mail.gmail.com
Whole thread Raw
In response to Re: jsonb format is pessimal for toast compression  (Hannu Krosing <hannu@2ndQuadrant.com>)
List pgsql-hackers
On Fri, Aug 8, 2014 at 10:50 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote:
> How hard and how expensive would it be to teach pg_lzcompress to
> apply a delta filter on suitable data ?
>
> So that instead of integers their deltas will be fed to the "real"
> compressor

Has anyone given this more thought? I know this might not be 9.4
material, but to me it sounds like the most promising approach, if
it's workable. This isn't a made up thing, the 7z and LZMA formats
also have an optional delta filter.

Of course with JSONB the problem is figuring out which parts to apply
the delta filter to, and which parts not.

This would also help with integer arrays, containing for example
foreign key values to a serial column. There's bound to be some
redundancy, as nearby serial values are likely to end up close
together. In one of my past projects we used to store large arrays of
integer fkeys, deliberately sorted for duplicate elimination.

For an ideal case comparison, intar2 could be as large as intar1 when
compressed with a 4-byte wide delta filter:

create table intar1 as select array(select 1::int from
generate_series(1,1000000)) a;
create table intar2 as select array(select generate_series(1,1000000)::int) a;

In PostgreSQL 9.3 the sizes are:
select pg_column_size(a) from intar1;         45810
select pg_column_size(a) from intar2;       4000020

So a factor of 87 difference.

Regards,
Marti



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: SSL regression test suite
Next
From: Heikki Linnakangas
Date:
Subject: Re: SSL regression test suite