Re: jsonb format is pessimal for toast compression - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: jsonb format is pessimal for toast compression
Date
Msg-id 53E4EE5F.5090904@dunslane.net
Whole thread Raw
In response to Re: jsonb format is pessimal for toast compression  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: jsonb format is pessimal for toast compression
List pgsql-hackers
On 08/08/2014 11:18 AM, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> On 08/07/2014 11:17 PM, Tom Lane wrote:
>>> I looked into the issue reported in bug #11109.  The problem appears to be
>>> that jsonb's on-disk format is designed in such a way that the leading
>>> portion of any JSON array or object will be fairly incompressible, because
>>> it consists mostly of a strictly-increasing series of integer offsets.
>
>> Back when this structure was first presented at pgCon 2013, I wondered
>> if we shouldn't extract the strings into a dictionary, because of key
>> repetition, and convinced myself that this shouldn't be necessary
>> because in significant cases TOAST would take care of it.
> That's not really the issue here, I think.  The problem is that a
> relatively minor aspect of the representation, namely the choice to store
> a series of offsets rather than a series of lengths, produces
> nonrepetitive data even when the original input is repetitive.


It would certainly be worth validating that changing this would fix the 
problem.

I don't know how invasive that would be - I suspect (without looking 
very closely) not terribly much.

> 2. Are we going to ship 9.4 without fixing this?  I definitely don't see
> replacing pg_lzcompress as being on the agenda for 9.4, whereas changing
> jsonb is still within the bounds of reason.
>
> Considering all the hype that's built up around jsonb, shipping a design
> with a fundamental performance handicap doesn't seem like a good plan
> to me.  We could perhaps band-aid around it by using different compression
> parameters for jsonb, although that would require some painful API changes
> since toast_compress_datum() doesn't know what datatype it's operating on.
>
>             


Yeah, it would be a bit painful, but after all finding out this sort of 
thing is why we have betas.


cheers

andrew



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: replication commands and log_statements
Next
From: David Rowley
Date:
Subject: Defining a foreign key with a duplicate column is broken