Home > mailing lists

Re: jsonb format is pessimal for toast compression - Mailing list pgsql-hackers

From	Claudio Freire
Subject	Re: jsonb format is pessimal for toast compression
Date	August 13, 2014 20:41:54
Msg-id	CAGTBQpZb4q0Bg2eL+SYfoQjgpSHriq6aA8Sv2v0zeU3JSW+JBg@mail.gmail.com Whole thread Raw
In response to	Re: jsonb format is pessimal for toast compression (Bruce Momjian <bruce@momjian.us>)
List	pgsql-hackers

Tree view

On Tue, Aug 12, 2014 at 8:00 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Mon, Aug 11, 2014 at 01:44:05PM -0700, Peter Geoghegan wrote:
>> On Mon, Aug 11, 2014 at 1:01 PM, Stephen Frost <sfrost@snowman.net> wrote:
>> > We've got a clear example of someone, quite reasonably, expecting their
>> > JSONB object to be compressed using the normal TOAST mechanism, and
>> > we're failing to do that in cases where it's actually a win to do so.
>> > That's the focus of this discussion and what needs to be addressed
>> > before 9.4 goes out.
>>
>> Sure. I'm not trying to minimize that. We should fix it, certainly.
>> However, it does bear considering that JSON data, with each document
>> stored in a row is not an effective target for TOAST compression in
>> general, even as text.
>
> Seems we have two issues:
>
> 1)  the header makes testing for compression likely to fail
> 2)  use of pointers rather than offsets reduces compression potential

I do think the best solution for 2 is what's been proposed already, to
do delta-coding of the pointers in chunks (ie, 1 pointer, 15 deltas,
repeat).

But it does make binary search quite more complex.

Alternatively, it could be somewhat compressed as follows:

Segment = 1 pointer head, 15 deltas
Pointer head = pointers[0]
delta[i] = pointers[i] - pointers[0] for i in 1..15

(delta to segment head, not previous value)

Now, you can have 4 types of segments. 8, 16, 32, 64 bits, which is
the size of the deltas. You achieve between 8x and 1x compression, and
even when 1x (no compression), you make it easier for pglz to find
something compressible.

Accessing it is also simple, if you have a segment index (tough part here).

Replace the 15 for something that makes such segment index very compact ;)

pgsql-hackers by date:

From: Claudio Freire
Date: 13 August 2014, 20:33:46
Subject: Re: Proposal: Incremental Backup

From: Tomas Vondra
Date: 13 August 2014, 21:07:18
Subject: Re: 9.5: Memory-bounded HashAgg

Re: jsonb format is pessimal for toast compression - Mailing list pgsql-hackers

Previous

Next