On Tue, Aug 12, 2014 at 8:00 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Mon, Aug 11, 2014 at 01:44:05PM -0700, Peter Geoghegan wrote:
>> On Mon, Aug 11, 2014 at 1:01 PM, Stephen Frost <sfrost@snowman.net> wrote:
>> > We've got a clear example of someone, quite reasonably, expecting their
>> > JSONB object to be compressed using the normal TOAST mechanism, and
>> > we're failing to do that in cases where it's actually a win to do so.
>> > That's the focus of this discussion and what needs to be addressed
>> > before 9.4 goes out.
>>
>> Sure. I'm not trying to minimize that. We should fix it, certainly.
>> However, it does bear considering that JSON data, with each document
>> stored in a row is not an effective target for TOAST compression in
>> general, even as text.
>
> Seems we have two issues:
>
> 1) the header makes testing for compression likely to fail
> 2) use of pointers rather than offsets reduces compression potential
I do think the best solution for 2 is what's been proposed already, to
do delta-coding of the pointers in chunks (ie, 1 pointer, 15 deltas,
repeat).
But it does make binary search quite more complex.
Alternatively, it could be somewhat compressed as follows:
Segment = 1 pointer head, 15 deltas
Pointer head = pointers[0]
delta[i] = pointers[i] - pointers[0] for i in 1..15
(delta to segment head, not previous value)
Now, you can have 4 types of segments. 8, 16, 32, 64 bits, which is
the size of the deltas. You achieve between 8x and 1x compression, and
even when 1x (no compression), you make it easier for pglz to find
something compressible.
Accessing it is also simple, if you have a segment index (tough part here).
Replace the 15 for something that makes such segment index very compact ;)