Re: jsonb format is pessimal for toast compression - Mailing list pgsql-hackers

From Arthur Silva
Subject Re: jsonb format is pessimal for toast compression
Date
Msg-id CAO_YK0VZRG6yjBgyrfQ7Rz=K7j-xq6HYcfVQZcTO3WNhXHCwSA@mail.gmail.com
Whole thread
In response to Re: jsonb format is pessimal for toast compression  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: jsonb format is pessimal for toast compression
List pgsql-hackers

I'm still getting up to speed on postgres development but I'd like to leave an opinion.

We should add some sort of versionning to the jsonb format. This can be explored in the future in many ways.

As for the current problem, we should explore the directory at the end option. It should improve compression and keep good access performance.

A 4 byte header is sufficient to store the directory offset and some versionning bits.

Em 15/08/2014 17:39, "Tom Lane" <tgl@sss.pgh.pa.us> escreveu:
Josh Berkus <josh@agliodbs.com> writes:
> On 08/14/2014 07:24 PM, Tom Lane wrote:
>> We can certainly reduce that.  The question was whether it would be
>> worth the effort to try.  At this point, with three different test
>> data sets having shown clear space savings, I think it is worth
>> the effort.  I'll poke into it tomorrow or over the weekend, unless
>> somebody beats me to it.

> Note that I specifically created that data set to be a worst case: many
> top-level keys, no nesting, and small values.  However, I don't think
> it's an unrealistic worst case.

> Interestingly, even on the unpatched, 1GB table case, the *index* on the
> JSONB is only 60MB.  Which shows just how terrific the improvement in
> GIN index size/performance is.

I've been poking at this, and I think the main explanation for your result
is that with more JSONB documents being subject to compression, we're
spending more time in pglz_decompress.  There's no free lunch in that
department: if you want compressed storage it's gonna cost ya to
decompress.  The only way I can get decompression and TOAST access to not
dominate the profile on cases of this size is to ALTER COLUMN SET STORAGE
PLAIN.  However, when I do that, I do see my test patch running about 25%
slower overall than HEAD on an "explain analyze select jfield -> 'key'
from table" type of query with 200-key documents with narrow fields (see
attached perl script that generates the test data).

It seems difficult to improve much on that for this test case.  I put some
logic into findJsonbValueFromContainer to calculate the offset sums just
once not once per binary-search iteration, but that only improved matters
5% at best.  I still think it'd be worth modifying the JsonbIterator code
to avoid repetitive offset calculations, but that's not too relevant to
this test case.

Having said all that, I think this test is something of a contrived worst
case.  More realistic cases are likely to have many fewer keys (so that
speed of the binary search loop is less of an issue) or else to have total
document sizes large enough that inline PLAIN storage isn't an option,
meaning that detoast+decompression costs will dominate.

                        regards, tom lane



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Supporting Windows SChannel as OpenSSL replacement
Next
From: worthy7
Date:
Subject: GIST create index very very slow