Re: jsonb format is pessimal for toast compression - Mailing list pgsql-hackers

From Arthur Silva
Subject Re: jsonb format is pessimal for toast compression
Date
Msg-id CAO_YK0VZRG6yjBgyrfQ7Rz=K7j-xq6HYcfVQZcTO3WNhXHCwSA@mail.gmail.com
Whole thread Raw
In response to Re: jsonb format is pessimal for toast compression  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: jsonb format is pessimal for toast compression
List pgsql-hackers
<p dir="ltr">I'm still getting up to speed on postgres development but I'd like to leave an opinion. <p dir="ltr">We
shouldadd some sort of versionning to the jsonb format. This can be explored in the future in many ways.<p dir="ltr">As
forthe current problem, we should explore the directory at the end option. It should improve compression and keep good
accessperformance. <p dir="ltr">A 4 byte header is sufficient to store the directory offset and some versionning
bits.<br/><div class="gmail_quote">Em 15/08/2014 17:39, "Tom Lane" <<a
href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>>escreveu:<br type="attribution" /><blockquote
class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Josh Berkus <<a
href="mailto:josh@agliodbs.com">josh@agliodbs.com</a>>writes:<br /> > On 08/14/2014 07:24 PM, Tom Lane wrote:<br
/>>> We can certainly reduce that.  The question was whether it would be<br /> >> worth the effort to try. 
Atthis point, with three different test<br /> >> data sets having shown clear space savings, I think it is
worth<br/> >> the effort.  I'll poke into it tomorrow or over the weekend, unless<br /> >> somebody beats
meto it.<br /><br /> > Note that I specifically created that data set to be a worst case: many<br /> > top-level
keys,no nesting, and small values.  However, I don't think<br /> > it's an unrealistic worst case.<br /><br /> >
Interestingly,even on the unpatched, 1GB table case, the *index* on the<br /> > JSONB is only 60MB.  Which shows
justhow terrific the improvement in<br /> > GIN index size/performance is.<br /><br /> I've been poking at this, and
Ithink the main explanation for your result<br /> is that with more JSONB documents being subject to compression,
we're<br/> spending more time in pglz_decompress.  There's no free lunch in that<br /> department: if you want
compressedstorage it's gonna cost ya to<br /> decompress.  The only way I can get decompression and TOAST access to
not<br/> dominate the profile on cases of this size is to ALTER COLUMN SET STORAGE<br /> PLAIN.  However, when I do
that,I do see my test patch running about 25%<br /> slower overall than HEAD on an "explain analyze select jfield ->
'key'<br/> from table" type of query with 200-key documents with narrow fields (see<br /> attached perl script that
generatesthe test data).<br /><br /> It seems difficult to improve much on that for this test case.  I put some<br />
logicinto findJsonbValueFromContainer to calculate the offset sums just<br /> once not once per binary-search
iteration,but that only improved matters<br /> 5% at best.  I still think it'd be worth modifying the JsonbIterator
code<br/> to avoid repetitive offset calculations, but that's not too relevant to<br /> this test case.<br /><br />
Havingsaid all that, I think this test is something of a contrived worst<br /> case.  More realistic cases are likely
tohave many fewer keys (so that<br /> speed of the binary search loop is less of an issue) or else to have total<br />
documentsizes large enough that inline PLAIN storage isn't an option,<br /> meaning that detoast+decompression costs
willdominate.<br /><br />                         regards, tom lane<br /><br /><br /><br /> --<br /> Sent via
pgsql-hackersmailing list (<a href="mailto:pgsql-hackers@postgresql.org">pgsql-hackers@postgresql.org</a>)<br /> To
makechanges to your subscription:<br /><a href="http://www.postgresql.org/mailpref/pgsql-hackers"
target="_blank">http://www.postgresql.org/mailpref/pgsql-hackers</a><br/><br /></blockquote></div> 

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Supporting Windows SChannel as OpenSSL replacement
Next
From: worthy7
Date:
Subject: GIST create index very very slow