Re: jsonb format is pessimal for toast compression - Mailing list pgsql-hackers

From Larry White
Subject Re: jsonb format is pessimal for toast compression
Date
Msg-id CAMdbzVgYBGtOaGQpo8tU6PC7zL5T2gqT6kme40BHpzEi3uWyOQ@mail.gmail.com
Whole thread Raw
In response to jsonb format is pessimal for toast compression  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
<div dir="ltr">Apologies if this is a ridiculous suggestion, but I believe that swapping out the compression algorithm
(forSnappy, for example) has been discussed in the past. I wonder if that algorithm is sufficiently different that it
wouldproduce a better result, and if that might not be preferable to some of the other options. </div><div
class="gmail_extra"><br/><br /><div class="gmail_quote">On Thu, Aug 7, 2014 at 11:17 PM, Tom Lane <span
dir="ltr"><<ahref="mailto:tgl@sss.pgh.pa.us" target="_blank">tgl@sss.pgh.pa.us</a>></span> wrote:<br
/><blockquoteclass="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> I looked into
theissue reported in bug #11109.  The problem appears to be<br /> that jsonb's on-disk format is designed in such a way
thatthe leading<br /> portion of any JSON array or object will be fairly incompressible, because<br /> it consists
mostlyof a strictly-increasing series of integer offsets.<br /> This interacts poorly with the code in pglz_compress()
thatgives up if<br /> it's found nothing compressible in the first first_success_by bytes of a<br />
value-to-be-compressed. (first_success_by is 1024 in the default set of<br /> compression parameters.)<br /><br /> As
anexample, here's gdb's report of the bitwise representation of the<br /> example JSON value in the bug thread:<br
/><br/> 0x2ab85ac:      0x20000005      0x00000004      0x50003098      0x0000309f<br /> 0x2ab85bc:      0x000030ae    
 0x000030b8     0x000030cf      0x000030da<br /> 0x2ab85cc:      0x000030df      0x000030ee      0x00003105    
 0x6b6e756a<br/> 0x2ab85dc:      0x400000de      0x00000034      0x00000068      0x0000009c<br /> 0x2ab85ec:    
 0x000000d0     0x00000104      0x00000138      0x0000016c<br /> 0x2ab85fc:      0x000001a0      0x000001d4    
 0x00000208     0x0000023c<br /> 0x2ab860c:      0x00000270      0x000002a4      0x000002d8      0x0000030c<br />
0x2ab861c:     0x00000340      0x00000374      0x000003a8      0x000003dc<br /> 0x2ab862c:      0x00000410    
 0x00000444     0x00000478      0x000004ac<br /> 0x2ab863c:      0x000004e0      0x00000514      0x00000548    
 0x0000057c<br/> 0x2ab864c:      0x000005b0      0x000005e4      0x00000618      0x0000064c<br /> 0x2ab865c:    
 0x00000680     0x000006b4      0x000006e8      0x0000071c<br /> 0x2ab866c:      0x00000750      0x00000784    
 0x000007b8     0x000007ec<br /> 0x2ab867c:      0x00000820      0x00000854      0x00000888      0x000008bc<br />
0x2ab868c:     0x000008f0      0x00000924      0x00000958      0x0000098c<br /> 0x2ab869c:      0x000009c0    
 0x000009f4     0x00000a28      0x00000a5c<br /> 0x2ab86ac:      0x00000a90      0x00000ac4      0x00000af8    
 0x00000b2c<br/> 0x2ab86bc:      0x00000b60      0x00000b94      0x00000bc8      0x00000bfc<br /> 0x2ab86cc:    
 0x00000c30     0x00000c64      0x00000c98      0x00000ccc<br /> 0x2ab86dc:      0x00000d00      0x00000d34    
 0x00000d68     0x00000d9c<br /> 0x2ab86ec:      0x00000dd0      0x00000e04      0x00000e38      0x00000e6c<br />
0x2ab86fc:     0x00000ea0      0x00000ed4      0x00000f08      0x00000f3c<br /> 0x2ab870c:      0x00000f70    
 0x00000fa4     0x00000fd8      0x0000100c<br /> 0x2ab871c:      0x00001040      0x00001074      0x000010a8    
 0x000010dc<br/> 0x2ab872c:      0x00001110      0x00001144      0x00001178      0x000011ac<br /> 0x2ab873c:    
 0x000011e0     0x00001214      0x00001248      0x0000127c<br /> 0x2ab874c:      0x000012b0      0x000012e4    
 0x00001318     0x0000134c<br /> 0x2ab875c:      0x00001380      0x000013b4      0x000013e8      0x0000141c<br />
0x2ab876c:     0x00001450      0x00001484      0x000014b8      0x000014ec<br /> 0x2ab877c:      0x00001520    
 0x00001554     0x00001588      0x000015bc<br /> 0x2ab878c:      0x000015f0      0x00001624      0x00001658    
 0x0000168c<br/> 0x2ab879c:      0x000016c0      0x000016f4      0x00001728      0x0000175c<br /> 0x2ab87ac:    
 0x00001790     0x000017c4      0x000017f8      0x0000182c<br /> 0x2ab87bc:      0x00001860      0x00001894    
 0x000018c8     0x000018fc<br /> 0x2ab87cc:      0x00001930      0x00001964      0x00001998      0x000019cc<br />
0x2ab87dc:     0x00001a00      0x00001a34      0x00001a68      0x00001a9c<br /> 0x2ab87ec:      0x00001ad0    
 0x00001b04     0x00001b38      0x00001b6c<br /> 0x2ab87fc:      0x00001ba0      0x00001bd4      0x00001c08    
 0x00001c3c<br/> 0x2ab880c:      0x00001c70      0x00001ca4      0x00001cd8      0x00001d0c<br /> 0x2ab881c:    
 0x00001d40     0x00001d74      0x00001da8      0x00001ddc<br /> 0x2ab882c:      0x00001e10      0x00001e44    
 0x00001e78     0x00001eac<br /> 0x2ab883c:      0x00001ee0      0x00001f14      0x00001f48      0x00001f7c<br />
0x2ab884c:     0x00001fb0      0x00001fe4      0x00002018      0x0000204c<br /> 0x2ab885c:      0x00002080    
 0x000020b4     0x000020e8      0x0000211c<br /> 0x2ab886c:      0x00002150      0x00002184      0x000021b8    
 0x000021ec<br/> 0x2ab887c:      0x00002220      0x00002254      0x00002288      0x000022bc<br /> 0x2ab888c:    
 0x000022f0     0x00002324      0x00002358      0x0000238c<br /> 0x2ab889c:      0x000023c0      0x000023f4    
 0x00002428     0x0000245c<br /> 0x2ab88ac:      0x00002490      0x000024c4      0x000024f8      0x0000252c<br />
0x2ab88bc:     0x00002560      0x00002594      0x000025c8      0x000025fc<br /> 0x2ab88cc:      0x00002630    
 0x00002664     0x00002698      0x000026cc<br /> 0x2ab88dc:      0x00002700      0x00002734      0x00002768    
 0x0000279c<br/> 0x2ab88ec:      0x000027d0      0x00002804      0x00002838      0x0000286c<br /> 0x2ab88fc:    
 0x000028a0     0x000028d4      0x00002908      0x0000293c<br /> 0x2ab890c:      0x00002970      0x000029a4    
 0x000029d8     0x00002a0c<br /> 0x2ab891c:      0x00002a40      0x00002a74      0x00002aa8      0x00002adc<br />
0x2ab892c:     0x00002b10      0x00002b44      0x00002b78      0x00002bac<br /> 0x2ab893c:      0x00002be0    
 0x00002c14     0x00002c48      0x00002c7c<br /> 0x2ab894c:      0x00002cb0      0x00002ce4      0x00002d18    
 0x32343231<br/> 0x2ab895c:      0x74653534      0x74656577      0x33746577      0x77673534<br /> 0x2ab896c:    
 0x74657274     0x33347477      0x72777120      0x20717771<br /> 0x2ab897c:      0x65727771      0x20777120    
 0x66647372     0x73616b6c<br /> 0x2ab898c:      0x33353471      0x71772035      0x72777172      0x71727771<br />
0x2ab899c:     0x77203277      0x72777172      0x71727771      0x33323233<br /> 0x2ab89ac:      0x6b207732    
 0x20657773     0x73616673      0x73207372<br /> 0x2ab89bc:      0x64736664      0x32343231      0x74653534    
 0x74656577<br/> 0x2ab89cc:      0x33746577      0x77673534      0x74657274      0x33347477<br /> 0x2ab89dc:    
 0x72777120     0x20717771      0x65727771      0x20777120<br /> 0x2ab89ec:      0x66647372      0x73616b6c    
 0x33353471     0x71772035<br /> 0x2ab89fc:      0x72777172      0x71727771      0x77203277      0x72777172<br />
0x2ab8a0c:     0x71727771      0x33323233      0x6b207732      0x20657773<br /> 0x2ab8a1c:      0x73616673    
 0x73207372     0x64736664      0x32343231<br /> 0x2ab8a2c:      0x74653534      0x74656577      0x33746577    
 0x77673534<br/> 0x2ab8a3c:      0x74657274      0x33347477      0x72777120      0x20717771<br /> 0x2ab8a4c:    
 0x65727771     0x20777120      0x66647372      0x73616b6c<br /> 0x2ab8a5c:      0x33353471      0x71772035    
 0x72777172     0x71727771<br /> 0x2ab8a6c:      0x77203277      0x72777172      0x71727771      0x33323233<br />
0x2ab8a7c:     0x6b207732      0x20657773      0x73616673      0x73207372<br /> 0x2ab8a8c:      0x64736664    
 0x32343231     0x74653534      0x74656577<br /> 0x2ab8a9c:      0x33746577      0x77673534      0x74657274    
 0x33347477<br/> 0x2ab8aac:      0x72777120      0x20717771      0x65727771      0x20777120<br /> 0x2ab8abc:    
 0x66647372     0x73616b6c      0x33353471      0x71772035<br /> 0x2ab8acc:      0x72777172      0x71727771    
 0x77203277     0x72777172<br /> 0x2ab8adc:      0x71727771      0x33323233      0x6b207732      0x20657773<br />
0x2ab8aec:     0x73616673      0x73207372      0x64736664      0x32343231<br /> 0x2ab8afc:      0x74653534    
 0x74656577     0x33746577      0x77673534<br /> ...<br /> 0x2abb61c:      0x74657274      0x33347477      0x72777120  
  0x20717771<br /> 0x2abb62c:      0x65727771      0x20777120      0x66647372      0x73616b6c<br /> 0x2abb63c:    
 0x33353471     0x71772035      0x72777172      0x71727771<br /> 0x2abb64c:      0x77203277      0x72777172    
 0x71727771     0x33323233<br /> 0x2abb65c:      0x6b207732      0x20657773      0x73616673      0x73207372<br />
0x2abb66c:     0x64736664      0x537a6962      0x41706574      0x73756220<br /> 0x2abb67c:      0x73656e69    
 0x74732073     0x45617065      0x746e6576<br /> 0x2abb68c:      0x656d6954      0x34313032      0x2d38302d    
 0x32203730<br/> 0x2abb69c:      0x33323a31      0x2e33333a      0x62393434      0x6f4c7a69<br /> 0x2abb6ac:    
 0x69746163     0x61506e6f      0x74736972      0x736e6172<br /> 0x2abb6bc:      0x69746361      0x61446e6f    
 0x30326574     0x302d3431<br /> 0x2abb6cc:      0x37302d38      0x3a313220      0x333a3332      0x34342e33<br /><br />
Thereis plenty of compressible data once we get into the repetitive<br /> strings in the payload part --- but that
startsat offset 944, and up to<br /> that point there is nothing that pg_lzcompress can get a handle on.  There<br />
are,by definition, no sequences of 4 or more repeated bytes in that area.<br /> I think in principle pg_lzcompress
coulddecide to compress the 3-byte<br /> sequences consisting of the high-order 24 bits of each offset; but it<br />
doesn'tchoose to do so, probably because of the way its lookup hash table<br /> works:<br /><br />  * pglz_hist_idx
-<br/>  *<br />  *              Computes the history table slot for the lookup by the next 4<br />  *            
 charactersin the input.<br />  *<br />  * NB: because we use the next 4 characters, we are not guaranteed to<br />  *
find3-character matches; they very possibly will be in the wrong<br />  * hash list.  This seems an acceptable tradeoff
forspreading out the<br />  * hash keys more.<br /><br /> For jsonb header data, the "next 4 characters" are *always*
different,so<br /> only a chance hash collision can result in a match.  There is therefore a<br /> pretty good chance
thatno compression will occur before it gives up<br /> because of first_success_by.<br /><br /> I'm not sure if there
isany easy fix for this.  We could possibly change<br /> the default first_success_by value, but I think that'd just be
postponing<br/> the problem to larger jsonb objects/arrays, and it would hurt performance<br /> for genuinely
incompressibledata.  A somewhat painful, but not yet<br /> out-of-the-question, alternative is to change the jsonb
on-disk<br/> representation.  Perhaps the JEntry array could be defined as containing<br /> element lengths instead of
elementending offsets.  Not sure though if<br /> that would break binary searching for JSON object keys.<br /><br />  
                     regards, tom lane<br /></blockquote></div><br /></div> 

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Specifying the unit in storage parameter
Next
From: Amit Kapila
Date:
Subject: Re: postgresql.auto.conf and reload