Thread: jsonb format is pessimal for toast compression

jsonb format is pessimal for toast compression

From
Tom Lane
Date:
I looked into the issue reported in bug #11109.  The problem appears to be
that jsonb's on-disk format is designed in such a way that the leading
portion of any JSON array or object will be fairly incompressible, because
it consists mostly of a strictly-increasing series of integer offsets.
This interacts poorly with the code in pglz_compress() that gives up if
it's found nothing compressible in the first first_success_by bytes of a
value-to-be-compressed.  (first_success_by is 1024 in the default set of
compression parameters.)

As an example, here's gdb's report of the bitwise representation of the
example JSON value in the bug thread:

0x2ab85ac:      0x20000005      0x00000004      0x50003098      0x0000309f
0x2ab85bc:      0x000030ae      0x000030b8      0x000030cf      0x000030da
0x2ab85cc:      0x000030df      0x000030ee      0x00003105      0x6b6e756a
0x2ab85dc:      0x400000de      0x00000034      0x00000068      0x0000009c
0x2ab85ec:      0x000000d0      0x00000104      0x00000138      0x0000016c
0x2ab85fc:      0x000001a0      0x000001d4      0x00000208      0x0000023c
0x2ab860c:      0x00000270      0x000002a4      0x000002d8      0x0000030c
0x2ab861c:      0x00000340      0x00000374      0x000003a8      0x000003dc
0x2ab862c:      0x00000410      0x00000444      0x00000478      0x000004ac
0x2ab863c:      0x000004e0      0x00000514      0x00000548      0x0000057c
0x2ab864c:      0x000005b0      0x000005e4      0x00000618      0x0000064c
0x2ab865c:      0x00000680      0x000006b4      0x000006e8      0x0000071c
0x2ab866c:      0x00000750      0x00000784      0x000007b8      0x000007ec
0x2ab867c:      0x00000820      0x00000854      0x00000888      0x000008bc
0x2ab868c:      0x000008f0      0x00000924      0x00000958      0x0000098c
0x2ab869c:      0x000009c0      0x000009f4      0x00000a28      0x00000a5c
0x2ab86ac:      0x00000a90      0x00000ac4      0x00000af8      0x00000b2c
0x2ab86bc:      0x00000b60      0x00000b94      0x00000bc8      0x00000bfc
0x2ab86cc:      0x00000c30      0x00000c64      0x00000c98      0x00000ccc
0x2ab86dc:      0x00000d00      0x00000d34      0x00000d68      0x00000d9c
0x2ab86ec:      0x00000dd0      0x00000e04      0x00000e38      0x00000e6c
0x2ab86fc:      0x00000ea0      0x00000ed4      0x00000f08      0x00000f3c
0x2ab870c:      0x00000f70      0x00000fa4      0x00000fd8      0x0000100c
0x2ab871c:      0x00001040      0x00001074      0x000010a8      0x000010dc
0x2ab872c:      0x00001110      0x00001144      0x00001178      0x000011ac
0x2ab873c:      0x000011e0      0x00001214      0x00001248      0x0000127c
0x2ab874c:      0x000012b0      0x000012e4      0x00001318      0x0000134c
0x2ab875c:      0x00001380      0x000013b4      0x000013e8      0x0000141c
0x2ab876c:      0x00001450      0x00001484      0x000014b8      0x000014ec
0x2ab877c:      0x00001520      0x00001554      0x00001588      0x000015bc
0x2ab878c:      0x000015f0      0x00001624      0x00001658      0x0000168c
0x2ab879c:      0x000016c0      0x000016f4      0x00001728      0x0000175c
0x2ab87ac:      0x00001790      0x000017c4      0x000017f8      0x0000182c
0x2ab87bc:      0x00001860      0x00001894      0x000018c8      0x000018fc
0x2ab87cc:      0x00001930      0x00001964      0x00001998      0x000019cc
0x2ab87dc:      0x00001a00      0x00001a34      0x00001a68      0x00001a9c
0x2ab87ec:      0x00001ad0      0x00001b04      0x00001b38      0x00001b6c
0x2ab87fc:      0x00001ba0      0x00001bd4      0x00001c08      0x00001c3c
0x2ab880c:      0x00001c70      0x00001ca4      0x00001cd8      0x00001d0c
0x2ab881c:      0x00001d40      0x00001d74      0x00001da8      0x00001ddc
0x2ab882c:      0x00001e10      0x00001e44      0x00001e78      0x00001eac
0x2ab883c:      0x00001ee0      0x00001f14      0x00001f48      0x00001f7c
0x2ab884c:      0x00001fb0      0x00001fe4      0x00002018      0x0000204c
0x2ab885c:      0x00002080      0x000020b4      0x000020e8      0x0000211c
0x2ab886c:      0x00002150      0x00002184      0x000021b8      0x000021ec
0x2ab887c:      0x00002220      0x00002254      0x00002288      0x000022bc
0x2ab888c:      0x000022f0      0x00002324      0x00002358      0x0000238c
0x2ab889c:      0x000023c0      0x000023f4      0x00002428      0x0000245c
0x2ab88ac:      0x00002490      0x000024c4      0x000024f8      0x0000252c
0x2ab88bc:      0x00002560      0x00002594      0x000025c8      0x000025fc
0x2ab88cc:      0x00002630      0x00002664      0x00002698      0x000026cc
0x2ab88dc:      0x00002700      0x00002734      0x00002768      0x0000279c
0x2ab88ec:      0x000027d0      0x00002804      0x00002838      0x0000286c
0x2ab88fc:      0x000028a0      0x000028d4      0x00002908      0x0000293c
0x2ab890c:      0x00002970      0x000029a4      0x000029d8      0x00002a0c
0x2ab891c:      0x00002a40      0x00002a74      0x00002aa8      0x00002adc
0x2ab892c:      0x00002b10      0x00002b44      0x00002b78      0x00002bac
0x2ab893c:      0x00002be0      0x00002c14      0x00002c48      0x00002c7c
0x2ab894c:      0x00002cb0      0x00002ce4      0x00002d18      0x32343231
0x2ab895c:      0x74653534      0x74656577      0x33746577      0x77673534
0x2ab896c:      0x74657274      0x33347477      0x72777120      0x20717771
0x2ab897c:      0x65727771      0x20777120      0x66647372      0x73616b6c
0x2ab898c:      0x33353471      0x71772035      0x72777172      0x71727771
0x2ab899c:      0x77203277      0x72777172      0x71727771      0x33323233
0x2ab89ac:      0x6b207732      0x20657773      0x73616673      0x73207372
0x2ab89bc:      0x64736664      0x32343231      0x74653534      0x74656577
0x2ab89cc:      0x33746577      0x77673534      0x74657274      0x33347477
0x2ab89dc:      0x72777120      0x20717771      0x65727771      0x20777120
0x2ab89ec:      0x66647372      0x73616b6c      0x33353471      0x71772035
0x2ab89fc:      0x72777172      0x71727771      0x77203277      0x72777172
0x2ab8a0c:      0x71727771      0x33323233      0x6b207732      0x20657773
0x2ab8a1c:      0x73616673      0x73207372      0x64736664      0x32343231
0x2ab8a2c:      0x74653534      0x74656577      0x33746577      0x77673534
0x2ab8a3c:      0x74657274      0x33347477      0x72777120      0x20717771
0x2ab8a4c:      0x65727771      0x20777120      0x66647372      0x73616b6c
0x2ab8a5c:      0x33353471      0x71772035      0x72777172      0x71727771
0x2ab8a6c:      0x77203277      0x72777172      0x71727771      0x33323233
0x2ab8a7c:      0x6b207732      0x20657773      0x73616673      0x73207372
0x2ab8a8c:      0x64736664      0x32343231      0x74653534      0x74656577
0x2ab8a9c:      0x33746577      0x77673534      0x74657274      0x33347477
0x2ab8aac:      0x72777120      0x20717771      0x65727771      0x20777120
0x2ab8abc:      0x66647372      0x73616b6c      0x33353471      0x71772035
0x2ab8acc:      0x72777172      0x71727771      0x77203277      0x72777172
0x2ab8adc:      0x71727771      0x33323233      0x6b207732      0x20657773
0x2ab8aec:      0x73616673      0x73207372      0x64736664      0x32343231
0x2ab8afc:      0x74653534      0x74656577      0x33746577      0x77673534
...
0x2abb61c:      0x74657274      0x33347477      0x72777120      0x20717771
0x2abb62c:      0x65727771      0x20777120      0x66647372      0x73616b6c
0x2abb63c:      0x33353471      0x71772035      0x72777172      0x71727771
0x2abb64c:      0x77203277      0x72777172      0x71727771      0x33323233
0x2abb65c:      0x6b207732      0x20657773      0x73616673      0x73207372
0x2abb66c:      0x64736664      0x537a6962      0x41706574      0x73756220
0x2abb67c:      0x73656e69      0x74732073      0x45617065      0x746e6576
0x2abb68c:      0x656d6954      0x34313032      0x2d38302d      0x32203730
0x2abb69c:      0x33323a31      0x2e33333a      0x62393434      0x6f4c7a69
0x2abb6ac:      0x69746163      0x61506e6f      0x74736972      0x736e6172
0x2abb6bc:      0x69746361      0x61446e6f      0x30326574      0x302d3431
0x2abb6cc:      0x37302d38      0x3a313220      0x333a3332      0x34342e33

There is plenty of compressible data once we get into the repetitive
strings in the payload part --- but that starts at offset 944, and up to
that point there is nothing that pg_lzcompress can get a handle on.  There
are, by definition, no sequences of 4 or more repeated bytes in that area.
I think in principle pg_lzcompress could decide to compress the 3-byte
sequences consisting of the high-order 24 bits of each offset; but it
doesn't choose to do so, probably because of the way its lookup hash table
works:
* pglz_hist_idx -**        Computes the history table slot for the lookup by the next 4*        characters in the
input.**NB: because we use the next 4 characters, we are not guaranteed to* find 3-character matches; they very
possiblywill be in the wrong* hash list.  This seems an acceptable tradeoff for spreading out the* hash keys more.
 

For jsonb header data, the "next 4 characters" are *always* different, so
only a chance hash collision can result in a match.  There is therefore a
pretty good chance that no compression will occur before it gives up
because of first_success_by.

I'm not sure if there is any easy fix for this.  We could possibly change
the default first_success_by value, but I think that'd just be postponing
the problem to larger jsonb objects/arrays, and it would hurt performance
for genuinely incompressible data.  A somewhat painful, but not yet
out-of-the-question, alternative is to change the jsonb on-disk
representation.  Perhaps the JEntry array could be defined as containing
element lengths instead of element ending offsets.  Not sure though if
that would break binary searching for JSON object keys.
        regards, tom lane



Re: jsonb format is pessimal for toast compression

From
Larry White
Date:
<div dir="ltr">Apologies if this is a ridiculous suggestion, but I believe that swapping out the compression algorithm
(forSnappy, for example) has been discussed in the past. I wonder if that algorithm is sufficiently different that it
wouldproduce a better result, and if that might not be preferable to some of the other options. </div><div
class="gmail_extra"><br/><br /><div class="gmail_quote">On Thu, Aug 7, 2014 at 11:17 PM, Tom Lane <span
dir="ltr"><<ahref="mailto:tgl@sss.pgh.pa.us" target="_blank">tgl@sss.pgh.pa.us</a>></span> wrote:<br
/><blockquoteclass="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> I looked into
theissue reported in bug #11109.  The problem appears to be<br /> that jsonb's on-disk format is designed in such a way
thatthe leading<br /> portion of any JSON array or object will be fairly incompressible, because<br /> it consists
mostlyof a strictly-increasing series of integer offsets.<br /> This interacts poorly with the code in pglz_compress()
thatgives up if<br /> it's found nothing compressible in the first first_success_by bytes of a<br />
value-to-be-compressed. (first_success_by is 1024 in the default set of<br /> compression parameters.)<br /><br /> As
anexample, here's gdb's report of the bitwise representation of the<br /> example JSON value in the bug thread:<br
/><br/> 0x2ab85ac:      0x20000005      0x00000004      0x50003098      0x0000309f<br /> 0x2ab85bc:      0x000030ae    
 0x000030b8     0x000030cf      0x000030da<br /> 0x2ab85cc:      0x000030df      0x000030ee      0x00003105    
 0x6b6e756a<br/> 0x2ab85dc:      0x400000de      0x00000034      0x00000068      0x0000009c<br /> 0x2ab85ec:    
 0x000000d0     0x00000104      0x00000138      0x0000016c<br /> 0x2ab85fc:      0x000001a0      0x000001d4    
 0x00000208     0x0000023c<br /> 0x2ab860c:      0x00000270      0x000002a4      0x000002d8      0x0000030c<br />
0x2ab861c:     0x00000340      0x00000374      0x000003a8      0x000003dc<br /> 0x2ab862c:      0x00000410    
 0x00000444     0x00000478      0x000004ac<br /> 0x2ab863c:      0x000004e0      0x00000514      0x00000548    
 0x0000057c<br/> 0x2ab864c:      0x000005b0      0x000005e4      0x00000618      0x0000064c<br /> 0x2ab865c:    
 0x00000680     0x000006b4      0x000006e8      0x0000071c<br /> 0x2ab866c:      0x00000750      0x00000784    
 0x000007b8     0x000007ec<br /> 0x2ab867c:      0x00000820      0x00000854      0x00000888      0x000008bc<br />
0x2ab868c:     0x000008f0      0x00000924      0x00000958      0x0000098c<br /> 0x2ab869c:      0x000009c0    
 0x000009f4     0x00000a28      0x00000a5c<br /> 0x2ab86ac:      0x00000a90      0x00000ac4      0x00000af8    
 0x00000b2c<br/> 0x2ab86bc:      0x00000b60      0x00000b94      0x00000bc8      0x00000bfc<br /> 0x2ab86cc:    
 0x00000c30     0x00000c64      0x00000c98      0x00000ccc<br /> 0x2ab86dc:      0x00000d00      0x00000d34    
 0x00000d68     0x00000d9c<br /> 0x2ab86ec:      0x00000dd0      0x00000e04      0x00000e38      0x00000e6c<br />
0x2ab86fc:     0x00000ea0      0x00000ed4      0x00000f08      0x00000f3c<br /> 0x2ab870c:      0x00000f70    
 0x00000fa4     0x00000fd8      0x0000100c<br /> 0x2ab871c:      0x00001040      0x00001074      0x000010a8    
 0x000010dc<br/> 0x2ab872c:      0x00001110      0x00001144      0x00001178      0x000011ac<br /> 0x2ab873c:    
 0x000011e0     0x00001214      0x00001248      0x0000127c<br /> 0x2ab874c:      0x000012b0      0x000012e4    
 0x00001318     0x0000134c<br /> 0x2ab875c:      0x00001380      0x000013b4      0x000013e8      0x0000141c<br />
0x2ab876c:     0x00001450      0x00001484      0x000014b8      0x000014ec<br /> 0x2ab877c:      0x00001520    
 0x00001554     0x00001588      0x000015bc<br /> 0x2ab878c:      0x000015f0      0x00001624      0x00001658    
 0x0000168c<br/> 0x2ab879c:      0x000016c0      0x000016f4      0x00001728      0x0000175c<br /> 0x2ab87ac:    
 0x00001790     0x000017c4      0x000017f8      0x0000182c<br /> 0x2ab87bc:      0x00001860      0x00001894    
 0x000018c8     0x000018fc<br /> 0x2ab87cc:      0x00001930      0x00001964      0x00001998      0x000019cc<br />
0x2ab87dc:     0x00001a00      0x00001a34      0x00001a68      0x00001a9c<br /> 0x2ab87ec:      0x00001ad0    
 0x00001b04     0x00001b38      0x00001b6c<br /> 0x2ab87fc:      0x00001ba0      0x00001bd4      0x00001c08    
 0x00001c3c<br/> 0x2ab880c:      0x00001c70      0x00001ca4      0x00001cd8      0x00001d0c<br /> 0x2ab881c:    
 0x00001d40     0x00001d74      0x00001da8      0x00001ddc<br /> 0x2ab882c:      0x00001e10      0x00001e44    
 0x00001e78     0x00001eac<br /> 0x2ab883c:      0x00001ee0      0x00001f14      0x00001f48      0x00001f7c<br />
0x2ab884c:     0x00001fb0      0x00001fe4      0x00002018      0x0000204c<br /> 0x2ab885c:      0x00002080    
 0x000020b4     0x000020e8      0x0000211c<br /> 0x2ab886c:      0x00002150      0x00002184      0x000021b8    
 0x000021ec<br/> 0x2ab887c:      0x00002220      0x00002254      0x00002288      0x000022bc<br /> 0x2ab888c:    
 0x000022f0     0x00002324      0x00002358      0x0000238c<br /> 0x2ab889c:      0x000023c0      0x000023f4    
 0x00002428     0x0000245c<br /> 0x2ab88ac:      0x00002490      0x000024c4      0x000024f8      0x0000252c<br />
0x2ab88bc:     0x00002560      0x00002594      0x000025c8      0x000025fc<br /> 0x2ab88cc:      0x00002630    
 0x00002664     0x00002698      0x000026cc<br /> 0x2ab88dc:      0x00002700      0x00002734      0x00002768    
 0x0000279c<br/> 0x2ab88ec:      0x000027d0      0x00002804      0x00002838      0x0000286c<br /> 0x2ab88fc:    
 0x000028a0     0x000028d4      0x00002908      0x0000293c<br /> 0x2ab890c:      0x00002970      0x000029a4    
 0x000029d8     0x00002a0c<br /> 0x2ab891c:      0x00002a40      0x00002a74      0x00002aa8      0x00002adc<br />
0x2ab892c:     0x00002b10      0x00002b44      0x00002b78      0x00002bac<br /> 0x2ab893c:      0x00002be0    
 0x00002c14     0x00002c48      0x00002c7c<br /> 0x2ab894c:      0x00002cb0      0x00002ce4      0x00002d18    
 0x32343231<br/> 0x2ab895c:      0x74653534      0x74656577      0x33746577      0x77673534<br /> 0x2ab896c:    
 0x74657274     0x33347477      0x72777120      0x20717771<br /> 0x2ab897c:      0x65727771      0x20777120    
 0x66647372     0x73616b6c<br /> 0x2ab898c:      0x33353471      0x71772035      0x72777172      0x71727771<br />
0x2ab899c:     0x77203277      0x72777172      0x71727771      0x33323233<br /> 0x2ab89ac:      0x6b207732    
 0x20657773     0x73616673      0x73207372<br /> 0x2ab89bc:      0x64736664      0x32343231      0x74653534    
 0x74656577<br/> 0x2ab89cc:      0x33746577      0x77673534      0x74657274      0x33347477<br /> 0x2ab89dc:    
 0x72777120     0x20717771      0x65727771      0x20777120<br /> 0x2ab89ec:      0x66647372      0x73616b6c    
 0x33353471     0x71772035<br /> 0x2ab89fc:      0x72777172      0x71727771      0x77203277      0x72777172<br />
0x2ab8a0c:     0x71727771      0x33323233      0x6b207732      0x20657773<br /> 0x2ab8a1c:      0x73616673    
 0x73207372     0x64736664      0x32343231<br /> 0x2ab8a2c:      0x74653534      0x74656577      0x33746577    
 0x77673534<br/> 0x2ab8a3c:      0x74657274      0x33347477      0x72777120      0x20717771<br /> 0x2ab8a4c:    
 0x65727771     0x20777120      0x66647372      0x73616b6c<br /> 0x2ab8a5c:      0x33353471      0x71772035    
 0x72777172     0x71727771<br /> 0x2ab8a6c:      0x77203277      0x72777172      0x71727771      0x33323233<br />
0x2ab8a7c:     0x6b207732      0x20657773      0x73616673      0x73207372<br /> 0x2ab8a8c:      0x64736664    
 0x32343231     0x74653534      0x74656577<br /> 0x2ab8a9c:      0x33746577      0x77673534      0x74657274    
 0x33347477<br/> 0x2ab8aac:      0x72777120      0x20717771      0x65727771      0x20777120<br /> 0x2ab8abc:    
 0x66647372     0x73616b6c      0x33353471      0x71772035<br /> 0x2ab8acc:      0x72777172      0x71727771    
 0x77203277     0x72777172<br /> 0x2ab8adc:      0x71727771      0x33323233      0x6b207732      0x20657773<br />
0x2ab8aec:     0x73616673      0x73207372      0x64736664      0x32343231<br /> 0x2ab8afc:      0x74653534    
 0x74656577     0x33746577      0x77673534<br /> ...<br /> 0x2abb61c:      0x74657274      0x33347477      0x72777120  
  0x20717771<br /> 0x2abb62c:      0x65727771      0x20777120      0x66647372      0x73616b6c<br /> 0x2abb63c:    
 0x33353471     0x71772035      0x72777172      0x71727771<br /> 0x2abb64c:      0x77203277      0x72777172    
 0x71727771     0x33323233<br /> 0x2abb65c:      0x6b207732      0x20657773      0x73616673      0x73207372<br />
0x2abb66c:     0x64736664      0x537a6962      0x41706574      0x73756220<br /> 0x2abb67c:      0x73656e69    
 0x74732073     0x45617065      0x746e6576<br /> 0x2abb68c:      0x656d6954      0x34313032      0x2d38302d    
 0x32203730<br/> 0x2abb69c:      0x33323a31      0x2e33333a      0x62393434      0x6f4c7a69<br /> 0x2abb6ac:    
 0x69746163     0x61506e6f      0x74736972      0x736e6172<br /> 0x2abb6bc:      0x69746361      0x61446e6f    
 0x30326574     0x302d3431<br /> 0x2abb6cc:      0x37302d38      0x3a313220      0x333a3332      0x34342e33<br /><br />
Thereis plenty of compressible data once we get into the repetitive<br /> strings in the payload part --- but that
startsat offset 944, and up to<br /> that point there is nothing that pg_lzcompress can get a handle on.  There<br />
are,by definition, no sequences of 4 or more repeated bytes in that area.<br /> I think in principle pg_lzcompress
coulddecide to compress the 3-byte<br /> sequences consisting of the high-order 24 bits of each offset; but it<br />
doesn'tchoose to do so, probably because of the way its lookup hash table<br /> works:<br /><br />  * pglz_hist_idx
-<br/>  *<br />  *              Computes the history table slot for the lookup by the next 4<br />  *            
 charactersin the input.<br />  *<br />  * NB: because we use the next 4 characters, we are not guaranteed to<br />  *
find3-character matches; they very possibly will be in the wrong<br />  * hash list.  This seems an acceptable tradeoff
forspreading out the<br />  * hash keys more.<br /><br /> For jsonb header data, the "next 4 characters" are *always*
different,so<br /> only a chance hash collision can result in a match.  There is therefore a<br /> pretty good chance
thatno compression will occur before it gives up<br /> because of first_success_by.<br /><br /> I'm not sure if there
isany easy fix for this.  We could possibly change<br /> the default first_success_by value, but I think that'd just be
postponing<br/> the problem to larger jsonb objects/arrays, and it would hurt performance<br /> for genuinely
incompressibledata.  A somewhat painful, but not yet<br /> out-of-the-question, alternative is to change the jsonb
on-disk<br/> representation.  Perhaps the JEntry array could be defined as containing<br /> element lengths instead of
elementending offsets.  Not sure though if<br /> that would break binary searching for JSON object keys.<br /><br />  
                     regards, tom lane<br /></blockquote></div><br /></div> 

Re: jsonb format is pessimal for toast compression

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> I looked into the issue reported in bug #11109.  The problem appears to be
> that jsonb's on-disk format is designed in such a way that the leading
> portion of any JSON array or object will be fairly incompressible, because
> it consists mostly of a strictly-increasing series of integer offsets.
> This interacts poorly with the code in pglz_compress() that gives up if
> it's found nothing compressible in the first first_success_by bytes of a
> value-to-be-compressed.  (first_success_by is 1024 in the default set of
> compression parameters.)

I haven't looked at this in any detail, so take this with a grain of
salt, but what about teaching pglz_compress about using an offset
farther into the data, if the incoming data is quite a bit larger than
1k?  This is just a test to see if it's worthwhile to keep going, no?  I
wonder if this might even be able to be provided as a type-specific
option, to avoid changing the behavior for types other than jsonb in
this regard.

(I'm imaginging a boolean saying "pick a random sample", or perhaps a
function which can be called that'll return "here's where you wanna test
if this thing is gonna compress at all")

I'm rather disinclined to change the on-disk format because of this
specific test, that feels a bit like the tail wagging the dog to me,
especially as I do hope that some day we'll figure out a way to use a
better compression algorithm than pglz.
Thanks,        Stephen

Re: jsonb format is pessimal for toast compression

From
Ashutosh Bapat
Date:



On Fri, Aug 8, 2014 at 10:48 AM, Stephen Frost <sfrost@snowman.net> wrote:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> I looked into the issue reported in bug #11109.  The problem appears to be
> that jsonb's on-disk format is designed in such a way that the leading
> portion of any JSON array or object will be fairly incompressible, because
> it consists mostly of a strictly-increasing series of integer offsets.
> This interacts poorly with the code in pglz_compress() that gives up if
> it's found nothing compressible in the first first_success_by bytes of a
> value-to-be-compressed.  (first_success_by is 1024 in the default set of
> compression parameters.)

I haven't looked at this in any detail, so take this with a grain of
salt, but what about teaching pglz_compress about using an offset
farther into the data, if the incoming data is quite a bit larger than
1k?  This is just a test to see if it's worthwhile to keep going, no?  I
wonder if this might even be able to be provided as a type-specific
option, to avoid changing the behavior for types other than jsonb in
this regard.


+1 for offset. Or sample the data in the beginning, middle and end. Obviously one could always come up with worst case, but.
 
(I'm imaginging a boolean saying "pick a random sample", or perhaps a
function which can be called that'll return "here's where you wanna test
if this thing is gonna compress at all")

I'm rather disinclined to change the on-disk format because of this
specific test, that feels a bit like the tail wagging the dog to me,
especially as I do hope that some day we'll figure out a way to use a
better compression algorithm than pglz.

        Thanks,

                Stephen



--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: jsonb format is pessimal for toast compression

From
Andrew Dunstan
Date:
On 08/07/2014 11:17 PM, Tom Lane wrote:
> I looked into the issue reported in bug #11109.  The problem appears to be
> that jsonb's on-disk format is designed in such a way that the leading
> portion of any JSON array or object will be fairly incompressible, because
> it consists mostly of a strictly-increasing series of integer offsets.
> This interacts poorly with the code in pglz_compress() that gives up if
> it's found nothing compressible in the first first_success_by bytes of a
> value-to-be-compressed.  (first_success_by is 1024 in the default set of
> compression parameters.)

[snip]

> There is plenty of compressible data once we get into the repetitive
> strings in the payload part --- but that starts at offset 944, and up to
> that point there is nothing that pg_lzcompress can get a handle on.  There
> are, by definition, no sequences of 4 or more repeated bytes in that area.
> I think in principle pg_lzcompress could decide to compress the 3-byte
> sequences consisting of the high-order 24 bits of each offset; but it
> doesn't choose to do so, probably because of the way its lookup hash table
> works:
>
>   * pglz_hist_idx -
>   *
>   *        Computes the history table slot for the lookup by the next 4
>   *        characters in the input.
>   *
>   * NB: because we use the next 4 characters, we are not guaranteed to
>   * find 3-character matches; they very possibly will be in the wrong
>   * hash list.  This seems an acceptable tradeoff for spreading out the
>   * hash keys more.
>
> For jsonb header data, the "next 4 characters" are *always* different, so
> only a chance hash collision can result in a match.  There is therefore a
> pretty good chance that no compression will occur before it gives up
> because of first_success_by.
>
> I'm not sure if there is any easy fix for this.  We could possibly change
> the default first_success_by value, but I think that'd just be postponing
> the problem to larger jsonb objects/arrays, and it would hurt performance
> for genuinely incompressible data.  A somewhat painful, but not yet
> out-of-the-question, alternative is to change the jsonb on-disk
> representation.  Perhaps the JEntry array could be defined as containing
> element lengths instead of element ending offsets.  Not sure though if
> that would break binary searching for JSON object keys.
>
>             


Ouch.

Back when this structure was first presented at pgCon 2013, I wondered 
if we shouldn't extract the strings into a dictionary, because of key 
repetition, and convinced myself that this shouldn't be necessary 
because in significant cases TOAST would take care of it.

Maybe we should have pglz_compress() look at the *last* 1024 bytes if it 
can't find anything worth compressing in the first, for values larger 
than a certain size.

It's worth noting that this is a fairly pathological case. AIUI the 
example you constructed has an array with 100k string elements. I don't 
think that's typical. So I suspect that unless I've misunderstood the 
statement of the problem we're going to find that almost all the jsonb 
we will be storing is still compressible.

cheers

andrew



Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> I looked into the issue reported in bug #11109.  The problem appears to be
>> that jsonb's on-disk format is designed in such a way that the leading
>> portion of any JSON array or object will be fairly incompressible, because
>> it consists mostly of a strictly-increasing series of integer offsets.
>> This interacts poorly with the code in pglz_compress() that gives up if
>> it's found nothing compressible in the first first_success_by bytes of a
>> value-to-be-compressed.  (first_success_by is 1024 in the default set of
>> compression parameters.)

> I haven't looked at this in any detail, so take this with a grain of
> salt, but what about teaching pglz_compress about using an offset
> farther into the data, if the incoming data is quite a bit larger than
> 1k?  This is just a test to see if it's worthwhile to keep going, no?

Well, the point of the existing approach is that it's a *nearly free*
test to see if it's worthwhile to keep going; there's just one if-test
added in the outer loop of the compression code.  (cf commit ad434473ebd2,
which added that along with some other changes.)  AFAICS, what we'd have
to do to do it as you suggest would to execute compression on some subset
of the data and then throw away that work entirely.  I do not find that
attractive, especially when for most datatypes there's no particular
reason to look at one subset instead of another.

> I'm rather disinclined to change the on-disk format because of this
> specific test, that feels a bit like the tail wagging the dog to me,
> especially as I do hope that some day we'll figure out a way to use a
> better compression algorithm than pglz.

I'm unimpressed by that argument too, for a number of reasons:

1. The real problem here is that jsonb is emitting quite a bit of
fundamentally-nonrepetitive data, even when the user-visible input is very
repetitive.  That's a compression-unfriendly transformation by anyone's
measure.  Assuming that some future replacement for pg_lzcompress() will
nonetheless be able to compress the data strikes me as mostly wishful
thinking.  Besides, we'd more than likely have a similar early-exit rule
in any substitute implementation, so that we'd still be at risk even if
it usually worked.

2. Are we going to ship 9.4 without fixing this?  I definitely don't see
replacing pg_lzcompress as being on the agenda for 9.4, whereas changing
jsonb is still within the bounds of reason.

Considering all the hype that's built up around jsonb, shipping a design
with a fundamental performance handicap doesn't seem like a good plan
to me.  We could perhaps band-aid around it by using different compression
parameters for jsonb, although that would require some painful API changes
since toast_compress_datum() doesn't know what datatype it's operating on.
        regards, tom lane



Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> On 08/07/2014 11:17 PM, Tom Lane wrote:
>> I looked into the issue reported in bug #11109.  The problem appears to be
>> that jsonb's on-disk format is designed in such a way that the leading
>> portion of any JSON array or object will be fairly incompressible, because
>> it consists mostly of a strictly-increasing series of integer offsets.

> Ouch.

> Back when this structure was first presented at pgCon 2013, I wondered 
> if we shouldn't extract the strings into a dictionary, because of key 
> repetition, and convinced myself that this shouldn't be necessary 
> because in significant cases TOAST would take care of it.

That's not really the issue here, I think.  The problem is that a
relatively minor aspect of the representation, namely the choice to store
a series of offsets rather than a series of lengths, produces
nonrepetitive data even when the original input is repetitive.

> Maybe we should have pglz_compress() look at the *last* 1024 bytes if it 
> can't find anything worth compressing in the first, for values larger 
> than a certain size.

Not possible with anything like the current implementation, since it's
just an on-the-fly status check not a trial compression.

> It's worth noting that this is a fairly pathological case. AIUI the 
> example you constructed has an array with 100k string elements. I don't 
> think that's typical. So I suspect that unless I've misunderstood the 
> statement of the problem we're going to find that almost all the jsonb 
> we will be storing is still compressible.

Actually, the 100K-string example I constructed *did* compress.  Larry's
example that's not compressing is only about 12kB.  AFAICS, the threshold
for trouble is in the vicinity of 256 array or object entries (resulting
in a 1kB JEntry array).  That doesn't seem especially high.  There is a
probabilistic component as to whether the early-exit case will actually
fire, since any chance hash collision will probably result in some 3-byte
offset prefix getting compressed.  But the fact that a beta tester tripped
over this doesn't leave me with a warm feeling about the odds that it
won't happen much in the field.
        regards, tom lane



Re: jsonb format is pessimal for toast compression

From
Andrew Dunstan
Date:
On 08/08/2014 11:18 AM, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> On 08/07/2014 11:17 PM, Tom Lane wrote:
>>> I looked into the issue reported in bug #11109.  The problem appears to be
>>> that jsonb's on-disk format is designed in such a way that the leading
>>> portion of any JSON array or object will be fairly incompressible, because
>>> it consists mostly of a strictly-increasing series of integer offsets.
>
>> Back when this structure was first presented at pgCon 2013, I wondered
>> if we shouldn't extract the strings into a dictionary, because of key
>> repetition, and convinced myself that this shouldn't be necessary
>> because in significant cases TOAST would take care of it.
> That's not really the issue here, I think.  The problem is that a
> relatively minor aspect of the representation, namely the choice to store
> a series of offsets rather than a series of lengths, produces
> nonrepetitive data even when the original input is repetitive.


It would certainly be worth validating that changing this would fix the 
problem.

I don't know how invasive that would be - I suspect (without looking 
very closely) not terribly much.

> 2. Are we going to ship 9.4 without fixing this?  I definitely don't see
> replacing pg_lzcompress as being on the agenda for 9.4, whereas changing
> jsonb is still within the bounds of reason.
>
> Considering all the hype that's built up around jsonb, shipping a design
> with a fundamental performance handicap doesn't seem like a good plan
> to me.  We could perhaps band-aid around it by using different compression
> parameters for jsonb, although that would require some painful API changes
> since toast_compress_datum() doesn't know what datatype it's operating on.
>
>             


Yeah, it would be a bit painful, but after all finding out this sort of 
thing is why we have betas.


cheers

andrew



Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> On 08/08/2014 11:18 AM, Tom Lane wrote:
>> That's not really the issue here, I think.  The problem is that a
>> relatively minor aspect of the representation, namely the choice to store
>> a series of offsets rather than a series of lengths, produces
>> nonrepetitive data even when the original input is repetitive.

> It would certainly be worth validating that changing this would fix the 
> problem.
> I don't know how invasive that would be - I suspect (without looking 
> very closely) not terribly much.

I took a quick look and saw that this wouldn't be that easy to get around.
As I'd suspected upthread, there are places that do random access into a
JEntry array, such as the binary search in findJsonbValueFromContainer().
If we have to add up all the preceding lengths to locate the corresponding
value part, we lose the performance advantages of binary search.  AFAICS
that's applied directly to the on-disk representation.  I'd thought
perhaps there was always a transformation step to build a pointer list,
but nope.
        regards, tom lane



Re: jsonb format is pessimal for toast compression

From
John W Higgins
Date:



On Fri, Aug 8, 2014 at 8:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Stephen Frost <sfrost@snowman.net> writes:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:

> I'm rather disinclined to change the on-disk format because of this
> specific test, that feels a bit like the tail wagging the dog to me,
> especially as I do hope that some day we'll figure out a way to use a
> better compression algorithm than pglz.

I'm unimpressed by that argument too, for a number of reasons:

1. The real problem here is that jsonb is emitting quite a bit of
fundamentally-nonrepetitive data, even when the user-visible input is very
repetitive.  That's a compression-unfriendly transformation by anyone's
measure.  Assuming that some future replacement for pg_lzcompress() will
nonetheless be able to compress the data strikes me as mostly wishful
thinking.  Besides, we'd more than likely have a similar early-exit rule
in any substitute implementation, so that we'd still be at risk even if
it usually worked.

Would an answer be to switch the location of the jsonb "header" data to the end of the field as opposed to the beginning of the field? That would allow pglz to see what it wants to see early on and go to work when possible? 

Add an offset at the top of the field to show where to look - but then it would be the same in terms of functionality outside of that? Or pretty close?

John

Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
John W Higgins <wishdev@gmail.com> writes:
> Would an answer be to switch the location of the jsonb "header" data to the
> end of the field as opposed to the beginning of the field? That would allow
> pglz to see what it wants to see early on and go to work when possible?

Hm, might work.  Seems a bit odd, but it would make pglz_compress happier.

OTOH, the big-picture issue here is that jsonb is generating
noncompressible data in the first place.  Putting it somewhere else in the
Datum doesn't change the fact that we're going to have bloated storage,
even if we dodge the early-exit problem.  (I suspect the compression
disadvantage vs text/plain-json that I showed yesterday is coming largely
from that offset array.)  But I don't currently see how to avoid that and
still preserve the fast binary-search key lookup property, which is surely
a nice thing to have.
        regards, tom lane



Re: jsonb format is pessimal for toast compression

From
Andrew Dunstan
Date:
On 08/08/2014 12:04 PM, John W Higgins wrote:

>
> Would an answer be to switch the location of the jsonb "header" data 
> to the end of the field as opposed to the beginning of the field? That 
> would allow pglz to see what it wants to see early on and go to work 
> when possible?
>
> Add an offset at the top of the field to show where to look - but then 
> it would be the same in terms of functionality outside of that? Or 
> pretty close?
>

That might make building up jsonb structures piece by piece as we do 
difficult.

cheers

andrew






Re: jsonb format is pessimal for toast compression

From
Andrew Dunstan
Date:
On 08/08/2014 11:54 AM, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> On 08/08/2014 11:18 AM, Tom Lane wrote:
>>> That's not really the issue here, I think.  The problem is that a
>>> relatively minor aspect of the representation, namely the choice to store
>>> a series of offsets rather than a series of lengths, produces
>>> nonrepetitive data even when the original input is repetitive.
>> It would certainly be worth validating that changing this would fix the
>> problem.
>> I don't know how invasive that would be - I suspect (without looking
>> very closely) not terribly much.
> I took a quick look and saw that this wouldn't be that easy to get around.
> As I'd suspected upthread, there are places that do random access into a
> JEntry array, such as the binary search in findJsonbValueFromContainer().
> If we have to add up all the preceding lengths to locate the corresponding
> value part, we lose the performance advantages of binary search.  AFAICS
> that's applied directly to the on-disk representation.  I'd thought
> perhaps there was always a transformation step to build a pointer list,
> but nope.
>
>             


It would be interesting to know what the performance hit would be if we 
calculated the offsets/pointers on the fly, especially if we could cache 
it somehow. The main benefit of binary search is in saving on 
comparisons, especially of strings, ISTM, and that could still be 
available - this would just be a bit of extra arithmetic.

cheers

andrew




Re: jsonb format is pessimal for toast compression

From
Teodor Sigaev
Date:
> value-to-be-compressed.  (first_success_by is 1024 in the default set of
> compression parameters.)

Curious idea: we could swap JEntry array and values: values in the 
begining of type will be catched by pg_lzcompress. But we will need to 
know offset of JEntry array, so header will grow up till 8 bytes 
(actually, it will be a varlena header!)


-- 
Teodor Sigaev                      E-mail: teodor@sigaev.ru                                      WWW:
http://www.sigaev.ru/



Re: jsonb format is pessimal for toast compression

From
Teodor Sigaev
Date:
> Curious idea: we could swap JEntry array and values: values in the
> begining of type will be catched by pg_lzcompress. But we will need to
> know offset of JEntry array, so header will grow up till 8 bytes
> (actually, it will be a varlena header!)

May be I wasn't clear:jsonb type will start from string collection 
instead of JEntry array, JEntry array will be placed at the end of 
object/array. so, pg_lzcompress will find repeatable 4-byte pieces in 
first 1024 bytes of jsonb.

-- 
Teodor Sigaev                      E-mail: teodor@sigaev.ru                                      WWW:
http://www.sigaev.ru/



Re: jsonb format is pessimal for toast compression

From
Bruce Momjian
Date:
On Fri, Aug  8, 2014 at 11:02:26AM -0400, Tom Lane wrote:
> 2. Are we going to ship 9.4 without fixing this?  I definitely don't see
> replacing pg_lzcompress as being on the agenda for 9.4, whereas changing
> jsonb is still within the bounds of reason.

FYI, pg_upgrade could be taught to refuse to upgrade from earlier 9.4
betas and report the problem JSONB columns.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: jsonb format is pessimal for toast compression

From
Josh Berkus
Date:
On 08/08/2014 08:02 AM, Tom Lane wrote:
> 2. Are we going to ship 9.4 without fixing this?  I definitely don't see
> replacing pg_lzcompress as being on the agenda for 9.4, whereas changing
> jsonb is still within the bounds of reason.
> 
> Considering all the hype that's built up around jsonb, shipping a design
> with a fundamental performance handicap doesn't seem like a good plan
> to me.  We could perhaps band-aid around it by using different compression
> parameters for jsonb, although that would require some painful API changes
> since toast_compress_datum() doesn't know what datatype it's operating on.

I would rather ship late than ship a noncompressable JSONB.

One we ship 9.4, many users are going to load 100's of GB into JSONB
fields.  Even if we fix the compressability issue in 9.5, those users
won't be able to fix the compression without rewriting all their data,
which could be prohibitive.  And we'll be  in a position where we have
to support the 9.4 JSONB format/compression technique for years so that
users aren't blocked from upgrading.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: jsonb format is pessimal for toast compression

From
Alexander Korotkov
Date:
On Fri, Aug 8, 2014 at 9:14 PM, Teodor Sigaev <teodor@sigaev.ru> wrote:
Curious idea: we could swap JEntry array and values: values in the
begining of type will be catched by pg_lzcompress. But we will need to
know offset of JEntry array, so header will grow up till 8 bytes
(actually, it will be a varlena header!)

May be I wasn't clear:jsonb type will start from string collection instead of JEntry array, JEntry array will be placed at the end of object/array. so, pg_lzcompress will find repeatable 4-byte pieces in first 1024 bytes of jsonb.

Another idea I have is that store offset in each JEntry is not necessary to have benefit of binary search. Namely what if we store offsets in each 8th JEntry and length in others? The speed of binary search will be about the same: overhead is only calculation offsets in the 8-entries chunk. But lengths will probably repeat. 

------
With best regards,
Alexander Korotkov. 
  

Re: jsonb format is pessimal for toast compression

From
Ants Aasma
Date:
On Fri, Aug 8, 2014 at 7:35 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
>> I took a quick look and saw that this wouldn't be that easy to get around.
>> As I'd suspected upthread, there are places that do random access into a
>> JEntry array, such as the binary search in findJsonbValueFromContainer().
>> If we have to add up all the preceding lengths to locate the corresponding
>> value part, we lose the performance advantages of binary search.  AFAICS
>> that's applied directly to the on-disk representation.  I'd thought
>> perhaps there was always a transformation step to build a pointer list,
>> but nope.
>
> It would be interesting to know what the performance hit would be if we
> calculated the offsets/pointers on the fly, especially if we could cache it
> somehow. The main benefit of binary search is in saving on comparisons,
> especially of strings, ISTM, and that could still be available - this would
> just be a bit of extra arithmetic.

I don't think binary search is the main problem here. Objects are
usually reasonably sized, while arrays are more likely to be huge. To
make matters worse, jsonb -> int goes from O(1) to O(n).

Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de



Re: jsonb format is pessimal for toast compression

From
Hannu Krosing
Date:
On 08/08/2014 06:17 AM, Tom Lane wrote:
> I looked into the issue reported in bug #11109.  The problem appears to be
> that jsonb's on-disk format is designed in such a way that the leading
> portion of any JSON array or object will be fairly incompressible, because
> it consists mostly of a strictly-increasing series of integer offsets.
How hard and how expensive would it be to teach pg_lzcompress to
apply a delta filter on suitable data ?

So that instead of integers their deltas will be fed to the "real"
compressor


-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ




Re: jsonb format is pessimal for toast compression

From
Peter Geoghegan
Date:
On Fri, Aug 8, 2014 at 12:41 PM, Ants Aasma <ants@cybertec.at> wrote:
> I don't think binary search is the main problem here. Objects are
> usually reasonably sized, while arrays are more likely to be huge. To
> make matters worse, jsonb -> int goes from O(1) to O(n).

I don't think it's true that arrays are more likely to be huge. That
regression would be bad, but jsonb -> int is not the most compelling
operator by far. The indexable operators (in particular, @>) don't
support subscripting arrays like that, and with good reason.

-- 
Peter Geoghegan



Re: jsonb format is pessimal for toast compression

From
Peter Geoghegan
Date:
On Fri, Aug 8, 2014 at 12:06 PM, Josh Berkus <josh@agliodbs.com> wrote:
> One we ship 9.4, many users are going to load 100's of GB into JSONB
> fields.  Even if we fix the compressability issue in 9.5, those users
> won't be able to fix the compression without rewriting all their data,
> which could be prohibitive.  And we'll be  in a position where we have
> to support the 9.4 JSONB format/compression technique for years so that
> users aren't blocked from upgrading.

FWIW, if we take the delicious JSON data as representative, a table
storing that data as jsonb is 1374 MB in size. Whereas an equivalent
table with the data typed using the original json datatype (but with
white space differences more or less ignored, because it was created
using a jsonb -> json cast), the same data is 1352 MB.

Larry's complaint is valid; this is a real problem, and I'd like to
fix it before 9.4 is out. However, let us not lose sight of the fact
that JSON data is usually a poor target for TOAST compression. With
idiomatic usage, redundancy is very much more likely to appear across
rows, and not within individual Datums. Frankly, we aren't doing a
very good job there, and doing better requires an alternative
strategy.

-- 
Peter Geoghegan



Re: jsonb format is pessimal for toast compression

From
Larry White
Date:
I was not complaining; I think JSONB is awesome. 

But I am one of those people who would like to put 100's of GB (or more) JSON files into Postgres and I am concerned about file size and possible future changes to the format.


On Fri, Aug 8, 2014 at 7:10 PM, Peter Geoghegan <pg@heroku.com> wrote:
On Fri, Aug 8, 2014 at 12:06 PM, Josh Berkus <josh@agliodbs.com> wrote:
> One we ship 9.4, many users are going to load 100's of GB into JSONB
> fields.  Even if we fix the compressability issue in 9.5, those users
> won't be able to fix the compression without rewriting all their data,
> which could be prohibitive.  And we'll be  in a position where we have
> to support the 9.4 JSONB format/compression technique for years so that
> users aren't blocked from upgrading.

FWIW, if we take the delicious JSON data as representative, a table
storing that data as jsonb is 1374 MB in size. Whereas an equivalent
table with the data typed using the original json datatype (but with
white space differences more or less ignored, because it was created
using a jsonb -> json cast), the same data is 1352 MB.

Larry's complaint is valid; this is a real problem, and I'd like to
fix it before 9.4 is out. However, let us not lose sight of the fact
that JSON data is usually a poor target for TOAST compression. With
idiomatic usage, redundancy is very much more likely to appear across
rows, and not within individual Datums. Frankly, we aren't doing a
very good job there, and doing better requires an alternative
strategy.

--
Peter Geoghegan

Re: jsonb format is pessimal for toast compression

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > * Tom Lane (tgl@sss.pgh.pa.us) wrote:
> >> I looked into the issue reported in bug #11109.  The problem appears to be
> >> that jsonb's on-disk format is designed in such a way that the leading
> >> portion of any JSON array or object will be fairly incompressible, because
> >> it consists mostly of a strictly-increasing series of integer offsets.
> >> This interacts poorly with the code in pglz_compress() that gives up if
> >> it's found nothing compressible in the first first_success_by bytes of a
> >> value-to-be-compressed.  (first_success_by is 1024 in the default set of
> >> compression parameters.)
>
> > I haven't looked at this in any detail, so take this with a grain of
> > salt, but what about teaching pglz_compress about using an offset
> > farther into the data, if the incoming data is quite a bit larger than
> > 1k?  This is just a test to see if it's worthwhile to keep going, no?
>
> Well, the point of the existing approach is that it's a *nearly free*
> test to see if it's worthwhile to keep going; there's just one if-test
> added in the outer loop of the compression code.  (cf commit ad434473ebd2,
> which added that along with some other changes.)  AFAICS, what we'd have
> to do to do it as you suggest would to execute compression on some subset
> of the data and then throw away that work entirely.  I do not find that
> attractive, especially when for most datatypes there's no particular
> reason to look at one subset instead of another.

Ah, I see- we were using the first block as it means we can reuse the
work done on it if we decide to continue with the compression.  Makes
sense.  We could possibly arrange to have the amount attempted depend on
the data type, but you point out that we can't do that without teaching
lower components about types, which is less than ideal.

What about considering how large the object is when we are analyzing if
it compresses well overall?  That is- for a larger object, make a larger
effort to compress it.  There's clearly a pessimistic case which could
arise from that, but it may be better than the current situation.
There's a clear risk that such an algorithm may well be very type
specific, meaning that we make things worse for some types (eg: bytea's
which end up never compressing well we'd likely spend more CPU time
trying than we do today).

> 1. The real problem here is that jsonb is emitting quite a bit of
> fundamentally-nonrepetitive data, even when the user-visible input is very
> repetitive.  That's a compression-unfriendly transformation by anyone's
> measure.  Assuming that some future replacement for pg_lzcompress() will
> nonetheless be able to compress the data strikes me as mostly wishful
> thinking.  Besides, we'd more than likely have a similar early-exit rule
> in any substitute implementation, so that we'd still be at risk even if
> it usually worked.

I agree that jsonb ends up being nonrepetitive in part, which is why
I've been trying to push the discussion in the direction of making it
more likely for the highly-compressible data to be considered rather
than the start of the jsonb object.  I don't care for our compression
algorithm having to be catered to in this regard in general though as
the exact same problem could, and likely does, exist in some real life
bytea-using PG implementations.

I disagree that another algorithm wouldn't be able to manage better on
this data than pglz.  pglz, from my experience, is notoriously bad a
certain data sets which other algorithms are not as poorly impacted by.

> 2. Are we going to ship 9.4 without fixing this?  I definitely don't see
> replacing pg_lzcompress as being on the agenda for 9.4, whereas changing
> jsonb is still within the bounds of reason.

I'd really hate to ship 9.4 without a fix for this, but I have a similar
hard time with shipping 9.4 without the binary search component..

> Considering all the hype that's built up around jsonb, shipping a design
> with a fundamental performance handicap doesn't seem like a good plan
> to me.  We could perhaps band-aid around it by using different compression
> parameters for jsonb, although that would require some painful API changes
> since toast_compress_datum() doesn't know what datatype it's operating on.

I don't like the idea of shipping with this handicap either.

Perhaps another options would be a new storage type which basically says
"just compress it, no matter what"?  We'd be able to make that the
default for jsonb columns too, no?  Again- I'll admit this is shooting
from the hip, but I wanted to get these thoughts out and I won't have
much more time tonight.
Thanks!    Stephen

Re: jsonb format is pessimal for toast compression

From
Stephen Frost
Date:
* Bruce Momjian (bruce@momjian.us) wrote:
> On Fri, Aug  8, 2014 at 11:02:26AM -0400, Tom Lane wrote:
> > 2. Are we going to ship 9.4 without fixing this?  I definitely don't see
> > replacing pg_lzcompress as being on the agenda for 9.4, whereas changing
> > jsonb is still within the bounds of reason.
>
> FYI, pg_upgrade could be taught to refuse to upgrade from earlier 9.4
> betas and report the problem JSONB columns.

That is *not* a good solution..
Thanks,    Stephen

Re: jsonb format is pessimal for toast compression

From
Stephen Frost
Date:
* Josh Berkus (josh@agliodbs.com) wrote:
> On 08/08/2014 08:02 AM, Tom Lane wrote:
> > 2. Are we going to ship 9.4 without fixing this?  I definitely don't see
> > replacing pg_lzcompress as being on the agenda for 9.4, whereas changing
> > jsonb is still within the bounds of reason.
> >
> > Considering all the hype that's built up around jsonb, shipping a design
> > with a fundamental performance handicap doesn't seem like a good plan
> > to me.  We could perhaps band-aid around it by using different compression
> > parameters for jsonb, although that would require some painful API changes
> > since toast_compress_datum() doesn't know what datatype it's operating on.
>
> I would rather ship late than ship a noncompressable JSONB.
>
> One we ship 9.4, many users are going to load 100's of GB into JSONB
> fields.  Even if we fix the compressability issue in 9.5, those users
> won't be able to fix the compression without rewriting all their data,
> which could be prohibitive.  And we'll be  in a position where we have
> to support the 9.4 JSONB format/compression technique for years so that
> users aren't blocked from upgrading.

Would you accept removing the binary-search capability from jsonb just
to make it compressable?  I certainly wouldn't.  I'd hate to ship late
also, but I'd be willing to support that if we can find a good solution
to keep both compressability and binary-search (and provided it doesn't
delay us many months..).
Thanks,
    Stephen

Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> What about considering how large the object is when we are analyzing if
> it compresses well overall?

Hmm, yeah, that's a possibility: we could redefine the limit at which
we bail out in terms of a fraction of the object size instead of a fixed
limit.  However, that risks expending a large amount of work before we
bail, if we have a very large incompressible object --- which is not
exactly an unlikely case.  Consider for example JPEG images stored as
bytea, which I believe I've heard of people doing.  Another issue is
that it's not real clear that that fixes the problem for any fractional
size we'd want to use.  In Larry's example of a jsonb value that fails
to compress, the header size is 940 bytes out of about 12K, so we'd be
needing to trial-compress about 10% of the object before we reach
compressible data --- and I doubt his example is worst-case.

>> 1. The real problem here is that jsonb is emitting quite a bit of
>> fundamentally-nonrepetitive data, even when the user-visible input is very
>> repetitive.  That's a compression-unfriendly transformation by anyone's
>> measure.

> I disagree that another algorithm wouldn't be able to manage better on
> this data than pglz.  pglz, from my experience, is notoriously bad a
> certain data sets which other algorithms are not as poorly impacted by.

Well, I used to be considered a compression expert, and I'm going to
disagree with you here.  It's surely possible that other algorithms would
be able to get some traction where pglz fails to get any, but that doesn't
mean that presenting them with hard-to-compress data in the first place is
a good design decision.  There is no scenario in which data like this is
going to be friendly to a general-purpose compression algorithm.  It'd
be necessary to have explicit knowledge that the data consists of an
increasing series of four-byte integers to be able to do much with it.
And then such an assumption would break down once you got past the
header ...

> Perhaps another options would be a new storage type which basically says
> "just compress it, no matter what"?  We'd be able to make that the
> default for jsonb columns too, no?

Meh.  We could do that, but it would still require adding arguments to
toast_compress_datum() that aren't there now.  In any case, this is a
band-aid solution; and as Josh notes, once we ship 9.4 we are going to
be stuck with jsonb's on-disk representation pretty much forever.
        regards, tom lane



Re: jsonb format is pessimal for toast compression

From
Andrew Dunstan
Date:
On 08/08/2014 08:45 PM, Tom Lane wrote:
>> Perhaps another options would be a new storage type which basically says
>> "just compress it, no matter what"?  We'd be able to make that the
>> default for jsonb columns too, no?
> Meh.  We could do that, but it would still require adding arguments to
> toast_compress_datum() that aren't there now.  In any case, this is a
> band-aid solution; and as Josh notes, once we ship 9.4 we are going to
> be stuck with jsonb's on-disk representation pretty much forever.
>


Yeah, and almost any other solution is likely to mean non-jsonb users 
potentially paying a penalty for fixing this for jsonb. So if we can 
adjust the jsonb layout to fix this problem I think we should do so.

cheers

andrew



Re: jsonb format is pessimal for toast compression

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > What about considering how large the object is when we are analyzing if
> > it compresses well overall?
>
> Hmm, yeah, that's a possibility: we could redefine the limit at which
> we bail out in terms of a fraction of the object size instead of a fixed
> limit.  However, that risks expending a large amount of work before we
> bail, if we have a very large incompressible object --- which is not
> exactly an unlikely case.  Consider for example JPEG images stored as
> bytea, which I believe I've heard of people doing.  Another issue is
> that it's not real clear that that fixes the problem for any fractional
> size we'd want to use.  In Larry's example of a jsonb value that fails
> to compress, the header size is 940 bytes out of about 12K, so we'd be
> needing to trial-compress about 10% of the object before we reach
> compressible data --- and I doubt his example is worst-case.

Agreed- I tried to allude to that in my prior mail, there's clearly a
concern that we'd make things worse in certain situations.  Then again,
at least for that case, we could recommend changing the storage type to
EXTERNAL.

> >> 1. The real problem here is that jsonb is emitting quite a bit of
> >> fundamentally-nonrepetitive data, even when the user-visible input is very
> >> repetitive.  That's a compression-unfriendly transformation by anyone's
> >> measure.
>
> > I disagree that another algorithm wouldn't be able to manage better on
> > this data than pglz.  pglz, from my experience, is notoriously bad a
> > certain data sets which other algorithms are not as poorly impacted by.
>
> Well, I used to be considered a compression expert, and I'm going to
> disagree with you here.  It's surely possible that other algorithms would
> be able to get some traction where pglz fails to get any, but that doesn't
> mean that presenting them with hard-to-compress data in the first place is
> a good design decision.  There is no scenario in which data like this is
> going to be friendly to a general-purpose compression algorithm.  It'd
> be necessary to have explicit knowledge that the data consists of an
> increasing series of four-byte integers to be able to do much with it.
> And then such an assumption would break down once you got past the
> header ...

I've wondered previously as to if we, perhaps, missed the boat pretty
badly by not providing an explicitly versioned per-type compression
capability, such that we wouldn't be stuck with one compression
algorith for all types, and would be able to version compression types
in a way that would allow us to change them over time, provided the
newer code always understands how to decode X-4 (or whatever) versions
back.

I do agree that it'd be great to represent every type in a highly
compressable way for the sake of the compression algorithm, but
I've not seen any good suggestions for how to make that happen and I've
got a hard time seeing how we could completely change the jsonb storage
format, retain the capabilities it has today, make it highly
compressible, and get 9.4 out this calendar year.

I expect we could trivially add padding into the jsonb header to make it
compress better, for the sake of this particular check, but then we're
going to always be compression jsonb, even when the user data isn't
actually terribly good for compression, spending a good bit of CPU time
while we're at it.

> > Perhaps another options would be a new storage type which basically says
> > "just compress it, no matter what"?  We'd be able to make that the
> > default for jsonb columns too, no?
>
> Meh.  We could do that, but it would still require adding arguments to
> toast_compress_datum() that aren't there now.  In any case, this is a
> band-aid solution; and as Josh notes, once we ship 9.4 we are going to
> be stuck with jsonb's on-disk representation pretty much forever.

I agree that we need to avoid changing jsonb's on-disk representation.
Have I missed where a good suggestion has been made about how to do that
which preserves the binary-search capabilities and doesn't make the code
much more difficult?  Trying to move the header to the end just for the
sake of this doesn't strike me as a good solution as it'll make things
quite a bit more complicated.  Is there a way we could interleave the
likely-compressible user data in with the header instead?  I've not
looked, but it seems like that's the only reasonable approach to address
this issue in this manner.  If that's simply done, then great, but it
strikes me as unlikely to be..

I'll just throw out a bit of a counter-point to all this also though- we
don't try to focus on making our on-disk representation of data,
generally, very compressible even though there are filesystems, such as
ZFS, which might benefit from certain rearrangements of our on-disk
formats (no, I don't have any specific recommendations in this vein, but
I certainly don't see anyone else asking after it or asking for us to be
concerned about it).  Compression is great and I'd hate to see us have a
format that will just work with it even though it might be beneficial in
many cases, but I feel the fault here is with the compression algorithm
and the decisions made as part of that operation and not really with
this particular data structure.
Thanks,
    Stephen

Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> I agree that we need to avoid changing jsonb's on-disk representation.

... post-release, I assume you mean.

> Have I missed where a good suggestion has been made about how to do that
> which preserves the binary-search capabilities and doesn't make the code
> much more difficult?

We don't have one yet, but we've only been thinking about this for a few
hours.

> Trying to move the header to the end just for the
> sake of this doesn't strike me as a good solution as it'll make things
> quite a bit more complicated.  Is there a way we could interleave the
> likely-compressible user data in with the header instead?

Yeah, I was wondering about that too, but I don't immediately see how to
do it without some sort of preprocessing step when we read the object
(which'd be morally equivalent to converting a series of lengths into a
pointer array).  Binary search isn't going to work if the items it's
searching in aren't all the same size.

Having said that, I am not sure that a preprocessing step is a
deal-breaker.  It'd be O(N), but with a pretty darn small constant factor,
and for plausible sizes of objects I think the binary search might still
dominate.  Worth investigation perhaps.
        regards, tom lane



Re: jsonb format is pessimal for toast compression

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > I agree that we need to avoid changing jsonb's on-disk representation.
>
> ... post-release, I assume you mean.

Yes.

> > Have I missed where a good suggestion has been made about how to do that
> > which preserves the binary-search capabilities and doesn't make the code
> > much more difficult?
>
> We don't have one yet, but we've only been thinking about this for a few
> hours.

Fair enough.

> > Trying to move the header to the end just for the
> > sake of this doesn't strike me as a good solution as it'll make things
> > quite a bit more complicated.  Is there a way we could interleave the
> > likely-compressible user data in with the header instead?
>
> Yeah, I was wondering about that too, but I don't immediately see how to
> do it without some sort of preprocessing step when we read the object
> (which'd be morally equivalent to converting a series of lengths into a
> pointer array).  Binary search isn't going to work if the items it's
> searching in aren't all the same size.
>
> Having said that, I am not sure that a preprocessing step is a
> deal-breaker.  It'd be O(N), but with a pretty darn small constant factor,
> and for plausible sizes of objects I think the binary search might still
> dominate.  Worth investigation perhaps.

For my part, I'm less concerned about a preprocessing step which happens
when we store the data and more concerned about ensuring that we're able
to extract data quickly.  Perhaps that's simply because I'm used to
writes being more expensive than reads, but I'm not alone in that
regard either.  I doubt I'll have time in the next couple of weeks to
look into this and if we're going to want this change for 9.4, we really
need someone working on it sooner than later.  (to the crowd)- do we
have any takers for this investigation?
Thanks,
    Stephen

Re: jsonb format is pessimal for toast compression

From
Amit Kapila
Date:
On Sat, Aug 9, 2014 at 6:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Stephen Frost <sfrost@snowman.net> writes:
> > What about considering how large the object is when we are analyzing if
> > it compresses well overall?
>
> Hmm, yeah, that's a possibility: we could redefine the limit at which
> we bail out in terms of a fraction of the object size instead of a fixed
> limit.  However, that risks expending a large amount of work before we
> bail, if we have a very large incompressible object --- which is not
> exactly an unlikely case.  Consider for example JPEG images stored as
> bytea, which I believe I've heard of people doing.  Another issue is
> that it's not real clear that that fixes the problem for any fractional
> size we'd want to use.  In Larry's example of a jsonb value that fails
> to compress, the header size is 940 bytes out of about 12K, so we'd be
> needing to trial-compress about 10% of the object before we reach
> compressible data --- and I doubt his example is worst-case.
>
> >> 1. The real problem here is that jsonb is emitting quite a bit of
> >> fundamentally-nonrepetitive data, even when the user-visible input is very
> >> repetitive.  That's a compression-unfriendly transformation by anyone's
> >> measure.
>
> > I disagree that another algorithm wouldn't be able to manage better on
> > this data than pglz.  pglz, from my experience, is notoriously bad a
> > certain data sets which other algorithms are not as poorly impacted by.
>
> Well, I used to be considered a compression expert, and I'm going to
> disagree with you here.  It's surely possible that other algorithms would
> be able to get some traction where pglz fails to get any, 

During my previous work in this area, I had seen that some algorithms
use skipping logic which can be useful for incompressible data followed
by compressible data or in general as well.  One of the technique could
be If we don't find any match for first 4 bytes, then skip 4 bytes
and if we don't find match again for next 8 bytes, then skip 8
bytes and keep on doing the same until we find first match in which
case it would go back to beginning of data.  Now here we could follow
this logic until we actually compare total of first_success_by bytes.
There can be caveats in this particular scheme of skipping but I
just wanted to mention in general about the skipping idea to reduce
the number of situations where we will bail out even though there is
lot of compressible data.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: jsonb format is pessimal for toast compression

From
Bruce Momjian
Date:
On Fri, Aug  8, 2014 at 08:25:04PM -0400, Stephen Frost wrote:
> * Bruce Momjian (bruce@momjian.us) wrote:
> > On Fri, Aug  8, 2014 at 11:02:26AM -0400, Tom Lane wrote:
> > > 2. Are we going to ship 9.4 without fixing this?  I definitely don't see
> > > replacing pg_lzcompress as being on the agenda for 9.4, whereas changing
> > > jsonb is still within the bounds of reason.
> > 
> > FYI, pg_upgrade could be taught to refuse to upgrade from earlier 9.4
> > betas and report the problem JSONB columns.
> 
> That is *not* a good solution..

If you change the JSONB binary format, and we can't read the old format,
that is the only option.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: jsonb format is pessimal for toast compression

From
David G Johnston
Date:
akapila wrote
> On Sat, Aug 9, 2014 at 6:15 AM, Tom Lane <

> tgl@.pa

> > wrote:
>>
>> Stephen Frost <

> sfrost@

> > writes:
>> > What about considering how large the object is when we are analyzing if
>> > it compresses well overall?
>>
>> Hmm, yeah, that's a possibility: we could redefine the limit at which
>> we bail out in terms of a fraction of the object size instead of a fixed
>> limit.  However, that risks expending a large amount of work before we
>> bail, if we have a very large incompressible object --- which is not
>> exactly an unlikely case.  Consider for example JPEG images stored as
>> bytea, which I believe I've heard of people doing.  Another issue is
>> that it's not real clear that that fixes the problem for any fractional
>> size we'd want to use.  In Larry's example of a jsonb value that fails
>> to compress, the header size is 940 bytes out of about 12K, so we'd be
>> needing to trial-compress about 10% of the object before we reach
>> compressible data --- and I doubt his example is worst-case.
>>
>> >> 1. The real problem here is that jsonb is emitting quite a bit of
>> >> fundamentally-nonrepetitive data, even when the user-visible input is
> very
>> >> repetitive.  That's a compression-unfriendly transformation by
>> anyone's
>> >> measure.
>>
>> > I disagree that another algorithm wouldn't be able to manage better on
>> > this data than pglz.  pglz, from my experience, is notoriously bad a
>> > certain data sets which other algorithms are not as poorly impacted by.
>>
>> Well, I used to be considered a compression expert, and I'm going to
>> disagree with you here.  It's surely possible that other algorithms would
>> be able to get some traction where pglz fails to get any,
> 
> During my previous work in this area, I had seen that some algorithms
> use skipping logic which can be useful for incompressible data followed
> by compressible data or in general as well. 

Random thought from the sideline...

This particular data type has the novel (within PostgreSQL) design of both a
(feature oriented - and sizeable) header and a payload.  Is there some way
to add that model into the storage system so that, at a higher level,
separate attempts are made to compress each section and then the compressed
(or not) results and written out adjacently and with a small header
indicating the length of the stored header and other meta data like whether
each part is compressed and even the type that data represents?  For reading
back into memory the header-payload generic type is populated and then the
header and payload decompressed - as needed - then the two parts are fed
into the appropriate type constructor that understands and accepts the two
pieces.

Just hoping to spark an idea here - I don't know enough about the internals
to even guess how close I am to something feasible.

David J.





--
View this message in context:
http://postgresql.1045698.n5.nabble.com/jsonb-format-is-pessimal-for-toast-compression-tp5814162p5814299.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



Re: jsonb format is pessimal for toast compression

From
Stephen Frost
Date:
Bruce,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Fri, Aug  8, 2014 at 08:25:04PM -0400, Stephen Frost wrote:
> > * Bruce Momjian (bruce@momjian.us) wrote:
> > > FYI, pg_upgrade could be taught to refuse to upgrade from earlier 9.4
> > > betas and report the problem JSONB columns.
> >
> > That is *not* a good solution..
>
> If you change the JSONB binary format, and we can't read the old format,
> that is the only option.

Apologies- I had thought you were suggesting this for a 9.4 -> 9.5
conversion, not for just 9.4beta to 9.4.  Adding that to pg_upgrade to
address folks upgrading from betas would certainly be fine.
Thanks,
    Stephen

Re: jsonb format is pessimal for toast compression

From
Kevin Grittner
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Stephen Frost <sfrost@snowman.net> writes:

>> Trying to move the header to the end just for the sake of this
>> doesn't strike me as a good solution as it'll make things quite
>> a bit more complicated.

Why is that?  How much harder would it be to add a single offset
field to the front to point to the part we're shifting to the end?
It is not all that unusual to put a directory at the end, like in
the .zip file format.



>> Is there a way we could interleave the likely-compressible user
>> data in with the header instead?
>
> Yeah, I was wondering about that too, but I don't immediately see
> how to do it without some sort of preprocessing step when we read
> the object (which'd be morally equivalent to converting a series
> of lengths into a pointer array).

That sounds far more complex and fragile than just moving the
indexes to the end.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
Kevin Grittner <kgrittn@ymail.com> writes:
>> Stephen Frost <sfrost@snowman.net> writes:
>>> Trying to move the header to the end just for the sake of this
>>> doesn't strike me as a good solution as it'll make things quite
>>> a bit more complicated.

> Why is that?� How much harder would it be to add a single offset
> field to the front to point to the part we're shifting to the end?
> It is not all that unusual to put a directory at the end, like in
> the .zip file format.

Yeah, I was wondering that too.  Arguably, directory-at-the-end would
be easier to work with for on-the-fly creation, not that we do any
such thing at the moment.  I think the main thing that's bugging Stephen
is that doing that just to make pglz_compress happy seems like a kluge
(and I have to agree).

Here's a possibly more concrete thing to think about: we may very well
someday want to support JSONB object field or array element extraction
without reading all blocks of a large toasted JSONB value, if the value is
stored external without compression.  We already went to the trouble of
creating analogous logic for substring extraction from a long uncompressed
text or bytea value, so I think this is a plausible future desire.  With
the current format you could imagine grabbing the first TOAST chunk, and
then if you see the header is longer than that you can grab the remainder
of the header without any wasted I/O, and for the array-subscripting case
you'd now have enough info to fetch the element value from the body of
the JSONB without any wasted I/O.  With directory-at-the-end you'd
have to read the first chunk just to get the directory pointer, and this
would most likely not give you any of the directory proper; but at least
you'd know exactly how big the directory is before you go to read it in.
The former case is probably slightly better.  However, if you're doing an
object key lookup not an array element fetch, neither of these formats are
really friendly at all, because each binary-search probe probably requires
bringing in one or two toast chunks from the body of the JSONB value so
you can look at the key text.  I'm not sure if there's a way to redesign
the format to make that less painful/expensive --- but certainly, having
the key texts scattered through the JSONB value doesn't seem like a great
thing from this standpoint.
        regards, tom lane



Re: jsonb format is pessimal for toast compression

From
Robert Haas
Date:
On Sat, Aug 9, 2014 at 3:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Kevin Grittner <kgrittn@ymail.com> writes:
>>> Stephen Frost <sfrost@snowman.net> writes:
>>>> Trying to move the header to the end just for the sake of this
>>>> doesn't strike me as a good solution as it'll make things quite
>>>> a bit more complicated.
>
>> Why is that?  How much harder would it be to add a single offset
>> field to the front to point to the part we're shifting to the end?
>> It is not all that unusual to put a directory at the end, like in
>> the .zip file format.
>
> Yeah, I was wondering that too.  Arguably, directory-at-the-end would
> be easier to work with for on-the-fly creation, not that we do any
> such thing at the moment.  I think the main thing that's bugging Stephen
> is that doing that just to make pglz_compress happy seems like a kluge
> (and I have to agree).
>
> Here's a possibly more concrete thing to think about: we may very well
> someday want to support JSONB object field or array element extraction
> without reading all blocks of a large toasted JSONB value, if the value is
> stored external without compression.  We already went to the trouble of
> creating analogous logic for substring extraction from a long uncompressed
> text or bytea value, so I think this is a plausible future desire.  With
> the current format you could imagine grabbing the first TOAST chunk, and
> then if you see the header is longer than that you can grab the remainder
> of the header without any wasted I/O, and for the array-subscripting case
> you'd now have enough info to fetch the element value from the body of
> the JSONB without any wasted I/O.  With directory-at-the-end you'd
> have to read the first chunk just to get the directory pointer, and this
> would most likely not give you any of the directory proper; but at least
> you'd know exactly how big the directory is before you go to read it in.
> The former case is probably slightly better.  However, if you're doing an
> object key lookup not an array element fetch, neither of these formats are
> really friendly at all, because each binary-search probe probably requires
> bringing in one or two toast chunks from the body of the JSONB value so
> you can look at the key text.  I'm not sure if there's a way to redesign
> the format to make that less painful/expensive --- but certainly, having
> the key texts scattered through the JSONB value doesn't seem like a great
> thing from this standpoint.

I think that's a good point.

On the general topic, I don't think it's reasonable to imagine that
we're going to come up with a single heuristic that works well for
every kind of input data.  What pglz is doing - assuming that if the
beginning of the data is incompressible then the rest probably is too
- is fundamentally reasonable, nonwithstanding the fact that it
doesn't happen to work out well for JSONB.  We might be able to tinker
with that general strategy in some way that seems to fix this case and
doesn't appear to break others, but there's some risk in that, and
there's no obvious reason in my mind why PGLZ should be require to fly
blind.  So I think it would be a better idea to arrange some method by
which JSONB (and perhaps other data types) can provide compression
hints to pglz.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> ... I think it would be a better idea to arrange some method by
> which JSONB (and perhaps other data types) can provide compression
> hints to pglz.

I agree with that as a long-term goal, but not sure if it's sane to
push into 9.4.

What we could conceivably do now is (a) add a datatype OID argument to
toast_compress_datum, and (b) hard-wire the selection of a different
compression-parameters struct if it's JSONBOID.  The actual fix would
then be to increase the first_success_by field of this alternate struct.

I had been worrying about API-instability risks associated with (a),
but on reflection it seems unlikely that any third-party code calls
toast_compress_datum directly, and anyway it's not something we'd
be back-patching to before 9.4.

The main objection to (b) is that it wouldn't help for domains over jsonb
(and no, I don't want to insert a getBaseType call there to fix that).

A longer-term solution would be to make this some sort of type property
that domains could inherit, like typstorage is already.  (Somebody
suggested dealing with this by adding more typstorage values, but
I don't find that attractive; typstorage is known in too many places.)
We'd need some thought about exactly what we want to expose, since
the specific knobs that pglz_compress has today aren't necessarily
good for the long term.

This is all kinda ugly really, but since I'm not hearing brilliant
ideas for redesigning jsonb's storage format, maybe this is the
best we can do for now.
        regards, tom lane



Re: jsonb format is pessimal for toast compression

From
Peter Geoghegan
Date:
On Mon, Aug 11, 2014 at 12:07 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> I think that's a good point.

I think that there may be something to be said for the current layout.
Having adjacent keys and values could take better advantage of CPU
cache characteristics. I've heard of approaches to improving B-Tree
locality that forced keys and values to be adjacent on individual
B-Tree pages [1], for example. I've heard of this more than once. And
FWIW, I believe based on earlier research of user requirements in this
area that very large jsonb datums are not considered all that
compelling. Document database systems have considerable limitations
here.

> On the general topic, I don't think it's reasonable to imagine that
> we're going to come up with a single heuristic that works well for
> every kind of input data.  What pglz is doing - assuming that if the
> beginning of the data is incompressible then the rest probably is too
> - is fundamentally reasonable, nonwithstanding the fact that it
> doesn't happen to work out well for JSONB.  We might be able to tinker
> with that general strategy in some way that seems to fix this case and
> doesn't appear to break others, but there's some risk in that, and
> there's no obvious reason in my mind why PGLZ should be require to fly
> blind.  So I think it would be a better idea to arrange some method by
> which JSONB (and perhaps other data types) can provide compression
> hints to pglz.

If there is to be any effort to make jsonb a more effective target for
compression, I imagine that that would have to target redundancy
between JSON documents. With idiomatic usage, we can expect plenty of
it.

[1] http://www.vldb.org/conf/1999/P7.pdf , "We also forced each key
and child pointer to be adjacent to each other physically"
-- 
Peter Geoghegan



Re: jsonb format is pessimal for toast compression

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > ... I think it would be a better idea to arrange some method by
> > which JSONB (and perhaps other data types) can provide compression
> > hints to pglz.
>
> I agree with that as a long-term goal, but not sure if it's sane to
> push into 9.4.

Agreed.

> What we could conceivably do now is (a) add a datatype OID argument to
> toast_compress_datum, and (b) hard-wire the selection of a different
> compression-parameters struct if it's JSONBOID.  The actual fix would
> then be to increase the first_success_by field of this alternate struct.

Isn't the offset-to-compressable-data variable though, depending on the
number of keys, etc?  Would we be increasing first_success_by based off
of some function which inspects the object?

> I had been worrying about API-instability risks associated with (a),
> but on reflection it seems unlikely that any third-party code calls
> toast_compress_datum directly, and anyway it's not something we'd
> be back-patching to before 9.4.

Agreed.

> The main objection to (b) is that it wouldn't help for domains over jsonb
> (and no, I don't want to insert a getBaseType call there to fix that).

While not ideal, that seems like an acceptable compromise for 9.4 to me.

> A longer-term solution would be to make this some sort of type property
> that domains could inherit, like typstorage is already.  (Somebody
> suggested dealing with this by adding more typstorage values, but
> I don't find that attractive; typstorage is known in too many places.)

Think that was me and having it be something which domains can inherit
makes sense.  Would be able to use this approach to introduce type
(and domains inheirited from that type) specific compression algorithms,
perhaps?  Or even get to a point where we could have a chunk-based
compression scheme for certain types of objects (such as JSONB) where we
keep track of which keys exist at which points in the compressed object,
allowing us to skip to the specific chunk which contains the requested
key, similar to what we do with uncompressed data?

> We'd need some thought about exactly what we want to expose, since
> the specific knobs that pglz_compress has today aren't necessarily
> good for the long term.

Agreed.

> This is all kinda ugly really, but since I'm not hearing brilliant
> ideas for redesigning jsonb's storage format, maybe this is the
> best we can do for now.

This would certainly be an improvement over what's going on now, and I
love the idea of possibly being able to expand this in the future to do
more.  What I'd hate to see is having all of this and only ever using it
to say "skip ahead another 1k for JSONB".
Thanks,
    Stephen

Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> What we could conceivably do now is (a) add a datatype OID argument to
>> toast_compress_datum, and (b) hard-wire the selection of a different
>> compression-parameters struct if it's JSONBOID.  The actual fix would
>> then be to increase the first_success_by field of this alternate struct.

> Isn't the offset-to-compressable-data variable though, depending on the
> number of keys, etc?  Would we be increasing first_success_by based off
> of some function which inspects the object?

Given that this is a short-term hack, I'd be satisfied with setting it
to INT_MAX.

If we got more ambitious, we could consider improving the cutoff logic
so that it gives up at "x% of the object or n bytes, whichever comes
first"; but I'd want to see some hard evidence that that was useful
before adding any more cycles to pglz_compress.
        regards, tom lane



Re: jsonb format is pessimal for toast compression

From
Stephen Frost
Date:
* Peter Geoghegan (pg@heroku.com) wrote:
> If there is to be any effort to make jsonb a more effective target for
> compression, I imagine that that would have to target redundancy
> between JSON documents. With idiomatic usage, we can expect plenty of
> it.

While I certainly agree, that's a rather different animal to address and
doesn't hold a lot of relevance to the current problem.  Or, to put it
another way, I don't think anyone is going to be surprised that two rows
containing the same data (even if they're inserted in the same
transaction and have the same visibility information) are compressed
together in some fashion.

We've got a clear example of someone, quite reasonably, expecting their
JSONB object to be compressed using the normal TOAST mechanism, and
we're failing to do that in cases where it's actually a win to do so.
That's the focus of this discussion and what needs to be addressed
before 9.4 goes out.
Thanks,
    Stephen

Re: jsonb format is pessimal for toast compression

From
Peter Geoghegan
Date:
On Mon, Aug 11, 2014 at 1:01 PM, Stephen Frost <sfrost@snowman.net> wrote:
> We've got a clear example of someone, quite reasonably, expecting their
> JSONB object to be compressed using the normal TOAST mechanism, and
> we're failing to do that in cases where it's actually a win to do so.
> That's the focus of this discussion and what needs to be addressed
> before 9.4 goes out.

Sure. I'm not trying to minimize that. We should fix it, certainly.
However, it does bear considering that JSON data, with each document
stored in a row is not an effective target for TOAST compression in
general, even as text.

-- 
Peter Geoghegan



Re: jsonb format is pessimal for toast compression

From
Marti Raudsepp
Date:
On Fri, Aug 8, 2014 at 10:50 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote:
> How hard and how expensive would it be to teach pg_lzcompress to
> apply a delta filter on suitable data ?
>
> So that instead of integers their deltas will be fed to the "real"
> compressor

Has anyone given this more thought? I know this might not be 9.4
material, but to me it sounds like the most promising approach, if
it's workable. This isn't a made up thing, the 7z and LZMA formats
also have an optional delta filter.

Of course with JSONB the problem is figuring out which parts to apply
the delta filter to, and which parts not.

This would also help with integer arrays, containing for example
foreign key values to a serial column. There's bound to be some
redundancy, as nearby serial values are likely to end up close
together. In one of my past projects we used to store large arrays of
integer fkeys, deliberately sorted for duplicate elimination.

For an ideal case comparison, intar2 could be as large as intar1 when
compressed with a 4-byte wide delta filter:

create table intar1 as select array(select 1::int from
generate_series(1,1000000)) a;
create table intar2 as select array(select generate_series(1,1000000)::int) a;

In PostgreSQL 9.3 the sizes are:
select pg_column_size(a) from intar1;         45810
select pg_column_size(a) from intar2;       4000020

So a factor of 87 difference.

Regards,
Marti



Re: jsonb format is pessimal for toast compression

From
Robert Haas
Date:
On Mon, Aug 11, 2014 at 3:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> ... I think it would be a better idea to arrange some method by
>> which JSONB (and perhaps other data types) can provide compression
>> hints to pglz.
>
> I agree with that as a long-term goal, but not sure if it's sane to
> push into 9.4.
>
> What we could conceivably do now is (a) add a datatype OID argument to
> toast_compress_datum, and (b) hard-wire the selection of a different
> compression-parameters struct if it's JSONBOID.  The actual fix would
> then be to increase the first_success_by field of this alternate struct.

I think it would be perfectly sane to do that for 9.4.  It may not be
perfect, but neither is what we have now.

> A longer-term solution would be to make this some sort of type property
> that domains could inherit, like typstorage is already.  (Somebody
> suggested dealing with this by adding more typstorage values, but
> I don't find that attractive; typstorage is known in too many places.)
> We'd need some thought about exactly what we want to expose, since
> the specific knobs that pglz_compress has today aren't necessarily
> good for the long term.

What would really be ideal here is if the JSON code could inform the
toast compression code "this many initial bytes are likely
incompressible, just pass them through without trying, and then start
compressing at byte N", where N is the byte following the TOC.  But I
don't know that there's a reasonable way to implement that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: jsonb format is pessimal for toast compression

From
Bruce Momjian
Date:
On Mon, Aug 11, 2014 at 01:44:05PM -0700, Peter Geoghegan wrote:
> On Mon, Aug 11, 2014 at 1:01 PM, Stephen Frost <sfrost@snowman.net> wrote:
> > We've got a clear example of someone, quite reasonably, expecting their
> > JSONB object to be compressed using the normal TOAST mechanism, and
> > we're failing to do that in cases where it's actually a win to do so.
> > That's the focus of this discussion and what needs to be addressed
> > before 9.4 goes out.
> 
> Sure. I'm not trying to minimize that. We should fix it, certainly.
> However, it does bear considering that JSON data, with each document
> stored in a row is not an effective target for TOAST compression in
> general, even as text.

Seems we have two issues:

1)  the header makes testing for compression likely to fail
2)  use of pointers rather than offsets reduces compression potential

I understand we are focusing on #1, but how much does compression reduce
the storage size with and without #2?  Seems we need to know that answer
before deciding if it is worth reducing the ability to do fast lookups
with #2.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: jsonb format is pessimal for toast compression

From
Claudio Freire
Date:
On Tue, Aug 12, 2014 at 8:00 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Mon, Aug 11, 2014 at 01:44:05PM -0700, Peter Geoghegan wrote:
>> On Mon, Aug 11, 2014 at 1:01 PM, Stephen Frost <sfrost@snowman.net> wrote:
>> > We've got a clear example of someone, quite reasonably, expecting their
>> > JSONB object to be compressed using the normal TOAST mechanism, and
>> > we're failing to do that in cases where it's actually a win to do so.
>> > That's the focus of this discussion and what needs to be addressed
>> > before 9.4 goes out.
>>
>> Sure. I'm not trying to minimize that. We should fix it, certainly.
>> However, it does bear considering that JSON data, with each document
>> stored in a row is not an effective target for TOAST compression in
>> general, even as text.
>
> Seems we have two issues:
>
> 1)  the header makes testing for compression likely to fail
> 2)  use of pointers rather than offsets reduces compression potential

I do think the best solution for 2 is what's been proposed already, to
do delta-coding of the pointers in chunks (ie, 1 pointer, 15 deltas,
repeat).

But it does make binary search quite more complex.

Alternatively, it could be somewhat compressed as follows:

Segment = 1 pointer head, 15 deltas
Pointer head = pointers[0]
delta[i] = pointers[i] - pointers[0] for i in 1..15

(delta to segment head, not previous value)

Now, you can have 4 types of segments. 8, 16, 32, 64 bits, which is
the size of the deltas. You achieve between 8x and 1x compression, and
even when 1x (no compression), you make it easier for pglz to find
something compressible.

Accessing it is also simple, if you have a segment index (tough part here).

Replace the 15 for something that makes such segment index very compact ;)



Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Seems we have two issues:
> 1)  the header makes testing for compression likely to fail
> 2)  use of pointers rather than offsets reduces compression potential

> I understand we are focusing on #1, but how much does compression reduce
> the storage size with and without #2?  Seems we need to know that answer
> before deciding if it is worth reducing the ability to do fast lookups
> with #2.

That's a fair question.  I did a very very simple hack to replace the item
offsets with item lengths -- turns out that that mostly requires removing
some code that changes lengths to offsets ;-).  I then loaded up Larry's
example of a noncompressible JSON value, and compared pg_column_size()
which is just about the right thing here since it reports datum size after
compression.  Remembering that the textual representation is 12353 bytes:

json:                382 bytes
jsonb, using offsets:        12593 bytes
jsonb, using lengths:        406 bytes

So this confirms my suspicion that the choice of offsets not lengths
is what's killing compressibility.  If it used lengths, jsonb would be
very nearly as compressible as the original text.

Hack attached in case anyone wants to collect more thorough statistics.
We'd not actually want to do it like this because of the large expense
of recomputing the offsets on-demand all the time.  (It does pass the
regression tests, for what that's worth.)

            regards, tom lane

diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index 04f35bf..2297504 100644
*** a/src/backend/utils/adt/jsonb_util.c
--- b/src/backend/utils/adt/jsonb_util.c
*************** convertJsonbArray(StringInfo buffer, JEn
*** 1378,1385 ****
                       errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
                              JENTRY_POSMASK)));

-         if (i > 0)
-             meta = (meta & ~JENTRY_POSMASK) | totallen;
          copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
          metaoffset += sizeof(JEntry);
      }
--- 1378,1383 ----
*************** convertJsonbObject(StringInfo buffer, JE
*** 1430,1437 ****
                       errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
                              JENTRY_POSMASK)));

-         if (i > 0)
-             meta = (meta & ~JENTRY_POSMASK) | totallen;
          copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
          metaoffset += sizeof(JEntry);

--- 1428,1433 ----
*************** convertJsonbObject(StringInfo buffer, JE
*** 1445,1451 ****
                       errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
                              JENTRY_POSMASK)));

-         meta = (meta & ~JENTRY_POSMASK) | totallen;
          copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
          metaoffset += sizeof(JEntry);
      }
--- 1441,1446 ----
*************** uniqueifyJsonbObject(JsonbValue *object)
*** 1592,1594 ****
--- 1587,1600 ----
          object->val.object.nPairs = res + 1 - object->val.object.pairs;
      }
  }
+
+ uint32
+ jsonb_get_offset(const JEntry *ja, int index)
+ {
+     uint32    off = 0;
+     int i;
+
+     for (i = 0; i < index; i++)
+         off += JBE_LEN(ja, i);
+     return off;
+ }
diff --git a/src/include/utils/jsonb.h b/src/include/utils/jsonb.h
index 5f2594b..c9b18e1 100644
*** a/src/include/utils/jsonb.h
--- b/src/include/utils/jsonb.h
*************** typedef uint32 JEntry;
*** 153,162 ****
   * Macros for getting the offset and length of an element. Note multiple
   * evaluations and access to prior array element.
   */
! #define JBE_ENDPOS(je_)            ((je_) & JENTRY_POSMASK)
! #define JBE_OFF(ja, i)            ((i) == 0 ? 0 : JBE_ENDPOS((ja)[i - 1]))
! #define JBE_LEN(ja, i)            ((i) == 0 ? JBE_ENDPOS((ja)[i]) \
!                                  : JBE_ENDPOS((ja)[i]) - JBE_ENDPOS((ja)[i - 1]))

  /*
   * A jsonb array or object node, within a Jsonb Datum.
--- 153,163 ----
   * Macros for getting the offset and length of an element. Note multiple
   * evaluations and access to prior array element.
   */
! #define JBE_LENFLD(je_)            ((je_) & JENTRY_POSMASK)
! #define JBE_OFF(ja, i)            jsonb_get_offset(ja, i)
! #define JBE_LEN(ja, i)            JBE_LENFLD((ja)[i])
!
! extern uint32 jsonb_get_offset(const JEntry *ja, int index);

  /*
   * A jsonb array or object node, within a Jsonb Datum.

Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
I wrote:
> That's a fair question.  I did a very very simple hack to replace the item
> offsets with item lengths -- turns out that that mostly requires removing
> some code that changes lengths to offsets ;-).  I then loaded up Larry's
> example of a noncompressible JSON value, and compared pg_column_size()
> which is just about the right thing here since it reports datum size after
> compression.  Remembering that the textual representation is 12353 bytes:

> json:                382 bytes
> jsonb, using offsets:        12593 bytes
> jsonb, using lengths:        406 bytes

Oh, one more result: if I leave the representation alone, but change
the compression parameters to set first_success_by to INT_MAX, this
value takes up 1397 bytes.  So that's better, but still more than a
3X penalty compared to using lengths.  (Admittedly, this test value
probably is an outlier compared to normal practice, since it's a hundred
or so repetitions of the same two strings.)
        regards, tom lane



Re: jsonb format is pessimal for toast compression

From
Andrew Dunstan
Date:
On 08/13/2014 09:01 PM, Tom Lane wrote:
> I wrote:
>> That's a fair question.  I did a very very simple hack to replace the item
>> offsets with item lengths -- turns out that that mostly requires removing
>> some code that changes lengths to offsets ;-).  I then loaded up Larry's
>> example of a noncompressible JSON value, and compared pg_column_size()
>> which is just about the right thing here since it reports datum size after
>> compression.  Remembering that the textual representation is 12353 bytes:
>> json:                382 bytes
>> jsonb, using offsets:        12593 bytes
>> jsonb, using lengths:        406 bytes
> Oh, one more result: if I leave the representation alone, but change
> the compression parameters to set first_success_by to INT_MAX, this
> value takes up 1397 bytes.  So that's better, but still more than a
> 3X penalty compared to using lengths.  (Admittedly, this test value
> probably is an outlier compared to normal practice, since it's a hundred
> or so repetitions of the same two strings.)
>
>             



What does changing to lengths do to the speed of other operations?

cheers

andrew




Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> On 08/13/2014 09:01 PM, Tom Lane wrote:
>>> That's a fair question.  I did a very very simple hack to replace the item
>>> offsets with item lengths -- turns out that that mostly requires removing
>>> some code that changes lengths to offsets ;-).

> What does changing to lengths do to the speed of other operations?

This was explicitly *not* an attempt to measure the speed issue.  To do
a fair trial of that, you'd have to work a good bit harder, methinks.
Examining each of N items would involve O(N^2) work with the patch as
posted, but presumably you could get it down to less in most or all
cases --- in particular, sequential traversal could be done with little
added cost.  But it'd take a lot more hacking.
        regards, tom lane



Re: jsonb format is pessimal for toast compression

From
Heikki Linnakangas
Date:
On 08/14/2014 04:01 AM, Tom Lane wrote:
> I wrote:
>> That's a fair question.  I did a very very simple hack to replace the item
>> offsets with item lengths -- turns out that that mostly requires removing
>> some code that changes lengths to offsets ;-).  I then loaded up Larry's
>> example of a noncompressible JSON value, and compared pg_column_size()
>> which is just about the right thing here since it reports datum size after
>> compression.  Remembering that the textual representation is 12353 bytes:
>
>> json:                382 bytes
>> jsonb, using offsets:        12593 bytes
>> jsonb, using lengths:        406 bytes
>
> Oh, one more result: if I leave the representation alone, but change
> the compression parameters to set first_success_by to INT_MAX, this
> value takes up 1397 bytes.  So that's better, but still more than a
> 3X penalty compared to using lengths.  (Admittedly, this test value
> probably is an outlier compared to normal practice, since it's a hundred
> or so repetitions of the same two strings.)

For comparison, here's a patch that implements the scheme that Alexander
Korotkov suggested, where we store an offset every 8th element, and a
length in the others. It compresses Larry's example to 525 bytes.
Increasing the "stride" from 8 to 16 entries, it compresses to 461 bytes.

A nice thing about this patch is that it's on-disk compatible with the
current format, hence initdb is not required.

(The current comments claim that the first element in an array always
has the JENTRY_ISFIRST flags set; that is wrong, there is no such flag.
I removed the flag in commit d9daff0e, but apparently failed to update
the comment and the accompanying JBE_ISFIRST macro. Sorry about that,
will fix. This patch uses the bit that used to be JENTRY_ISFIRST to mark
entries that store a length instead of an end offset.).

- Heikki


Attachment

Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> For comparison, here's a patch that implements the scheme that Alexander 
> Korotkov suggested, where we store an offset every 8th element, and a 
> length in the others. It compresses Larry's example to 525 bytes. 
> Increasing the "stride" from 8 to 16 entries, it compresses to 461 bytes.

> A nice thing about this patch is that it's on-disk compatible with the 
> current format, hence initdb is not required.

TBH, I think that's about the only nice thing about it :-(.  It's
conceptually a mess.  And while I agree that this way avoids creating
a big-O performance issue for large arrays/objects, I think the micro
performance is probably going to be not so good.  The existing code is
based on the assumption that JBE_OFF() and JBE_LEN() are negligibly cheap;
but with a solution like this, it's guaranteed that one or the other is
going to be not-so-cheap.

I think if we're going to do anything to the representation at all,
we need to refactor the calling code; at least fixing the JsonbIterator
logic so that it tracks the current data offset rather than expecting to
able to compute it at no cost.

The difficulty in arguing about this is that unless we have an agreed-on
performance benchmark test, it's going to be a matter of unsupported
opinions whether one solution is faster than another.  Have we got
anything that stresses key lookup and/or array indexing?
        regards, tom lane



Re: jsonb format is pessimal for toast compression

From
Bruce Momjian
Date:
On Wed, Aug 13, 2014 at 09:01:43PM -0400, Tom Lane wrote:
> I wrote:
> > That's a fair question.  I did a very very simple hack to replace the item
> > offsets with item lengths -- turns out that that mostly requires removing
> > some code that changes lengths to offsets ;-).  I then loaded up Larry's
> > example of a noncompressible JSON value, and compared pg_column_size()
> > which is just about the right thing here since it reports datum size after
> > compression.  Remembering that the textual representation is 12353 bytes:
> 
> > json:                382 bytes
> > jsonb, using offsets:        12593 bytes
> > jsonb, using lengths:        406 bytes
> 
> Oh, one more result: if I leave the representation alone, but change
> the compression parameters to set first_success_by to INT_MAX, this
> value takes up 1397 bytes.  So that's better, but still more than a
> 3X penalty compared to using lengths.  (Admittedly, this test value
> probably is an outlier compared to normal practice, since it's a hundred
> or so repetitions of the same two strings.)

Uh, can we get compression for actual documents, rather than duplicate
strings?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Uh, can we get compression for actual documents, rather than duplicate
> strings?

[ shrug... ]  What's your proposed set of "actual documents"?
I don't think we have any corpus of JSON docs that are all large
enough to need compression.

This gets back to the problem of what test case are we going to consider
while debating what solution to adopt.
        regards, tom lane



Re: jsonb format is pessimal for toast compression

From
Bruce Momjian
Date:
On Thu, Aug 14, 2014 at 12:22:46PM -0400, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Uh, can we get compression for actual documents, rather than duplicate
> > strings?
> 
> [ shrug... ]  What's your proposed set of "actual documents"?
> I don't think we have any corpus of JSON docs that are all large
> enough to need compression.
> 
> This gets back to the problem of what test case are we going to consider
> while debating what solution to adopt.

Uh, we just one need one 12k JSON document from somewhere.  Clearly this
is something we can easily get.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: jsonb format is pessimal for toast compression

From
Merlin Moncure
Date:
On Thu, Aug 14, 2014 at 11:52 AM, Bruce Momjian <bruce@momjian.us> wrote:
> On Thu, Aug 14, 2014 at 12:22:46PM -0400, Tom Lane wrote:
>> Bruce Momjian <bruce@momjian.us> writes:
>> > Uh, can we get compression for actual documents, rather than duplicate
>> > strings?
>>
>> [ shrug... ]  What's your proposed set of "actual documents"?
>> I don't think we have any corpus of JSON docs that are all large
>> enough to need compression.
>>
>> This gets back to the problem of what test case are we going to consider
>> while debating what solution to adopt.
>
> Uh, we just one need one 12k JSON document from somewhere.  Clearly this
> is something we can easily get.

it's trivial to make a large json[b] document:
select length(to_json(array(select row(a.*) from pg_attribute a))::TEXT);

select



Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> On Thu, Aug 14, 2014 at 12:22:46PM -0400, Tom Lane wrote:
>> This gets back to the problem of what test case are we going to consider
>> while debating what solution to adopt.

> Uh, we just one need one 12k JSON document from somewhere.  Clearly this
> is something we can easily get.

I would put little faith in a single document as being representative.

To try to get some statistics about a real-world case, I looked at the
delicio.us dataset that someone posted awhile back (1252973 JSON docs).
These have a minimum length (in text representation) of 604 bytes and
a maximum length of 5949 bytes, which means that they aren't going to
tell us all that much about large JSON docs, but this is better than
no data at all.

Since documents of only a couple hundred bytes aren't going to be subject
to compression, I made a table of four columns each containing the same
JSON data, so that each row would be long enough to force the toast logic
to try to do something.  (Note that none of these documents are anywhere
near big enough to hit the refuses-to-compress problem.)  Given that,
I get the following statistics for pg_column_size():
            min    max    avg

JSON (text) representation    382    1155    526.5

HEAD's JSONB representation    493    1485    695.1
all-lengths representation    440    1257    615.3

So IOW, on this dataset the existing JSONB representation creates about
32% bloat compared to just storing the (compressed) user-visible text,
and switching to all-lengths would about halve that penalty.

Maybe this is telling us it's not worth changing the representation,
and we should just go do something about the first_success_by threshold
and be done.  I'm hesitant to draw such conclusions on the basis of a
single use-case though, especially one that doesn't really have that
much use for compression in the first place.  Do we have other JSON
corpuses to look at?
        regards, tom lane



Re: jsonb format is pessimal for toast compression

From
Peter Geoghegan
Date:
On Thu, Aug 14, 2014 at 10:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Maybe this is telling us it's not worth changing the representation,
> and we should just go do something about the first_success_by threshold
> and be done.  I'm hesitant to draw such conclusions on the basis of a
> single use-case though, especially one that doesn't really have that
> much use for compression in the first place.  Do we have other JSON
> corpuses to look at?


Yes. Pavel posted some representative JSON data a while back:
http://pgsql.cz/data/data.dump.gz (it's a plain dump)

-- 
Peter Geoghegan



Re: jsonb format is pessimal for toast compression

From
Bruce Momjian
Date:
On Thu, Aug 14, 2014 at 01:57:14PM -0400, Tom Lane wrote:
> Maybe this is telling us it's not worth changing the representation,
> and we should just go do something about the first_success_by threshold
> and be done.  I'm hesitant to draw such conclusions on the basis of a
> single use-case though, especially one that doesn't really have that
> much use for compression in the first place.  Do we have other JSON
> corpuses to look at?

Yes, that is what I was expecting --- once the whitespace and syntax
sugar is gone in JSONB, I was unclear how much compression would help.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: jsonb format is pessimal for toast compression

From
Josh Berkus
Date:
On 08/14/2014 11:13 AM, Bruce Momjian wrote:
> On Thu, Aug 14, 2014 at 01:57:14PM -0400, Tom Lane wrote:
>> Maybe this is telling us it's not worth changing the representation,
>> and we should just go do something about the first_success_by threshold
>> and be done.  I'm hesitant to draw such conclusions on the basis of a
>> single use-case though, especially one that doesn't really have that
>> much use for compression in the first place.  Do we have other JSON
>> corpuses to look at?
> 
> Yes, that is what I was expecting --- once the whitespace and syntax
> sugar is gone in JSONB, I was unclear how much compression would help.

I thought the destruction case was when we have enough top-level keys
that the offsets are more than 1K total, though, yes?

So we need to test that set ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: jsonb format is pessimal for toast compression

From
Oleg Bartunov
Date:
I did quick test on the same bookmarks to test performance of 9.4beta2 and 9.4beta2+patch

The query was the same we used in pgcon presentation:
SELECT count(*) FROM jb WHERE jb @> '{"tags":[{"term":"NYC"}]}'::jsonb;

                          table size  |   time (ms)
9.4beta2:            1374 Mb  |     1160
9.4beta2+patch: 1373 Mb  |     1213


Yes, performance degrades, but not much.  There is also small  win in table size, but bookmarks are not big, so it's difficult to say about compression.

Oleg



On Thu, Aug 14, 2014 at 9:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Bruce Momjian <bruce@momjian.us> writes:
> On Thu, Aug 14, 2014 at 12:22:46PM -0400, Tom Lane wrote:
>> This gets back to the problem of what test case are we going to consider
>> while debating what solution to adopt.

> Uh, we just one need one 12k JSON document from somewhere.  Clearly this
> is something we can easily get.

I would put little faith in a single document as being representative.

To try to get some statistics about a real-world case, I looked at the
delicio.us dataset that someone posted awhile back (1252973 JSON docs).
These have a minimum length (in text representation) of 604 bytes and
a maximum length of 5949 bytes, which means that they aren't going to
tell us all that much about large JSON docs, but this is better than
no data at all.

Since documents of only a couple hundred bytes aren't going to be subject
to compression, I made a table of four columns each containing the same
JSON data, so that each row would be long enough to force the toast logic
to try to do something.  (Note that none of these documents are anywhere
near big enough to hit the refuses-to-compress problem.)  Given that,
I get the following statistics for pg_column_size():

                                min     max     avg

JSON (text) representation      382     1155    526.5

HEAD's JSONB representation     493     1485    695.1

all-lengths representation      440     1257    615.3

So IOW, on this dataset the existing JSONB representation creates about
32% bloat compared to just storing the (compressed) user-visible text,
and switching to all-lengths would about halve that penalty.

Maybe this is telling us it's not worth changing the representation,
and we should just go do something about the first_success_by threshold
and be done.  I'm hesitant to draw such conclusions on the basis of a
single use-case though, especially one that doesn't really have that
much use for compression in the first place.  Do we have other JSON
corpuses to look at?

                        regards, tom lane


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: jsonb format is pessimal for toast compression

From
Larry White
Date:
I attached a json file of approximately 513K. It contains two repetitions of a single json structure. The values are quasi-random. It might make a decent test case of meaningfully sized data.

best


On Thu, Aug 14, 2014 at 2:25 PM, Oleg Bartunov <obartunov@gmail.com> wrote:
I did quick test on the same bookmarks to test performance of 9.4beta2 and 9.4beta2+patch

The query was the same we used in pgcon presentation:
SELECT count(*) FROM jb WHERE jb @> '{"tags":[{"term":"NYC"}]}'::jsonb;

                          table size  |   time (ms)
9.4beta2:            1374 Mb  |     1160
9.4beta2+patch: 1373 Mb  |     1213


Yes, performance degrades, but not much.  There is also small  win in table size, but bookmarks are not big, so it's difficult to say about compression.

Oleg



On Thu, Aug 14, 2014 at 9:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Bruce Momjian <bruce@momjian.us> writes:
> On Thu, Aug 14, 2014 at 12:22:46PM -0400, Tom Lane wrote:
>> This gets back to the problem of what test case are we going to consider
>> while debating what solution to adopt.

> Uh, we just one need one 12k JSON document from somewhere.  Clearly this
> is something we can easily get.

I would put little faith in a single document as being representative.

To try to get some statistics about a real-world case, I looked at the
delicio.us dataset that someone posted awhile back (1252973 JSON docs).
These have a minimum length (in text representation) of 604 bytes and
a maximum length of 5949 bytes, which means that they aren't going to
tell us all that much about large JSON docs, but this is better than
no data at all.

Since documents of only a couple hundred bytes aren't going to be subject
to compression, I made a table of four columns each containing the same
JSON data, so that each row would be long enough to force the toast logic
to try to do something.  (Note that none of these documents are anywhere
near big enough to hit the refuses-to-compress problem.)  Given that,
I get the following statistics for pg_column_size():

                                min     max     avg

JSON (text) representation      382     1155    526.5

HEAD's JSONB representation     493     1485    695.1

all-lengths representation      440     1257    615.3

So IOW, on this dataset the existing JSONB representation creates about
32% bloat compared to just storing the (compressed) user-visible text,
and switching to all-lengths would about halve that penalty.

Maybe this is telling us it's not worth changing the representation,
and we should just go do something about the first_success_by threshold
and be done.  I'm hesitant to draw such conclusions on the basis of a
single use-case though, especially one that doesn't really have that
much use for compression in the first place.  Do we have other JSON
corpuses to look at?

                        regards, tom lane


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Attachment

Re: jsonb format is pessimal for toast compression

From
Claudio Freire
Date:
On Thu, Aug 14, 2014 at 3:49 PM, Larry White <ljw1001@gmail.com> wrote:
> I attached a json file of approximately 513K. It contains two repetitions of
> a single json structure. The values are quasi-random. It might make a decent
> test case of meaningfully sized data.


I have a 59M in plain SQL (10M compressed, 51M on-disk table size)
collection of real-world JSON data.

This data is mostly counters and anciliary info stored in json for the
flexibility, more than anything else, since it's otherwise quite
structured: most values share a lot between each other (in key names)
but there's not much redundancy within single rows.

Value length stats (in text format):

min: 14
avg: 427
max: 23239

If anyone's interested, contact me personally (I gotta anonimize the
info a bit first, since it's production info, and it's too big to
attach on the ML).



Re: jsonb format is pessimal for toast compression

From
Claudio Freire
Date:
On Thu, Aug 14, 2014 at 4:24 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Thu, Aug 14, 2014 at 3:49 PM, Larry White <ljw1001@gmail.com> wrote:
>> I attached a json file of approximately 513K. It contains two repetitions of
>> a single json structure. The values are quasi-random. It might make a decent
>> test case of meaningfully sized data.
>
>
> I have a 59M in plain SQL (10M compressed, 51M on-disk table size)
> collection of real-world JSON data.
>
> This data is mostly counters and anciliary info stored in json for the
> flexibility, more than anything else, since it's otherwise quite
> structured: most values share a lot between each other (in key names)
> but there's not much redundancy within single rows.
>
> Value length stats (in text format):
>
> min: 14
> avg: 427
> max: 23239
>
> If anyone's interested, contact me personally (I gotta anonimize the
> info a bit first, since it's production info, and it's too big to
> attach on the ML).

Oh, that one has a 13k toast, not very interesting.

But I've got another (very similar), 47M table, 40M toast, length distribution:

min: 19
avg: 474
max: 20370

Not sure why it's got a bigger toast having a similar distribution.
Tells just how meaningless min/avg/max stats are :(



Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
Peter Geoghegan <pg@heroku.com> writes:
> On Thu, Aug 14, 2014 at 10:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Maybe this is telling us it's not worth changing the representation,
>> and we should just go do something about the first_success_by threshold
>> and be done.  I'm hesitant to draw such conclusions on the basis of a
>> single use-case though, especially one that doesn't really have that
>> much use for compression in the first place.  Do we have other JSON
>> corpuses to look at?

> Yes. Pavel posted some representative JSON data a while back:
> http://pgsql.cz/data/data.dump.gz (it's a plain dump)

I did some quick stats on that.  206560 rows
                min    max    avg

external text representation        220    172685    880.3

JSON representation (compressed text)    224    78565    541.3

pg_column_size, JSONB HEAD repr.    225    82540    639.0

pg_column_size, all-lengths repr.    225    66794    531.1

So in this data, there definitely is some scope for compression:
just compressing the text gets about 38% savings.  The all-lengths
hack is able to beat that slightly, but the all-offsets format is
well behind at 27%.

Not sure what to conclude.  It looks from both these examples like
we're talking about a 10 to 20 percent size penalty for JSON objects
that are big enough to need compression.  Is that beyond our threshold
of pain?  I'm not sure, but there is definitely room to argue that the
extra I/O costs will swamp any savings we get from faster access to
individual fields or array elements.
        regards, tom lane



Re: jsonb format is pessimal for toast compression

From
Gavin Flower
Date:
On 15/08/14 09:47, Tom Lane wrote:
> Peter Geoghegan <pg@heroku.com> writes:
>> On Thu, Aug 14, 2014 at 10:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Maybe this is telling us it's not worth changing the representation,
>>> and we should just go do something about the first_success_by threshold
>>> and be done.  I'm hesitant to draw such conclusions on the basis of a
>>> single use-case though, especially one that doesn't really have that
>>> much use for compression in the first place.  Do we have other JSON
>>> corpuses to look at?
>> Yes. Pavel posted some representative JSON data a while back:
>> http://pgsql.cz/data/data.dump.gz (it's a plain dump)
> I did some quick stats on that.  206560 rows
>
>                     min    max    avg
>
> external text representation        220    172685    880.3
>
> JSON representation (compressed text)    224    78565    541.3
>
> pg_column_size, JSONB HEAD repr.    225    82540    639.0
>
> pg_column_size, all-lengths repr.    225    66794    531.1
>
> So in this data, there definitely is some scope for compression:
> just compressing the text gets about 38% savings.  The all-lengths
> hack is able to beat that slightly, but the all-offsets format is
> well behind at 27%.
>
> Not sure what to conclude.  It looks from both these examples like
> we're talking about a 10 to 20 percent size penalty for JSON objects
> that are big enough to need compression.  Is that beyond our threshold
> of pain?  I'm not sure, but there is definitely room to argue that the
> extra I/O costs will swamp any savings we get from faster access to
> individual fields or array elements.
>
>             regards, tom lane
>
>
Curious, would adding the standard deviation help in characterising the 
distribution of data values?

Also you might like to consider additionally using the median value, and 
possibly the 25% & 75% (or some such) values.  I assume the 'avg' in 
your table, refers to the arithmetic mean.  Sometimes the median is a 
better meaure of 'normal' than the arithmetic mean, and it can be useful 
to note the difference between the two!

Graphing the values may also be useful.  You might have 2, or more, 
distinct populations which might show up as several distinct peaks - in 
which case, this might suggest changes to the algorithm.

Many moons ago, I did a 400 level statistics course at University, of 
which I've forgotten most.  However, I'm aware of other potentially 
useful measure, but I suspect that they would be too esoteric for the 
current problem!


Cheers,
Gavin




Re: jsonb format is pessimal for toast compression

From
Josh Berkus
Date:
So, here's a destruction test case:

200,000 JSON values (plus 2 key columns)
Average width 4K (+/- 1K)
183 keys per JSON valuekeys 10 to 30 characters105 float values70 integer values8 text and date valuesno nesting

The "jsonic" table is JSON
The "jsonbish" table is JSONB

(I can't share this data set, but it makes a good test case)

And, we see the effect:

postgres=# select pg_size_pretty(pg_total_relation_size('jsonic'));pg_size_pretty
----------------394 MB
(1 row)

postgres=# select pg_size_pretty(pg_total_relation_size('jsonbish'));pg_size_pretty
----------------1147 MB
(1 row)

So, pretty bad; JSONB is 200% larger than JSON.

I don't think having 183 top-level keys is all that unreasonable of a
use case.  Some folks will be migrating from Mongo, Redis or Couch to
PostgreSQL, and might have a whole denormalized schema in JSON.

BTW, I find this peculiar:

postgres=# select pg_size_pretty(pg_relation_size('jsonic'));
pg_size_pretty
----------------383 MB
(1 row)

postgres=# select pg_size_pretty(pg_relation_size('jsonbish'));
pg_size_pretty
----------------11 MB
(1 row)

Next up: Tom's patch and indexing!

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> So, here's a destruction test case:
> 200,000 JSON values (plus 2 key columns)
> Average width 4K (+/- 1K)
> 183 keys per JSON value

Is that 183 keys exactly each time, or is 183 the average?
If so, what's the min/max number of keys?

I ask because 183 would be below the threshold where I'd expect the
no-compression behavior to kick in.

> And, we see the effect:

> postgres=# select pg_size_pretty(pg_total_relation_size('jsonic'));
>  pg_size_pretty
> ----------------
>  394 MB
> (1 row)

> postgres=# select pg_size_pretty(pg_total_relation_size('jsonbish'));
>  pg_size_pretty
> ----------------
>  1147 MB
> (1 row)

> So, pretty bad; JSONB is 200% larger than JSON.

Ouch.  But it's not clear how much of this is from the first_success_by
threshold and how much is from having poor compression even though we
escaped that trap.

> BTW, I find this peculiar:

> postgres=# select pg_size_pretty(pg_relation_size('jsonic'));

>  pg_size_pretty
> ----------------
>  383 MB
> (1 row)

> postgres=# select pg_size_pretty(pg_relation_size('jsonbish'));

>  pg_size_pretty
> ----------------
>  11 MB
> (1 row)

pg_relation_size is just the main data fork; it excludes TOAST.
So what we can conclude is that most of the data got toasted out-of-line
in jsonb, while very little did in json.  That probably just comes from
the average datum size being close to the push-out-of-line threshold,
so that worse compression puts it over the edge.

It would be useful to see min/max/avg of pg_column_size() in both
these cases.
        regards, tom lane



Re: jsonb format is pessimal for toast compression

From
Josh Berkus
Date:
On 08/14/2014 04:02 PM, Tom Lane wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>> So, here's a destruction test case:
>> 200,000 JSON values (plus 2 key columns)
>> Average width 4K (+/- 1K)
>> 183 keys per JSON value
> 
> Is that 183 keys exactly each time, or is 183 the average?

Each time exactly.

> It would be useful to see min/max/avg of pg_column_size() in both
> these cases.

Well, this is 9.4, so I can do better than that.  How about quartiles?
thetype |    colsize_distribution
---------+----------------------------json    | {1777,1803,1890,1940,4424}jsonb   | {5902,5926,5978,6002,6208}

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> On 08/14/2014 04:02 PM, Tom Lane wrote:
>> It would be useful to see min/max/avg of pg_column_size() in both
>> these cases.

> Well, this is 9.4, so I can do better than that.  How about quartiles?

>  thetype |    colsize_distribution
> ---------+----------------------------
>  json    | {1777,1803,1890,1940,4424}
>  jsonb   | {5902,5926,5978,6002,6208}

OK.  That matches with the observation about being mostly toasted or not
--- the threshold for pushing out-of-line would be something a little
under 2KB depending on the other columns you had in the table.

What's more, it looks like the jsonb data is pretty much never getting
compressed --- the min is too high for that.  So I'm guessing that this
example is mostly about the first_success_by threshold preventing any
compression from happening.  Please, before looking at my other patch,
try this: in src/backend/utils/adt/pg_lzcompress.c, change line 221
thusly:

-    1024,                        /* Give up if no compression in the first 1KB */
+    INT_MAX,                    /* Give up if no compression in the first 1KB */

then reload the jsonb data and give us the same stats on that.
        regards, tom lane



Re: jsonb format is pessimal for toast compression

From
Josh Berkus
Date:
> What's more, it looks like the jsonb data is pretty much never getting
> compressed --- the min is too high for that.  So I'm guessing that this
> example is mostly about the first_success_by threshold preventing any
> compression from happening.  Please, before looking at my other patch,
> try this: in src/backend/utils/adt/pg_lzcompress.c, change line 221
> thusly:
> 
> -    1024,                        /* Give up if no compression in the first 1KB */
> +    INT_MAX,                    /* Give up if no compression in the first 1KB */
> 
> then reload the jsonb data and give us the same stats on that.

That helped things, but not as much as you'd think:

postgres=# select pg_size_pretty(pg_total_relation_size('jsonic'));
pg_size_pretty
----------------394 MB
(1 row)

postgres=# select pg_size_pretty(pg_total_relation_size('jsonbish'));pg_size_pretty
----------------801 MB
(1 row)

What I find really strange is that the column size distribution is
exactly the same:
thetype |    colsize_distribution
---------+----------------------------json    | {1777,1803,1890,1940,4424}jsonb   | {5902,5926,5978,6002,6208}

Shouldn't the lower end stuff be smaller?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: jsonb format is pessimal for toast compression

From
Josh Berkus
Date:
On 08/14/2014 04:47 PM, Josh Berkus wrote:
>  thetype |    colsize_distribution
> ---------+----------------------------
>  json    | {1777,1803,1890,1940,4424}
>  jsonb   | {5902,5926,5978,6002,6208}

Just realized my query was counting the whole row size instead of just
the column size.  Here's just the JSON column:

Before changing to to INT_MAX:
thetype |    colsize_distribution
---------+----------------------------json    | {1741,1767,1854,1904,2292}jsonb   | {3551,5866,5910,5958,6168}

After:
thetype |    colsize_distribution
---------+----------------------------json    | {1741,1767,1854,1904,2292}jsonb   | {3515,3543,3636,3690,4038}

So that did improve things, just not as much as we'd like.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: jsonb format is pessimal for toast compression

From
Josh Berkus
Date:
> Before changing to to INT_MAX:
> 
>  thetype |    colsize_distribution
> ---------+----------------------------
>  json    | {1741,1767,1854,1904,2292}
>  jsonb   | {3551,5866,5910,5958,6168}
> 
> After:
> 
>  thetype |    colsize_distribution
> ---------+----------------------------
>  json    | {1741,1767,1854,1904,2292}
>  jsonb   | {3515,3543,3636,3690,4038}
> 
> So that did improve things, just not as much as we'd like.

And with Tom's test patch:

postgres=# select pg_size_pretty(pg_total_relation_size('jsonic'));
pg_size_pretty
----------------394 MB
(1 row)

postgres=# select pg_size_pretty(pg_total_relation_size('jsonbish'));pg_size_pretty
----------------541 MB
(1 row)
thetype |    colsize_distribution
---------+----------------------------json    | {1741,1767,1854,1904,2292}jsonb   | {2037,2114,2288,2348,2746}

Since that improved things a *lot*, just +40% instead of +200%, I
thought I'd test some select queries.  I decided to test a GIN lookup
and value extraction, since indexed lookup is really what I care about.

9.4b2 no patches:

postgres=# explain analyze select row_to_json -> 'kt1_total_sum' from
jsonbish where row_to_json @> '{ "rpt_per_dt" : "2003-06-30" }';
      QUERY PLAN
 

-------------------------------------------------------------------------------------------------------------------------------------------Bitmap
HeapScan on jsonbish  (cost=29.55..582.92 rows=200 width=18)
 
(actual time=20.814..2845.454 rows=100423 loops=1)  Recheck Cond: (row_to_json @> '{"rpt_per_dt":
"2003-06-30"}'::jsonb) Heap Blocks: exact=1471  ->  Bitmap Index Scan on jsonbish_row_to_json_idx  (cost=0.00..29.50
 
rows=200 width=0) (actual time=20.551..20.551 rows=100423 loops=1)        Index Cond: (row_to_json @> '{"rpt_per_dt":
"2003-06-30"}'::jsonb)Planningtime: 0.102 msExecution time: 2856.179 ms
 


9.4b2 TL patch:

postgres=# explain analyze select row_to_json -> 'kt1_total_sum' from
jsonbish where row_to_json @> '{ "rpt_per_dt" : "2003-06-30" }';
      QUERY
 
PLAN

-------------------------------------------------------------------------------------------------------------------------------------------Bitmap
HeapScan on jsonbish  (cost=29.55..582.92 rows=200 width=18)
 
(actual time=24.071..5201.687 rows=100423 loops=1)  Recheck Cond: (row_to_json @> '{"rpt_per_dt":
"2003-06-30"}'::jsonb) Heap Blocks: exact=1471  ->  Bitmap Index Scan on jsonbish_row_to_json_idx  (cost=0.00..29.50
 
rows=200 width=0) (actual time=23.779..23.779 rows=100423 loops=1)        Index Cond: (row_to_json @> '{"rpt_per_dt":
"2003-06-30"}'::jsonb)Planningtime: 0.098 msExecution time: 5214.212 ms
 

... so, an 80% increase in lookup and extraction time for swapping
offsets for lengths.  That's actually all extraction time; I tried
removing the extraction from the query, and without it both queries are
close enough to be statstically insignificant.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> And with Tom's test patch:
> ...
> Since that improved things a *lot*, just +40% instead of +200%, I
> thought I'd test some select queries.

That test patch is not meant to be fast, its ambition was only to see
what the effects on storage size would be.  So I find this unsurprising:

> ... so, an 80% increase in lookup and extraction time for swapping
> offsets for lengths.

We can certainly reduce that.  The question was whether it would be
worth the effort to try.  At this point, with three different test
data sets having shown clear space savings, I think it is worth
the effort.  I'll poke into it tomorrow or over the weekend, unless
somebody beats me to it.
        regards, tom lane



Re: jsonb format is pessimal for toast compression

From
Josh Berkus
Date:
On 08/14/2014 07:24 PM, Tom Lane wrote:

> We can certainly reduce that.  The question was whether it would be
> worth the effort to try.  At this point, with three different test
> data sets having shown clear space savings, I think it is worth
> the effort.  I'll poke into it tomorrow or over the weekend, unless
> somebody beats me to it.

Note that I specifically created that data set to be a worst case: many
top-level keys, no nesting, and small values.  However, I don't think
it's an unrealistic worst case.

Interestingly, even on the unpatched, 1GB table case, the *index* on the
JSONB is only 60MB.  Which shows just how terrific the improvement in
GIN index size/performance is.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> On 08/14/2014 07:24 PM, Tom Lane wrote:
>> We can certainly reduce that.  The question was whether it would be
>> worth the effort to try.  At this point, with three different test
>> data sets having shown clear space savings, I think it is worth
>> the effort.  I'll poke into it tomorrow or over the weekend, unless
>> somebody beats me to it.

> Note that I specifically created that data set to be a worst case: many
> top-level keys, no nesting, and small values.  However, I don't think
> it's an unrealistic worst case.

> Interestingly, even on the unpatched, 1GB table case, the *index* on the
> JSONB is only 60MB.  Which shows just how terrific the improvement in
> GIN index size/performance is.

I've been poking at this, and I think the main explanation for your result
is that with more JSONB documents being subject to compression, we're
spending more time in pglz_decompress.  There's no free lunch in that
department: if you want compressed storage it's gonna cost ya to
decompress.  The only way I can get decompression and TOAST access to not
dominate the profile on cases of this size is to ALTER COLUMN SET STORAGE
PLAIN.  However, when I do that, I do see my test patch running about 25%
slower overall than HEAD on an "explain analyze select jfield -> 'key'
from table" type of query with 200-key documents with narrow fields (see
attached perl script that generates the test data).

It seems difficult to improve much on that for this test case.  I put some
logic into findJsonbValueFromContainer to calculate the offset sums just
once not once per binary-search iteration, but that only improved matters
5% at best.  I still think it'd be worth modifying the JsonbIterator code
to avoid repetitive offset calculations, but that's not too relevant to
this test case.

Having said all that, I think this test is something of a contrived worst
case.  More realistic cases are likely to have many fewer keys (so that
speed of the binary search loop is less of an issue) or else to have total
document sizes large enough that inline PLAIN storage isn't an option,
meaning that detoast+decompression costs will dominate.

            regards, tom lane

#! /usr/bin/perl

for (my $i = 0; $i < 100000; $i++) {
    print "{";
    for (my $k = 1; $k <= 200; $k++) {
    print ", " if $k > 1;
    printf "\"k%d\": %d", $k, int(rand(10));
    }
    print "}\n";
}

Re: jsonb format is pessimal for toast compression

From
Arthur Silva
Date:
<p dir="ltr">I'm still getting up to speed on postgres development but I'd like to leave an opinion. <p dir="ltr">We
shouldadd some sort of versionning to the jsonb format. This can be explored in the future in many ways.<p dir="ltr">As
forthe current problem, we should explore the directory at the end option. It should improve compression and keep good
accessperformance. <p dir="ltr">A 4 byte header is sufficient to store the directory offset and some versionning
bits.<br/><div class="gmail_quote">Em 15/08/2014 17:39, "Tom Lane" <<a
href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>>escreveu:<br type="attribution" /><blockquote
class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Josh Berkus <<a
href="mailto:josh@agliodbs.com">josh@agliodbs.com</a>>writes:<br /> > On 08/14/2014 07:24 PM, Tom Lane wrote:<br
/>>> We can certainly reduce that.  The question was whether it would be<br /> >> worth the effort to try. 
Atthis point, with three different test<br /> >> data sets having shown clear space savings, I think it is
worth<br/> >> the effort.  I'll poke into it tomorrow or over the weekend, unless<br /> >> somebody beats
meto it.<br /><br /> > Note that I specifically created that data set to be a worst case: many<br /> > top-level
keys,no nesting, and small values.  However, I don't think<br /> > it's an unrealistic worst case.<br /><br /> >
Interestingly,even on the unpatched, 1GB table case, the *index* on the<br /> > JSONB is only 60MB.  Which shows
justhow terrific the improvement in<br /> > GIN index size/performance is.<br /><br /> I've been poking at this, and
Ithink the main explanation for your result<br /> is that with more JSONB documents being subject to compression,
we're<br/> spending more time in pglz_decompress.  There's no free lunch in that<br /> department: if you want
compressedstorage it's gonna cost ya to<br /> decompress.  The only way I can get decompression and TOAST access to
not<br/> dominate the profile on cases of this size is to ALTER COLUMN SET STORAGE<br /> PLAIN.  However, when I do
that,I do see my test patch running about 25%<br /> slower overall than HEAD on an "explain analyze select jfield ->
'key'<br/> from table" type of query with 200-key documents with narrow fields (see<br /> attached perl script that
generatesthe test data).<br /><br /> It seems difficult to improve much on that for this test case.  I put some<br />
logicinto findJsonbValueFromContainer to calculate the offset sums just<br /> once not once per binary-search
iteration,but that only improved matters<br /> 5% at best.  I still think it'd be worth modifying the JsonbIterator
code<br/> to avoid repetitive offset calculations, but that's not too relevant to<br /> this test case.<br /><br />
Havingsaid all that, I think this test is something of a contrived worst<br /> case.  More realistic cases are likely
tohave many fewer keys (so that<br /> speed of the binary search loop is less of an issue) or else to have total<br />
documentsizes large enough that inline PLAIN storage isn't an option,<br /> meaning that detoast+decompression costs
willdominate.<br /><br />                         regards, tom lane<br /><br /><br /><br /> --<br /> Sent via
pgsql-hackersmailing list (<a href="mailto:pgsql-hackers@postgresql.org">pgsql-hackers@postgresql.org</a>)<br /> To
makechanges to your subscription:<br /><a href="http://www.postgresql.org/mailpref/pgsql-hackers"
target="_blank">http://www.postgresql.org/mailpref/pgsql-hackers</a><br/><br /></blockquote></div> 

Re: [Bad Attachment] Re: jsonb format is pessimal for toast compression

From
Josh Berkus
Date:
On 08/15/2014 01:38 PM, Tom Lane wrote:
> I've been poking at this, and I think the main explanation for your result
> is that with more JSONB documents being subject to compression, we're
> spending more time in pglz_decompress.  There's no free lunch in that
> department: if you want compressed storage it's gonna cost ya to
> decompress.  The only way I can get decompression and TOAST access to not
> dominate the profile on cases of this size is to ALTER COLUMN SET STORAGE
> PLAIN.  However, when I do that, I do see my test patch running about 25%
> slower overall than HEAD on an "explain analyze select jfield -> 'key'
> from table" type of query with 200-key documents with narrow fields (see
> attached perl script that generates the test data).

Ok, that probably falls under the heading of "acceptable tradeoffs" then.

> Having said all that, I think this test is something of a contrived worst
> case.  More realistic cases are likely to have many fewer keys (so that
> speed of the binary search loop is less of an issue) or else to have total
> document sizes large enough that inline PLAIN storage isn't an option,
> meaning that detoast+decompression costs will dominate.

This was intended to be a worst case.  However, I don't think that it's
the last time we'll see the case of having 100 to 200 keys each with
short values.  That case was actually from some XML data which I'd
already converted into a regular table (hence every row having 183
keys), but if JSONB had been available when I started the project, I
might have chosen to store it as JSONB instead.  It occurs to me that
the matching data from a personals website would very much fit the
pattern of having between 50 and 200 keys, each of which has a short value.

So we don't need to *optimize* for that case, but it also shouldn't be
disastrously slow or 300% of the size of comparable TEXT.  Mind you, I
don't find +80% to be disastrously slow (especially not with a space
savings of 60%), so maybe that's good enough.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
Arthur Silva <arthurprs@gmail.com> writes:
> We should add some sort of versionning to the jsonb format. This can be
> explored in the future in many ways.

If we end up making an incompatible change to the jsonb format, I would
support taking the opportunity to stick a version ID in there.  But
I don't want to force a dump/reload cycle *only* to do that.

> As for the current problem, we should explore the directory at the end
> option. It should improve compression and keep good access performance.

Meh.  Pushing the directory to the end is just a band-aid, and since it
would still force a dump/reload, it's not a very enticing band-aid.
The only thing it'd really fix is the first_success_by issue, which
we could fix *without* a dump/reload by using different compression
parameters for jsonb.  Moving the directory to the end, by itself,
does nothing to fix the problem that the directory contents aren't
compressible --- and we now have pretty clear evidence that that is a
significant issue.  (See for instance Josh's results that increasing
first_success_by did very little for the size of his dataset.)

I think the realistic alternatives at this point are either to
switch to all-lengths as in my test patch, or to use the hybrid approach
of Heikki's test patch.  IMO the major attraction of Heikki's patch
is that it'd be upward compatible with existing beta installations,
ie no initdb required (but thus, no opportunity to squeeze in a version
identifier either).  It's not showing up terribly well in the performance
tests I've been doing --- it's about halfway between HEAD and my patch on
that extract-a-key-from-a-PLAIN-stored-column test.  But, just as with my
patch, there are things that could be done to micro-optimize it by
touching a bit more code.

I did some quick stats comparing compressed sizes for the delicio.us
data, printing quartiles as per Josh's lead:

all-lengths    {440,569,609,655,1257}
Heikki's patch    {456,582,624,671,1274}
HEAD        {493,636,684,744,1485}

(As before, this is pg_column_size of the jsonb within a table whose rows
are wide enough to force tuptoaster.c to try to compress the jsonb;
otherwise many of these values wouldn't get compressed.)  These documents
don't have enough keys to trigger the first_success_by issue, so that
HEAD doesn't look too awful, but still there's about an 11% gain from
switching from offsets to lengths.  Heikki's method captures much of
that but not all.

Personally I'd prefer to go to the all-lengths approach, but a large
part of that comes from a subjective assessment that the hybrid approach
is too messy.  Others might well disagree.

In case anyone else wants to do measurements on some more data sets,
attached is a copy of Heikki's patch updated to apply against git tip.

            regards, tom lane

diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index 04f35bf..47b2998 100644
*** a/src/backend/utils/adt/jsonb_util.c
--- b/src/backend/utils/adt/jsonb_util.c
*************** convertJsonbArray(StringInfo buffer, JEn
*** 1378,1385 ****
                       errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
                              JENTRY_POSMASK)));

!         if (i > 0)
              meta = (meta & ~JENTRY_POSMASK) | totallen;
          copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
          metaoffset += sizeof(JEntry);
      }
--- 1378,1387 ----
                       errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
                              JENTRY_POSMASK)));

!         if (i % JBE_STORE_LEN_STRIDE == 0)
              meta = (meta & ~JENTRY_POSMASK) | totallen;
+         else
+             meta |= JENTRY_HAS_LEN;
          copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
          metaoffset += sizeof(JEntry);
      }
*************** convertJsonbObject(StringInfo buffer, JE
*** 1430,1440 ****
                       errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
                              JENTRY_POSMASK)));

!         if (i > 0)
              meta = (meta & ~JENTRY_POSMASK) | totallen;
          copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
          metaoffset += sizeof(JEntry);

          convertJsonbValue(buffer, &meta, &pair->value, level);
          len = meta & JENTRY_POSMASK;
          totallen += len;
--- 1432,1445 ----
                       errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
                              JENTRY_POSMASK)));

!         if (i % JBE_STORE_LEN_STRIDE == 0)
              meta = (meta & ~JENTRY_POSMASK) | totallen;
+         else
+             meta |= JENTRY_HAS_LEN;
          copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
          metaoffset += sizeof(JEntry);

+         /* put value */
          convertJsonbValue(buffer, &meta, &pair->value, level);
          len = meta & JENTRY_POSMASK;
          totallen += len;
*************** convertJsonbObject(StringInfo buffer, JE
*** 1445,1451 ****
                       errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
                              JENTRY_POSMASK)));

!         meta = (meta & ~JENTRY_POSMASK) | totallen;
          copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
          metaoffset += sizeof(JEntry);
      }
--- 1450,1456 ----
                       errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
                              JENTRY_POSMASK)));

!         meta |= JENTRY_HAS_LEN;
          copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
          metaoffset += sizeof(JEntry);
      }
*************** uniqueifyJsonbObject(JsonbValue *object)
*** 1592,1594 ****
--- 1597,1635 ----
          object->val.object.nPairs = res + 1 - object->val.object.pairs;
      }
  }
+
+ uint32
+ jsonb_get_offset(const JEntry *ja, int index)
+ {
+     uint32        off = 0;
+     int            i;
+
+     /*
+      * Each absolute entry contains the *end* offset. Start offset of this
+      * entry is equal to the end offset of the previous entry.
+      */
+     for (i = index - 1; i >= 0; i--)
+     {
+         off += JBE_POSFLD(ja[i]);
+         if (!JBE_HAS_LEN(ja[i]))
+             break;
+     }
+     return off;
+ }
+
+ uint32
+ jsonb_get_length(const JEntry *ja, int index)
+ {
+     uint32        off;
+     uint32        len;
+
+     if (JBE_HAS_LEN(ja[index]))
+         len = JBE_POSFLD(ja[index]);
+     else
+     {
+         off = jsonb_get_offset(ja, index);
+         len = JBE_POSFLD(ja[index]) - off;
+     }
+
+     return len;
+ }
diff --git a/src/include/utils/jsonb.h b/src/include/utils/jsonb.h
index 91e3e14..10a07bb 100644
*** a/src/include/utils/jsonb.h
--- b/src/include/utils/jsonb.h
*************** typedef struct JsonbValue JsonbValue;
*** 102,112 ****
   * to JB_FSCALAR | JB_FARRAY.
   *
   * To encode the length and offset of the variable-length portion of each
!  * node in a compact way, the JEntry stores only the end offset within the
!  * variable-length portion of the container node. For the first JEntry in the
!  * container's JEntry array, that equals to the length of the node data.  The
!  * begin offset and length of the rest of the entries can be calculated using
!  * the end offset of the previous JEntry in the array.
   *
   * Overall, the Jsonb struct requires 4-bytes alignment. Within the struct,
   * the variable-length portion of some node types is aligned to a 4-byte
--- 102,113 ----
   * to JB_FSCALAR | JB_FARRAY.
   *
   * To encode the length and offset of the variable-length portion of each
!  * node in a compact way, the JEntry stores either the length of the element,
!  * or its end offset within the variable-length portion of the container node.
!  * Entries that store a length are marked with the JENTRY_HAS_LEN flag, other
!  * entries store an end offset. The begin offset and length of each entry
!  * can be calculated by scanning backwards to the previous entry storing an
!  * end offset, and adding up the lengths of the elements in between.
   *
   * Overall, the Jsonb struct requires 4-bytes alignment. Within the struct,
   * the variable-length portion of some node types is aligned to a 4-byte
*************** typedef struct JsonbValue JsonbValue;
*** 120,134 ****
  /*
   * Jentry format.
   *
!  * The least significant 28 bits store the end offset of the entry (see
!  * JBE_ENDPOS, JBE_OFF, JBE_LEN macros below). The next three bits
!  * are used to store the type of the entry. The most significant bit
!  * is unused, and should be set to zero.
   */
  typedef uint32 JEntry;

  #define JENTRY_POSMASK            0x0FFFFFFF
  #define JENTRY_TYPEMASK            0x70000000

  /* values stored in the type bits */
  #define JENTRY_ISSTRING            0x00000000
--- 121,136 ----
  /*
   * Jentry format.
   *
!  * The least significant 28 bits store the end offset or the length of the
!  * entry, depending on whether the JENTRY_HAS_LEN flag is set (see
!  * JBE_ENDPOS, JBE_OFF, JBE_LEN macros below). The other three bits
!  * are used to store the type of the entry.
   */
  typedef uint32 JEntry;

  #define JENTRY_POSMASK            0x0FFFFFFF
  #define JENTRY_TYPEMASK            0x70000000
+ #define JENTRY_HAS_LEN            0x80000000

  /* values stored in the type bits */
  #define JENTRY_ISSTRING            0x00000000
*************** typedef uint32 JEntry;
*** 146,160 ****
  #define JBE_ISBOOL_TRUE(je_)    (((je_) & JENTRY_TYPEMASK) == JENTRY_ISBOOL_TRUE)
  #define JBE_ISBOOL_FALSE(je_)    (((je_) & JENTRY_TYPEMASK) == JENTRY_ISBOOL_FALSE)
  #define JBE_ISBOOL(je_)            (JBE_ISBOOL_TRUE(je_) || JBE_ISBOOL_FALSE(je_))

  /*
!  * Macros for getting the offset and length of an element. Note multiple
!  * evaluations and access to prior array element.
   */
! #define JBE_ENDPOS(je_)            ((je_) & JENTRY_POSMASK)
! #define JBE_OFF(ja, i)            ((i) == 0 ? 0 : JBE_ENDPOS((ja)[i - 1]))
! #define JBE_LEN(ja, i)            ((i) == 0 ? JBE_ENDPOS((ja)[i]) \
!                                  : JBE_ENDPOS((ja)[i]) - JBE_ENDPOS((ja)[i - 1]))

  /*
   * A jsonb array or object node, within a Jsonb Datum.
--- 148,170 ----
  #define JBE_ISBOOL_TRUE(je_)    (((je_) & JENTRY_TYPEMASK) == JENTRY_ISBOOL_TRUE)
  #define JBE_ISBOOL_FALSE(je_)    (((je_) & JENTRY_TYPEMASK) == JENTRY_ISBOOL_FALSE)
  #define JBE_ISBOOL(je_)            (JBE_ISBOOL_TRUE(je_) || JBE_ISBOOL_FALSE(je_))
+ #define JBE_HAS_LEN(je_)        (((je_) & JENTRY_HAS_LEN) != 0)

  /*
!  * Macros for getting the offset and length of an element.
   */
! #define JBE_POSFLD(je_)            ((je_) & JENTRY_POSMASK)
! #define JBE_OFF(ja, i)            jsonb_get_offset(ja, i)
! #define JBE_LEN(ja, i)            jsonb_get_length(ja, i)
!
! /*
!  * Store an absolute end offset every JBE_STORE_LEN_STRIDE elements (for an
!  * array) or key/value pairs (for an object). Others are stored as lengths.
!  */
! #define JBE_STORE_LEN_STRIDE    8
!
! extern uint32 jsonb_get_offset(const JEntry *ja, int index);
! extern uint32 jsonb_get_length(const JEntry *ja, int index);

  /*
   * A jsonb array or object node, within a Jsonb Datum.

Re: jsonb format is pessimal for toast compression

From
Arthur Silva
Date:
On Fri, Aug 15, 2014 at 8:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Arthur Silva <arthurprs@gmail.com> writes:
> We should add some sort of versionning to the jsonb format. This can be
> explored in the future in many ways.

If we end up making an incompatible change to the jsonb format, I would
support taking the opportunity to stick a version ID in there.  But
I don't want to force a dump/reload cycle *only* to do that.

> As for the current problem, we should explore the directory at the end
> option. It should improve compression and keep good access performance.

Meh.  Pushing the directory to the end is just a band-aid, and since it
would still force a dump/reload, it's not a very enticing band-aid.
The only thing it'd really fix is the first_success_by issue, which
we could fix *without* a dump/reload by using different compression
parameters for jsonb.  Moving the directory to the end, by itself,
does nothing to fix the problem that the directory contents aren't
compressible --- and we now have pretty clear evidence that that is a
significant issue.  (See for instance Josh's results that increasing
first_success_by did very little for the size of his dataset.)

I think the realistic alternatives at this point are either to
switch to all-lengths as in my test patch, or to use the hybrid approach
of Heikki's test patch.  IMO the major attraction of Heikki's patch
is that it'd be upward compatible with existing beta installations,
ie no initdb required (but thus, no opportunity to squeeze in a version
identifier either).  It's not showing up terribly well in the performance
tests I've been doing --- it's about halfway between HEAD and my patch on
that extract-a-key-from-a-PLAIN-stored-column test.  But, just as with my
patch, there are things that could be done to micro-optimize it by
touching a bit more code.

I did some quick stats comparing compressed sizes for the delicio.us
data, printing quartiles as per Josh's lead:

all-lengths     {440,569,609,655,1257}
Heikki's patch  {456,582,624,671,1274}
HEAD            {493,636,684,744,1485}

(As before, this is pg_column_size of the jsonb within a table whose rows
are wide enough to force tuptoaster.c to try to compress the jsonb;
otherwise many of these values wouldn't get compressed.)  These documents
don't have enough keys to trigger the first_success_by issue, so that
HEAD doesn't look too awful, but still there's about an 11% gain from
switching from offsets to lengths.  Heikki's method captures much of
that but not all.

Personally I'd prefer to go to the all-lengths approach, but a large
part of that comes from a subjective assessment that the hybrid approach
is too messy.  Others might well disagree.

In case anyone else wants to do measurements on some more data sets,
attached is a copy of Heikki's patch updated to apply against git tip.

                        regards, tom lane


I agree that versioning might sound silly at this point, but lets keep it in mind.
Row level compression is very slow itself, so it sounds odd to me paying 25% performance penalty everywhere for the sake of having better compression ratio in the dictionary area.
Consider, for example, an optimization that stuffs integers (up to 28 bits) inside the JEntry itself. That alone would save 8 bytes for each integer.


Re: jsonb format is pessimal for toast compression

From
Josh Berkus
Date:
On 08/15/2014 04:19 PM, Tom Lane wrote:
> Personally I'd prefer to go to the all-lengths approach, but a large
> part of that comes from a subjective assessment that the hybrid approach
> is too messy.  Others might well disagree.
> 
> In case anyone else wants to do measurements on some more data sets,
> attached is a copy of Heikki's patch updated to apply against git tip.

Note that this is not 100% comparable because I'm running it against git
clone, and the earlier tests were against beta2.  However, the Heikki
patch looks like a bust on this dataset -- see below.

postgres=# select pg_size_pretty(pg_total_relation_size('jsonic'));pg_size_pretty
----------------394 MB
(1 row)

postgres=# select pg_size_pretty(pg_total_relation_size('jsonbish'));
pg_size_pretty
----------------542 MB

Extraction Test:

postgres=# explain analyze select row_to_json -> 'kt1_total_sum' from
jsonbish where row_to_json @> '{ "rpt_per_dt" : "2003-06-30" }';
      QUERY
 
PLAN

-------------------------------------------------------------------------------------------------------------------------------------------Bitmap
HeapScan on jsonbish  (cost=29.55..582.92 rows=200 width=18)
 
(actual time=22.742..5281.823 rows=100423 loops=1)  Recheck Cond: (row_to_json @> '{"rpt_per_dt":
"2003-06-30"}'::jsonb) Heap Blocks: exact=1471  ->  Bitmap Index Scan on jsonbish_row_to_json_idx  (cost=0.00..29.50
 
rows=200 width=0) (actual time=22.445..22.445 rows=100423 loops=1)        Index Cond: (row_to_json @> '{"rpt_per_dt":
"2003-06-30"}'::jsonb)Planningtime: 0.095 msExecution time: 5292.047 ms
 
(7 rows)

So, that extraction test is about 1% *slower* than the basic Tom Lane
lengths-only patch, and still 80% slower than original JSONB.  And it's
the same size as the lengths-only version.

Huh?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> On 08/15/2014 04:19 PM, Tom Lane wrote:
>> Personally I'd prefer to go to the all-lengths approach, but a large
>> part of that comes from a subjective assessment that the hybrid approach
>> is too messy.  Others might well disagree.

> ... So, that extraction test is about 1% *slower* than the basic Tom Lane
> lengths-only patch, and still 80% slower than original JSONB.  And it's
> the same size as the lengths-only version.

Since it's looking like this might be the direction we want to go, I took
the time to flesh out my proof-of-concept patch.  The attached version
takes care of cosmetic issues (like fixing the comments), and includes
code to avoid O(N^2) penalties in findJsonbValueFromContainer and
JsonbIteratorNext.  I'm not sure whether those changes will help
noticeably on Josh's test case; for me, they seemed worth making, but
they do not bring the code back to full speed parity with the all-offsets
version.  But as we've been discussing, it seems likely that those costs
would be swamped by compression and I/O considerations in most scenarios
with large documents; and of course for small documents it hardly matters.

Even if we don't go this way, there are parts of this patch that would
need to get committed.  I found for instance that convertJsonbArray and
convertJsonbObject have insufficient defenses against overflowing the
overall length field for the array or object.

For my own part, I'm satisfied with the patch as attached (modulo the
need to teach pg_upgrade about the incompatibility).  There remains the
question of whether to take this opportunity to add a version ID to the
binary format.  I'm not as excited about that idea as I originally was;
having now studied the code more carefully, I think that any expansion
would likely happen by adding more type codes and/or commandeering the
currently-unused high-order bit of JEntrys.  We don't need a version ID
in the header for that.  Moreover, if we did have such an ID, it would be
notationally painful to get it to most of the places that might need it.

            regards, tom lane

diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index 2fd87fc..456011a 100644
*** a/src/backend/utils/adt/jsonb.c
--- b/src/backend/utils/adt/jsonb.c
*************** jsonb_from_cstring(char *json, int len)
*** 196,207 ****
  static size_t
  checkStringLen(size_t len)
  {
!     if (len > JENTRY_POSMASK)
          ereport(ERROR,
                  (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
                   errmsg("string too long to represent as jsonb string"),
                   errdetail("Due to an implementation restriction, jsonb strings cannot exceed %d bytes.",
!                            JENTRY_POSMASK)));

      return len;
  }
--- 196,207 ----
  static size_t
  checkStringLen(size_t len)
  {
!     if (len > JENTRY_LENMASK)
          ereport(ERROR,
                  (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
                   errmsg("string too long to represent as jsonb string"),
                   errdetail("Due to an implementation restriction, jsonb strings cannot exceed %d bytes.",
!                            JENTRY_LENMASK)));

      return len;
  }
diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index 04f35bf..e47eaea 100644
*** a/src/backend/utils/adt/jsonb_util.c
--- b/src/backend/utils/adt/jsonb_util.c
***************
*** 26,40 ****
   * in MaxAllocSize, and the number of elements (or pairs) must fit in the bits
   * reserved for that in the JsonbContainer.header field.
   *
!  * (the total size of an array's elements is also limited by JENTRY_POSMASK,
   * but we're not concerned about that here)
   */
  #define JSONB_MAX_ELEMS (Min(MaxAllocSize / sizeof(JsonbValue), JB_CMASK))
  #define JSONB_MAX_PAIRS (Min(MaxAllocSize / sizeof(JsonbPair), JB_CMASK))

! static void fillJsonbValue(JEntry *array, int index, char *base_addr,
                 JsonbValue *result);
! static bool    equalsJsonbScalarValue(JsonbValue *a, JsonbValue *b);
  static int    compareJsonbScalarValue(JsonbValue *a, JsonbValue *b);
  static Jsonb *convertToJsonb(JsonbValue *val);
  static void convertJsonbValue(StringInfo buffer, JEntry *header, JsonbValue *val, int level);
--- 26,41 ----
   * in MaxAllocSize, and the number of elements (or pairs) must fit in the bits
   * reserved for that in the JsonbContainer.header field.
   *
!  * (the total size of an array's elements is also limited by JENTRY_LENMASK,
   * but we're not concerned about that here)
   */
  #define JSONB_MAX_ELEMS (Min(MaxAllocSize / sizeof(JsonbValue), JB_CMASK))
  #define JSONB_MAX_PAIRS (Min(MaxAllocSize / sizeof(JsonbPair), JB_CMASK))

! static void fillJsonbValue(JEntry *children, int index,
!                char *base_addr, uint32 offset,
                 JsonbValue *result);
! static bool equalsJsonbScalarValue(JsonbValue *a, JsonbValue *b);
  static int    compareJsonbScalarValue(JsonbValue *a, JsonbValue *b);
  static Jsonb *convertToJsonb(JsonbValue *val);
  static void convertJsonbValue(StringInfo buffer, JEntry *header, JsonbValue *val, int level);
*************** JsonbValueToJsonb(JsonbValue *val)
*** 108,113 ****
--- 109,135 ----
  }

  /*
+  * Get the offset of the variable-length portion of a Jsonb node within
+  * the variable-length-data part of its container.  The node is identified
+  * by index within the container's JEntry array.
+  *
+  * We do this by adding up the lengths of all the previous nodes'
+  * variable-length portions.  It's best to avoid using this function when
+  * iterating through all the nodes in a container, since that would result
+  * in O(N^2) work.
+  */
+ uint32
+ getJsonbOffset(const JEntry *ja, int index)
+ {
+     uint32        off = 0;
+     int            i;
+
+     for (i = 0; i < index; i++)
+         off += JBE_LEN(ja, i);
+     return off;
+ }
+
+ /*
   * BT comparator worker function.  Returns an integer less than, equal to, or
   * greater than zero, indicating whether a is less than, equal to, or greater
   * than b.  Consistent with the requirements for a B-Tree operator class
*************** findJsonbValueFromContainer(JsonbContain
*** 279,324 ****
      if (flags & JB_FARRAY & container->header)
      {
          char       *base_addr = (char *) (children + count);
          int            i;

          for (i = 0; i < count; i++)
          {
!             fillJsonbValue(children, i, base_addr, result);

              if (key->type == result->type)
              {
                  if (equalsJsonbScalarValue(key, result))
                      return result;
              }
          }
      }
      else if (flags & JB_FOBJECT & container->header)
      {
          /* Since this is an object, account for *Pairs* of Jentrys */
          char       *base_addr = (char *) (children + count * 2);
          uint32        stopLow = 0,
!                     stopMiddle;

!         /* Object key past by caller must be a string */
          Assert(key->type == jbvString);

          /* Binary search on object/pair keys *only* */
!         while (stopLow < count)
          {
              int            index;
              int            difference;
              JsonbValue    candidate;

              /*
!              * Note how we compensate for the fact that we're iterating
!              * through pairs (not entries) throughout.
               */
-             stopMiddle = stopLow + (count - stopLow) / 2;
-
              index = stopMiddle * 2;

              candidate.type = jbvString;
!             candidate.val.string.val = base_addr + JBE_OFF(children, index);
              candidate.val.string.len = JBE_LEN(children, index);

              difference = lengthCompareJsonbStringValue(&candidate, key);
--- 301,370 ----
      if (flags & JB_FARRAY & container->header)
      {
          char       *base_addr = (char *) (children + count);
+         uint32        offset = 0;
          int            i;

          for (i = 0; i < count; i++)
          {
!             fillJsonbValue(children, i, base_addr, offset, result);

              if (key->type == result->type)
              {
                  if (equalsJsonbScalarValue(key, result))
                      return result;
              }
+
+             offset += JBE_LEN(children, i);
          }
      }
      else if (flags & JB_FOBJECT & container->header)
      {
          /* Since this is an object, account for *Pairs* of Jentrys */
          char       *base_addr = (char *) (children + count * 2);
+         uint32       *offsets;
+         uint32        lastoff;
+         int            lastoffpos;
          uint32        stopLow = 0,
!                     stopHigh = count;

!         /* Object key passed by caller must be a string */
          Assert(key->type == jbvString);

+         /*
+          * We use a cache to avoid redundant getJsonbOffset() computations
+          * inside the search loop.  Note that count may well be zero at this
+          * point; to avoid an ugly special case for initializing lastoff and
+          * lastoffpos, we allocate one extra array element.
+          */
+         offsets = (uint32 *) palloc((count * 2 + 1) * sizeof(uint32));
+         offsets[0] = lastoff = 0;
+         lastoffpos = 0;
+
          /* Binary search on object/pair keys *only* */
!         while (stopLow < stopHigh)
          {
+             uint32        stopMiddle;
              int            index;
              int            difference;
              JsonbValue    candidate;

+             stopMiddle = stopLow + (stopHigh - stopLow) / 2;
+
              /*
!              * Compensate for the fact that we're searching through pairs (not
!              * entries).
               */
              index = stopMiddle * 2;

+             /* Update the offsets cache through at least index+1 */
+             while (lastoffpos <= index)
+             {
+                 lastoff += JBE_LEN(children, lastoffpos);
+                 offsets[++lastoffpos] = lastoff;
+             }
+
              candidate.type = jbvString;
!             candidate.val.string.val = base_addr + offsets[index];
              candidate.val.string.len = JBE_LEN(children, index);

              difference = lengthCompareJsonbStringValue(&candidate, key);
*************** findJsonbValueFromContainer(JsonbContain
*** 326,333 ****
              if (difference == 0)
              {
                  /* Found our key, return value */
!                 fillJsonbValue(children, index + 1, base_addr, result);

                  return result;
              }
              else
--- 372,382 ----
              if (difference == 0)
              {
                  /* Found our key, return value */
!                 fillJsonbValue(children, index + 1,
!                                base_addr, offsets[index + 1],
!                                result);

+                 pfree(offsets);
                  return result;
              }
              else
*************** findJsonbValueFromContainer(JsonbContain
*** 335,343 ****
                  if (difference < 0)
                      stopLow = stopMiddle + 1;
                  else
!                     count = stopMiddle;
              }
          }
      }

      /* Not found */
--- 384,394 ----
                  if (difference < 0)
                      stopLow = stopMiddle + 1;
                  else
!                     stopHigh = stopMiddle;
              }
          }
+
+         pfree(offsets);
      }

      /* Not found */
*************** getIthJsonbValueFromContainer(JsonbConta
*** 368,374 ****

      result = palloc(sizeof(JsonbValue));

!     fillJsonbValue(container->children, i, base_addr, result);

      return result;
  }
--- 419,427 ----

      result = palloc(sizeof(JsonbValue));

!     fillJsonbValue(container->children, i, base_addr,
!                    getJsonbOffset(container->children, i),
!                    result);

      return result;
  }
*************** getIthJsonbValueFromContainer(JsonbConta
*** 377,387 ****
   * A helper function to fill in a JsonbValue to represent an element of an
   * array, or a key or value of an object.
   *
   * A nested array or object will be returned as jbvBinary, ie. it won't be
   * expanded.
   */
  static void
! fillJsonbValue(JEntry *children, int index, char *base_addr, JsonbValue *result)
  {
      JEntry        entry = children[index];

--- 430,446 ----
   * A helper function to fill in a JsonbValue to represent an element of an
   * array, or a key or value of an object.
   *
+  * The node's JEntry is at children[index], and its variable-length data
+  * is at base_addr + offset.  We make the caller determine the offset since
+  * in many cases the caller can amortize the work across multiple children.
+  *
   * A nested array or object will be returned as jbvBinary, ie. it won't be
   * expanded.
   */
  static void
! fillJsonbValue(JEntry *children, int index,
!                char *base_addr, uint32 offset,
!                JsonbValue *result)
  {
      JEntry        entry = children[index];

*************** fillJsonbValue(JEntry *children, int ind
*** 392,405 ****
      else if (JBE_ISSTRING(entry))
      {
          result->type = jbvString;
!         result->val.string.val = base_addr + JBE_OFF(children, index);
!         result->val.string.len = JBE_LEN(children, index);
          Assert(result->val.string.len >= 0);
      }
      else if (JBE_ISNUMERIC(entry))
      {
          result->type = jbvNumeric;
!         result->val.numeric = (Numeric) (base_addr + INTALIGN(JBE_OFF(children, index)));
      }
      else if (JBE_ISBOOL_TRUE(entry))
      {
--- 451,464 ----
      else if (JBE_ISSTRING(entry))
      {
          result->type = jbvString;
!         result->val.string.val = base_addr + offset;
!         result->val.string.len = JBE_LENFLD(entry);
          Assert(result->val.string.len >= 0);
      }
      else if (JBE_ISNUMERIC(entry))
      {
          result->type = jbvNumeric;
!         result->val.numeric = (Numeric) (base_addr + INTALIGN(offset));
      }
      else if (JBE_ISBOOL_TRUE(entry))
      {
*************** fillJsonbValue(JEntry *children, int ind
*** 415,422 ****
      {
          Assert(JBE_ISCONTAINER(entry));
          result->type = jbvBinary;
!         result->val.binary.data = (JsonbContainer *) (base_addr + INTALIGN(JBE_OFF(children, index)));
!         result->val.binary.len = JBE_LEN(children, index) - (INTALIGN(JBE_OFF(children, index)) - JBE_OFF(children,
index));
      }
  }

--- 474,482 ----
      {
          Assert(JBE_ISCONTAINER(entry));
          result->type = jbvBinary;
!         /* Remove alignment padding from data pointer and len */
!         result->val.binary.data = (JsonbContainer *) (base_addr + INTALIGN(offset));
!         result->val.binary.len = JBE_LENFLD(entry) - (INTALIGN(offset) - offset);
      }
  }

*************** recurse:
*** 668,680 ****
               * a full conversion
               */
              val->val.array.rawScalar = (*it)->isScalar;
!             (*it)->i = 0;
              /* Set state for next call */
              (*it)->state = JBI_ARRAY_ELEM;
              return WJB_BEGIN_ARRAY;

          case JBI_ARRAY_ELEM:
!             if ((*it)->i >= (*it)->nElems)
              {
                  /*
                   * All elements within array already processed.  Report this
--- 728,741 ----
               * a full conversion
               */
              val->val.array.rawScalar = (*it)->isScalar;
!             (*it)->curIndex = 0;
!             (*it)->curDataOffset = 0;
              /* Set state for next call */
              (*it)->state = JBI_ARRAY_ELEM;
              return WJB_BEGIN_ARRAY;

          case JBI_ARRAY_ELEM:
!             if ((*it)->curIndex >= (*it)->nElems)
              {
                  /*
                   * All elements within array already processed.  Report this
*************** recurse:
*** 686,692 ****
                  return WJB_END_ARRAY;
              }

!             fillJsonbValue((*it)->children, (*it)->i++, (*it)->dataProper, val);

              if (!IsAJsonbScalar(val) && !skipNested)
              {
--- 747,758 ----
                  return WJB_END_ARRAY;
              }

!             fillJsonbValue((*it)->children, (*it)->curIndex,
!                            (*it)->dataProper, (*it)->curDataOffset,
!                            val);
!
!             (*it)->curDataOffset += JBE_LEN((*it)->children, (*it)->curIndex);
!             (*it)->curIndex++;

              if (!IsAJsonbScalar(val) && !skipNested)
              {
*************** recurse:
*** 712,724 ****
               * v->val.object.pairs is not actually set, because we aren't
               * doing a full conversion
               */
!             (*it)->i = 0;
              /* Set state for next call */
              (*it)->state = JBI_OBJECT_KEY;
              return WJB_BEGIN_OBJECT;

          case JBI_OBJECT_KEY:
!             if ((*it)->i >= (*it)->nElems)
              {
                  /*
                   * All pairs within object already processed.  Report this to
--- 778,791 ----
               * v->val.object.pairs is not actually set, because we aren't
               * doing a full conversion
               */
!             (*it)->curIndex = 0;
!             (*it)->curDataOffset = 0;
              /* Set state for next call */
              (*it)->state = JBI_OBJECT_KEY;
              return WJB_BEGIN_OBJECT;

          case JBI_OBJECT_KEY:
!             if ((*it)->curIndex >= (*it)->nElems)
              {
                  /*
                   * All pairs within object already processed.  Report this to
*************** recurse:
*** 732,738 ****
              else
              {
                  /* Return key of a key/value pair.  */
!                 fillJsonbValue((*it)->children, (*it)->i * 2, (*it)->dataProper, val);
                  if (val->type != jbvString)
                      elog(ERROR, "unexpected jsonb type as object key");

--- 799,807 ----
              else
              {
                  /* Return key of a key/value pair.  */
!                 fillJsonbValue((*it)->children, (*it)->curIndex * 2,
!                                (*it)->dataProper, (*it)->curDataOffset,
!                                val);
                  if (val->type != jbvString)
                      elog(ERROR, "unexpected jsonb type as object key");

*************** recurse:
*** 745,752 ****
              /* Set state for next call */
              (*it)->state = JBI_OBJECT_KEY;

!             fillJsonbValue((*it)->children, ((*it)->i++) * 2 + 1,
!                            (*it)->dataProper, val);

              /*
               * Value may be a container, in which case we recurse with new,
--- 814,829 ----
              /* Set state for next call */
              (*it)->state = JBI_OBJECT_KEY;

!             (*it)->curDataOffset += JBE_LEN((*it)->children,
!                                             (*it)->curIndex * 2);
!
!             fillJsonbValue((*it)->children, (*it)->curIndex * 2 + 1,
!                            (*it)->dataProper, (*it)->curDataOffset,
!                            val);
!
!             (*it)->curDataOffset += JBE_LEN((*it)->children,
!                                             (*it)->curIndex * 2 + 1);
!             (*it)->curIndex++;

              /*
               * Value may be a container, in which case we recurse with new,
*************** convertJsonbArray(StringInfo buffer, JEn
*** 1340,1353 ****
      int            totallen;
      uint32        header;

!     /* Initialize pointer into conversion buffer at this level */
      offset = buffer->len;

      padBufferToInt(buffer);

      /*
!      * Construct the header Jentry, stored in the beginning of the variable-
!      * length payload.
       */
      header = val->val.array.nElems | JB_FARRAY;
      if (val->val.array.rawScalar)
--- 1417,1431 ----
      int            totallen;
      uint32        header;

!     /* Remember where variable-length data starts for this array */
      offset = buffer->len;

+     /* Align to 4-byte boundary (any padding counts as part of my data) */
      padBufferToInt(buffer);

      /*
!      * Construct the header Jentry and store it in the beginning of the
!      * variable-length payload.
       */
      header = val->val.array.nElems | JB_FARRAY;
      if (val->val.array.rawScalar)
*************** convertJsonbArray(StringInfo buffer, JEn
*** 1358,1364 ****
      }

      appendToBuffer(buffer, (char *) &header, sizeof(uint32));
!     /* reserve space for the JEntries of the elements. */
      metaoffset = reserveFromBuffer(buffer, sizeof(JEntry) * val->val.array.nElems);

      totallen = 0;
--- 1436,1443 ----
      }

      appendToBuffer(buffer, (char *) &header, sizeof(uint32));
!
!     /* Reserve space for the JEntries of the elements. */
      metaoffset = reserveFromBuffer(buffer, sizeof(JEntry) * val->val.array.nElems);

      totallen = 0;
*************** convertJsonbArray(StringInfo buffer, JEn
*** 1368,1391 ****
          int            len;
          JEntry        meta;

          convertJsonbValue(buffer, &meta, elem, level + 1);
-         len = meta & JENTRY_POSMASK;
-         totallen += len;

!         if (totallen > JENTRY_POSMASK)
              ereport(ERROR,
                      (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
                       errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
!                             JENTRY_POSMASK)));

-         if (i > 0)
-             meta = (meta & ~JENTRY_POSMASK) | totallen;
          copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
          metaoffset += sizeof(JEntry);
      }

      totallen = buffer->len - offset;

      /* Initialize the header of this node, in the container's JEntry array */
      *pheader = JENTRY_ISCONTAINER | totallen;
  }
--- 1447,1485 ----
          int            len;
          JEntry        meta;

+         /*
+          * Convert element, producing a JEntry and appending its
+          * variable-length data to buffer
+          */
          convertJsonbValue(buffer, &meta, elem, level + 1);

!         /*
!          * Bail out if total variable-length data exceeds what will fit in a
!          * JEntry length field.  We check this in each iteration, not just
!          * once at the end, to forestall possible integer overflow.
!          */
!         len = JBE_LENFLD(meta);
!         totallen += len;
!         if (totallen > JENTRY_LENMASK)
              ereport(ERROR,
                      (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
                       errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
!                             JENTRY_LENMASK)));

          copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
          metaoffset += sizeof(JEntry);
      }

+     /* Total data size is everything we've appended to buffer */
      totallen = buffer->len - offset;

+     /* Check length again, since we didn't include the metadata above */
+     if (totallen > JENTRY_LENMASK)
+         ereport(ERROR,
+                 (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+                  errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
+                         JENTRY_LENMASK)));
+
      /* Initialize the header of this node, in the container's JEntry array */
      *pheader = JENTRY_ISCONTAINER | totallen;
  }
*************** convertJsonbArray(StringInfo buffer, JEn
*** 1393,1457 ****
  static void
  convertJsonbObject(StringInfo buffer, JEntry *pheader, JsonbValue *val, int level)
  {
-     uint32        header;
      int            offset;
      int            metaoffset;
      int            i;
      int            totallen;

!     /* Initialize pointer into conversion buffer at this level */
      offset = buffer->len;

      padBufferToInt(buffer);

!     /* Initialize header */
      header = val->val.object.nPairs | JB_FOBJECT;
      appendToBuffer(buffer, (char *) &header, sizeof(uint32));

!     /* reserve space for the JEntries of the keys and values */
      metaoffset = reserveFromBuffer(buffer, sizeof(JEntry) * val->val.object.nPairs * 2);

      totallen = 0;
      for (i = 0; i < val->val.object.nPairs; i++)
      {
!         JsonbPair *pair = &val->val.object.pairs[i];
!         int len;
!         JEntry meta;

!         /* put key */
          convertJsonbScalar(buffer, &meta, &pair->key);

!         len = meta & JENTRY_POSMASK;
          totallen += len;

-         if (totallen > JENTRY_POSMASK)
-             ereport(ERROR,
-                     (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
-                      errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
-                             JENTRY_POSMASK)));
-
-         if (i > 0)
-             meta = (meta & ~JENTRY_POSMASK) | totallen;
          copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
          metaoffset += sizeof(JEntry);

!         convertJsonbValue(buffer, &meta, &pair->value, level);
!         len = meta & JENTRY_POSMASK;
!         totallen += len;

!         if (totallen > JENTRY_POSMASK)
!             ereport(ERROR,
!                     (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
!                      errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
!                             JENTRY_POSMASK)));

-         meta = (meta & ~JENTRY_POSMASK) | totallen;
          copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
          metaoffset += sizeof(JEntry);
      }

      totallen = buffer->len - offset;

      *pheader = JENTRY_ISCONTAINER | totallen;
  }

--- 1487,1570 ----
  static void
  convertJsonbObject(StringInfo buffer, JEntry *pheader, JsonbValue *val, int level)
  {
      int            offset;
      int            metaoffset;
      int            i;
      int            totallen;
+     uint32        header;

!     /* Remember where variable-length data starts for this object */
      offset = buffer->len;

+     /* Align to 4-byte boundary (any padding counts as part of my data) */
      padBufferToInt(buffer);

!     /*
!      * Construct the header Jentry and store it in the beginning of the
!      * variable-length payload.
!      */
      header = val->val.object.nPairs | JB_FOBJECT;
      appendToBuffer(buffer, (char *) &header, sizeof(uint32));

!     /* Reserve space for the JEntries of the keys and values. */
      metaoffset = reserveFromBuffer(buffer, sizeof(JEntry) * val->val.object.nPairs * 2);

      totallen = 0;
      for (i = 0; i < val->val.object.nPairs; i++)
      {
!         JsonbPair  *pair = &val->val.object.pairs[i];
!         int            len;
!         JEntry        meta;

!         /*
!          * Convert key, producing a JEntry and appending its variable-length
!          * data to buffer
!          */
          convertJsonbScalar(buffer, &meta, &pair->key);

!         len = JBE_LENFLD(meta);
          totallen += len;

          copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
          metaoffset += sizeof(JEntry);

!         /*
!          * Convert value, producing a JEntry and appending its variable-length
!          * data to buffer
!          */
!         convertJsonbValue(buffer, &meta, &pair->value, level + 1);

!         len = JBE_LENFLD(meta);
!         totallen += len;

          copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
          metaoffset += sizeof(JEntry);
+
+         /*
+          * Bail out if total variable-length data exceeds what will fit in a
+          * JEntry length field.  We check this in each iteration, not just
+          * once at the end, to forestall possible integer overflow.  But it
+          * should be sufficient to check once per iteration, since
+          * JENTRY_LENMASK is several bits narrower than int.
+          */
+         if (totallen > JENTRY_LENMASK)
+             ereport(ERROR,
+                     (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+                      errmsg("total size of jsonb object elements exceeds the maximum of %u bytes",
+                             JENTRY_LENMASK)));
      }

+     /* Total data size is everything we've appended to buffer */
      totallen = buffer->len - offset;

+     /* Check length again, since we didn't include the metadata above */
+     if (totallen > JENTRY_LENMASK)
+         ereport(ERROR,
+                 (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+                  errmsg("total size of jsonb object elements exceeds the maximum of %u bytes",
+                         JENTRY_LENMASK)));
+
+     /* Initialize the header of this node, in the container's JEntry array */
      *pheader = JENTRY_ISCONTAINER | totallen;
  }

diff --git a/src/include/utils/jsonb.h b/src/include/utils/jsonb.h
index 91e3e14..b9a4314 100644
*** a/src/include/utils/jsonb.h
--- b/src/include/utils/jsonb.h
*************** typedef struct JsonbValue JsonbValue;
*** 83,91 ****
   * buffer is accessed, but they can also be deep copied and passed around.
   *
   * Jsonb is a tree structure. Each node in the tree consists of a JEntry
!  * header, and a variable-length content.  The JEntry header indicates what
!  * kind of a node it is, e.g. a string or an array, and the offset and length
!  * of its variable-length portion within the container.
   *
   * The JEntry and the content of a node are not stored physically together.
   * Instead, the container array or object has an array that holds the JEntrys
--- 83,91 ----
   * buffer is accessed, but they can also be deep copied and passed around.
   *
   * Jsonb is a tree structure. Each node in the tree consists of a JEntry
!  * header and a variable-length content (possibly of zero size).  The JEntry
!  * header indicates what kind of a node it is, e.g. a string or an array,
!  * and includes the length of its variable-length portion.
   *
   * The JEntry and the content of a node are not stored physically together.
   * Instead, the container array or object has an array that holds the JEntrys
*************** typedef struct JsonbValue JsonbValue;
*** 95,133 ****
   * hold its JEntry. Hence, no JEntry header is stored for the root node.  It
   * is implicitly known that the root node must be an array or an object,
   * so we can get away without the type indicator as long as we can distinguish
!  * the two.  For that purpose, both an array and an object begins with a uint32
   * header field, which contains an JB_FOBJECT or JB_FARRAY flag.  When a naked
   * scalar value needs to be stored as a Jsonb value, what we actually store is
   * an array with one element, with the flags in the array's header field set
   * to JB_FSCALAR | JB_FARRAY.
   *
-  * To encode the length and offset of the variable-length portion of each
-  * node in a compact way, the JEntry stores only the end offset within the
-  * variable-length portion of the container node. For the first JEntry in the
-  * container's JEntry array, that equals to the length of the node data.  The
-  * begin offset and length of the rest of the entries can be calculated using
-  * the end offset of the previous JEntry in the array.
-  *
   * Overall, the Jsonb struct requires 4-bytes alignment. Within the struct,
   * the variable-length portion of some node types is aligned to a 4-byte
   * boundary, while others are not. When alignment is needed, the padding is
   * in the beginning of the node that requires it. For example, if a numeric
   * node is stored after a string node, so that the numeric node begins at
   * offset 3, the variable-length portion of the numeric node will begin with
!  * one padding byte.
   */

  /*
   * Jentry format.
   *
!  * The least significant 28 bits store the end offset of the entry (see
!  * JBE_ENDPOS, JBE_OFF, JBE_LEN macros below). The next three bits
!  * are used to store the type of the entry. The most significant bit
!  * is unused, and should be set to zero.
   */
  typedef uint32 JEntry;

! #define JENTRY_POSMASK            0x0FFFFFFF
  #define JENTRY_TYPEMASK            0x70000000

  /* values stored in the type bits */
--- 95,126 ----
   * hold its JEntry. Hence, no JEntry header is stored for the root node.  It
   * is implicitly known that the root node must be an array or an object,
   * so we can get away without the type indicator as long as we can distinguish
!  * the two.  For that purpose, both an array and an object begin with a uint32
   * header field, which contains an JB_FOBJECT or JB_FARRAY flag.  When a naked
   * scalar value needs to be stored as a Jsonb value, what we actually store is
   * an array with one element, with the flags in the array's header field set
   * to JB_FSCALAR | JB_FARRAY.
   *
   * Overall, the Jsonb struct requires 4-bytes alignment. Within the struct,
   * the variable-length portion of some node types is aligned to a 4-byte
   * boundary, while others are not. When alignment is needed, the padding is
   * in the beginning of the node that requires it. For example, if a numeric
   * node is stored after a string node, so that the numeric node begins at
   * offset 3, the variable-length portion of the numeric node will begin with
!  * one padding byte so that the actual numeric data is 4-byte aligned.
   */

  /*
   * Jentry format.
   *
!  * The least significant 28 bits store the data length of the entry (see
!  * JBE_LENFLD and JBE_LEN macros below). The next three bits store the type
!  * of the entry. The most significant bit is reserved for future use, and
!  * should be set to zero.
   */
  typedef uint32 JEntry;

! #define JENTRY_LENMASK            0x0FFFFFFF
  #define JENTRY_TYPEMASK            0x70000000

  /* values stored in the type bits */
*************** typedef uint32 JEntry;
*** 148,160 ****
  #define JBE_ISBOOL(je_)            (JBE_ISBOOL_TRUE(je_) || JBE_ISBOOL_FALSE(je_))

  /*
!  * Macros for getting the offset and length of an element. Note multiple
!  * evaluations and access to prior array element.
   */
! #define JBE_ENDPOS(je_)            ((je_) & JENTRY_POSMASK)
! #define JBE_OFF(ja, i)            ((i) == 0 ? 0 : JBE_ENDPOS((ja)[i - 1]))
! #define JBE_LEN(ja, i)            ((i) == 0 ? JBE_ENDPOS((ja)[i]) \
!                                  : JBE_ENDPOS((ja)[i]) - JBE_ENDPOS((ja)[i - 1]))

  /*
   * A jsonb array or object node, within a Jsonb Datum.
--- 141,150 ----
  #define JBE_ISBOOL(je_)            (JBE_ISBOOL_TRUE(je_) || JBE_ISBOOL_FALSE(je_))

  /*
!  * Macros for getting the data length of a JEntry.
   */
! #define JBE_LENFLD(je_)            ((je_) & JENTRY_LENMASK)
! #define JBE_LEN(ja, i)            JBE_LENFLD((ja)[i])

  /*
   * A jsonb array or object node, within a Jsonb Datum.
*************** typedef struct JsonbIterator
*** 287,306 ****
  {
      /* Container being iterated */
      JsonbContainer *container;
!     uint32        nElems;            /* Number of elements in children array (will be
!                                  * nPairs for objects) */
      bool        isScalar;        /* Pseudo-array scalar value? */
!     JEntry       *children;

      /* Current item in buffer (up to nElems, but must * 2 for objects) */
!     int            i;

!     /*
!      * Data proper.  This points just past end of children array.
!      * We use the JBE_OFF() macro on the Jentrys to find offsets of each
!      * child in this area.
!      */
!     char       *dataProper;

      /* Private state */
      JsonbIterState state;
--- 277,294 ----
  {
      /* Container being iterated */
      JsonbContainer *container;
!     uint32        nElems;            /* Number of elements in children array (will
!                                  * be nPairs for objects) */
      bool        isScalar;        /* Pseudo-array scalar value? */
!     JEntry       *children;        /* JEntrys for child nodes */
!     /* Data proper.  This points just past end of children array */
!     char       *dataProper;

      /* Current item in buffer (up to nElems, but must * 2 for objects) */
!     int            curIndex;

!     /* Data offset corresponding to current item */
!     uint32        curDataOffset;

      /* Private state */
      JsonbIterState state;
*************** extern Datum gin_consistent_jsonb_path(P
*** 344,349 ****
--- 332,338 ----
  extern Datum gin_triconsistent_jsonb_path(PG_FUNCTION_ARGS);

  /* Support functions */
+ extern uint32 getJsonbOffset(const JEntry *ja, int index);
  extern int    compareJsonbContainers(JsonbContainer *a, JsonbContainer *b);
  extern JsonbValue *findJsonbValueFromContainer(JsonbContainer *sheader,
                              uint32 flags,

Re: jsonb format is pessimal for toast compression

From
Josh Berkus
Date:
On 08/20/2014 08:29 AM, Tom Lane wrote:
> Since it's looking like this might be the direction we want to go, I took
> the time to flesh out my proof-of-concept patch.  The attached version
> takes care of cosmetic issues (like fixing the comments), and includes
> code to avoid O(N^2) penalties in findJsonbValueFromContainer and
> JsonbIteratorNext

OK, will test.

This means we need a beta3, no?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> This means we need a beta3, no?

If we change the on-disk format, I'd say so.  So we don't want to wait
around too long before deciding.
        regards, tom lane



Re: jsonb format is pessimal for toast compression

From
Josh Berkus
Date:
On 08/20/2014 08:29 AM, Tom Lane wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>> On 08/15/2014 04:19 PM, Tom Lane wrote:
>>> Personally I'd prefer to go to the all-lengths approach, but a large
>>> part of that comes from a subjective assessment that the hybrid approach
>>> is too messy.  Others might well disagree.
> 
>> ... So, that extraction test is about 1% *slower* than the basic Tom Lane
>> lengths-only patch, and still 80% slower than original JSONB.  And it's
>> the same size as the lengths-only version.
> 
> Since it's looking like this might be the direction we want to go, I took
> the time to flesh out my proof-of-concept patch.  The attached version
> takes care of cosmetic issues (like fixing the comments), and includes
> code to avoid O(N^2) penalties in findJsonbValueFromContainer and
> JsonbIteratorNext.  I'm not sure whether those changes will help
> noticeably on Josh's test case; for me, they seemed worth making, but
> they do not bring the code back to full speed parity with the all-offsets
> version.  But as we've been discussing, it seems likely that those costs
> would be swamped by compression and I/O considerations in most scenarios
> with large documents; and of course for small documents it hardly matters.

Table sizes and extraction times are unchanged from the prior patch
based on my workload.

We should be comparing all-lengths vs length-and-offset maybe using
another workload as well ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: jsonb format is pessimal for toast compression

From
Arthur Silva
Date:
What data are you using right now Josh?

There's the github archive http://www.githubarchive.org/
Here's some sample data https://gist.github.com/igrigorik/2017462




--
Arthur Silva



On Wed, Aug 20, 2014 at 6:09 PM, Josh Berkus <josh@agliodbs.com> wrote:
On 08/20/2014 08:29 AM, Tom Lane wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>> On 08/15/2014 04:19 PM, Tom Lane wrote:
>>> Personally I'd prefer to go to the all-lengths approach, but a large
>>> part of that comes from a subjective assessment that the hybrid approach
>>> is too messy.  Others might well disagree.
>
>> ... So, that extraction test is about 1% *slower* than the basic Tom Lane
>> lengths-only patch, and still 80% slower than original JSONB.  And it's
>> the same size as the lengths-only version.
>
> Since it's looking like this might be the direction we want to go, I took
> the time to flesh out my proof-of-concept patch.  The attached version
> takes care of cosmetic issues (like fixing the comments), and includes
> code to avoid O(N^2) penalties in findJsonbValueFromContainer and
> JsonbIteratorNext.  I'm not sure whether those changes will help
> noticeably on Josh's test case; for me, they seemed worth making, but
> they do not bring the code back to full speed parity with the all-offsets
> version.  But as we've been discussing, it seems likely that those costs
> would be swamped by compression and I/O considerations in most scenarios
> with large documents; and of course for small documents it hardly matters.

Table sizes and extraction times are unchanged from the prior patch
based on my workload.

We should be comparing all-lengths vs length-and-offset maybe using
another workload as well ...

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: jsonb format is pessimal for toast compression

From
Josh Berkus
Date:
On 08/20/2014 03:42 PM, Arthur Silva wrote:
> What data are you using right now Josh?

The same data as upthread.

Can you test the three patches (9.4 head, 9.4 with Tom's cleanup of
Heikki's patch, and 9.4 with Tom's latest lengths-only) on your workload?

I'm concerned that my workload is unusual and don't want us to make this
decision based entirely on it.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: jsonb format is pessimal for toast compression

From
Arthur Silva
Date:
On Thu, Aug 21, 2014 at 6:20 PM, Josh Berkus <josh@agliodbs.com> wrote:
On 08/20/2014 03:42 PM, Arthur Silva wrote:
> What data are you using right now Josh?

The same data as upthread.

Can you test the three patches (9.4 head, 9.4 with Tom's cleanup of
Heikki's patch, and 9.4 with Tom's latest lengths-only) on your workload?

I'm concerned that my workload is unusual and don't want us to make this
decision based entirely on it.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Here's my test results so far with the github archive data.

It's important to keep in mind that the PushEvent event objects that I use in the queries only contains a small number of keys (8 to be precise), so these tests don't really stress the changed code.

Anyway, in this dataset (with the small objects) using the all-lengths patch provide small compression savings but the overhead is minimal.

----------------

Test data: 610MB of Json -- 341969 items

Index size (jsonb_ops): 331MB

Test query 1: SELECT data->'url', data->'actor' FROM t_json WHERE data @> '{"type": "PushEvent"}'
Test query 1 items: 169732

Test query 2: SELECT data FROM t_json WHERE data @> '{"type": "PushEvent"}'
Test query 2 items:

----------------
HEAD (aka, all offsets) EXTENDED
Size: 374MB
Toast Size: 145MB

Test query 1 runtime: 680ms
Test query 2 runtime: 405ms
----------------
HEAD (aka, all offsets) EXTERNAL
Size: 366MB
Toast Size: 333MB

Test query 1 runtime: 505ms
Test query 2 runtime: 350ms
----------------
All Lengths (Tom Lane patch) EXTENDED
Size: 379MB
Toast Size: 108MB

Test query 1 runtime: 720ms
Test query 2 runtime: 420ms
----------------
All Lengths (Tom Lane patch) EXTERNAL
Size: 366MB
Toast Size: 333MB

Test query 1 runtime: 525ms
Test query 2 runtime: 355ms


--
Arthur Silva


Re: jsonb format is pessimal for toast compression

From
Heikki Linnakangas
Date:
On 08/16/2014 02:19 AM, Tom Lane wrote:
> I think the realistic alternatives at this point are either to
> switch to all-lengths as in my test patch, or to use the hybrid approach
> of Heikki's test patch.  IMO the major attraction of Heikki's patch
> is that it'd be upward compatible with existing beta installations,
> ie no initdb required (but thus, no opportunity to squeeze in a version
> identifier either).  It's not showing up terribly well in the performance
> tests I've been doing --- it's about halfway between HEAD and my patch on
> that extract-a-key-from-a-PLAIN-stored-column test.  But, just as with my
> patch, there are things that could be done to micro-optimize it by
> touching a bit more code.
>
> I did some quick stats comparing compressed sizes for the delicio.us
> data, printing quartiles as per Josh's lead:
>
> all-lengths    {440,569,609,655,1257}
> Heikki's patch    {456,582,624,671,1274}
> HEAD        {493,636,684,744,1485}
>
> (As before, this is pg_column_size of the jsonb within a table whose rows
> are wide enough to force tuptoaster.c to try to compress the jsonb;
> otherwise many of these values wouldn't get compressed.)  These documents
> don't have enough keys to trigger the first_success_by issue, so that
> HEAD doesn't look too awful, but still there's about an 11% gain from
> switching from offsets to lengths.  Heikki's method captures much of
> that but not all.
>
> Personally I'd prefer to go to the all-lengths approach, but a large
> part of that comes from a subjective assessment that the hybrid approach
> is too messy.  Others might well disagree.

It's not too pretty, no. But it would be nice to not have to make a 
tradeoff between lookup speed and compressibility.

Yet another idea is to store all lengths, but add an additional array of 
offsets to JsonbContainer. The array would contain the offset of, say, 
every 16th element. It would be very small compared to the lengths 
array, but would greatly speed up random access on a large array/object.

- Heikki




Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> On 08/16/2014 02:19 AM, Tom Lane wrote:
>> I think the realistic alternatives at this point are either to
>> switch to all-lengths as in my test patch, or to use the hybrid approach
>> of Heikki's test patch. ...
>> Personally I'd prefer to go to the all-lengths approach, but a large
>> part of that comes from a subjective assessment that the hybrid approach
>> is too messy.  Others might well disagree.

> It's not too pretty, no. But it would be nice to not have to make a 
> tradeoff between lookup speed and compressibility.

> Yet another idea is to store all lengths, but add an additional array of 
> offsets to JsonbContainer. The array would contain the offset of, say, 
> every 16th element. It would be very small compared to the lengths 
> array, but would greatly speed up random access on a large array/object.

That does nothing to address my basic concern about the patch, which is
that it's too complicated and therefore bug-prone.  Moreover, it'd lose
on-disk compatibility which is really the sole saving grace of the
proposal.

My feeling about it at this point is that the apparent speed gain from
using offsets is illusory: in practically all real-world cases where there
are enough keys or array elements for it to matter, costs associated with
compression (or rather failure to compress) will dominate any savings we
get from offset-assisted lookups.  I agree that the evidence for this
opinion is pretty thin ... but the evidence against it is nonexistent.
        regards, tom lane



Re: jsonb format is pessimal for toast compression

From
Josh Berkus
Date:
On 08/26/2014 07:51 AM, Tom Lane wrote:
> My feeling about it at this point is that the apparent speed gain from
> using offsets is illusory: in practically all real-world cases where there
> are enough keys or array elements for it to matter, costs associated with
> compression (or rather failure to compress) will dominate any savings we
> get from offset-assisted lookups.  I agree that the evidence for this
> opinion is pretty thin ... but the evidence against it is nonexistent.

Well, I have shown one test case which shows where lengths is a net
penalty.  However, for that to be the case, you have to have the
following conditions *all* be true:

* lots of top-level keys
* short values
* rows which are on the borderline for TOAST
* table which fits in RAM

... so that's a "special case" and if it's sub-optimal, no bigee.  Also,
it's not like it's an order-of-magnitude slower.

Anyway,  I called for feedback on by blog, and have gotten some:

http://www.databasesoup.com/2014/08/the-great-jsonb-tradeoff.html

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> Anyway,  I called for feedback on by blog, and have gotten some:
> http://www.databasesoup.com/2014/08/the-great-jsonb-tradeoff.html

I was hoping you'd get some useful data from that, but so far it seems
like a rehash of points made in the on-list thread :-(
        regards, tom lane



Re: jsonb format is pessimal for toast compression

From
Josh Berkus
Date:
On 08/26/2014 11:40 AM, Tom Lane wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>> Anyway,  I called for feedback on by blog, and have gotten some:
>> http://www.databasesoup.com/2014/08/the-great-jsonb-tradeoff.html
> 
> I was hoping you'd get some useful data from that, but so far it seems
> like a rehash of points made in the on-list thread :-(
> 
>             regards, tom lane

yah, me too. :-(

Unfortunately even the outside commentors don't seem to understand that
storage size *is* related to speed, it's exchanging I/O speed for CPU speed.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> On 08/26/2014 11:40 AM, Tom Lane wrote:
>> I was hoping you'd get some useful data from that, but so far it seems
>> like a rehash of points made in the on-list thread :-(

> Unfortunately even the outside commentors don't seem to understand that
> storage size *is* related to speed, it's exchanging I/O speed for CPU speed.

Yeah, exactly.  Given current hardware trends, data compression is
becoming more of a win not less as time goes on: CPU cycles are cheap
even compared to main memory access, let alone mass storage.  So I'm
thinking we want to adopt a compression-friendly data format even if
it measures out as a small loss currently.

I wish it were cache-friendly too, per the upthread tangent about having
to fetch keys from all over the place within a large JSON object.

... and while I was typing that sentence, lightning struck.  The existing
arrangement of object subfields with keys and values interleaved is just
plain dumb.  We should rearrange that as all the keys in order, then all
the values in the same order.  Then the keys are naturally adjacent in
memory and object-key searches become much more cache-friendly: you
probably touch most of the key portion of the object, but none of the
values portion, until you know exactly what part of the latter to fetch.
This approach might complicate the lookup logic marginally but I bet not
very much; and it will be a huge help if we ever want to do smart access
to EXTERNAL (non-compressed) JSON values.

I will go prototype that just to see how much code rearrangement is
required.
        regards, tom lane



Re: jsonb format is pessimal for toast compression

From
Andres Freund
Date:
On 2014-08-26 15:01:27 -0400, Tom Lane wrote:
> Josh Berkus <josh@agliodbs.com> writes:
> > On 08/26/2014 11:40 AM, Tom Lane wrote:
> >> I was hoping you'd get some useful data from that, but so far it seems
> >> like a rehash of points made in the on-list thread :-(
> 
> > Unfortunately even the outside commentors don't seem to understand that
> > storage size *is* related to speed, it's exchanging I/O speed for CPU speed.
> 
> Yeah, exactly.  Given current hardware trends, data compression is
> becoming more of a win not less as time goes on: CPU cycles are cheap
> even compared to main memory access, let alone mass storage.  So I'm
> thinking we want to adopt a compression-friendly data format even if
> it measures out as a small loss currently.

On the other hand the majority of databases these day fit into main
memory due to its increasing sizes and postgres is more often CPU than
IO bound.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: jsonb format is pessimal for toast compression

From
Laurence Rowe
Date:
On 26 August 2014 11:34, Josh Berkus <josh@agliodbs.com> wrote:
On 08/26/2014 07:51 AM, Tom Lane wrote:
> My feeling about it at this point is that the apparent speed gain from
> using offsets is illusory: in practically all real-world cases where there
> are enough keys or array elements for it to matter, costs associated with
> compression (or rather failure to compress) will dominate any savings we
> get from offset-assisted lookups.  I agree that the evidence for this
> opinion is pretty thin ... but the evidence against it is nonexistent.

Well, I have shown one test case which shows where lengths is a net
penalty.  However, for that to be the case, you have to have the
following conditions *all* be true:

* lots of top-level keys
* short values
* rows which are on the borderline for TOAST
* table which fits in RAM

... so that's a "special case" and if it's sub-optimal, no bigee.  Also,
it's not like it's an order-of-magnitude slower.

Anyway,  I called for feedback on by blog, and have gotten some:

http://www.databasesoup.com/2014/08/the-great-jsonb-tradeoff.html

It would be really interesting to see your results with column STORAGE EXTERNAL for that benchmark. I think it is important to separate out the slowdown due to decompression now being needed vs that inherent in the new format, we can always switch off compression on a per-column basis using STORAGE EXTERNAL.


My JSON data has smallish objects with a small number of keys, it barely compresses at all with the patch and shows similar results to Arthur's data. Across ~500K rows I get:

encoded=# select count(properties->>'submitted_by') from compressed;
 count  
--------
 431948
(1 row)

Time: 250.512 ms

encoded=# select count(properties->>'submitted_by') from uncompressed;
 count  
--------
 431948
(1 row)

Time: 218.552 ms


Laurence

Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-08-26 15:01:27 -0400, Tom Lane wrote:
>> Yeah, exactly.  Given current hardware trends, data compression is
>> becoming more of a win not less as time goes on: CPU cycles are cheap
>> even compared to main memory access, let alone mass storage.  So I'm
>> thinking we want to adopt a compression-friendly data format even if
>> it measures out as a small loss currently.

> On the other hand the majority of databases these day fit into main
> memory due to its increasing sizes and postgres is more often CPU than
> IO bound.

Well, better data compression helps make that true ;-).  And don't forget
cache effects; actual main memory is considered slow these days.
        regards, tom lane



Re: jsonb format is pessimal for toast compression

From
Claudio Freire
Date:
On Tue, Aug 26, 2014 at 4:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>> On 08/26/2014 11:40 AM, Tom Lane wrote:
>>> I was hoping you'd get some useful data from that, but so far it seems
>>> like a rehash of points made in the on-list thread :-(
>
>> Unfortunately even the outside commentors don't seem to understand that
>> storage size *is* related to speed, it's exchanging I/O speed for CPU speed.
>
> Yeah, exactly.  Given current hardware trends, data compression is
> becoming more of a win not less as time goes on: CPU cycles are cheap
> even compared to main memory access, let alone mass storage.  So I'm
> thinking we want to adopt a compression-friendly data format even if
> it measures out as a small loss currently.
>
> I wish it were cache-friendly too, per the upthread tangent about having
> to fetch keys from all over the place within a large JSON object.


What about my earlier proposal?

An in-memory compressed representation would greatly help cache
locality, more so if you pack keys as you mentioned.



Re: jsonb format is pessimal for toast compression

From
Andres Freund
Date:
On 2014-08-26 15:17:13 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2014-08-26 15:01:27 -0400, Tom Lane wrote:
> >> Yeah, exactly.  Given current hardware trends, data compression is
> >> becoming more of a win not less as time goes on: CPU cycles are cheap
> >> even compared to main memory access, let alone mass storage.  So I'm
> >> thinking we want to adopt a compression-friendly data format even if
> >> it measures out as a small loss currently.
> 
> > On the other hand the majority of databases these day fit into main
> > memory due to its increasing sizes and postgres is more often CPU than
> > IO bound.
> 
> Well, better data compression helps make that true ;-).

People disable toast compression though because it results in better
performance :(. Part of that could be fixed by a faster compression
method, part of it by decompressing less often. But still.

> And don't forget cache effects; actual main memory is considered slow
> these days.

Right. But that plays the other way round too. Compressed datums need to
be copied to be accessed uncompressed. Whereas at least in comparison to
inline compressed datums that's not necessary.

Anyway, that's just to say that I don't really agree that CPU overhead
is a worthy price to pay for storage efficiency if the gains are small.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: jsonb format is pessimal for toast compression

From
Peter Geoghegan
Date:
On Tue, Aug 26, 2014 at 12:27 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> Anyway, that's just to say that I don't really agree that CPU overhead
> is a worthy price to pay for storage efficiency if the gains are small.

+1

-- 
Peter Geoghegan



Re: jsonb format is pessimal for toast compression

From
Josh Berkus
Date:
On 08/26/2014 12:27 PM, Andres Freund wrote:
> Anyway, that's just to say that I don't really agree that CPU overhead
> is a worthy price to pay for storage efficiency if the gains are small.

But in this case the gains aren't small; we're talking up to 60% smaller
storage.

Testing STORAGE EXTENDED soon.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
I wrote:
> I wish it were cache-friendly too, per the upthread tangent about having
> to fetch keys from all over the place within a large JSON object.

> ... and while I was typing that sentence, lightning struck.  The existing
> arrangement of object subfields with keys and values interleaved is just
> plain dumb.  We should rearrange that as all the keys in order, then all
> the values in the same order.  Then the keys are naturally adjacent in
> memory and object-key searches become much more cache-friendly: you
> probably touch most of the key portion of the object, but none of the
> values portion, until you know exactly what part of the latter to fetch.
> This approach might complicate the lookup logic marginally but I bet not
> very much; and it will be a huge help if we ever want to do smart access
> to EXTERNAL (non-compressed) JSON values.

> I will go prototype that just to see how much code rearrangement is
> required.

This looks pretty good from a coding point of view.  I have not had time
yet to see if it affects the speed of the benchmark cases we've been
trying.  I suspect that it won't make much difference in them.  I think
if we do decide to make an on-disk format change, we should seriously
consider including this change.

The same concept could be applied to offset-based storage of course,
although I rather doubt that we'd make that combination of choices since
it would be giving up on-disk compatibility for benefits that are mostly
in the future.

Attached are two patches: one is a "delta" against the last jsonb-lengths
patch I posted, and the other is a "merged" patch showing the total change
from HEAD, for ease of application.

            regards, tom lane

diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index e47eaea..4e7fe67 100644
*** a/src/backend/utils/adt/jsonb_util.c
--- b/src/backend/utils/adt/jsonb_util.c
***************
*** 26,33 ****
   * in MaxAllocSize, and the number of elements (or pairs) must fit in the bits
   * reserved for that in the JsonbContainer.header field.
   *
!  * (the total size of an array's elements is also limited by JENTRY_LENMASK,
!  * but we're not concerned about that here)
   */
  #define JSONB_MAX_ELEMS (Min(MaxAllocSize / sizeof(JsonbValue), JB_CMASK))
  #define JSONB_MAX_PAIRS (Min(MaxAllocSize / sizeof(JsonbPair), JB_CMASK))
--- 26,33 ----
   * in MaxAllocSize, and the number of elements (or pairs) must fit in the bits
   * reserved for that in the JsonbContainer.header field.
   *
!  * (The total size of an array's or object's elements is also limited by
!  * JENTRY_LENMASK, but we're not concerned about that here.)
   */
  #define JSONB_MAX_ELEMS (Min(MaxAllocSize / sizeof(JsonbValue), JB_CMASK))
  #define JSONB_MAX_PAIRS (Min(MaxAllocSize / sizeof(JsonbPair), JB_CMASK))
*************** findJsonbValueFromContainer(JsonbContain
*** 294,303 ****
  {
      JEntry       *children = container->children;
      int            count = (container->header & JB_CMASK);
!     JsonbValue *result = palloc(sizeof(JsonbValue));

      Assert((flags & ~(JB_FARRAY | JB_FOBJECT)) == 0);

      if (flags & JB_FARRAY & container->header)
      {
          char       *base_addr = (char *) (children + count);
--- 294,309 ----
  {
      JEntry       *children = container->children;
      int            count = (container->header & JB_CMASK);
!     JsonbValue *result;

      Assert((flags & ~(JB_FARRAY | JB_FOBJECT)) == 0);

+     /* Quick out without a palloc cycle if object/array is empty */
+     if (count <= 0)
+         return NULL;
+
+     result = palloc(sizeof(JsonbValue));
+
      if (flags & JB_FARRAY & container->header)
      {
          char       *base_addr = (char *) (children + count);
*************** findJsonbValueFromContainer(JsonbContain
*** 323,329 ****
          char       *base_addr = (char *) (children + count * 2);
          uint32       *offsets;
          uint32        lastoff;
!         int            lastoffpos;
          uint32        stopLow = 0,
                      stopHigh = count;

--- 329,335 ----
          char       *base_addr = (char *) (children + count * 2);
          uint32       *offsets;
          uint32        lastoff;
!         int            i;
          uint32        stopLow = 0,
                      stopHigh = count;

*************** findJsonbValueFromContainer(JsonbContain
*** 332,379 ****

          /*
           * We use a cache to avoid redundant getJsonbOffset() computations
!          * inside the search loop.  Note that count may well be zero at this
!          * point; to avoid an ugly special case for initializing lastoff and
!          * lastoffpos, we allocate one extra array element.
           */
!         offsets = (uint32 *) palloc((count * 2 + 1) * sizeof(uint32));
!         offsets[0] = lastoff = 0;
!         lastoffpos = 0;

          /* Binary search on object/pair keys *only* */
          while (stopLow < stopHigh)
          {
              uint32        stopMiddle;
-             int            index;
              int            difference;
              JsonbValue    candidate;

              stopMiddle = stopLow + (stopHigh - stopLow) / 2;

-             /*
-              * Compensate for the fact that we're searching through pairs (not
-              * entries).
-              */
-             index = stopMiddle * 2;
-
-             /* Update the offsets cache through at least index+1 */
-             while (lastoffpos <= index)
-             {
-                 lastoff += JBE_LEN(children, lastoffpos);
-                 offsets[++lastoffpos] = lastoff;
-             }
-
              candidate.type = jbvString;
!             candidate.val.string.val = base_addr + offsets[index];
!             candidate.val.string.len = JBE_LEN(children, index);

              difference = lengthCompareJsonbStringValue(&candidate, key);

              if (difference == 0)
              {
!                 /* Found our key, return value */
!                 fillJsonbValue(children, index + 1,
!                                base_addr, offsets[index + 1],
                                 result);

                  pfree(offsets);
--- 338,383 ----

          /*
           * We use a cache to avoid redundant getJsonbOffset() computations
!          * inside the search loop.  The entire cache can be filled immediately
!          * since we expect to need the last offset for value access.  (This
!          * choice could lose if the key is not present, but avoiding extra
!          * logic inside the search loop probably makes up for that.)
           */
!         offsets = (uint32 *) palloc(count * sizeof(uint32));
!         lastoff = 0;
!         for (i = 0; i < count; i++)
!         {
!             offsets[i] = lastoff;
!             lastoff += JBE_LEN(children, i);
!         }
!         /* lastoff now has the offset of the first value item */

          /* Binary search on object/pair keys *only* */
          while (stopLow < stopHigh)
          {
              uint32        stopMiddle;
              int            difference;
              JsonbValue    candidate;

              stopMiddle = stopLow + (stopHigh - stopLow) / 2;

              candidate.type = jbvString;
!             candidate.val.string.val = base_addr + offsets[stopMiddle];
!             candidate.val.string.len = JBE_LEN(children, stopMiddle);

              difference = lengthCompareJsonbStringValue(&candidate, key);

              if (difference == 0)
              {
!                 /* Found our key, return corresponding value */
!                 int            index = stopMiddle + count;
!
!                 /* navigate to appropriate offset */
!                 for (i = count; i < index; i++)
!                     lastoff += JBE_LEN(children, i);
!
!                 fillJsonbValue(children, index,
!                                base_addr, lastoff,
                                 result);

                  pfree(offsets);
*************** recurse:
*** 730,735 ****
--- 734,740 ----
              val->val.array.rawScalar = (*it)->isScalar;
              (*it)->curIndex = 0;
              (*it)->curDataOffset = 0;
+             (*it)->curValueOffset = 0;    /* not actually used */
              /* Set state for next call */
              (*it)->state = JBI_ARRAY_ELEM;
              return WJB_BEGIN_ARRAY;
*************** recurse:
*** 780,785 ****
--- 785,792 ----
               */
              (*it)->curIndex = 0;
              (*it)->curDataOffset = 0;
+             (*it)->curValueOffset = getJsonbOffset((*it)->children,
+                                                    (*it)->nElems);
              /* Set state for next call */
              (*it)->state = JBI_OBJECT_KEY;
              return WJB_BEGIN_OBJECT;
*************** recurse:
*** 799,805 ****
              else
              {
                  /* Return key of a key/value pair.  */
!                 fillJsonbValue((*it)->children, (*it)->curIndex * 2,
                                 (*it)->dataProper, (*it)->curDataOffset,
                                 val);
                  if (val->type != jbvString)
--- 806,812 ----
              else
              {
                  /* Return key of a key/value pair.  */
!                 fillJsonbValue((*it)->children, (*it)->curIndex,
                                 (*it)->dataProper, (*it)->curDataOffset,
                                 val);
                  if (val->type != jbvString)
*************** recurse:
*** 814,828 ****
              /* Set state for next call */
              (*it)->state = JBI_OBJECT_KEY;

!             (*it)->curDataOffset += JBE_LEN((*it)->children,
!                                             (*it)->curIndex * 2);
!
!             fillJsonbValue((*it)->children, (*it)->curIndex * 2 + 1,
!                            (*it)->dataProper, (*it)->curDataOffset,
                             val);

              (*it)->curDataOffset += JBE_LEN((*it)->children,
!                                             (*it)->curIndex * 2 + 1);
              (*it)->curIndex++;

              /*
--- 821,834 ----
              /* Set state for next call */
              (*it)->state = JBI_OBJECT_KEY;

!             fillJsonbValue((*it)->children, (*it)->curIndex + (*it)->nElems,
!                            (*it)->dataProper, (*it)->curValueOffset,
                             val);

              (*it)->curDataOffset += JBE_LEN((*it)->children,
!                                             (*it)->curIndex);
!             (*it)->curValueOffset += JBE_LEN((*it)->children,
!                                              (*it)->curIndex + (*it)->nElems);
              (*it)->curIndex++;

              /*
*************** convertJsonbObject(StringInfo buffer, JE
*** 1509,1514 ****
--- 1515,1524 ----
      /* Reserve space for the JEntries of the keys and values. */
      metaoffset = reserveFromBuffer(buffer, sizeof(JEntry) * val->val.object.nPairs * 2);

+     /*
+      * Iterate over the keys, then over the values, since that is the ordering
+      * we want in the on-disk representation.
+      */
      totallen = 0;
      for (i = 0; i < val->val.object.nPairs; i++)
      {
*************** convertJsonbObject(StringInfo buffer, JE
*** 1529,1534 ****
--- 1539,1561 ----
          metaoffset += sizeof(JEntry);

          /*
+          * Bail out if total variable-length data exceeds what will fit in a
+          * JEntry length field.  We check this in each iteration, not just
+          * once at the end, to forestall possible integer overflow.
+          */
+         if (totallen > JENTRY_LENMASK)
+             ereport(ERROR,
+                     (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+                      errmsg("total size of jsonb object elements exceeds the maximum of %u bytes",
+                             JENTRY_LENMASK)));
+     }
+     for (i = 0; i < val->val.object.nPairs; i++)
+     {
+         JsonbPair  *pair = &val->val.object.pairs[i];
+         int            len;
+         JEntry        meta;
+
+         /*
           * Convert value, producing a JEntry and appending its variable-length
           * data to buffer
           */
*************** convertJsonbObject(StringInfo buffer, JE
*** 1543,1551 ****
          /*
           * Bail out if total variable-length data exceeds what will fit in a
           * JEntry length field.  We check this in each iteration, not just
!          * once at the end, to forestall possible integer overflow.  But it
!          * should be sufficient to check once per iteration, since
!          * JENTRY_LENMASK is several bits narrower than int.
           */
          if (totallen > JENTRY_LENMASK)
              ereport(ERROR,
--- 1570,1576 ----
          /*
           * Bail out if total variable-length data exceeds what will fit in a
           * JEntry length field.  We check this in each iteration, not just
!          * once at the end, to forestall possible integer overflow.
           */
          if (totallen > JENTRY_LENMASK)
              ereport(ERROR,
diff --git a/src/include/utils/jsonb.h b/src/include/utils/jsonb.h
index b9a4314..f9472af 100644
*** a/src/include/utils/jsonb.h
--- b/src/include/utils/jsonb.h
*************** typedef uint32 JEntry;
*** 149,156 ****
  /*
   * A jsonb array or object node, within a Jsonb Datum.
   *
!  * An array has one child for each element. An object has two children for
!  * each key/value pair.
   */
  typedef struct JsonbContainer
  {
--- 149,160 ----
  /*
   * A jsonb array or object node, within a Jsonb Datum.
   *
!  * An array has one child for each element, stored in array order.
!  *
!  * An object has two children for each key/value pair.  The keys all appear
!  * first, in key sort order; then the values appear, in an order matching the
!  * key order.  This arrangement keeps the keys compact in memory, making a
!  * search for a particular key more cache-friendly.
   */
  typedef struct JsonbContainer
  {
*************** typedef struct JsonbContainer
*** 162,169 ****
  } JsonbContainer;

  /* flags for the header-field in JsonbContainer */
! #define JB_CMASK                0x0FFFFFFF
! #define JB_FSCALAR                0x10000000
  #define JB_FOBJECT                0x20000000
  #define JB_FARRAY                0x40000000

--- 166,173 ----
  } JsonbContainer;

  /* flags for the header-field in JsonbContainer */
! #define JB_CMASK                0x0FFFFFFF        /* mask for count field */
! #define JB_FSCALAR                0x10000000        /* flag bits */
  #define JB_FOBJECT                0x20000000
  #define JB_FARRAY                0x40000000

*************** struct JsonbValue
*** 238,255 ****
                                       (jsonbval)->type <= jbvBool)

  /*
!  * Pair within an Object.
   *
!  * Pairs with duplicate keys are de-duplicated.  We store the order for the
!  * benefit of doing so in a well-defined way with respect to the original
!  * observed order (which is "last observed wins").  This is only used briefly
!  * when originally constructing a Jsonb.
   */
  struct JsonbPair
  {
      JsonbValue    key;            /* Must be a jbvString */
      JsonbValue    value;            /* May be of any type */
!     uint32        order;            /* preserves order of pairs with equal keys */
  };

  /* Conversion state used when parsing Jsonb from text, or for type coercion */
--- 242,261 ----
                                       (jsonbval)->type <= jbvBool)

  /*
!  * Key/value pair within an Object.
   *
!  * This struct type is only used briefly while constructing a Jsonb; it is
!  * *not* the on-disk representation.
!  *
!  * Pairs with duplicate keys are de-duplicated.  We store the originally
!  * observed pair ordering for the purpose of removing duplicates in a
!  * well-defined way (which is "last observed wins").
   */
  struct JsonbPair
  {
      JsonbValue    key;            /* Must be a jbvString */
      JsonbValue    value;            /* May be of any type */
!     uint32        order;            /* Pair's index in original sequence */
  };

  /* Conversion state used when parsing Jsonb from text, or for type coercion */
*************** typedef struct JsonbIterator
*** 284,295 ****
      /* Data proper.  This points just past end of children array */
      char       *dataProper;

!     /* Current item in buffer (up to nElems, but must * 2 for objects) */
      int            curIndex;

      /* Data offset corresponding to current item */
      uint32        curDataOffset;

      /* Private state */
      JsonbIterState state;

--- 290,308 ----
      /* Data proper.  This points just past end of children array */
      char       *dataProper;

!     /* Current item in buffer (up to nElems) */
      int            curIndex;

      /* Data offset corresponding to current item */
      uint32        curDataOffset;

+     /*
+      * If the container is an object, we want to return keys and values
+      * alternately; so curDataOffset points to the current key, and
+      * curValueOffset points to the current value.
+      */
+     uint32        curValueOffset;
+
      /* Private state */
      JsonbIterState state;

diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index 2fd87fc..456011a 100644
*** a/src/backend/utils/adt/jsonb.c
--- b/src/backend/utils/adt/jsonb.c
*************** jsonb_from_cstring(char *json, int len)
*** 196,207 ****
  static size_t
  checkStringLen(size_t len)
  {
!     if (len > JENTRY_POSMASK)
          ereport(ERROR,
                  (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
                   errmsg("string too long to represent as jsonb string"),
                   errdetail("Due to an implementation restriction, jsonb strings cannot exceed %d bytes.",
!                            JENTRY_POSMASK)));

      return len;
  }
--- 196,207 ----
  static size_t
  checkStringLen(size_t len)
  {
!     if (len > JENTRY_LENMASK)
          ereport(ERROR,
                  (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
                   errmsg("string too long to represent as jsonb string"),
                   errdetail("Due to an implementation restriction, jsonb strings cannot exceed %d bytes.",
!                            JENTRY_LENMASK)));

      return len;
  }
diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index 04f35bf..4e7fe67 100644
*** a/src/backend/utils/adt/jsonb_util.c
--- b/src/backend/utils/adt/jsonb_util.c
***************
*** 26,40 ****
   * in MaxAllocSize, and the number of elements (or pairs) must fit in the bits
   * reserved for that in the JsonbContainer.header field.
   *
!  * (the total size of an array's elements is also limited by JENTRY_POSMASK,
!  * but we're not concerned about that here)
   */
  #define JSONB_MAX_ELEMS (Min(MaxAllocSize / sizeof(JsonbValue), JB_CMASK))
  #define JSONB_MAX_PAIRS (Min(MaxAllocSize / sizeof(JsonbPair), JB_CMASK))

! static void fillJsonbValue(JEntry *array, int index, char *base_addr,
                 JsonbValue *result);
! static bool    equalsJsonbScalarValue(JsonbValue *a, JsonbValue *b);
  static int    compareJsonbScalarValue(JsonbValue *a, JsonbValue *b);
  static Jsonb *convertToJsonb(JsonbValue *val);
  static void convertJsonbValue(StringInfo buffer, JEntry *header, JsonbValue *val, int level);
--- 26,41 ----
   * in MaxAllocSize, and the number of elements (or pairs) must fit in the bits
   * reserved for that in the JsonbContainer.header field.
   *
!  * (The total size of an array's or object's elements is also limited by
!  * JENTRY_LENMASK, but we're not concerned about that here.)
   */
  #define JSONB_MAX_ELEMS (Min(MaxAllocSize / sizeof(JsonbValue), JB_CMASK))
  #define JSONB_MAX_PAIRS (Min(MaxAllocSize / sizeof(JsonbPair), JB_CMASK))

! static void fillJsonbValue(JEntry *children, int index,
!                char *base_addr, uint32 offset,
                 JsonbValue *result);
! static bool equalsJsonbScalarValue(JsonbValue *a, JsonbValue *b);
  static int    compareJsonbScalarValue(JsonbValue *a, JsonbValue *b);
  static Jsonb *convertToJsonb(JsonbValue *val);
  static void convertJsonbValue(StringInfo buffer, JEntry *header, JsonbValue *val, int level);
*************** static void convertJsonbArray(StringInfo
*** 42,48 ****
  static void convertJsonbObject(StringInfo buffer, JEntry *header, JsonbValue *val, int level);
  static void convertJsonbScalar(StringInfo buffer, JEntry *header, JsonbValue *scalarVal);

! static int reserveFromBuffer(StringInfo buffer, int len);
  static void appendToBuffer(StringInfo buffer, const char *data, int len);
  static void copyToBuffer(StringInfo buffer, int offset, const char *data, int len);
  static short padBufferToInt(StringInfo buffer);
--- 43,49 ----
  static void convertJsonbObject(StringInfo buffer, JEntry *header, JsonbValue *val, int level);
  static void convertJsonbScalar(StringInfo buffer, JEntry *header, JsonbValue *scalarVal);

! static int    reserveFromBuffer(StringInfo buffer, int len);
  static void appendToBuffer(StringInfo buffer, const char *data, int len);
  static void copyToBuffer(StringInfo buffer, int offset, const char *data, int len);
  static short padBufferToInt(StringInfo buffer);
*************** JsonbValueToJsonb(JsonbValue *val)
*** 108,113 ****
--- 109,135 ----
  }

  /*
+  * Get the offset of the variable-length portion of a Jsonb node within
+  * the variable-length-data part of its container.  The node is identified
+  * by index within the container's JEntry array.
+  *
+  * We do this by adding up the lengths of all the previous nodes'
+  * variable-length portions.  It's best to avoid using this function when
+  * iterating through all the nodes in a container, since that would result
+  * in O(N^2) work.
+  */
+ uint32
+ getJsonbOffset(const JEntry *ja, int index)
+ {
+     uint32        off = 0;
+     int            i;
+
+     for (i = 0; i < index; i++)
+         off += JBE_LEN(ja, i);
+     return off;
+ }
+
+ /*
   * BT comparator worker function.  Returns an integer less than, equal to, or
   * greater than zero, indicating whether a is less than, equal to, or greater
   * than b.  Consistent with the requirements for a B-Tree operator class
*************** compareJsonbContainers(JsonbContainer *a
*** 201,207 ****
               *
               * If the two values were of the same container type, then there'd
               * have been a chance to observe the variation in the number of
!              * elements/pairs (when processing WJB_BEGIN_OBJECT, say).  They're
               * either two heterogeneously-typed containers, or a container and
               * some scalar type.
               *
--- 223,229 ----
               *
               * If the two values were of the same container type, then there'd
               * have been a chance to observe the variation in the number of
!              * elements/pairs (when processing WJB_BEGIN_OBJECT, say). They're
               * either two heterogeneously-typed containers, or a container and
               * some scalar type.
               *
*************** findJsonbValueFromContainer(JsonbContain
*** 272,333 ****
  {
      JEntry       *children = container->children;
      int            count = (container->header & JB_CMASK);
!     JsonbValue *result = palloc(sizeof(JsonbValue));

      Assert((flags & ~(JB_FARRAY | JB_FOBJECT)) == 0);

      if (flags & JB_FARRAY & container->header)
      {
          char       *base_addr = (char *) (children + count);
          int            i;

          for (i = 0; i < count; i++)
          {
!             fillJsonbValue(children, i, base_addr, result);

              if (key->type == result->type)
              {
                  if (equalsJsonbScalarValue(key, result))
                      return result;
              }
          }
      }
      else if (flags & JB_FOBJECT & container->header)
      {
          /* Since this is an object, account for *Pairs* of Jentrys */
          char       *base_addr = (char *) (children + count * 2);
          uint32        stopLow = 0,
!                     stopMiddle;

!         /* Object key past by caller must be a string */
          Assert(key->type == jbvString);

          /* Binary search on object/pair keys *only* */
!         while (stopLow < count)
          {
!             int            index;
              int            difference;
              JsonbValue    candidate;

!             /*
!              * Note how we compensate for the fact that we're iterating
!              * through pairs (not entries) throughout.
!              */
!             stopMiddle = stopLow + (count - stopLow) / 2;
!
!             index = stopMiddle * 2;

              candidate.type = jbvString;
!             candidate.val.string.val = base_addr + JBE_OFF(children, index);
!             candidate.val.string.len = JBE_LEN(children, index);

              difference = lengthCompareJsonbStringValue(&candidate, key);

              if (difference == 0)
              {
!                 /* Found our key, return value */
!                 fillJsonbValue(children, index + 1, base_addr, result);

                  return result;
              }
              else
--- 294,386 ----
  {
      JEntry       *children = container->children;
      int            count = (container->header & JB_CMASK);
!     JsonbValue *result;

      Assert((flags & ~(JB_FARRAY | JB_FOBJECT)) == 0);

+     /* Quick out without a palloc cycle if object/array is empty */
+     if (count <= 0)
+         return NULL;
+
+     result = palloc(sizeof(JsonbValue));
+
      if (flags & JB_FARRAY & container->header)
      {
          char       *base_addr = (char *) (children + count);
+         uint32        offset = 0;
          int            i;

          for (i = 0; i < count; i++)
          {
!             fillJsonbValue(children, i, base_addr, offset, result);

              if (key->type == result->type)
              {
                  if (equalsJsonbScalarValue(key, result))
                      return result;
              }
+
+             offset += JBE_LEN(children, i);
          }
      }
      else if (flags & JB_FOBJECT & container->header)
      {
          /* Since this is an object, account for *Pairs* of Jentrys */
          char       *base_addr = (char *) (children + count * 2);
+         uint32       *offsets;
+         uint32        lastoff;
+         int            i;
          uint32        stopLow = 0,
!                     stopHigh = count;

!         /* Object key passed by caller must be a string */
          Assert(key->type == jbvString);

+         /*
+          * We use a cache to avoid redundant getJsonbOffset() computations
+          * inside the search loop.  The entire cache can be filled immediately
+          * since we expect to need the last offset for value access.  (This
+          * choice could lose if the key is not present, but avoiding extra
+          * logic inside the search loop probably makes up for that.)
+          */
+         offsets = (uint32 *) palloc(count * sizeof(uint32));
+         lastoff = 0;
+         for (i = 0; i < count; i++)
+         {
+             offsets[i] = lastoff;
+             lastoff += JBE_LEN(children, i);
+         }
+         /* lastoff now has the offset of the first value item */
+
          /* Binary search on object/pair keys *only* */
!         while (stopLow < stopHigh)
          {
!             uint32        stopMiddle;
              int            difference;
              JsonbValue    candidate;

!             stopMiddle = stopLow + (stopHigh - stopLow) / 2;

              candidate.type = jbvString;
!             candidate.val.string.val = base_addr + offsets[stopMiddle];
!             candidate.val.string.len = JBE_LEN(children, stopMiddle);

              difference = lengthCompareJsonbStringValue(&candidate, key);

              if (difference == 0)
              {
!                 /* Found our key, return corresponding value */
!                 int            index = stopMiddle + count;
!
!                 /* navigate to appropriate offset */
!                 for (i = count; i < index; i++)
!                     lastoff += JBE_LEN(children, i);
!
!                 fillJsonbValue(children, index,
!                                base_addr, lastoff,
!                                result);

+                 pfree(offsets);
                  return result;
              }
              else
*************** findJsonbValueFromContainer(JsonbContain
*** 335,343 ****
                  if (difference < 0)
                      stopLow = stopMiddle + 1;
                  else
!                     count = stopMiddle;
              }
          }
      }

      /* Not found */
--- 388,398 ----
                  if (difference < 0)
                      stopLow = stopMiddle + 1;
                  else
!                     stopHigh = stopMiddle;
              }
          }
+
+         pfree(offsets);
      }

      /* Not found */
*************** getIthJsonbValueFromContainer(JsonbConta
*** 368,374 ****

      result = palloc(sizeof(JsonbValue));

!     fillJsonbValue(container->children, i, base_addr, result);

      return result;
  }
--- 423,431 ----

      result = palloc(sizeof(JsonbValue));

!     fillJsonbValue(container->children, i, base_addr,
!                    getJsonbOffset(container->children, i),
!                    result);

      return result;
  }
*************** getIthJsonbValueFromContainer(JsonbConta
*** 377,387 ****
   * A helper function to fill in a JsonbValue to represent an element of an
   * array, or a key or value of an object.
   *
   * A nested array or object will be returned as jbvBinary, ie. it won't be
   * expanded.
   */
  static void
! fillJsonbValue(JEntry *children, int index, char *base_addr, JsonbValue *result)
  {
      JEntry        entry = children[index];

--- 434,450 ----
   * A helper function to fill in a JsonbValue to represent an element of an
   * array, or a key or value of an object.
   *
+  * The node's JEntry is at children[index], and its variable-length data
+  * is at base_addr + offset.  We make the caller determine the offset since
+  * in many cases the caller can amortize the work across multiple children.
+  *
   * A nested array or object will be returned as jbvBinary, ie. it won't be
   * expanded.
   */
  static void
! fillJsonbValue(JEntry *children, int index,
!                char *base_addr, uint32 offset,
!                JsonbValue *result)
  {
      JEntry        entry = children[index];

*************** fillJsonbValue(JEntry *children, int ind
*** 392,405 ****
      else if (JBE_ISSTRING(entry))
      {
          result->type = jbvString;
!         result->val.string.val = base_addr + JBE_OFF(children, index);
!         result->val.string.len = JBE_LEN(children, index);
          Assert(result->val.string.len >= 0);
      }
      else if (JBE_ISNUMERIC(entry))
      {
          result->type = jbvNumeric;
!         result->val.numeric = (Numeric) (base_addr + INTALIGN(JBE_OFF(children, index)));
      }
      else if (JBE_ISBOOL_TRUE(entry))
      {
--- 455,468 ----
      else if (JBE_ISSTRING(entry))
      {
          result->type = jbvString;
!         result->val.string.val = base_addr + offset;
!         result->val.string.len = JBE_LENFLD(entry);
          Assert(result->val.string.len >= 0);
      }
      else if (JBE_ISNUMERIC(entry))
      {
          result->type = jbvNumeric;
!         result->val.numeric = (Numeric) (base_addr + INTALIGN(offset));
      }
      else if (JBE_ISBOOL_TRUE(entry))
      {
*************** fillJsonbValue(JEntry *children, int ind
*** 415,422 ****
      {
          Assert(JBE_ISCONTAINER(entry));
          result->type = jbvBinary;
!         result->val.binary.data = (JsonbContainer *) (base_addr + INTALIGN(JBE_OFF(children, index)));
!         result->val.binary.len = JBE_LEN(children, index) - (INTALIGN(JBE_OFF(children, index)) - JBE_OFF(children,
index));
      }
  }

--- 478,486 ----
      {
          Assert(JBE_ISCONTAINER(entry));
          result->type = jbvBinary;
!         /* Remove alignment padding from data pointer and len */
!         result->val.binary.data = (JsonbContainer *) (base_addr + INTALIGN(offset));
!         result->val.binary.len = JBE_LENFLD(entry) - (INTALIGN(offset) - offset);
      }
  }

*************** recurse:
*** 668,680 ****
               * a full conversion
               */
              val->val.array.rawScalar = (*it)->isScalar;
!             (*it)->i = 0;
              /* Set state for next call */
              (*it)->state = JBI_ARRAY_ELEM;
              return WJB_BEGIN_ARRAY;

          case JBI_ARRAY_ELEM:
!             if ((*it)->i >= (*it)->nElems)
              {
                  /*
                   * All elements within array already processed.  Report this
--- 732,746 ----
               * a full conversion
               */
              val->val.array.rawScalar = (*it)->isScalar;
!             (*it)->curIndex = 0;
!             (*it)->curDataOffset = 0;
!             (*it)->curValueOffset = 0;    /* not actually used */
              /* Set state for next call */
              (*it)->state = JBI_ARRAY_ELEM;
              return WJB_BEGIN_ARRAY;

          case JBI_ARRAY_ELEM:
!             if ((*it)->curIndex >= (*it)->nElems)
              {
                  /*
                   * All elements within array already processed.  Report this
*************** recurse:
*** 686,692 ****
                  return WJB_END_ARRAY;
              }

!             fillJsonbValue((*it)->children, (*it)->i++, (*it)->dataProper, val);

              if (!IsAJsonbScalar(val) && !skipNested)
              {
--- 752,763 ----
                  return WJB_END_ARRAY;
              }

!             fillJsonbValue((*it)->children, (*it)->curIndex,
!                            (*it)->dataProper, (*it)->curDataOffset,
!                            val);
!
!             (*it)->curDataOffset += JBE_LEN((*it)->children, (*it)->curIndex);
!             (*it)->curIndex++;

              if (!IsAJsonbScalar(val) && !skipNested)
              {
*************** recurse:
*** 697,704 ****
              else
              {
                  /*
!                  * Scalar item in array, or a container and caller didn't
!                  * want us to recurse into it.
                   */
                  return WJB_ELEM;
              }
--- 768,775 ----
              else
              {
                  /*
!                  * Scalar item in array, or a container and caller didn't want
!                  * us to recurse into it.
                   */
                  return WJB_ELEM;
              }
*************** recurse:
*** 712,724 ****
               * v->val.object.pairs is not actually set, because we aren't
               * doing a full conversion
               */
!             (*it)->i = 0;
              /* Set state for next call */
              (*it)->state = JBI_OBJECT_KEY;
              return WJB_BEGIN_OBJECT;

          case JBI_OBJECT_KEY:
!             if ((*it)->i >= (*it)->nElems)
              {
                  /*
                   * All pairs within object already processed.  Report this to
--- 783,798 ----
               * v->val.object.pairs is not actually set, because we aren't
               * doing a full conversion
               */
!             (*it)->curIndex = 0;
!             (*it)->curDataOffset = 0;
!             (*it)->curValueOffset = getJsonbOffset((*it)->children,
!                                                    (*it)->nElems);
              /* Set state for next call */
              (*it)->state = JBI_OBJECT_KEY;
              return WJB_BEGIN_OBJECT;

          case JBI_OBJECT_KEY:
!             if ((*it)->curIndex >= (*it)->nElems)
              {
                  /*
                   * All pairs within object already processed.  Report this to
*************** recurse:
*** 732,738 ****
              else
              {
                  /* Return key of a key/value pair.  */
!                 fillJsonbValue((*it)->children, (*it)->i * 2, (*it)->dataProper, val);
                  if (val->type != jbvString)
                      elog(ERROR, "unexpected jsonb type as object key");

--- 806,814 ----
              else
              {
                  /* Return key of a key/value pair.  */
!                 fillJsonbValue((*it)->children, (*it)->curIndex,
!                                (*it)->dataProper, (*it)->curDataOffset,
!                                val);
                  if (val->type != jbvString)
                      elog(ERROR, "unexpected jsonb type as object key");

*************** recurse:
*** 745,752 ****
              /* Set state for next call */
              (*it)->state = JBI_OBJECT_KEY;

!             fillJsonbValue((*it)->children, ((*it)->i++) * 2 + 1,
!                            (*it)->dataProper, val);

              /*
               * Value may be a container, in which case we recurse with new,
--- 821,835 ----
              /* Set state for next call */
              (*it)->state = JBI_OBJECT_KEY;

!             fillJsonbValue((*it)->children, (*it)->curIndex + (*it)->nElems,
!                            (*it)->dataProper, (*it)->curValueOffset,
!                            val);
!
!             (*it)->curDataOffset += JBE_LEN((*it)->children,
!                                             (*it)->curIndex);
!             (*it)->curValueOffset += JBE_LEN((*it)->children,
!                                              (*it)->curIndex + (*it)->nElems);
!             (*it)->curIndex++;

              /*
               * Value may be a container, in which case we recurse with new,
*************** reserveFromBuffer(StringInfo buffer, int
*** 1209,1216 ****
      buffer->len += len;

      /*
!      * Keep a trailing null in place, even though it's not useful for us;
!      * it seems best to preserve the invariants of StringInfos.
       */
      buffer->data[buffer->len] = '\0';

--- 1292,1299 ----
      buffer->len += len;

      /*
!      * Keep a trailing null in place, even though it's not useful for us; it
!      * seems best to preserve the invariants of StringInfos.
       */
      buffer->data[buffer->len] = '\0';

*************** convertToJsonb(JsonbValue *val)
*** 1284,1291 ****

      /*
       * Note: the JEntry of the root is discarded. Therefore the root
!      * JsonbContainer struct must contain enough information to tell what
!      * kind of value it is.
       */

      res = (Jsonb *) buffer.data;
--- 1367,1374 ----

      /*
       * Note: the JEntry of the root is discarded. Therefore the root
!      * JsonbContainer struct must contain enough information to tell what kind
!      * of value it is.
       */

      res = (Jsonb *) buffer.data;
*************** convertJsonbValue(StringInfo buffer, JEn
*** 1315,1324 ****
          return;

      /*
!      * A JsonbValue passed as val should never have a type of jbvBinary,
!      * and neither should any of its sub-components. Those values will be
!      * produced by convertJsonbArray and convertJsonbObject, the results of
!      * which will not be passed back to this function as an argument.
       */

      if (IsAJsonbScalar(val))
--- 1398,1407 ----
          return;

      /*
!      * A JsonbValue passed as val should never have a type of jbvBinary, and
!      * neither should any of its sub-components. Those values will be produced
!      * by convertJsonbArray and convertJsonbObject, the results of which will
!      * not be passed back to this function as an argument.
       */

      if (IsAJsonbScalar(val))
*************** convertJsonbArray(StringInfo buffer, JEn
*** 1340,1353 ****
      int            totallen;
      uint32        header;

!     /* Initialize pointer into conversion buffer at this level */
      offset = buffer->len;

      padBufferToInt(buffer);

      /*
!      * Construct the header Jentry, stored in the beginning of the variable-
!      * length payload.
       */
      header = val->val.array.nElems | JB_FARRAY;
      if (val->val.array.rawScalar)
--- 1423,1437 ----
      int            totallen;
      uint32        header;

!     /* Remember where variable-length data starts for this array */
      offset = buffer->len;

+     /* Align to 4-byte boundary (any padding counts as part of my data) */
      padBufferToInt(buffer);

      /*
!      * Construct the header Jentry and store it in the beginning of the
!      * variable-length payload.
       */
      header = val->val.array.nElems | JB_FARRAY;
      if (val->val.array.rawScalar)
*************** convertJsonbArray(StringInfo buffer, JEn
*** 1358,1364 ****
      }

      appendToBuffer(buffer, (char *) &header, sizeof(uint32));
!     /* reserve space for the JEntries of the elements. */
      metaoffset = reserveFromBuffer(buffer, sizeof(JEntry) * val->val.array.nElems);

      totallen = 0;
--- 1442,1449 ----
      }

      appendToBuffer(buffer, (char *) &header, sizeof(uint32));
!
!     /* Reserve space for the JEntries of the elements. */
      metaoffset = reserveFromBuffer(buffer, sizeof(JEntry) * val->val.array.nElems);

      totallen = 0;
*************** convertJsonbArray(StringInfo buffer, JEn
*** 1368,1391 ****
          int            len;
          JEntry        meta;

          convertJsonbValue(buffer, &meta, elem, level + 1);
-         len = meta & JENTRY_POSMASK;
-         totallen += len;

!         if (totallen > JENTRY_POSMASK)
              ereport(ERROR,
                      (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
                       errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
!                             JENTRY_POSMASK)));

-         if (i > 0)
-             meta = (meta & ~JENTRY_POSMASK) | totallen;
          copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
          metaoffset += sizeof(JEntry);
      }

      totallen = buffer->len - offset;

      /* Initialize the header of this node, in the container's JEntry array */
      *pheader = JENTRY_ISCONTAINER | totallen;
  }
--- 1453,1491 ----
          int            len;
          JEntry        meta;

+         /*
+          * Convert element, producing a JEntry and appending its
+          * variable-length data to buffer
+          */
          convertJsonbValue(buffer, &meta, elem, level + 1);

!         /*
!          * Bail out if total variable-length data exceeds what will fit in a
!          * JEntry length field.  We check this in each iteration, not just
!          * once at the end, to forestall possible integer overflow.
!          */
!         len = JBE_LENFLD(meta);
!         totallen += len;
!         if (totallen > JENTRY_LENMASK)
              ereport(ERROR,
                      (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
                       errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
!                             JENTRY_LENMASK)));

          copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
          metaoffset += sizeof(JEntry);
      }

+     /* Total data size is everything we've appended to buffer */
      totallen = buffer->len - offset;

+     /* Check length again, since we didn't include the metadata above */
+     if (totallen > JENTRY_LENMASK)
+         ereport(ERROR,
+                 (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+                  errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
+                         JENTRY_LENMASK)));
+
      /* Initialize the header of this node, in the container's JEntry array */
      *pheader = JENTRY_ISCONTAINER | totallen;
  }
*************** convertJsonbArray(StringInfo buffer, JEn
*** 1393,1457 ****
  static void
  convertJsonbObject(StringInfo buffer, JEntry *pheader, JsonbValue *val, int level)
  {
-     uint32        header;
      int            offset;
      int            metaoffset;
      int            i;
      int            totallen;

!     /* Initialize pointer into conversion buffer at this level */
      offset = buffer->len;

      padBufferToInt(buffer);

!     /* Initialize header */
      header = val->val.object.nPairs | JB_FOBJECT;
      appendToBuffer(buffer, (char *) &header, sizeof(uint32));

!     /* reserve space for the JEntries of the keys and values */
      metaoffset = reserveFromBuffer(buffer, sizeof(JEntry) * val->val.object.nPairs * 2);

      totallen = 0;
      for (i = 0; i < val->val.object.nPairs; i++)
      {
!         JsonbPair *pair = &val->val.object.pairs[i];
!         int len;
!         JEntry meta;

!         /* put key */
          convertJsonbScalar(buffer, &meta, &pair->key);

!         len = meta & JENTRY_POSMASK;
          totallen += len;

-         if (totallen > JENTRY_POSMASK)
-             ereport(ERROR,
-                     (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
-                      errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
-                             JENTRY_POSMASK)));
-
-         if (i > 0)
-             meta = (meta & ~JENTRY_POSMASK) | totallen;
          copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
          metaoffset += sizeof(JEntry);

!         convertJsonbValue(buffer, &meta, &pair->value, level);
!         len = meta & JENTRY_POSMASK;
!         totallen += len;
!
!         if (totallen > JENTRY_POSMASK)
              ereport(ERROR,
                      (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
!                      errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
!                             JENTRY_POSMASK)));

-         meta = (meta & ~JENTRY_POSMASK) | totallen;
          copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
          metaoffset += sizeof(JEntry);
      }

      totallen = buffer->len - offset;

      *pheader = JENTRY_ISCONTAINER | totallen;
  }

--- 1493,1595 ----
  static void
  convertJsonbObject(StringInfo buffer, JEntry *pheader, JsonbValue *val, int level)
  {
      int            offset;
      int            metaoffset;
      int            i;
      int            totallen;
+     uint32        header;

!     /* Remember where variable-length data starts for this object */
      offset = buffer->len;

+     /* Align to 4-byte boundary (any padding counts as part of my data) */
      padBufferToInt(buffer);

!     /*
!      * Construct the header Jentry and store it in the beginning of the
!      * variable-length payload.
!      */
      header = val->val.object.nPairs | JB_FOBJECT;
      appendToBuffer(buffer, (char *) &header, sizeof(uint32));

!     /* Reserve space for the JEntries of the keys and values. */
      metaoffset = reserveFromBuffer(buffer, sizeof(JEntry) * val->val.object.nPairs * 2);

+     /*
+      * Iterate over the keys, then over the values, since that is the ordering
+      * we want in the on-disk representation.
+      */
      totallen = 0;
      for (i = 0; i < val->val.object.nPairs; i++)
      {
!         JsonbPair  *pair = &val->val.object.pairs[i];
!         int            len;
!         JEntry        meta;

!         /*
!          * Convert key, producing a JEntry and appending its variable-length
!          * data to buffer
!          */
          convertJsonbScalar(buffer, &meta, &pair->key);

!         len = JBE_LENFLD(meta);
          totallen += len;

          copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
          metaoffset += sizeof(JEntry);

!         /*
!          * Bail out if total variable-length data exceeds what will fit in a
!          * JEntry length field.  We check this in each iteration, not just
!          * once at the end, to forestall possible integer overflow.
!          */
!         if (totallen > JENTRY_LENMASK)
              ereport(ERROR,
                      (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
!                      errmsg("total size of jsonb object elements exceeds the maximum of %u bytes",
!                             JENTRY_LENMASK)));
!     }
!     for (i = 0; i < val->val.object.nPairs; i++)
!     {
!         JsonbPair  *pair = &val->val.object.pairs[i];
!         int            len;
!         JEntry        meta;
!
!         /*
!          * Convert value, producing a JEntry and appending its variable-length
!          * data to buffer
!          */
!         convertJsonbValue(buffer, &meta, &pair->value, level + 1);
!
!         len = JBE_LENFLD(meta);
!         totallen += len;

          copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
          metaoffset += sizeof(JEntry);
+
+         /*
+          * Bail out if total variable-length data exceeds what will fit in a
+          * JEntry length field.  We check this in each iteration, not just
+          * once at the end, to forestall possible integer overflow.
+          */
+         if (totallen > JENTRY_LENMASK)
+             ereport(ERROR,
+                     (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+                      errmsg("total size of jsonb object elements exceeds the maximum of %u bytes",
+                             JENTRY_LENMASK)));
      }

+     /* Total data size is everything we've appended to buffer */
      totallen = buffer->len - offset;

+     /* Check length again, since we didn't include the metadata above */
+     if (totallen > JENTRY_LENMASK)
+         ereport(ERROR,
+                 (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+                  errmsg("total size of jsonb object elements exceeds the maximum of %u bytes",
+                         JENTRY_LENMASK)));
+
+     /* Initialize the header of this node, in the container's JEntry array */
      *pheader = JENTRY_ISCONTAINER | totallen;
  }

diff --git a/src/include/utils/jsonb.h b/src/include/utils/jsonb.h
index 91e3e14..f9472af 100644
*** a/src/include/utils/jsonb.h
--- b/src/include/utils/jsonb.h
*************** typedef struct JsonbValue JsonbValue;
*** 83,91 ****
   * buffer is accessed, but they can also be deep copied and passed around.
   *
   * Jsonb is a tree structure. Each node in the tree consists of a JEntry
!  * header, and a variable-length content.  The JEntry header indicates what
!  * kind of a node it is, e.g. a string or an array, and the offset and length
!  * of its variable-length portion within the container.
   *
   * The JEntry and the content of a node are not stored physically together.
   * Instead, the container array or object has an array that holds the JEntrys
--- 83,91 ----
   * buffer is accessed, but they can also be deep copied and passed around.
   *
   * Jsonb is a tree structure. Each node in the tree consists of a JEntry
!  * header and a variable-length content (possibly of zero size).  The JEntry
!  * header indicates what kind of a node it is, e.g. a string or an array,
!  * and includes the length of its variable-length portion.
   *
   * The JEntry and the content of a node are not stored physically together.
   * Instead, the container array or object has an array that holds the JEntrys
*************** typedef struct JsonbValue JsonbValue;
*** 95,133 ****
   * hold its JEntry. Hence, no JEntry header is stored for the root node.  It
   * is implicitly known that the root node must be an array or an object,
   * so we can get away without the type indicator as long as we can distinguish
!  * the two.  For that purpose, both an array and an object begins with a uint32
   * header field, which contains an JB_FOBJECT or JB_FARRAY flag.  When a naked
   * scalar value needs to be stored as a Jsonb value, what we actually store is
   * an array with one element, with the flags in the array's header field set
   * to JB_FSCALAR | JB_FARRAY.
   *
-  * To encode the length and offset of the variable-length portion of each
-  * node in a compact way, the JEntry stores only the end offset within the
-  * variable-length portion of the container node. For the first JEntry in the
-  * container's JEntry array, that equals to the length of the node data.  The
-  * begin offset and length of the rest of the entries can be calculated using
-  * the end offset of the previous JEntry in the array.
-  *
   * Overall, the Jsonb struct requires 4-bytes alignment. Within the struct,
   * the variable-length portion of some node types is aligned to a 4-byte
   * boundary, while others are not. When alignment is needed, the padding is
   * in the beginning of the node that requires it. For example, if a numeric
   * node is stored after a string node, so that the numeric node begins at
   * offset 3, the variable-length portion of the numeric node will begin with
!  * one padding byte.
   */

  /*
   * Jentry format.
   *
!  * The least significant 28 bits store the end offset of the entry (see
!  * JBE_ENDPOS, JBE_OFF, JBE_LEN macros below). The next three bits
!  * are used to store the type of the entry. The most significant bit
!  * is unused, and should be set to zero.
   */
  typedef uint32 JEntry;

! #define JENTRY_POSMASK            0x0FFFFFFF
  #define JENTRY_TYPEMASK            0x70000000

  /* values stored in the type bits */
--- 95,126 ----
   * hold its JEntry. Hence, no JEntry header is stored for the root node.  It
   * is implicitly known that the root node must be an array or an object,
   * so we can get away without the type indicator as long as we can distinguish
!  * the two.  For that purpose, both an array and an object begin with a uint32
   * header field, which contains an JB_FOBJECT or JB_FARRAY flag.  When a naked
   * scalar value needs to be stored as a Jsonb value, what we actually store is
   * an array with one element, with the flags in the array's header field set
   * to JB_FSCALAR | JB_FARRAY.
   *
   * Overall, the Jsonb struct requires 4-bytes alignment. Within the struct,
   * the variable-length portion of some node types is aligned to a 4-byte
   * boundary, while others are not. When alignment is needed, the padding is
   * in the beginning of the node that requires it. For example, if a numeric
   * node is stored after a string node, so that the numeric node begins at
   * offset 3, the variable-length portion of the numeric node will begin with
!  * one padding byte so that the actual numeric data is 4-byte aligned.
   */

  /*
   * Jentry format.
   *
!  * The least significant 28 bits store the data length of the entry (see
!  * JBE_LENFLD and JBE_LEN macros below). The next three bits store the type
!  * of the entry. The most significant bit is reserved for future use, and
!  * should be set to zero.
   */
  typedef uint32 JEntry;

! #define JENTRY_LENMASK            0x0FFFFFFF
  #define JENTRY_TYPEMASK            0x70000000

  /* values stored in the type bits */
*************** typedef uint32 JEntry;
*** 148,166 ****
  #define JBE_ISBOOL(je_)            (JBE_ISBOOL_TRUE(je_) || JBE_ISBOOL_FALSE(je_))

  /*
!  * Macros for getting the offset and length of an element. Note multiple
!  * evaluations and access to prior array element.
   */
! #define JBE_ENDPOS(je_)            ((je_) & JENTRY_POSMASK)
! #define JBE_OFF(ja, i)            ((i) == 0 ? 0 : JBE_ENDPOS((ja)[i - 1]))
! #define JBE_LEN(ja, i)            ((i) == 0 ? JBE_ENDPOS((ja)[i]) \
!                                  : JBE_ENDPOS((ja)[i]) - JBE_ENDPOS((ja)[i - 1]))

  /*
   * A jsonb array or object node, within a Jsonb Datum.
   *
!  * An array has one child for each element. An object has two children for
!  * each key/value pair.
   */
  typedef struct JsonbContainer
  {
--- 141,160 ----
  #define JBE_ISBOOL(je_)            (JBE_ISBOOL_TRUE(je_) || JBE_ISBOOL_FALSE(je_))

  /*
!  * Macros for getting the data length of a JEntry.
   */
! #define JBE_LENFLD(je_)            ((je_) & JENTRY_LENMASK)
! #define JBE_LEN(ja, i)            JBE_LENFLD((ja)[i])

  /*
   * A jsonb array or object node, within a Jsonb Datum.
   *
!  * An array has one child for each element, stored in array order.
!  *
!  * An object has two children for each key/value pair.  The keys all appear
!  * first, in key sort order; then the values appear, in an order matching the
!  * key order.  This arrangement keeps the keys compact in memory, making a
!  * search for a particular key more cache-friendly.
   */
  typedef struct JsonbContainer
  {
*************** typedef struct JsonbContainer
*** 172,179 ****
  } JsonbContainer;

  /* flags for the header-field in JsonbContainer */
! #define JB_CMASK                0x0FFFFFFF
! #define JB_FSCALAR                0x10000000
  #define JB_FOBJECT                0x20000000
  #define JB_FARRAY                0x40000000

--- 166,173 ----
  } JsonbContainer;

  /* flags for the header-field in JsonbContainer */
! #define JB_CMASK                0x0FFFFFFF        /* mask for count field */
! #define JB_FSCALAR                0x10000000        /* flag bits */
  #define JB_FOBJECT                0x20000000
  #define JB_FARRAY                0x40000000

*************** struct JsonbValue
*** 248,265 ****
                                       (jsonbval)->type <= jbvBool)

  /*
!  * Pair within an Object.
   *
!  * Pairs with duplicate keys are de-duplicated.  We store the order for the
!  * benefit of doing so in a well-defined way with respect to the original
!  * observed order (which is "last observed wins").  This is only used briefly
!  * when originally constructing a Jsonb.
   */
  struct JsonbPair
  {
      JsonbValue    key;            /* Must be a jbvString */
      JsonbValue    value;            /* May be of any type */
!     uint32        order;            /* preserves order of pairs with equal keys */
  };

  /* Conversion state used when parsing Jsonb from text, or for type coercion */
--- 242,261 ----
                                       (jsonbval)->type <= jbvBool)

  /*
!  * Key/value pair within an Object.
   *
!  * This struct type is only used briefly while constructing a Jsonb; it is
!  * *not* the on-disk representation.
!  *
!  * Pairs with duplicate keys are de-duplicated.  We store the originally
!  * observed pair ordering for the purpose of removing duplicates in a
!  * well-defined way (which is "last observed wins").
   */
  struct JsonbPair
  {
      JsonbValue    key;            /* Must be a jbvString */
      JsonbValue    value;            /* May be of any type */
!     uint32        order;            /* Pair's index in original sequence */
  };

  /* Conversion state used when parsing Jsonb from text, or for type coercion */
*************** typedef struct JsonbIterator
*** 287,306 ****
  {
      /* Container being iterated */
      JsonbContainer *container;
!     uint32        nElems;            /* Number of elements in children array (will be
!                                  * nPairs for objects) */
      bool        isScalar;        /* Pseudo-array scalar value? */
!     JEntry       *children;

!     /* Current item in buffer (up to nElems, but must * 2 for objects) */
!     int            i;

      /*
!      * Data proper.  This points just past end of children array.
!      * We use the JBE_OFF() macro on the Jentrys to find offsets of each
!      * child in this area.
       */
!     char       *dataProper;

      /* Private state */
      JsonbIterState state;
--- 283,307 ----
  {
      /* Container being iterated */
      JsonbContainer *container;
!     uint32        nElems;            /* Number of elements in children array (will
!                                  * be nPairs for objects) */
      bool        isScalar;        /* Pseudo-array scalar value? */
!     JEntry       *children;        /* JEntrys for child nodes */
!     /* Data proper.  This points just past end of children array */
!     char       *dataProper;

!     /* Current item in buffer (up to nElems) */
!     int            curIndex;
!
!     /* Data offset corresponding to current item */
!     uint32        curDataOffset;

      /*
!      * If the container is an object, we want to return keys and values
!      * alternately; so curDataOffset points to the current key, and
!      * curValueOffset points to the current value.
       */
!     uint32        curValueOffset;

      /* Private state */
      JsonbIterState state;
*************** extern Datum gin_consistent_jsonb_path(P
*** 344,349 ****
--- 345,351 ----
  extern Datum gin_triconsistent_jsonb_path(PG_FUNCTION_ARGS);

  /* Support functions */
+ extern uint32 getJsonbOffset(const JEntry *ja, int index);
  extern int    compareJsonbContainers(JsonbContainer *a, JsonbContainer *b);
  extern JsonbValue *findJsonbValueFromContainer(JsonbContainer *sheader,
                              uint32 flags,

Re: jsonb format is pessimal for toast compression

From
Arthur Silva
Date:
Tom, here's the results with github data (8 top level keys) only. Here's a sample object https://gist.github.com/igrigorik/2017462

All-Lenghts + Cache-Aware EXTERNAL
Query 1: 516ms
Query 2: 350ms

The difference is small but I's definitely faster, which makes sense since cache line misses are probably slightly reduced.
As in the previous runs, I ran the query a dozen times and took the average after excluding runs with a high deviation.

compare to (copied from my previous email)

HEAD (aka, all offsets) EXTERNAL
Test query 1 runtime: 505ms
Test query 2 runtime: 350ms


All Lengths (Tom Lane patch) EXTERNAL
Test query 1 runtime: 525ms
Test query 2 runtime: 355ms


--
Arthur Silva



On Tue, Aug 26, 2014 at 7:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
> I wish it were cache-friendly too, per the upthread tangent about having
> to fetch keys from all over the place within a large JSON object.

> ... and while I was typing that sentence, lightning struck.  The existing
> arrangement of object subfields with keys and values interleaved is just
> plain dumb.  We should rearrange that as all the keys in order, then all
> the values in the same order.  Then the keys are naturally adjacent in
> memory and object-key searches become much more cache-friendly: you
> probably touch most of the key portion of the object, but none of the
> values portion, until you know exactly what part of the latter to fetch.
> This approach might complicate the lookup logic marginally but I bet not
> very much; and it will be a huge help if we ever want to do smart access
> to EXTERNAL (non-compressed) JSON values.

> I will go prototype that just to see how much code rearrangement is
> required.

This looks pretty good from a coding point of view.  I have not had time
yet to see if it affects the speed of the benchmark cases we've been
trying.  I suspect that it won't make much difference in them.  I think
if we do decide to make an on-disk format change, we should seriously
consider including this change.

The same concept could be applied to offset-based storage of course,
although I rather doubt that we'd make that combination of choices since
it would be giving up on-disk compatibility for benefits that are mostly
in the future.

Attached are two patches: one is a "delta" against the last jsonb-lengths
patch I posted, and the other is a "merged" patch showing the total change
from HEAD, for ease of application.

                        regards, tom lane


Re: jsonb format is pessimal for toast compression

From
Peter Geoghegan
Date:
On Tue, Aug 26, 2014 at 8:41 PM, Arthur Silva <arthurprs@gmail.com> wrote:
> The difference is small but I's definitely faster, which makes sense since
> cache line misses are probably slightly reduced.
> As in the previous runs, I ran the query a dozen times and took the average
> after excluding runs with a high deviation.

I'm not surprised that it hasn't beaten HEAD. I haven't studied the
problem in detail, but I don't think that the "cache awareness" of the
new revision is necessarily a distinct advantage.

-- 
Peter Geoghegan



Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
Peter Geoghegan <pg@heroku.com> writes:
> I'm not surprised that it hasn't beaten HEAD. I haven't studied the
> problem in detail, but I don't think that the "cache awareness" of the
> new revision is necessarily a distinct advantage.

I doubt it's a significant advantage in the current state of the code;
I'm happy if it's not a loss.  I was looking ahead to someday fetching key
values efficiently from large EXTERNAL (ie out-of-line-but-not-compressed)
JSON values, analogously to the existing optimization for fetching text
substrings from EXTERNAL text values.  As mentioned upthread, the current
JSONB representation would be seriously unfriendly to such a thing.
        regards, tom lane



Re: jsonb format is pessimal for toast compression

From
Arthur Silva
Date:
It won't be faster by any means, but it should definitely be incorporated if any format changes are made (like Tom already suggested).

I think it's important we gather at least 2 more things before making any calls:
* Josh tests w/ cache aware patch, which should confirm cache aware is indeed prefered
* Tests with toast hacked to use lz4 instead, which might ease any decisions


--
Arthur Silva



On Wed, Aug 27, 2014 at 12:53 AM, Peter Geoghegan <pg@heroku.com> wrote:
On Tue, Aug 26, 2014 at 8:41 PM, Arthur Silva <arthurprs@gmail.com> wrote:
> The difference is small but I's definitely faster, which makes sense since
> cache line misses are probably slightly reduced.
> As in the previous runs, I ran the query a dozen times and took the average
> after excluding runs with a high deviation.

I'm not surprised that it hasn't beaten HEAD. I haven't studied the
problem in detail, but I don't think that the "cache awareness" of the
new revision is necessarily a distinct advantage.

--
Peter Geoghegan

Re: jsonb format is pessimal for toast compression

From
Arthur Silva
Date:
On Wed, Aug 27, 2014 at 1:09 AM, Arthur Silva <arthurprs@gmail.com> wrote:
It won't be faster by any means, but it should definitely be incorporated if any format changes are made (like Tom already suggested).

I think it's important we gather at least 2 more things before making any calls:
* Josh tests w/ cache aware patch, which should confirm cache aware is indeed prefered
* Tests with toast hacked to use lz4 instead, which might ease any decisions


--
Arthur Silva



On Wed, Aug 27, 2014 at 12:53 AM, Peter Geoghegan <pg@heroku.com> wrote:
On Tue, Aug 26, 2014 at 8:41 PM, Arthur Silva <arthurprs@gmail.com> wrote:
> The difference is small but I's definitely faster, which makes sense since
> cache line misses are probably slightly reduced.
> As in the previous runs, I ran the query a dozen times and took the average
> after excluding runs with a high deviation.

I'm not surprised that it hasn't beaten HEAD. I haven't studied the
problem in detail, but I don't think that the "cache awareness" of the
new revision is necessarily a distinct advantage.

--
Peter Geoghegan


I'm attaching a quick-n-dirty patch that uses lz4 compression instead of pglz in case someone wants to experiment with it. Seems to work in my test env, I'll make more tests when I get home.

PS: gotta love gmail fixed defaults of top-posting...
Attachment

Re: jsonb format is pessimal for toast compression

From
Jan Wieck
Date:
On 08/12/2014 10:58 AM, Robert Haas wrote:
> What would really be ideal here is if the JSON code could inform the
> toast compression code "this many initial bytes are likely
> incompressible, just pass them through without trying, and then start
> compressing at byte N", where N is the byte following the TOC.  But I
> don't know that there's a reasonable way to implement that.
>

Sorry, being late for the party.

Anyhow, this strikes me as a good basic direction of thought. But I 
think we should put the burden on the data type, not on toast. To do 
that data types could have an optional toast_hint_values() function, 
which the toast code can call with the actual datum at hand and its 
default parameter array. The hint values function then can modify that 
parameter array, telling toast how much to skip, how hard to try (or not 
at all) and so on. A data type specific function should know much better 
how to figure out how compressible a particular datum may be.

Certainly nothing for 9.4, but it might require changing the toast API 
in a different way than just handing it an oid and hard-coding the 
JASONBOID case into toast for 9.4. If we are going to change the API, we 
might as well do it right.


Regards,
Jan

-- 
Jan Wieck
Senior Software Engineer
http://slony.info



Re: jsonb format is pessimal for toast compression

From
Jan Wieck
Date:
On 08/08/2014 10:21 AM, Andrew Dunstan wrote:
>
> On 08/07/2014 11:17 PM, Tom Lane wrote:
>> I looked into the issue reported in bug #11109.  The problem appears to be
>> that jsonb's on-disk format is designed in such a way that the leading
>> portion of any JSON array or object will be fairly incompressible, because
>> it consists mostly of a strictly-increasing series of integer offsets.
>> This interacts poorly with the code in pglz_compress() that gives up if
>> it's found nothing compressible in the first first_success_by bytes of a
>> value-to-be-compressed.  (first_success_by is 1024 in the default set of
>> compression parameters.)
>
> [snip]
>
>> There is plenty of compressible data once we get into the repetitive
>> strings in the payload part --- but that starts at offset 944, and up to
>> that point there is nothing that pg_lzcompress can get a handle on.  There
>> are, by definition, no sequences of 4 or more repeated bytes in that area.
>> I think in principle pg_lzcompress could decide to compress the 3-byte
>> sequences consisting of the high-order 24 bits of each offset; but it
>> doesn't choose to do so, probably because of the way its lookup hash table
>> works:
>>
>>   * pglz_hist_idx -
>>   *
>>   *        Computes the history table slot for the lookup by the next 4
>>   *        characters in the input.
>>   *
>>   * NB: because we use the next 4 characters, we are not guaranteed to
>>   * find 3-character matches; they very possibly will be in the wrong
>>   * hash list.  This seems an acceptable tradeoff for spreading out the
>>   * hash keys more.
>>
>> For jsonb header data, the "next 4 characters" are *always* different, so
>> only a chance hash collision can result in a match.  There is therefore a
>> pretty good chance that no compression will occur before it gives up
>> because of first_success_by.
>>
>> I'm not sure if there is any easy fix for this.  We could possibly change
>> the default first_success_by value, but I think that'd just be postponing
>> the problem to larger jsonb objects/arrays, and it would hurt performance
>> for genuinely incompressible data.  A somewhat painful, but not yet
>> out-of-the-question, alternative is to change the jsonb on-disk
>> representation.  Perhaps the JEntry array could be defined as containing
>> element lengths instead of element ending offsets.  Not sure though if
>> that would break binary searching for JSON object keys.
>>
>>             
>
>
> Ouch.
>
> Back when this structure was first presented at pgCon 2013, I wondered
> if we shouldn't extract the strings into a dictionary, because of key
> repetition, and convinced myself that this shouldn't be necessary
> because in significant cases TOAST would take care of it.
>
> Maybe we should have pglz_compress() look at the *last* 1024 bytes if it
> can't find anything worth compressing in the first, for values larger
> than a certain size.
>
> It's worth noting that this is a fairly pathological case. AIUI the
> example you constructed has an array with 100k string elements. I don't
> think that's typical. So I suspect that unless I've misunderstood the
> statement of the problem we're going to find that almost all the jsonb
> we will be storing is still compressible.

I also think that a substantial part of the problem of coming up with a 
"representative" data sample is because the size of the incompressible 
data at the beginning is somewhat tied to the overall size of the datum 
itself. This may or may not be true in any particular use case, but as a 
general rule of thumb I would assume that the larger the JSONB document, 
the larger the offset array at the beginning.

Would changing 1024 to a fraction of the datum length for the time being 
give us enough room to come up with a proper solution for 9.5?


Regards,
Jan

-- 
Jan Wieck
Senior Software Engineer
http://slony.info



Re: jsonb format is pessimal for toast compression

From
Jan Wieck
Date:
On 08/08/2014 11:18 AM, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> On 08/07/2014 11:17 PM, Tom Lane wrote:
>>> I looked into the issue reported in bug #11109.  The problem appears to be
>>> that jsonb's on-disk format is designed in such a way that the leading
>>> portion of any JSON array or object will be fairly incompressible, because
>>> it consists mostly of a strictly-increasing series of integer offsets.
>
>> Ouch.
>
>> Back when this structure was first presented at pgCon 2013, I wondered
>> if we shouldn't extract the strings into a dictionary, because of key
>> repetition, and convinced myself that this shouldn't be necessary
>> because in significant cases TOAST would take care of it.
>
> That's not really the issue here, I think.  The problem is that a
> relatively minor aspect of the representation, namely the choice to store
> a series of offsets rather than a series of lengths, produces
> nonrepetitive data even when the original input is repetitive.

This is only because the input data was exact copies of the same strings 
over and over again. PGLZ can very well compress slightly less identical 
strings of varying lengths too. Not as well, but well enough. But I 
suspect such input data would make it fail again, even with lengths.


Regards,
Jan

-- 
Jan Wieck
Senior Software Engineer
http://slony.info



Re: jsonb format is pessimal for toast compression

From
"David E. Wheeler"
Date:
On Sep 4, 2014, at 7:26 PM, Jan Wieck <jan@wi3ck.info> wrote:

> This is only because the input data was exact copies of the same strings over and over again. PGLZ can very well
compressslightly less identical strings of varying lengths too. Not as well, but well enough. But I suspect such input
datawould make it fail again, even with lengths. 

We had a bit of discussion about JSONB compression at PDXPUG Day this morning. Josh polled the room, and about half
thoughwe should apply the patch for better compression, while the other half seemed to want faster access operations.
(Somefolks no doubt voted for both.) But in the ensuing discussion, I started to think that maybe we should leave it as
itis, for two reasons: 

1. There has been a fair amount of discussion about ways to better deal with this in future releases, such as hints to
TOASTabout how to compress, or the application of different compression algorithms (or pluggable compression). I’m
assumingthat leaving it as-is does not remove those possibilities. 

2. The major advantage of JSONB is fast access operations. If those are not as important for a given use case as
storagespace, there’s still the JSON type, which *does* compress reasonably well. IOW, We already have a JSON
alternativethe compresses well. So why make the same (or similar) trade-offs with JSONB? 

Just my $0.02. I would like to see some consensus on this, soon, though, as I am eager to get 9.4 and JSONB, regardless
ofthe outcome! 

Best,

David


Re: jsonb format is pessimal for toast compression

From
Josh Berkus
Date:
So, I finally got time to test Tom's latest patch on this.

TLDR: we want to go with Tom's latest patch and release beta3.

Figures:

So I tested HEAD against the latest lengths patch.  Per Arthur Silva, I
checked uncompressed times for JSONB against compressed times.  This
changed the picture considerably.

TABLE SIZES
-----------

HEAD
     ?column?       | pg_size_pretty
---------------------+----------------json text format    | 393 MBjsonb: compressed   | 1147 MBjsonb: uncompressed |
1221MB
 

PATCHED
     ?column?       | pg_size_pretty
---------------------+----------------json text format    | 394 MBjsonb: compressed   | 525 MBjsonb: uncompressed |
1200MB
 


EXTRACTION TIMES
----------------

HEAD

Q1 (search via GIN index followed by extracting 100,000 values from rows):

jsonb compressed: 4000
jsonb uncompressed: 3250


Q2 (seq scan and extract 200,000 values from rows):

json: 11700
jsonb compressed: 3150
jsonb uncompressed: 2700


PATCHED

Q1:

jsonb compressed: 6750
jsonb uncompressed: 3350

Q2:

json: 11796
jsonb compressed: 4700
jsonb uncompressed: 2650

----------------------

Conclusion: with Tom's patch, compressed JSONB is 55% smaller when
compressed (EXTENDED).  Extraction times are 50% to 70% slower, but this
appears to be almost entirely due to decompression overhead.  When not
compressing (EXTERNAL), extraction times for patch versions are
statistically the same as HEAD, and file sizes are similar to HEAD.

USER REACTION
-------------

I polled at both PDXpgDay and at FOSS4G, asking some ~~ 80 Postgres
users how they would feel about a compression vs. extraction time
tradeoff.  The audience was evenly split.

However, with the current patch, the user can choose.  Users who know
enough for performance tuning can set JSONB columns to EXTERNAL, and the
the same performance as the unpatched version.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: jsonb format is pessimal for toast compression

From
Stephen Frost
Date:
* Josh Berkus (josh@agliodbs.com) wrote:
> TLDR: we want to go with Tom's latest patch and release beta3.

Having not even read the rest- yes please.  We really need to get beta3
out and figure out when we're going to actually release 9.4...
Admittedly, the last month has been good and we've been fixing issues,
but it'd really be good to wrap 9.4 up.

> Conclusion: with Tom's patch, compressed JSONB is 55% smaller when
> compressed (EXTENDED).  Extraction times are 50% to 70% slower, but this
> appears to be almost entirely due to decompression overhead.  When not
> compressing (EXTERNAL), extraction times for patch versions are
> statistically the same as HEAD, and file sizes are similar to HEAD.

Not really a surprise.

> I polled at both PDXpgDay and at FOSS4G, asking some ~~ 80 Postgres
> users how they would feel about a compression vs. extraction time
> tradeoff.  The audience was evenly split.

Also not a surprise.

> However, with the current patch, the user can choose.  Users who know
> enough for performance tuning can set JSONB columns to EXTERNAL, and the
> the same performance as the unpatched version.

Agreed.
Thanks,
    Stephen

Re: jsonb format is pessimal for toast compression

From
Arthur Silva
Date:

On Thu, Sep 11, 2014 at 10:01 PM, Josh Berkus <josh@agliodbs.com> wrote:
So, I finally got time to test Tom's latest patch on this.

TLDR: we want to go with Tom's latest patch and release beta3.

Figures:

So I tested HEAD against the latest lengths patch.  Per Arthur Silva, I
checked uncompressed times for JSONB against compressed times.  This
changed the picture considerably.

TABLE SIZES
-----------

HEAD

      ?column?       | pg_size_pretty
---------------------+----------------
 json text format    | 393 MB
 jsonb: compressed   | 1147 MB
 jsonb: uncompressed | 1221 MB

PATCHED

      ?column?       | pg_size_pretty
---------------------+----------------
 json text format    | 394 MB
 jsonb: compressed   | 525 MB
 jsonb: uncompressed | 1200 MB


EXTRACTION TIMES
----------------

HEAD

Q1 (search via GIN index followed by extracting 100,000 values from rows):

jsonb compressed: 4000
jsonb uncompressed: 3250


Q2 (seq scan and extract 200,000 values from rows):

json: 11700
jsonb compressed: 3150
jsonb uncompressed: 2700


PATCHED

Q1:

jsonb compressed: 6750
jsonb uncompressed: 3350

Q2:

json: 11796
jsonb compressed: 4700
jsonb uncompressed: 2650

----------------------

Conclusion: with Tom's patch, compressed JSONB is 55% smaller when
compressed (EXTENDED).  Extraction times are 50% to 70% slower, but this
appears to be almost entirely due to decompression overhead.  When not
compressing (EXTERNAL), extraction times for patch versions are
statistically the same as HEAD, and file sizes are similar to HEAD.

USER REACTION
-------------

I polled at both PDXpgDay and at FOSS4G, asking some ~~ 80 Postgres
users how they would feel about a compression vs. extraction time
tradeoff.  The audience was evenly split.

However, with the current patch, the user can choose.  Users who know
enough for performance tuning can set JSONB columns to EXTERNAL, and the
the same performance as the unpatched version.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


The compression ratio difference is exaggerated in this case but it does support that Tom's patch alleviates the extraction penalty.

In my testings with the github archive data the savings <-> performance-penalty was fine, but I'm not confident in those results since there were only 8 top level keys.
For comparison, some twitter api objects[1] have 30+ top level keys. If I have time in the next couple of days I'll conduct some testings with the public twitter fire-hose data.

[1] https://dev.twitter.com/rest/reference/get/statuses/home_timeline

Re: jsonb format is pessimal for toast compression

From
Josh Berkus
Date:
On 09/11/2014 06:56 PM, Arthur Silva wrote:
> 
> In my testings with the github archive data the savings <->
> performance-penalty was fine, but I'm not confident in those results
> since there were only 8 top level keys.

Well, we did want to see that the patch doesn't create a regression with
data which doesn't fall into the problem case area, and your test did
that nicely.

> For comparison, some twitter api objects[1] have 30+ top level keys. If
> I have time in the next couple of days I'll conduct some testings with
> the public twitter fire-hose data.

Yah, if we have enough time for me to get the Mozilla Socorro test
environment working, I can also test with Mozilla crash data.  That has
some deep nesting and very large values.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: jsonb format is pessimal for toast compression

From
Robert Haas
Date:
On Thu, Sep 11, 2014 at 9:01 PM, Josh Berkus <josh@agliodbs.com> wrote:
> So, I finally got time to test Tom's latest patch on this.
>
> TLDR: we want to go with Tom's latest patch and release beta3.
>
> Figures:
>
> So I tested HEAD against the latest lengths patch.  Per Arthur Silva, I
> checked uncompressed times for JSONB against compressed times.  This
> changed the picture considerably.

Did you


-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: jsonb format is pessimal for toast compression

From
Robert Haas
Date:
On Fri, Sep 12, 2014 at 1:00 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Sep 11, 2014 at 9:01 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> So, I finally got time to test Tom's latest patch on this.
>>
>> TLDR: we want to go with Tom's latest patch and release beta3.
>>
>> Figures:
>>
>> So I tested HEAD against the latest lengths patch.  Per Arthur Silva, I
>> checked uncompressed times for JSONB against compressed times.  This
>> changed the picture considerably.
>
> Did you

Blah.

Did you test Heikki's patch from here?

http://www.postgresql.org/message-id/53EC8194.4020804@vmware.com

Tom didn't like it, but I thought it was rather clever.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: jsonb format is pessimal for toast compression

From
Josh Berkus
Date:
On 09/12/2014 10:00 AM, Robert Haas wrote:
> On Fri, Sep 12, 2014 at 1:00 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Thu, Sep 11, 2014 at 9:01 PM, Josh Berkus <josh@agliodbs.com> wrote:
>>> So, I finally got time to test Tom's latest patch on this.
>>>
>>> TLDR: we want to go with Tom's latest patch and release beta3.
>>>
>>> Figures:
>>>
>>> So I tested HEAD against the latest lengths patch.  Per Arthur Silva, I
>>> checked uncompressed times for JSONB against compressed times.  This
>>> changed the picture considerably.
>>
>> Did you
> 
> Blah.
> 
> Did you test Heikki's patch from here?
> 
> http://www.postgresql.org/message-id/53EC8194.4020804@vmware.com
> 
> Tom didn't like it, but I thought it was rather clever.
> 

Yes, I posted the results for that a couple weeks ago; Tom had posted a
cleaned-up version of that patch, but materially it made no difference
in sizes or extraction times compared with Tom's lengths-only patch.
Same for Arthur's tests.

It's certainly possible that there is a test case for which Heikki's
approach is superior, but if so we haven't seen it.  And since it's
approach is also more complicated, sticking with the simpler
lengths-only approach seems like the way to go.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: jsonb format is pessimal for toast compression

From
Robert Haas
Date:
On Fri, Sep 12, 2014 at 1:11 PM, Josh Berkus <josh@agliodbs.com> wrote:
> On 09/12/2014 10:00 AM, Robert Haas wrote:
>> On Fri, Sep 12, 2014 at 1:00 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Thu, Sep 11, 2014 at 9:01 PM, Josh Berkus <josh@agliodbs.com> wrote:
>>>> So, I finally got time to test Tom's latest patch on this.
>>>>
>>>> TLDR: we want to go with Tom's latest patch and release beta3.
>>>>
>>>> Figures:
>>>>
>>>> So I tested HEAD against the latest lengths patch.  Per Arthur Silva, I
>>>> checked uncompressed times for JSONB against compressed times.  This
>>>> changed the picture considerably.
>>>
>>> Did you
>>
>> Blah.
>>
>> Did you test Heikki's patch from here?
>>
>> http://www.postgresql.org/message-id/53EC8194.4020804@vmware.com
>>
>> Tom didn't like it, but I thought it was rather clever.
>
> Yes, I posted the results for that a couple weeks ago; Tom had posted a
> cleaned-up version of that patch, but materially it made no difference
> in sizes or extraction times compared with Tom's lengths-only patch.
> Same for Arthur's tests.
>
> It's certainly possible that there is a test case for which Heikki's
> approach is superior, but if so we haven't seen it.  And since it's
> approach is also more complicated, sticking with the simpler
> lengths-only approach seems like the way to go.

Huh, OK.  I'm slightly surprised, but that's why we benchmark these things.

Thanks for following up on this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Sep 12, 2014 at 1:11 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> It's certainly possible that there is a test case for which Heikki's
>> approach is superior, but if so we haven't seen it.  And since it's
>> approach is also more complicated, sticking with the simpler
>> lengths-only approach seems like the way to go.

> Huh, OK.  I'm slightly surprised, but that's why we benchmark these things.

The argument for Heikki's patch was never that it would offer better
performance; it's obvious (at least to me) that it won't.  The argument
was that it'd be upward-compatible with what we're doing now, so that
we'd not have to force an on-disk compatibility break with 9.4beta2.
        regards, tom lane



Re: jsonb format is pessimal for toast compression

From
Heikki Linnakangas
Date:
On 09/12/2014 08:52 PM, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Fri, Sep 12, 2014 at 1:11 PM, Josh Berkus <josh@agliodbs.com> wrote:
>>> It's certainly possible that there is a test case for which Heikki's
>>> approach is superior, but if so we haven't seen it.  And since it's
>>> approach is also more complicated, sticking with the simpler
>>> lengths-only approach seems like the way to go.
>
>> Huh, OK.  I'm slightly surprised, but that's why we benchmark these things.
>
> The argument for Heikki's patch was never that it would offer better
> performance; it's obvious (at least to me) that it won't.

Performance was one argument for sure. It's not hard to come up with a 
case where the all-lengths approach is much slower: take a huge array 
with, say, million elements, and fetch the last element in a tight loop. 
And do that in a PL/pgSQL function without storing the datum to disk, so 
that it doesn't get toasted. Not a very common thing to do in real life, 
although something like that might come up if you do a lot of json 
processing in PL/pgSQL. but storing offsets makes that faster.

IOW, something like this:

do $$
declare  ja jsonb;  i int4;
begin  select json_agg(g) into ja from generate_series(1, 100000) g;  for i in 1..100000 loop    perform ja ->> 90000;
endloop;
 
end;
$$;

should perform much better with current git master or "my patch", than 
with the all-lengths patch.

I'm OK with going for the all-lengths approach anyway; it's simpler, and 
working with huge arrays is hopefully not that common. But it's not a 
completely open-and-shut case.

- Heikki




Re: jsonb format is pessimal for toast compression

From
Josh Berkus
Date:
On 09/12/2014 01:30 PM, Heikki Linnakangas wrote:
> 
> Performance was one argument for sure. It's not hard to come up with a
> case where the all-lengths approach is much slower: take a huge array
> with, say, million elements, and fetch the last element in a tight loop.
> And do that in a PL/pgSQL function without storing the datum to disk, so
> that it doesn't get toasted. Not a very common thing to do in real life,
> although something like that might come up if you do a lot of json
> processing in PL/pgSQL. but storing offsets makes that faster.

While I didnt post the results (because they were uninteresting), I did
specifically test the "last element" in a set of 200 elements for
all-lengths vs. original offsets for JSONB, and the results were not
statistically different.

I did not test against your patch; is there some reason why your patch
would be faster for the "last element" case than the original offsets
version?

If not, I think the corner case is so obscure as to be not worth
optimizing for.  I can't imagine that more than a tiny minority of our
users are going to have thousands of keys per datum.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: jsonb format is pessimal for toast compression

From
Claudio Freire
Date:
On Mon, Sep 15, 2014 at 2:12 PM, Josh Berkus <josh@agliodbs.com> wrote:
> If not, I think the corner case is so obscure as to be not worth
> optimizing for.  I can't imagine that more than a tiny minority of our
> users are going to have thousands of keys per datum.

Worst case is linear cost scaling vs number of keys, which depends on
the number of keys how expensive it is.

It would have an effect only on uncompressed jsonb, since compressed
jsonb already pays a linear cost for decompression.

I'd suggest testing performance of large small keys in uncompressed
form. It's bound to have a noticeable regression there.

Now, large small keys could be 200 or 2000, or even 20k. I'd guess
several should be tested to find the shape of the curve.



Re: jsonb format is pessimal for toast compression

From
Josh Berkus
Date:
On 09/15/2014 10:23 AM, Claudio Freire wrote:
> Now, large small keys could be 200 or 2000, or even 20k. I'd guess
> several should be tested to find the shape of the curve.

Well, we know that it's not noticeable with 200, and that it is
noticeable with 100K.  It's only worth testing further if we think that
having more than 200 top-level keys in one JSONB value is going to be a
use case for more than 0.1% of our users.  I personally do not.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: jsonb format is pessimal for toast compression

From
Claudio Freire
Date:
On Mon, Sep 15, 2014 at 4:09 PM, Josh Berkus <josh@agliodbs.com> wrote:
> On 09/15/2014 10:23 AM, Claudio Freire wrote:
>> Now, large small keys could be 200 or 2000, or even 20k. I'd guess
>> several should be tested to find the shape of the curve.
>
> Well, we know that it's not noticeable with 200, and that it is
> noticeable with 100K.  It's only worth testing further if we think that
> having more than 200 top-level keys in one JSONB value is going to be a
> use case for more than 0.1% of our users.  I personally do not.

Yes, but bear in mind that the worst case is exactly at the use case
jsonb was designed to speed up: element access within relatively big
json documents.

Having them uncompressed is expectable because people using jsonb will
often favor speed over compactness if it's a tradeoff (otherwise
they'd use plain json).

So while you're right that it's perhaps above what would be a common
use case, the range "somewhere between 200 and 100K" for the tipping
point seems overly imprecise to me.



Re: jsonb format is pessimal for toast compression

From
Josh Berkus
Date:
On 09/15/2014 12:15 PM, Claudio Freire wrote:
> So while you're right that it's perhaps above what would be a common
> use case, the range "somewhere between 200 and 100K" for the tipping
> point seems overly imprecise to me.

Well, then, you know how to solve that.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: jsonb format is pessimal for toast compression

From
Claudio Freire
Date:
On Mon, Sep 15, 2014 at 4:17 PM, Josh Berkus <josh@agliodbs.com> wrote:
> On 09/15/2014 12:15 PM, Claudio Freire wrote:
>> So while you're right that it's perhaps above what would be a common
>> use case, the range "somewhere between 200 and 100K" for the tipping
>> point seems overly imprecise to me.
>
> Well, then, you know how to solve that.


I was hoping testing with other numbers was a simple hitting a key for
someone else.

But sure. I'll set something up.



Re: jsonb format is pessimal for toast compression

From
Josh Berkus
Date:
On 09/15/2014 12:25 PM, Claudio Freire wrote:
> On Mon, Sep 15, 2014 at 4:17 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> On 09/15/2014 12:15 PM, Claudio Freire wrote:
>>> So while you're right that it's perhaps above what would be a common
>>> use case, the range "somewhere between 200 and 100K" for the tipping
>>> point seems overly imprecise to me.
>>
>> Well, then, you know how to solve that.
> 
> 
> I was hoping testing with other numbers was a simple hitting a key for
> someone else.

Nope.  My test case has a fixed size.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: jsonb format is pessimal for toast compression

From
Robert Haas
Date:
On Mon, Sep 15, 2014 at 3:09 PM, Josh Berkus <josh@agliodbs.com> wrote:
> On 09/15/2014 10:23 AM, Claudio Freire wrote:
>> Now, large small keys could be 200 or 2000, or even 20k. I'd guess
>> several should be tested to find the shape of the curve.
>
> Well, we know that it's not noticeable with 200, and that it is
> noticeable with 100K.  It's only worth testing further if we think that
> having more than 200 top-level keys in one JSONB value is going to be a
> use case for more than 0.1% of our users.  I personally do not.

FWIW, I have written one (1) application that uses JSONB and it has
one sub-object (not the top-level object) that in the most typical
configuration contains precisely 270 keys.  Now, granted, that is not
the top-level object, if that distinction is actually relevant here,
but color me just a bit skeptical of this claim anyway.  This was just
a casual thing I did for my own use, not anything industrial strength,
so it's hard to believe I'm stressing the system more than 99.9% of
users will.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: jsonb format is pessimal for toast compression

From
Josh Berkus
Date:
On 09/15/2014 02:16 PM, Robert Haas wrote:
> On Mon, Sep 15, 2014 at 3:09 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> On 09/15/2014 10:23 AM, Claudio Freire wrote:
>>> Now, large small keys could be 200 or 2000, or even 20k. I'd guess
>>> several should be tested to find the shape of the curve.
>>
>> Well, we know that it's not noticeable with 200, and that it is
>> noticeable with 100K.  It's only worth testing further if we think that
>> having more than 200 top-level keys in one JSONB value is going to be a
>> use case for more than 0.1% of our users.  I personally do not.
> 
> FWIW, I have written one (1) application that uses JSONB and it has
> one sub-object (not the top-level object) that in the most typical
> configuration contains precisely 270 keys.  Now, granted, that is not
> the top-level object, if that distinction is actually relevant here,
> but color me just a bit skeptical of this claim anyway.  This was just
> a casual thing I did for my own use, not anything industrial strength,
> so it's hard to believe I'm stressing the system more than 99.9% of
> users will.

Actually, having the keys all at the same level *is* relevant for the
issue we're discussing.  If those 270 keys are organized in a tree, it's
not the same as having them all on one level (and not as problematic).

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: jsonb format is pessimal for toast compression

From
Peter Geoghegan
Date:
On Mon, Sep 15, 2014 at 4:05 PM, Josh Berkus <josh@agliodbs.com> wrote:
> Actually, having the keys all at the same level *is* relevant for the
> issue we're discussing.  If those 270 keys are organized in a tree, it's
> not the same as having them all on one level (and not as problematic).

I believe Robert meant that the 270 keys are not at the top level, but
are at some level (in other words, some object has 270 pairs). That is
equivalent to having them at the top level for the purposes of this
discussion.

FWIW, I am slightly concerned about weighing use cases around very
large JSON documents too heavily. Having enormous jsonb documents just
isn't going to work out that well, but neither will equivalent designs
in popular document database systems for similar reasons. For example,
the maximum BSON document size supported by MongoDB is 16 megabytes,
and that seems to be something that their users don't care too much
about. Having 270 pairs in an object isn't unreasonable, but it isn't
going to be all that common either.

-- 
Peter Geoghegan



Re: jsonb format is pessimal for toast compression

From
Arthur Silva
Date:
I couldn't get my hands on the twitter data but I'm generating my own. The json template is http://paste2.org/wJ1dfcjw and data was generated with http://www.json-generator.com/. It has 35 top level keys, just in case someone is wondering.
I generated 10000 random objects and I'm inserting them repeatedly until I got 320k rows.

Test query: SELECT data->>'name', data->>'email' FROM t_json
Test storage: EXTERNAL
Test jsonb lengths quartiles: {1278,1587,1731,1871,2231}
Tom's lengths+cache aware: 455ms
HEAD: 440ms

This is a realistic-ish workload in my opinion and Tom's patch performs within 4% of HEAD.

Due to the overall lenghts I couldn't really test compressibility so I re-ran the test. This time I inserted an array of 2 objects in each row, as in: [obj, obj];
The objects where taken in sequence from the 10000 pool so contents match in both tests.

Test query: SELECT data #> '{0, name}', data #> '{0, email}', data #> '{1, name}', data #> '{1, email}' FROM t_json
Test storage: EXTENDED
HEAD: 17mb table + 878mb toast
HEAD size quartiles: {2015,2500,2591,2711,3483}
HEAD query runtime: 15s
Tom's: 220mb table + 580mb toast
Tom's size quartiles: {1665,1984,2061,2142.25,2384}
Tom's query runtime: 13s

This is an intriguing edge case that Tom's patch actually outperform the base implementation for 3~4kb jsons.

Re: jsonb format is pessimal for toast compression

From
Craig Ringer
Date:
On 09/16/2014 07:44 AM, Peter Geoghegan wrote:
> FWIW, I am slightly concerned about weighing use cases around very
> large JSON documents too heavily. Having enormous jsonb documents just
> isn't going to work out that well, but neither will equivalent designs
> in popular document database systems for similar reasons. For example,
> the maximum BSON document size supported by MongoDB is 16 megabytes,
> and that seems to be something that their users don't care too much
> about. Having 270 pairs in an object isn't unreasonable, but it isn't
> going to be all that common either.

Also, at a certain size the fact that Pg must rewrite the whole document
for any change to it starts to introduce other practical changes.

Anyway - this is looking like the change will go in, and with it a
catversion bump. Introduction of a jsonb version/flags byte might be
worthwhile at the same time. It seems likely that there'll be more room
for improvement in jsonb, possibly even down to using different formats
for different data.

Is it worth paying a byte per value to save on possible upgrade pain?

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: jsonb format is pessimal for toast compression

From
Robert Haas
Date:
On Mon, Sep 15, 2014 at 7:44 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Mon, Sep 15, 2014 at 4:05 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> Actually, having the keys all at the same level *is* relevant for the
>> issue we're discussing.  If those 270 keys are organized in a tree, it's
>> not the same as having them all on one level (and not as problematic).
>
> I believe Robert meant that the 270 keys are not at the top level, but
> are at some level (in other words, some object has 270 pairs). That is
> equivalent to having them at the top level for the purposes of this
> discussion.

Yes, that's exactly what I meant.

> FWIW, I am slightly concerned about weighing use cases around very
> large JSON documents too heavily. Having enormous jsonb documents just
> isn't going to work out that well, but neither will equivalent designs
> in popular document database systems for similar reasons. For example,
> the maximum BSON document size supported by MongoDB is 16 megabytes,
> and that seems to be something that their users don't care too much
> about. Having 270 pairs in an object isn't unreasonable, but it isn't
> going to be all that common either.

The JSON documents in this case were not particularly large.  These
objects were < 100kB; they just had a lot of keys.   I'm a little
baffled by the apparent theme that people think that (object size) /
(# of keys) will tend to be large.  Maybe there will be some instances
where that's the case, but it's not what I'd expect.  I would expect
people to use JSON to serialize structured data in situations where
normalizing would be unwieldly.

For example, pick your favorite Facebook or Smartphone game - Plants
vs. Zombies, Farmville, Candy Crush Saga, whatever.  Or even a
traditional board game like chess.  Think about what the game state
looks like as an abstract object.  Almost without exception, you've
got some kind of game board with a bunch of squares and then you have
a bunch of pieces (plants, crops, candies, pawns) that are positioned
on those squares.  Now you want to store this in a database.  You're
certainly not going to have a table column per square, and EAV would
be stupid, so what's left?  You could use an array, but an array of
strings might not be descriptive enough; for a square in Farmville,
for example, you might need to know the type of crop, and whether it
was fertilized with special magic fertilizer, and when it's going to
be ready to harvest, and when it'll wither if not harvested.  So a
JSON is a pretty natural structure: an array of arrays of objects.  If
you have a 30x30 farm, you'll have 900 keys.  If you have a 50x50
farm, which probably means you're spending real money to buy imaginary
plants, you'll have 2500 keys.

(For the record, I have no actual knowledge of how any of these games
are implemented under the hood.  I'm just speculating on how I would
have done it.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: jsonb format is pessimal for toast compression

From
Josh Berkus
Date:
On 09/16/2014 06:31 AM, Robert Haas wrote:
> On Mon, Sep 15, 2014 at 7:44 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> On Mon, Sep 15, 2014 at 4:05 PM, Josh Berkus <josh@agliodbs.com> wrote:
>>> Actually, having the keys all at the same level *is* relevant for the
>>> issue we're discussing.  If those 270 keys are organized in a tree, it's
>>> not the same as having them all on one level (and not as problematic).
>>
>> I believe Robert meant that the 270 keys are not at the top level, but
>> are at some level (in other words, some object has 270 pairs). That is
>> equivalent to having them at the top level for the purposes of this
>> discussion.
> 
> Yes, that's exactly what I meant.
> 
>> FWIW, I am slightly concerned about weighing use cases around very
>> large JSON documents too heavily. Having enormous jsonb documents just
>> isn't going to work out that well, but neither will equivalent designs
>> in popular document database systems for similar reasons. For example,
>> the maximum BSON document size supported by MongoDB is 16 megabytes,
>> and that seems to be something that their users don't care too much
>> about. Having 270 pairs in an object isn't unreasonable, but it isn't
>> going to be all that common either.

Well, I can only judge from the use cases I personally have, none of
which involve more than 100 keys at any level for most rows.  So far
I've seen some people argue hypotetical use cases involving hundreds of
keys per level, but nobody who *actually* has such a use case.  Also,
note that we currently don't know where the "last value" extraction
becomes a performance problem at this stage, except that it's somewhere
between 200 and 100,000.  Also, we don't have a test which shows the
hybrid approach (Heikki's patch) performing better with 1000's of keys.

Basically, if someone is going to make a serious case for Heikki's
hybrid approach over the simpler lengths-only approach, then please post
some test data showing the benefit ASAP, since I can't demonstrate it.
Otherwise, let's get beta 3 out the door so we can get the 9.4 release
train moving again.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: jsonb format is pessimal for toast compression

From
Robert Haas
Date:
On Tue, Sep 16, 2014 at 12:47 PM, Josh Berkus <josh@agliodbs.com> wrote:
> On 09/16/2014 06:31 AM, Robert Haas wrote:
>> On Mon, Sep 15, 2014 at 7:44 PM, Peter Geoghegan <pg@heroku.com> wrote:
>>> On Mon, Sep 15, 2014 at 4:05 PM, Josh Berkus <josh@agliodbs.com> wrote:
>>>> Actually, having the keys all at the same level *is* relevant for the
>>>> issue we're discussing.  If those 270 keys are organized in a tree, it's
>>>> not the same as having them all on one level (and not as problematic).
>>>
>>> I believe Robert meant that the 270 keys are not at the top level, but
>>> are at some level (in other words, some object has 270 pairs). That is
>>> equivalent to having them at the top level for the purposes of this
>>> discussion.
>>
>> Yes, that's exactly what I meant.
>>
>>> FWIW, I am slightly concerned about weighing use cases around very
>>> large JSON documents too heavily. Having enormous jsonb documents just
>>> isn't going to work out that well, but neither will equivalent designs
>>> in popular document database systems for similar reasons. For example,
>>> the maximum BSON document size supported by MongoDB is 16 megabytes,
>>> and that seems to be something that their users don't care too much
>>> about. Having 270 pairs in an object isn't unreasonable, but it isn't
>>> going to be all that common either.
>
> Well, I can only judge from the use cases I personally have, none of
> which involve more than 100 keys at any level for most rows.  So far
> I've seen some people argue hypotetical use cases involving hundreds of
> keys per level, but nobody who *actually* has such a use case.

I already told you that I did, and that it was the only and only app I
had written for JSONB.

> Also,
> note that we currently don't know where the "last value" extraction
> becomes a performance problem at this stage, except that it's somewhere
> between 200 and 100,000.  Also, we don't have a test which shows the
> hybrid approach (Heikki's patch) performing better with 1000's of keys.

Fair point.

> Basically, if someone is going to make a serious case for Heikki's
> hybrid approach over the simpler lengths-only approach, then please post
> some test data showing the benefit ASAP, since I can't demonstrate it.
> Otherwise, let's get beta 3 out the door so we can get the 9.4 release
> train moving again.

I don't personally care about this enough to spend more time on it.  I
told you my extremely limited experience because it seems to
contradict your broader experience.  If you don't care, you don't
care.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: jsonb format is pessimal for toast compression

From
Josh Berkus
Date:
On 09/16/2014 09:54 AM, Robert Haas wrote:
> On Tue, Sep 16, 2014 at 12:47 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> On 09/16/2014 06:31 AM, Robert Haas wrote:
>>> On Mon, Sep 15, 2014 at 7:44 PM, Peter Geoghegan <pg@heroku.com> wrote:
>>>> On Mon, Sep 15, 2014 at 4:05 PM, Josh Berkus <josh@agliodbs.com> wrote:
>>>>> Actually, having the keys all at the same level *is* relevant for the
>>>>> issue we're discussing.  If those 270 keys are organized in a tree, it's
>>>>> not the same as having them all on one level (and not as problematic).
>>>>
>>>> I believe Robert meant that the 270 keys are not at the top level, but
>>>> are at some level (in other words, some object has 270 pairs). That is
>>>> equivalent to having them at the top level for the purposes of this
>>>> discussion.
>>>
>>> Yes, that's exactly what I meant.
>>>
>>>> FWIW, I am slightly concerned about weighing use cases around very
>>>> large JSON documents too heavily. Having enormous jsonb documents just
>>>> isn't going to work out that well, but neither will equivalent designs
>>>> in popular document database systems for similar reasons. For example,
>>>> the maximum BSON document size supported by MongoDB is 16 megabytes,
>>>> and that seems to be something that their users don't care too much
>>>> about. Having 270 pairs in an object isn't unreasonable, but it isn't
>>>> going to be all that common either.
>>
>> Well, I can only judge from the use cases I personally have, none of
>> which involve more than 100 keys at any level for most rows.  So far
>> I've seen some people argue hypotetical use cases involving hundreds of
>> keys per level, but nobody who *actually* has such a use case.
> 
> I already told you that I did, and that it was the only and only app I
> had written for JSONB.

Ah, ok, I thought yours was a test case.  Did you check how it performed
on the two patches at all?  My tests with 185 keys didn't show any
difference, including for a "last key" case.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: jsonb format is pessimal for toast compression

From
Heikki Linnakangas
Date:
On 09/16/2014 07:47 PM, Josh Berkus wrote:
> On 09/16/2014 06:31 AM, Robert Haas wrote:
>> On Mon, Sep 15, 2014 at 7:44 PM, Peter Geoghegan <pg@heroku.com> wrote:
>>> On Mon, Sep 15, 2014 at 4:05 PM, Josh Berkus <josh@agliodbs.com> wrote:
>>>> Actually, having the keys all at the same level *is* relevant for the
>>>> issue we're discussing.  If those 270 keys are organized in a tree, it's
>>>> not the same as having them all on one level (and not as problematic).
>>>
>>> I believe Robert meant that the 270 keys are not at the top level, but
>>> are at some level (in other words, some object has 270 pairs). That is
>>> equivalent to having them at the top level for the purposes of this
>>> discussion.
>>
>> Yes, that's exactly what I meant.
>>
>>> FWIW, I am slightly concerned about weighing use cases around very
>>> large JSON documents too heavily. Having enormous jsonb documents just
>>> isn't going to work out that well, but neither will equivalent designs
>>> in popular document database systems for similar reasons. For example,
>>> the maximum BSON document size supported by MongoDB is 16 megabytes,
>>> and that seems to be something that their users don't care too much
>>> about. Having 270 pairs in an object isn't unreasonable, but it isn't
>>> going to be all that common either.
>
> Well, I can only judge from the use cases I personally have, none of
> which involve more than 100 keys at any level for most rows.  So far
> I've seen some people argue hypotetical use cases involving hundreds of
> keys per level, but nobody who *actually* has such a use case.  Also,
> note that we currently don't know where the "last value" extraction
> becomes a performance problem at this stage, except that it's somewhere
> between 200 and 100,000.  Also, we don't have a test which shows the
> hybrid approach (Heikki's patch) performing better with 1000's of keys.
>
> Basically, if someone is going to make a serious case for Heikki's
> hybrid approach over the simpler lengths-only approach, then please post
> some test data showing the benefit ASAP, since I can't demonstrate it.
> Otherwise, let's get beta 3 out the door so we can get the 9.4 release
> train moving again.

Are you looking for someone with a real life scenario, or just synthetic
test case? The latter is easy to do.

See attached test program. It's basically the same I posted earlier.
Here are the results from my laptop with Tom's jsonb-lengths-merged.patch:

postgres=# select * from testtimes ;
  elem | duration_ms
------+-------------
    11 |    0.289508
    12 |    0.288122
    13 |    0.290558
    14 |    0.287889
    15 |    0.286303
    17 |    0.290415
    19 |    0.289829
    21 |    0.289783
    23 |    0.287104
    25 |    0.289834
    28 |    0.290735
    31 |    0.291844
    34 |    0.293454
    37 |    0.293866
    41 |    0.291217
    45 |    0.289243
    50 |    0.290385
    55 |    0.292085
    61 |    0.290892
    67 |    0.292335
    74 |    0.292561
    81 |    0.291416
    89 |    0.295714
    98 |     0.29844
   108 |    0.297421
   119 |    0.299471
   131 |    0.299877
   144 |    0.301604
   158 |    0.303365
   174 |    0.304203
   191 |    0.303596
   210 |    0.306526
   231 |    0.304189
   254 |    0.307782
   279 |    0.307372
   307 |    0.306873
   338 |    0.310471
   372 |      0.3151
   409 |    0.320354
   450 |     0.32038
   495 |    0.322127
   545 |    0.323256
   600 |    0.330419
   660 |    0.334226
   726 |    0.336951
   799 |     0.34108
   879 |    0.347746
   967 |    0.354275
  1064 |    0.356696
  1170 |    0.366906
  1287 |    0.375352
  1416 |    0.392952
  1558 |    0.392907
  1714 |    0.402157
  1885 |    0.412384
  2074 |    0.425958
  2281 |    0.435415
  2509 |     0.45301
  2760 |    0.469983
  3036 |    0.487329
  3340 |    0.505505
  3674 |    0.530412
  4041 |    0.552585
  4445 |    0.581815
  4890 |    0.610509
  5379 |    0.642885
  5917 |    0.680395
  6509 |    0.713849
  7160 |    0.757561
  7876 |    0.805225
  8664 |    0.856142
  9530 |    0.913255
(72 rows)

That's up to 9530 elements - it's pretty easy to extrapolate from there
to higher counts, it's O(n).

With unpatched git master, the runtime is flat, regardless of which
element is queried, at about 0.29 s. With
jsonb-with-offsets-and-lengths-2.patch, there's no difference that I
could measure.

The difference starts to be meaningful at around 500 entries. In
practice, I doubt anyone's going to notice until you start talking about
tens of thousands of entries.

I'll leave it up to the jury to decide if we care or not. It seems like
a fairly unusual use case, where you push around large enough arrays or
objects to notice. Then again, I'm sure *someone* will do it. People do
strange things, and they find ways to abuse the features that the
original developers didn't think of.

- Heikki


Attachment

Re: jsonb format is pessimal for toast compression

From
Claudio Freire
Date:
On Tue, Sep 16, 2014 at 3:12 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> I'll leave it up to the jury to decide if we care or not. It seems like a
> fairly unusual use case, where you push around large enough arrays or
> objects to notice. Then again, I'm sure *someone* will do it. People do
> strange things, and they find ways to abuse the features that the original
> developers didn't think of.

Again, it's not abusing of the feature. It's using it. Jsonb is
supposed to be fast for this.



Re: jsonb format is pessimal for toast compression

From
Robert Haas
Date:
On Tue, Sep 16, 2014 at 1:11 PM, Josh Berkus <josh@agliodbs.com> wrote:
>>> Well, I can only judge from the use cases I personally have, none of
>>> which involve more than 100 keys at any level for most rows.  So far
>>> I've seen some people argue hypotetical use cases involving hundreds of
>>> keys per level, but nobody who *actually* has such a use case.
>>
>> I already told you that I did, and that it was the only and only app I
>> had written for JSONB.
>
> Ah, ok, I thought yours was a test case.  Did you check how it performed
> on the two patches at all?  My tests with 185 keys didn't show any
> difference, including for a "last key" case.

No, I didn't test it.   But I think Heikki's test results pretty much
tell us everything there is to see here.  This isn't really that
complicated; I've read a few papers on index compression over the
years and they seem to often use techniques that have the same general
flavor as what Heikki did here, adding complexity in the data format
to gain other advantages.  So I don't think we should be put off.

Basically, I think that if we make a decision to use Tom's patch
rather than Heikki's patch, we're deciding that the initial decision,
by the folks who wrote the original jsonb code, to make array access
less than O(n) was misguided.  While that could be true, I'd prefer to
bet that those folks knew what they were doing.  The only way reason
we're even considering changing it is that the array of lengths
doesn't compress well, and we've got an approach that fixes that
problem while preserving the advantages of fast lookup.  We should
have a darn fine reason to say no to that approach, and "it didn't
benefit my particular use case" is not it.

In practice, I'm not very surprised that the impact doesn't seem too
bad when you're running SQL queries from the client.  There's so much
other overhead, for de-TOASTing and client communication and even just
planner and executor costs, that this gets lost in the noise.  But
think about a PL/pgsql procedure, say, where somebody might loop over
all of the elements in array.  If those operations go from O(1) to
O(n), then the loop goes from O(n) to O(n^2).  I will bet you a
beverage of your choice that somebody will find that behavior within a
year of release and be dismayed by it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: jsonb format is pessimal for toast compression

From
Josh Berkus
Date:
Heikki, Robert:

On 09/16/2014 11:12 AM, Heikki Linnakangas wrote:
> Are you looking for someone with a real life scenario, or just synthetic
> test case? The latter is easy to do.
> 
> See attached test program. It's basically the same I posted earlier.
> Here are the results from my laptop with Tom's jsonb-lengths-merged.patch:

Thanks for that!

> postgres=# select * from testtimes ;
>  elem | duration_ms
> ------+-------------
>  3674 |    0.530412
>  4041 |    0.552585
>  4445 |    0.581815

This looks like the level at which the difference gets to be really
noticeable.  Note that this is completely swamped by the difference
between compressed vs. uncompressed though.

> With unpatched git master, the runtime is flat, regardless of which
> element is queried, at about 0.29 s. With
> jsonb-with-offsets-and-lengths-2.patch, there's no difference that I
> could measure.

OK, thanks.

> The difference starts to be meaningful at around 500 entries. In
> practice, I doubt anyone's going to notice until you start talking about
> tens of thousands of entries.
> 
> I'll leave it up to the jury to decide if we care or not. It seems like
> a fairly unusual use case, where you push around large enough arrays or
> objects to notice. Then again, I'm sure *someone* will do it. People do
> strange things, and they find ways to abuse the features that the
> original developers didn't think of.

Right, but the question is whether it's worth having a more complex code
and data structure in order to support what certainly *seems* to be a
fairly obscure use-case, that is more than 4000 keys at the same level.And it's not like it stops working or becomes
completelyunresponsive
 
at that level; it's just double the response time.

On 09/16/2014 12:20 PM, Robert Haas wrote:> Basically, I think that if
we make a decision to use Tom's patch
> rather than Heikki's patch, we're deciding that the initial decision,
> by the folks who wrote the original jsonb code, to make array access
> less than O(n) was misguided.  While that could be true, I'd prefer to
> bet that those folks knew what they were doing.  The only way reason
> we're even considering changing it is that the array of lengths
> doesn't compress well, and we've got an approach that fixes that
> problem while preserving the advantages of fast lookup.  We should
> have a darn fine reason to say no to that approach, and "it didn't
> benefit my particular use case" is not it.

Do you feel that way *as a code maintainer*?  That is, if you ended up
maintaining the JSONB code, would you still feel that it's worth the
extra complexity?  Because that will be the main cost here.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: jsonb format is pessimal for toast compression

From
Robert Haas
Date:
On Tue, Sep 16, 2014 at 3:24 PM, Josh Berkus <josh@agliodbs.com> wrote:
> Do you feel that way *as a code maintainer*?  That is, if you ended up
> maintaining the JSONB code, would you still feel that it's worth the
> extra complexity?  Because that will be the main cost here.

I feel that Heikki doesn't have a reputation for writing or committing
unmaintainable code.

I haven't reviewed the patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: jsonb format is pessimal for toast compression

From
Petr Jelinek
Date:
On 16/09/14 21:20, Robert Haas wrote:
> In practice, I'm not very surprised that the impact doesn't seem too
> bad when you're running SQL queries from the client.  There's so much
> other overhead, for de-TOASTing and client communication and even just
> planner and executor costs, that this gets lost in the noise.  But
> think about a PL/pgsql procedure, say, where somebody might loop over
> all of the elements in array.  If those operations go from O(1) to
> O(n), then the loop goes from O(n) to O(n^2).  I will bet you a
> beverage of your choice that somebody will find that behavior within a
> year of release and be dismayed by it.
>

As somebody who did see server melt (quite literally that time 
unfortunately) thanks to the CPU overhead of operations on varlena 
arrays +1 (in fact +many).

Especially if we are trying to promote the json improvements in 9.4 as 
"best of both worlds" kind of thing.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: jsonb format is pessimal for toast compression

From
Heikki Linnakangas
Date:
On 09/16/2014 10:37 PM, Robert Haas wrote:
> On Tue, Sep 16, 2014 at 3:24 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> Do you feel that way *as a code maintainer*?  That is, if you ended up
>> maintaining the JSONB code, would you still feel that it's worth the
>> extra complexity?  Because that will be the main cost here.
>
> I feel that Heikki doesn't have a reputation for writing or committing
> unmaintainable code.
>
> I haven't reviewed the patch.

The patch I posted was not pretty, but I'm sure it could be refined to 
something sensible.

There are many possible variations of the basic scheme of storing mostly 
lengths, but an offset for every N elements. I replaced the length with 
offset on some element and used a flag bit to indicate which it is. 
Perhaps a simpler approach would be to store lengths, but also store a 
separate smaller array of offsets, after the lengths array.

I can write a patch if we want to go that way.

- Heikki




Re: jsonb format is pessimal for toast compression

From
Arthur Silva
Date:
On Tue, Sep 16, 2014 at 4:20 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Sep 16, 2014 at 1:11 PM, Josh Berkus <josh@agliodbs.com> wrote:
>>> Well, I can only judge from the use cases I personally have, none of
>>> which involve more than 100 keys at any level for most rows.  So far
>>> I've seen some people argue hypotetical use cases involving hundreds of
>>> keys per level, but nobody who *actually* has such a use case.
>>
>> I already told you that I did, and that it was the only and only app I
>> had written for JSONB.
>
> Ah, ok, I thought yours was a test case.  Did you check how it performed
> on the two patches at all?  My tests with 185 keys didn't show any
> difference, including for a "last key" case.

No, I didn't test it.   But I think Heikki's test results pretty much
tell us everything there is to see here.  This isn't really that
complicated; I've read a few papers on index compression over the
years and they seem to often use techniques that have the same general
flavor as what Heikki did here, adding complexity in the data format
to gain other advantages.  So I don't think we should be put off.

I second this reasoning. Even if I ran a couple of very realistic test cases that support all-lengths I do fell that the Hybrid aproach would be better as it covers all bases. To put things in perspective Tom's latest patch isn't much simpler either.

Since it would still be a breaking change we should consider changing the layout to key-key-key-value-value-value as it seems to pay off.


Basically, I think that if we make a decision to use Tom's patch
rather than Heikki's patch, we're deciding that the initial decision,
by the folks who wrote the original jsonb code, to make array access
less than O(n) was misguided.  While that could be true, I'd prefer to
bet that those folks knew what they were doing.  The only way reason
we're even considering changing it is that the array of lengths
doesn't compress well, and we've got an approach that fixes that
problem while preserving the advantages of fast lookup.  We should
have a darn fine reason to say no to that approach, and "it didn't
benefit my particular use case" is not it.

In practice, I'm not very surprised that the impact doesn't seem too
bad when you're running SQL queries from the client.  There's so much
other overhead, for de-TOASTing and client communication and even just
planner and executor costs, that this gets lost in the noise.  But
think about a PL/pgsql procedure, say, where somebody might loop over
all of the elements in array.  If those operations go from O(1) to
O(n), then the loop goes from O(n) to O(n^2).  I will bet you a
beverage of your choice that somebody will find that behavior within a
year of release and be dismayed by it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: jsonb format is pessimal for toast compression

From
Любен Каравелов
Date:
----- Цитат от Robert Haas (robertmhaas@gmail.com), на 16.09.2014 в 22:20 -----
>
> In practice, I'm not very surprised that the impact doesn't seem too
> bad when you're running SQL queries from the client.  There's so much
> other overhead, for de-TOASTing and client communication and even just
> planner and executor costs, that this gets lost in the noise.  But
> think about a PL/pgsql procedure, say, where somebody might loop over
> all of the elements in array.  If those operations go from O(1) to
> O(n), then the loop goes from O(n) to O(n^2).  I will bet you a
> beverage of your choice that somebody will find that behavior within a
> year of release and be dismayed by it.
>


Hi,

I can imagine situation exactly like that. We could use jsonb object to
represent sparse vectors in the database where the key is the dimension
and the value is the value. So they could easily grow to thousands of
dimensions. Once you have than in the database it is easy to go and
write some simple numeric computations on these vectors, let's say you
want a dot product of 2 sparse vectors. If the random access inside one
vector is going to O(n^2) then the dot product computation will be going
to O(n^2*m^2), so not pretty.

I am not saying that the DB is the right place to do this type of
computations but it is somethimes convenient to have it also in the DB.

Regards,
luben




Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> On 09/16/2014 10:37 PM, Robert Haas wrote:
>> On Tue, Sep 16, 2014 at 3:24 PM, Josh Berkus <josh@agliodbs.com> wrote:
>>> Do you feel that way *as a code maintainer*?  That is, if you ended up
>>> maintaining the JSONB code, would you still feel that it's worth the
>>> extra complexity?  Because that will be the main cost here.

>> I feel that Heikki doesn't have a reputation for writing or committing
>> unmaintainable code.
>> I haven't reviewed the patch.

> The patch I posted was not pretty, but I'm sure it could be refined to 
> something sensible.

We're somewhat comparing apples and oranges here, in that I pushed my
approach to something that I think is of committable quality (and which,
not incidentally, fixes some existing bugs that we'd need to fix in any
case); while Heikki's patch was just proof-of-concept.  It would be worth
pushing Heikki's patch to committable quality so that we had a more
complete understanding of just what the complexity difference really is.

> There are many possible variations of the basic scheme of storing mostly 
> lengths, but an offset for every N elements. I replaced the length with 
> offset on some element and used a flag bit to indicate which it is. 

Aside from the complexity issue, a demerit of Heikki's solution is that it
eats up a flag bit that we may well wish we had back later.  On the other
hand, there's definitely something to be said for not breaking
pg_upgrade-ability of 9.4beta databases.

> Perhaps a simpler approach would be to store lengths, but also store a 
> separate smaller array of offsets, after the lengths array.

That way would also give up on-disk compatibility, and I'm not sure it's
any simpler in practice than your existing solution.
        regards, tom lane



Re: jsonb format is pessimal for toast compression

From
Josh Berkus
Date:
On 09/16/2014 08:45 PM, Tom Lane wrote:
> We're somewhat comparing apples and oranges here, in that I pushed my
> approach to something that I think is of committable quality (and which,
> not incidentally, fixes some existing bugs that we'd need to fix in any
> case); while Heikki's patch was just proof-of-concept.  It would be worth
> pushing Heikki's patch to committable quality so that we had a more
> complete understanding of just what the complexity difference really is.

Is anyone actually working on this?

If not, I'm voting for the all-lengths patch so that we can get 9.4 out
the door.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: jsonb format is pessimal for toast compression

From
Heikki Linnakangas
Date:
On 09/18/2014 07:53 PM, Josh Berkus wrote:
> On 09/16/2014 08:45 PM, Tom Lane wrote:
>> We're somewhat comparing apples and oranges here, in that I pushed my
>> approach to something that I think is of committable quality (and which,
>> not incidentally, fixes some existing bugs that we'd need to fix in any
>> case); while Heikki's patch was just proof-of-concept.  It would be worth
>> pushing Heikki's patch to committable quality so that we had a more
>> complete understanding of just what the complexity difference really is.
>
> Is anyone actually working on this?
>
> If not, I'm voting for the all-lengths patch so that we can get 9.4 out
> the door.

I'll try to write a more polished patch tomorrow. We'll then see what it 
looks like, and can decide if we want it.

- Heikki




Re: jsonb format is pessimal for toast compression

From
Heikki Linnakangas
Date:
On 09/18/2014 09:27 PM, Heikki Linnakangas wrote:
> On 09/18/2014 07:53 PM, Josh Berkus wrote:
>> On 09/16/2014 08:45 PM, Tom Lane wrote:
>>> We're somewhat comparing apples and oranges here, in that I pushed my
>>> approach to something that I think is of committable quality (and which,
>>> not incidentally, fixes some existing bugs that we'd need to fix in any
>>> case); while Heikki's patch was just proof-of-concept.  It would be worth
>>> pushing Heikki's patch to committable quality so that we had a more
>>> complete understanding of just what the complexity difference really is.
>>
>> Is anyone actually working on this?
>>
>> If not, I'm voting for the all-lengths patch so that we can get 9.4 out
>> the door.
>
> I'll try to write a more polished patch tomorrow. We'll then see what it
> looks like, and can decide if we want it.

Ok, here are two patches. One is a refined version of my earlier patch,
and the other implements the separate offsets array approach. They are
both based on Tom's jsonb-lengths-merged.patch, so they include all the
whitespace fixes etc. he mentioned.

There is no big difference in terms of code complexity between the
patches. IMHO the separate offsets array is easier to understand, but it
makes for more complicated accessor macros to find the beginning of the
variable-length data.

Unlike Tom's patch, these patches don't cache any offsets when doing a
binary search. Doesn't seem worth it, when the access time is O(1) anyway.

Both of these patches have a #define JB_OFFSET_STRIDE for the "stride
size". For the separate offsets array, the offsets array has one element
for every JB_OFFSET_STRIDE children. For the other patch, every
JB_OFFSET_STRIDE child stores the end offset, while others store the
length. A smaller value makes random access faster, at the cost of
compressibility / on-disk size. I haven't done any measurements to find
the optimal value, the values in the patches are arbitrary.

I think we should bite the bullet and break compatibility with 9.4beta2
format, even if we go with "my patch". In a jsonb object, it makes sense
to store all the keys first, like Tom did, because of cache benefits,
and the future possibility to do smart EXTERNAL access. Also, even if we
can make the on-disk format compatible, it's weird that you can get
different runtime behavior with datums created with a beta version.
Seems more clear to just require a pg_dump + restore.

Tom: You mentioned earlier that your patch fixes some existing bugs.
What were they? There were a bunch of whitespace and comment fixes that
we should apply in any case, but I couldn't see any actual bugs. I think
we should apply those fixes separately, to make sure we don't forget
about them, and to make it easier to review these patches.

- Heikki


Attachment

Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> Tom: You mentioned earlier that your patch fixes some existing bugs. 
> What were they?

What I remember at the moment (sans caffeine) is that the routines for
assembling jsonb values out of field data were lacking some necessary
tests for overflow of the size/offset fields.  If you like I can apply
those fixes separately, but I think they were sufficiently integrated with
other changes in the logic that it wouldn't really help much for patch
reviewability.
        regards, tom lane



Re: jsonb format is pessimal for toast compression

From
Josh Berkus
Date:
On 09/19/2014 07:07 AM, Tom Lane wrote:
> Heikki Linnakangas <hlinnakangas@vmware.com> writes:
>> Tom: You mentioned earlier that your patch fixes some existing bugs. 
>> What were they?
> 
> What I remember at the moment (sans caffeine) is that the routines for
> assembling jsonb values out of field data were lacking some necessary
> tests for overflow of the size/offset fields.  If you like I can apply
> those fixes separately, but I think they were sufficiently integrated with
> other changes in the logic that it wouldn't really help much for patch
> reviewability.

Where are we on this?  Do we have a patch ready for testing?


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: jsonb format is pessimal for toast compression

From
Peter Geoghegan
Date:
On Fri, Sep 19, 2014 at 5:40 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> I think we should bite the bullet and break compatibility with 9.4beta2
> format, even if we go with "my patch". In a jsonb object, it makes sense to
> store all the keys first, like Tom did, because of cache benefits, and the
> future possibility to do smart EXTERNAL access. Also, even if we can make
> the on-disk format compatible, it's weird that you can get different runtime
> behavior with datums created with a beta version. Seems more clear to just
> require a pg_dump + restore.

I vote for going with your patch, and breaking compatibility for the
reasons stated here (though I'm skeptical of the claims about cache
benefits, FWIW).

-- 
Peter Geoghegan



Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
Peter Geoghegan <pg@heroku.com> writes:
> On Fri, Sep 19, 2014 at 5:40 AM, Heikki Linnakangas
> <hlinnakangas@vmware.com> wrote:
>> I think we should bite the bullet and break compatibility with 9.4beta2
>> format, even if we go with "my patch". In a jsonb object, it makes sense to
>> store all the keys first, like Tom did, because of cache benefits, and the
>> future possibility to do smart EXTERNAL access. Also, even if we can make
>> the on-disk format compatible, it's weird that you can get different runtime
>> behavior with datums created with a beta version. Seems more clear to just
>> require a pg_dump + restore.

> I vote for going with your patch, and breaking compatibility for the
> reasons stated here (though I'm skeptical of the claims about cache
> benefits, FWIW).

I'm also skeptical of that, but I think the potential for smart EXTERNAL
access is a valid consideration.

I've not had time to read Heikki's updated patch yet --- has anyone
else compared the two patches for code readability?  If they're fairly
close on that score, then I'd agree his approach is the best solution.
(I will look at his code, but I'm not sure I'm the most unbiased
observer.)
        regards, tom lane



Re: jsonb format is pessimal for toast compression

From
Jan Wieck
Date:
On 09/15/2014 09:46 PM, Craig Ringer wrote:
> On 09/16/2014 07:44 AM, Peter Geoghegan wrote:
>> FWIW, I am slightly concerned about weighing use cases around very
>> large JSON documents too heavily. Having enormous jsonb documents just
>> isn't going to work out that well, but neither will equivalent designs
>> in popular document database systems for similar reasons. For example,
>> the maximum BSON document size supported by MongoDB is 16 megabytes,
>> and that seems to be something that their users don't care too much
>> about. Having 270 pairs in an object isn't unreasonable, but it isn't
>> going to be all that common either.
>
> Also, at a certain size the fact that Pg must rewrite the whole document
> for any change to it starts to introduce other practical changes.
>
> Anyway - this is looking like the change will go in, and with it a
> catversion bump. Introduction of a jsonb version/flags byte might be
> worthwhile at the same time. It seems likely that there'll be more room
> for improvement in jsonb, possibly even down to using different formats
> for different data.
>
> Is it worth paying a byte per value to save on possible upgrade pain?
>

This comment seems to have drowned in the discussion.

If there indeed has to be a catversion bump in the process of this, then 
I agree with Craig.


Jan

-- 
Jan Wieck
Senior Software Engineer
http://slony.info



Re: jsonb format is pessimal for toast compression

From
Peter Geoghegan
Date:
On Tue, Sep 23, 2014 at 10:02 PM, Jan Wieck <jan@wi3ck.info> wrote:
>> Is it worth paying a byte per value to save on possible upgrade pain?
>>
>
> This comment seems to have drowned in the discussion.
>
> If there indeed has to be a catversion bump in the process of this, then I
> agree with Craig.

-1. We already have a reserved bit.


-- 
Peter Geoghegan



Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
Jan Wieck <jan@wi3ck.info> writes:
> On 09/15/2014 09:46 PM, Craig Ringer wrote:
>> Anyway - this is looking like the change will go in, and with it a
>> catversion bump. Introduction of a jsonb version/flags byte might be
>> worthwhile at the same time. It seems likely that there'll be more room
>> for improvement in jsonb, possibly even down to using different formats
>> for different data.
>> 
>> Is it worth paying a byte per value to save on possible upgrade pain?

> If there indeed has to be a catversion bump in the process of this, then 
> I agree with Craig.

FWIW, I don't really.  To begin with, it wouldn't be a byte per value,
it'd be four bytes, because we need word-alignment of the jsonb contents
so there's noplace to squeeze in an ID byte for free.  Secondly, as I
wrote in <15378.1408548595@sss.pgh.pa.us>:

: There remains the
: question of whether to take this opportunity to add a version ID to the
: binary format.  I'm not as excited about that idea as I originally was;
: having now studied the code more carefully, I think that any expansion
: would likely happen by adding more type codes and/or commandeering the
: currently-unused high-order bit of JEntrys.  We don't need a version ID
: in the header for that.  Moreover, if we did have such an ID, it would be
: notationally painful to get it to most of the places that might need it.

Heikki's patch would eat up the high-order JEntry bits, but the other
points remain.
        regards, tom lane



Re: jsonb format is pessimal for toast compression

From
Heikki Linnakangas
Date:
On 09/24/2014 08:16 AM, Tom Lane wrote:
> Jan Wieck <jan@wi3ck.info> writes:
>> On 09/15/2014 09:46 PM, Craig Ringer wrote:
>>> Anyway - this is looking like the change will go in, and with it a
>>> catversion bump. Introduction of a jsonb version/flags byte might be
>>> worthwhile at the same time. It seems likely that there'll be more room
>>> for improvement in jsonb, possibly even down to using different formats
>>> for different data.
>>>
>>> Is it worth paying a byte per value to save on possible upgrade pain?
>
>> If there indeed has to be a catversion bump in the process of this, then
>> I agree with Craig.
>
> FWIW, I don't really.  To begin with, it wouldn't be a byte per value,
> it'd be four bytes, because we need word-alignment of the jsonb contents
> so there's noplace to squeeze in an ID byte for free.  Secondly, as I
> wrote in <15378.1408548595@sss.pgh.pa.us>:
>
> : There remains the
> : question of whether to take this opportunity to add a version ID to the
> : binary format.  I'm not as excited about that idea as I originally was;
> : having now studied the code more carefully, I think that any expansion
> : would likely happen by adding more type codes and/or commandeering the
> : currently-unused high-order bit of JEntrys.  We don't need a version ID
> : in the header for that.  Moreover, if we did have such an ID, it would be
> : notationally painful to get it to most of the places that might need it.
>
> Heikki's patch would eat up the high-order JEntry bits, but the other
> points remain.

If we don't need to be backwards-compatible with the 9.4beta on-disk 
format, we don't necessarily need to eat the high-order JEntry bit. You 
can just assume that that every nth element is stored as an offset, and 
the rest as lengths. Although it would be nice to have the flag for it 
explicitly.

There are also a few free bits in the JsonbContainer header that can be 
used as a version ID in the future. So I don't think we need to change 
the format to add an explicit version ID field.

- Heikki




Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> On 09/24/2014 08:16 AM, Tom Lane wrote:
>> Heikki's patch would eat up the high-order JEntry bits, but the other
>> points remain.

> If we don't need to be backwards-compatible with the 9.4beta on-disk 
> format, we don't necessarily need to eat the high-order JEntry bit. You 
> can just assume that that every nth element is stored as an offset, and 
> the rest as lengths. Although it would be nice to have the flag for it 
> explicitly.

If we go with this approach, I think that we *should* eat the high bit
for it.  The main reason I want to do that is that it avoids having to
engrave the value of N on stone tablets.  I think that we should use
a pretty large value of N --- maybe 32 or so --- and having the freedom
to change it later based on experience seems like a good thing.
        regards, tom lane



Re: jsonb format is pessimal for toast compression

From
Andres Freund
Date:
On 2014-09-19 15:40:14 +0300, Heikki Linnakangas wrote:
> On 09/18/2014 09:27 PM, Heikki Linnakangas wrote:
> >I'll try to write a more polished patch tomorrow. We'll then see what it
> >looks like, and can decide if we want it.
> 
> Ok, here are two patches. One is a refined version of my earlier patch, and
> the other implements the separate offsets array approach. They are both
> based on Tom's jsonb-lengths-merged.patch, so they include all the
> whitespace fixes etc. he mentioned.
> 
> There is no big difference in terms of code complexity between the patches.
> IMHO the separate offsets array is easier to understand, but it makes for
> more complicated accessor macros to find the beginning of the
> variable-length data.

I personally am pretty clearly in favor of Heikki's version. I think it
could stand to slightly expand the reasoning behind the mixed
length/offset format; it's not immediately obvious why the offsets are
problematic for compression. Otherwise, based on a cursory look, it
looks good.

But independent of which version is chosen, we *REALLY* need to make the
decision soon. This issue has held up the next beta (like jsonb has
blocked previous beta) for *weeks*.

Personally it doesn't make me very happy that Heikki and Tom had to be
the people stepping up to fix this.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: jsonb format is pessimal for toast compression

From
Josh Berkus
Date:
On 09/25/2014 09:01 AM, Andres Freund wrote:
> But independent of which version is chosen, we *REALLY* need to make the
> decision soon. This issue has held up the next beta (like jsonb has
> blocked previous beta) for *weeks*.

Yes, please!

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: jsonb format is pessimal for toast compression

From
Bruce Momjian
Date:
On Thu, Sep 25, 2014 at 06:01:08PM +0200, Andres Freund wrote:
> But independent of which version is chosen, we *REALLY* need to make the
> decision soon. This issue has held up the next beta (like jsonb has
> blocked previous beta) for *weeks*.
> 
> Personally it doesn't make me very happy that Heikki and Tom had to be
> the people stepping up to fix this.

I think there are a few reasons this has been delayed, aside from the
scheduling ones:
1.  compression issues were a surprise, and we are wondering if    there are any other surprises2.  pg_upgrade makes
futuredata format changes problematic3.  9.3 multi-xact bugs spooked us into being more careful
 

I am not sure what we can do to increase our speed based on these items.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: jsonb format is pessimal for toast compression

From
Josh Berkus
Date:
On 09/25/2014 10:14 AM, Bruce Momjian wrote:
> On Thu, Sep 25, 2014 at 06:01:08PM +0200, Andres Freund wrote:
>> But independent of which version is chosen, we *REALLY* need to make the
>> decision soon. This issue has held up the next beta (like jsonb has
>> blocked previous beta) for *weeks*.
>>
>> Personally it doesn't make me very happy that Heikki and Tom had to be
>> the people stepping up to fix this.
> 
> I think there are a few reasons this has been delayed, aside from the
> scheduling ones:
> 
>     1.  compression issues were a surprise, and we are wondering if
>         there are any other surprises
>     2.  pg_upgrade makes future data format changes problematic
>     3.  9.3 multi-xact bugs spooked us into being more careful
> 
> I am not sure what we can do to increase our speed based on these items.

Alternately, this is delayed because:

1. We have one tested patch to fix the issue.

2. However, people are convinced that there's a better patch possible.

3. But nobody is working on this better patch except "in their spare time".

Given this, I once again vote for releasing based on Tom's lengths-only
patch, which is done, tested, and ready to go.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: jsonb format is pessimal for toast compression

From
Andres Freund
Date:
On 2014-09-25 10:18:24 -0700, Josh Berkus wrote:
> On 09/25/2014 10:14 AM, Bruce Momjian wrote:
> > On Thu, Sep 25, 2014 at 06:01:08PM +0200, Andres Freund wrote:
> >> But independent of which version is chosen, we *REALLY* need to make the
> >> decision soon. This issue has held up the next beta (like jsonb has
> >> blocked previous beta) for *weeks*.
> >>
> >> Personally it doesn't make me very happy that Heikki and Tom had to be
> >> the people stepping up to fix this.
> > 
> > I think there are a few reasons this has been delayed, aside from the
> > scheduling ones:
> > 
> >     1.  compression issues were a surprise, and we are wondering if
> >         there are any other surprises
> >     2.  pg_upgrade makes future data format changes problematic
> >     3.  9.3 multi-xact bugs spooked us into being more careful
> > 
> > I am not sure what we can do to increase our speed based on these items.
> 
> Alternately, this is delayed because:
> 
> 1. We have one tested patch to fix the issue.
> 
> 2. However, people are convinced that there's a better patch possible.
> 
> 3. But nobody is working on this better patch except "in their spare time".
> 
> Given this, I once again vote for releasing based on Tom's lengths-only
> patch, which is done, tested, and ready to go.

Heikki's patch is there and polished.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: jsonb format is pessimal for toast compression

From
Josh Berkus
Date:
On 09/25/2014 10:20 AM, Andres Freund wrote:
> On 2014-09-25 10:18:24 -0700, Josh Berkus wrote:
>> On 09/25/2014 10:14 AM, Bruce Momjian wrote:
>>> On Thu, Sep 25, 2014 at 06:01:08PM +0200, Andres Freund wrote:
>>>> But independent of which version is chosen, we *REALLY* need to make the
>>>> decision soon. This issue has held up the next beta (like jsonb has
>>>> blocked previous beta) for *weeks*.
>>>>
>>>> Personally it doesn't make me very happy that Heikki and Tom had to be
>>>> the people stepping up to fix this.
>>>
>>> I think there are a few reasons this has been delayed, aside from the
>>> scheduling ones:
>>>
>>>     1.  compression issues were a surprise, and we are wondering if
>>>         there are any other surprises
>>>     2.  pg_upgrade makes future data format changes problematic
>>>     3.  9.3 multi-xact bugs spooked us into being more careful
>>>
>>> I am not sure what we can do to increase our speed based on these items.
>>
>> Alternately, this is delayed because:
>>
>> 1. We have one tested patch to fix the issue.
>>
>> 2. However, people are convinced that there's a better patch possible.
>>
>> 3. But nobody is working on this better patch except "in their spare time".
>>
>> Given this, I once again vote for releasing based on Tom's lengths-only
>> patch, which is done, tested, and ready to go.
> 
> Heikki's patch is there and polished.

If Heikki says it's ready, I'll test.  So far he's said that it wasn't
done yet.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: jsonb format is pessimal for toast compression

From
Andres Freund
Date:
On 2014-09-25 10:25:24 -0700, Josh Berkus wrote:
> If Heikki says it's ready, I'll test.  So far he's said that it wasn't
> done yet.

http://www.postgresql.org/message-id/541C242E.3030004@vmware.com

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: jsonb format is pessimal for toast compression

From
Josh Berkus
Date:
On 09/25/2014 10:26 AM, Andres Freund wrote:
> On 2014-09-25 10:25:24 -0700, Josh Berkus wrote:
>> If Heikki says it's ready, I'll test.  So far he's said that it wasn't
>> done yet.
> 
> http://www.postgresql.org/message-id/541C242E.3030004@vmware.com

Yeah, and that didn't include some of Tom's bug fixes apparently, per
the succeeding message.  Which is why I asked Heikki if he was done, to
which he has not replied.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: jsonb format is pessimal for toast compression

From
Andres Freund
Date:
On 2014-09-25 10:29:51 -0700, Josh Berkus wrote:
> On 09/25/2014 10:26 AM, Andres Freund wrote:
> > On 2014-09-25 10:25:24 -0700, Josh Berkus wrote:
> >> If Heikki says it's ready, I'll test.  So far he's said that it wasn't
> >> done yet.
> > 
> > http://www.postgresql.org/message-id/541C242E.3030004@vmware.com
> 
> Yeah, and that didn't include some of Tom's bug fixes apparently, per
> the succeeding message.  Which is why I asked Heikki if he was done, to
> which he has not replied.

Well, Heikki said he doesn't see any fixes in Tom's patch. But either
way, this isn't anything that should prevent you from testing.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> On 09/25/2014 10:26 AM, Andres Freund wrote:
>> On 2014-09-25 10:25:24 -0700, Josh Berkus wrote:
>>> If Heikki says it's ready, I'll test.  So far he's said that it wasn't
>>> done yet.

>> http://www.postgresql.org/message-id/541C242E.3030004@vmware.com

> Yeah, and that didn't include some of Tom's bug fixes apparently, per
> the succeeding message.  Which is why I asked Heikki if he was done, to
> which he has not replied.

I took a quick look at the two patches Heikki posted.  I find the
"separate offsets array" approach unappealing.  It takes more space
than the other approaches, and that space will be filled with data
that we already know will not be at all compressible.  Moreover,
AFAICS we'd have to engrave the stride on stone tablets, which as
I already mentioned I'd really like to not do.

The "offsets-and-lengths" patch seems like the approach we ought to
compare to my patch, but it looks pretty unfinished to me: AFAICS it
includes logic to understand offsets sprinkled into a mostly-lengths
array, but no logic that would actually *store* any such offsets,
which means it's going to act just like my patch for performance
purposes.

In the interests of pushing this forward, I will work today on
trying to finish and review Heikki's offsets-and-lengths patch
so that we have something we can do performance testing on.
I doubt that the performance testing will tell us anything we
don't expect, but we should do it anyway.
        regards, tom lane



Re: jsonb format is pessimal for toast compression

From
Josh Berkus
Date:
On 09/25/2014 11:22 AM, Tom Lane wrote:
> In the interests of pushing this forward, I will work today on
> trying to finish and review Heikki's offsets-and-lengths patch
> so that we have something we can do performance testing on.
> I doubt that the performance testing will tell us anything we
> don't expect, but we should do it anyway.

OK.  I'll spend some time trying to get Socorro with JSONB working so
that I'll have a second test case.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
BTW, it seems like there is consensus that we ought to reorder the items
in a jsonb object to have keys first and then values, independently of the
other issues under discussion.  This means we *will* be breaking on-disk
compatibility with 9.4beta2, which means pg_upgrade will need to be taught
to refuse an upgrade if the database contains any jsonb columns.  Bruce,
do you have time to crank out a patch for that?
        regards, tom lane



Re: jsonb format is pessimal for toast compression

From
Bruce Momjian
Date:
On Thu, Sep 25, 2014 at 02:39:37PM -0400, Tom Lane wrote:
> BTW, it seems like there is consensus that we ought to reorder the items
> in a jsonb object to have keys first and then values, independently of the
> other issues under discussion.  This means we *will* be breaking on-disk
> compatibility with 9.4beta2, which means pg_upgrade will need to be taught
> to refuse an upgrade if the database contains any jsonb columns.  Bruce,
> do you have time to crank out a patch for that?

Yes, I can do that easily.  Tell me when you want it --- I just need a
catalog version number to trigger on.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: jsonb format is pessimal for toast compression

From
Andres Freund
Date:
On 2014-09-25 14:46:18 -0400, Bruce Momjian wrote:
> On Thu, Sep 25, 2014 at 02:39:37PM -0400, Tom Lane wrote:
> > BTW, it seems like there is consensus that we ought to reorder the items
> > in a jsonb object to have keys first and then values, independently of the
> > other issues under discussion.  This means we *will* be breaking on-disk
> > compatibility with 9.4beta2, which means pg_upgrade will need to be taught
> > to refuse an upgrade if the database contains any jsonb columns.  Bruce,
> > do you have time to crank out a patch for that?
> 
> Yes, I can do that easily.  Tell me when you want it --- I just need a
> catalog version number to trigger on.

Do you plan to make it conditional on jsonb being used in the database?
That'd not be bad to reduce the pain for testers that haven't used jsonb.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: jsonb format is pessimal for toast compression

From
Bruce Momjian
Date:
On Thu, Sep 25, 2014 at 09:00:07PM +0200, Andres Freund wrote:
> On 2014-09-25 14:46:18 -0400, Bruce Momjian wrote:
> > On Thu, Sep 25, 2014 at 02:39:37PM -0400, Tom Lane wrote:
> > > BTW, it seems like there is consensus that we ought to reorder the items
> > > in a jsonb object to have keys first and then values, independently of the
> > > other issues under discussion.  This means we *will* be breaking on-disk
> > > compatibility with 9.4beta2, which means pg_upgrade will need to be taught
> > > to refuse an upgrade if the database contains any jsonb columns.  Bruce,
> > > do you have time to crank out a patch for that?
> > 
> > Yes, I can do that easily.  Tell me when you want it --- I just need a
> > catalog version number to trigger on.
> 
> Do you plan to make it conditional on jsonb being used in the database?
> That'd not be bad to reduce the pain for testers that haven't used jsonb.

Yes, I already have code that scans pg_attribute looking for columns
with problematic data types and output them to a file, and then throw an
error.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: jsonb format is pessimal for toast compression

From
Alvaro Herrera
Date:
Bruce Momjian wrote:

>     3.  9.3 multi-xact bugs spooked us into being more careful

Uh.  Multixact changes in 9.3 were infinitely more invasive than the
jsonb changes will ever be.  a) they touched basic visibility design and routines,
which are complex, understood by very few people, and have remained
mostly unchanged for ages; b) they changed on-disk format for an
underlying support structure, requiring pg_upgrade to handle the
conversion; c) they added new catalog infrastructure to keep track of
required freezing; d) they introduced new uint32 counters subject to
wraparound; e) they introduced a novel user of slru.c with 5-char long
filenames; f) they messed with tuple locking protocol and EvalPlanQual
logic for traversing update chains.  Maybe I'm forgetting others.

JSONB has none of these properties.  As far as I can see, the only hairy
issue here (other than getting Josh Berkus to actually test the proposed
patches) is that JSONB is changing on-disk format; but we're avoiding
most issues there by dictating that people with existing JSONB databases
need to pg_dump them, i.e. there is no conversion step being written for
pg_upgrade.

It's good to be careful; it's even better to be more careful.  I too
have learned a lesson there.

Anyway I have no opinion on the JSONB stuff, other than considering that
ignoring performance for large arrays and large objects seems to run
counter to the whole point of JSONB in the first place (and of course
failing to compress is part of that, too.)

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
I wrote:
> The "offsets-and-lengths" patch seems like the approach we ought to
> compare to my patch, but it looks pretty unfinished to me: AFAICS it
> includes logic to understand offsets sprinkled into a mostly-lengths
> array, but no logic that would actually *store* any such offsets,
> which means it's going to act just like my patch for performance
> purposes.

> In the interests of pushing this forward, I will work today on
> trying to finish and review Heikki's offsets-and-lengths patch
> so that we have something we can do performance testing on.
> I doubt that the performance testing will tell us anything we
> don't expect, but we should do it anyway.

I've now done that, and attached is what I think would be a committable
version.  Having done this work, I no longer think that this approach
is significantly messier code-wise than the all-lengths version, and
it does have the merit of not degrading on very large objects/arrays.
So at the moment I'm leaning to this solution not the all-lengths one.

To get a sense of the compression effects of varying the stride distance,
I repeated the compression measurements I'd done on 14 August with Pavel's
geometry data (<24077.1408052877@sss.pgh.pa.us>).  The upshot of that was

                    min    max    avg

external text representation        220    172685    880.3
JSON representation (compressed text)    224    78565    541.3
pg_column_size, JSONB HEAD repr.    225    82540    639.0
pg_column_size, all-lengths repr.    225    66794    531.1

Here's what I get with this patch and different stride distances:

JB_OFFSET_STRIDE = 8            225    68551    559.7
JB_OFFSET_STRIDE = 16            225    67601    552.3
JB_OFFSET_STRIDE = 32            225    67120    547.4
JB_OFFSET_STRIDE = 64            225    66886    546.9
JB_OFFSET_STRIDE = 128            225    66879    546.9
JB_OFFSET_STRIDE = 256            225    66846    546.8

So at least for that test data, 32 seems like the sweet spot.
We are giving up a couple percent of space in comparison to the
all-lengths version, but this is probably an acceptable tradeoff
for not degrading on very large arrays.

I've not done any speed testing.

            regards, tom lane

diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index 2fd87fc..9beebb3 100644
*** a/src/backend/utils/adt/jsonb.c
--- b/src/backend/utils/adt/jsonb.c
*************** jsonb_from_cstring(char *json, int len)
*** 196,207 ****
  static size_t
  checkStringLen(size_t len)
  {
!     if (len > JENTRY_POSMASK)
          ereport(ERROR,
                  (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
                   errmsg("string too long to represent as jsonb string"),
                   errdetail("Due to an implementation restriction, jsonb strings cannot exceed %d bytes.",
!                            JENTRY_POSMASK)));

      return len;
  }
--- 196,207 ----
  static size_t
  checkStringLen(size_t len)
  {
!     if (len > JENTRY_OFFLENMASK)
          ereport(ERROR,
                  (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
                   errmsg("string too long to represent as jsonb string"),
                   errdetail("Due to an implementation restriction, jsonb strings cannot exceed %d bytes.",
!                            JENTRY_OFFLENMASK)));

      return len;
  }
diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index 04f35bf..f157df3 100644
*** a/src/backend/utils/adt/jsonb_util.c
--- b/src/backend/utils/adt/jsonb_util.c
***************
*** 26,40 ****
   * in MaxAllocSize, and the number of elements (or pairs) must fit in the bits
   * reserved for that in the JsonbContainer.header field.
   *
!  * (the total size of an array's elements is also limited by JENTRY_POSMASK,
!  * but we're not concerned about that here)
   */
  #define JSONB_MAX_ELEMS (Min(MaxAllocSize / sizeof(JsonbValue), JB_CMASK))
  #define JSONB_MAX_PAIRS (Min(MaxAllocSize / sizeof(JsonbPair), JB_CMASK))

! static void fillJsonbValue(JEntry *array, int index, char *base_addr,
                 JsonbValue *result);
! static bool    equalsJsonbScalarValue(JsonbValue *a, JsonbValue *b);
  static int    compareJsonbScalarValue(JsonbValue *a, JsonbValue *b);
  static Jsonb *convertToJsonb(JsonbValue *val);
  static void convertJsonbValue(StringInfo buffer, JEntry *header, JsonbValue *val, int level);
--- 26,41 ----
   * in MaxAllocSize, and the number of elements (or pairs) must fit in the bits
   * reserved for that in the JsonbContainer.header field.
   *
!  * (The total size of an array's or object's elements is also limited by
!  * JENTRY_OFFLENMASK, but we're not concerned about that here.)
   */
  #define JSONB_MAX_ELEMS (Min(MaxAllocSize / sizeof(JsonbValue), JB_CMASK))
  #define JSONB_MAX_PAIRS (Min(MaxAllocSize / sizeof(JsonbPair), JB_CMASK))

! static void fillJsonbValue(JsonbContainer *container, int index,
!                char *base_addr, uint32 offset,
                 JsonbValue *result);
! static bool equalsJsonbScalarValue(JsonbValue *a, JsonbValue *b);
  static int    compareJsonbScalarValue(JsonbValue *a, JsonbValue *b);
  static Jsonb *convertToJsonb(JsonbValue *val);
  static void convertJsonbValue(StringInfo buffer, JEntry *header, JsonbValue *val, int level);
*************** static void convertJsonbArray(StringInfo
*** 42,48 ****
  static void convertJsonbObject(StringInfo buffer, JEntry *header, JsonbValue *val, int level);
  static void convertJsonbScalar(StringInfo buffer, JEntry *header, JsonbValue *scalarVal);

! static int reserveFromBuffer(StringInfo buffer, int len);
  static void appendToBuffer(StringInfo buffer, const char *data, int len);
  static void copyToBuffer(StringInfo buffer, int offset, const char *data, int len);
  static short padBufferToInt(StringInfo buffer);
--- 43,49 ----
  static void convertJsonbObject(StringInfo buffer, JEntry *header, JsonbValue *val, int level);
  static void convertJsonbScalar(StringInfo buffer, JEntry *header, JsonbValue *scalarVal);

! static int    reserveFromBuffer(StringInfo buffer, int len);
  static void appendToBuffer(StringInfo buffer, const char *data, int len);
  static void copyToBuffer(StringInfo buffer, int offset, const char *data, int len);
  static short padBufferToInt(StringInfo buffer);
*************** JsonbValueToJsonb(JsonbValue *val)
*** 108,113 ****
--- 109,166 ----
  }

  /*
+  * Get the offset of the variable-length portion of a Jsonb node within
+  * the variable-length-data part of its container.  The node is identified
+  * by index within the container's JEntry array.
+  */
+ uint32
+ getJsonbOffset(const JsonbContainer *jc, int index)
+ {
+     uint32        offset = 0;
+     int            i;
+
+     /*
+      * Start offset of this entry is equal to the end offset of the previous
+      * entry.  Walk backwards to the most recent entry stored as an end
+      * offset, returning that offset plus any lengths in between.
+      */
+     for (i = index - 1; i >= 0; i--)
+     {
+         offset += JBE_OFFLENFLD(jc->children[i]);
+         if (JBE_HAS_OFF(jc->children[i]))
+             break;
+     }
+
+     return offset;
+ }
+
+ /*
+  * Get the length of the variable-length portion of a Jsonb node.
+  * The node is identified by index within the container's JEntry array.
+  */
+ uint32
+ getJsonbLength(const JsonbContainer *jc, int index)
+ {
+     uint32        off;
+     uint32        len;
+
+     /*
+      * If the length is stored directly in the JEntry, just return it.
+      * Otherwise, get the begin offset of the entry, and subtract that from
+      * the stored end+1 offset.
+      */
+     if (JBE_HAS_OFF(jc->children[index]))
+     {
+         off = getJsonbOffset(jc, index);
+         len = JBE_OFFLENFLD(jc->children[index]) - off;
+     }
+     else
+         len = JBE_OFFLENFLD(jc->children[index]);
+
+     return len;
+ }
+
+ /*
   * BT comparator worker function.  Returns an integer less than, equal to, or
   * greater than zero, indicating whether a is less than, equal to, or greater
   * than b.  Consistent with the requirements for a B-Tree operator class
*************** compareJsonbContainers(JsonbContainer *a
*** 201,207 ****
               *
               * If the two values were of the same container type, then there'd
               * have been a chance to observe the variation in the number of
!              * elements/pairs (when processing WJB_BEGIN_OBJECT, say).  They're
               * either two heterogeneously-typed containers, or a container and
               * some scalar type.
               *
--- 254,260 ----
               *
               * If the two values were of the same container type, then there'd
               * have been a chance to observe the variation in the number of
!              * elements/pairs (when processing WJB_BEGIN_OBJECT, say). They're
               * either two heterogeneously-typed containers, or a container and
               * some scalar type.
               *
*************** findJsonbValueFromContainer(JsonbContain
*** 272,295 ****
  {
      JEntry       *children = container->children;
      int            count = (container->header & JB_CMASK);
!     JsonbValue *result = palloc(sizeof(JsonbValue));

      Assert((flags & ~(JB_FARRAY | JB_FOBJECT)) == 0);

      if (flags & JB_FARRAY & container->header)
      {
          char       *base_addr = (char *) (children + count);
          int            i;

          for (i = 0; i < count; i++)
          {
!             fillJsonbValue(children, i, base_addr, result);

              if (key->type == result->type)
              {
                  if (equalsJsonbScalarValue(key, result))
                      return result;
              }
          }
      }
      else if (flags & JB_FOBJECT & container->header)
--- 325,357 ----
  {
      JEntry       *children = container->children;
      int            count = (container->header & JB_CMASK);
!     JsonbValue *result;

      Assert((flags & ~(JB_FARRAY | JB_FOBJECT)) == 0);

+     /* Quick out without a palloc cycle if object/array is empty */
+     if (count <= 0)
+         return NULL;
+
+     result = palloc(sizeof(JsonbValue));
+
      if (flags & JB_FARRAY & container->header)
      {
          char       *base_addr = (char *) (children + count);
+         uint32        offset = 0;
          int            i;

          for (i = 0; i < count; i++)
          {
!             fillJsonbValue(container, i, base_addr, offset, result);

              if (key->type == result->type)
              {
                  if (equalsJsonbScalarValue(key, result))
                      return result;
              }
+
+             JBE_ADVANCE_OFFSET(offset, children[i]);
          }
      }
      else if (flags & JB_FOBJECT & container->header)
*************** findJsonbValueFromContainer(JsonbContain
*** 297,332 ****
          /* Since this is an object, account for *Pairs* of Jentrys */
          char       *base_addr = (char *) (children + count * 2);
          uint32        stopLow = 0,
!                     stopMiddle;

!         /* Object key past by caller must be a string */
          Assert(key->type == jbvString);

          /* Binary search on object/pair keys *only* */
!         while (stopLow < count)
          {
!             int            index;
              int            difference;
              JsonbValue    candidate;

!             /*
!              * Note how we compensate for the fact that we're iterating
!              * through pairs (not entries) throughout.
!              */
!             stopMiddle = stopLow + (count - stopLow) / 2;
!
!             index = stopMiddle * 2;

              candidate.type = jbvString;
!             candidate.val.string.val = base_addr + JBE_OFF(children, index);
!             candidate.val.string.len = JBE_LEN(children, index);

              difference = lengthCompareJsonbStringValue(&candidate, key);

              if (difference == 0)
              {
!                 /* Found our key, return value */
!                 fillJsonbValue(children, index + 1, base_addr, result);

                  return result;
              }
--- 359,393 ----
          /* Since this is an object, account for *Pairs* of Jentrys */
          char       *base_addr = (char *) (children + count * 2);
          uint32        stopLow = 0,
!                     stopHigh = count;

!         /* Object key passed by caller must be a string */
          Assert(key->type == jbvString);

          /* Binary search on object/pair keys *only* */
!         while (stopLow < stopHigh)
          {
!             uint32        stopMiddle;
              int            difference;
              JsonbValue    candidate;

!             stopMiddle = stopLow + (stopHigh - stopLow) / 2;

              candidate.type = jbvString;
!             candidate.val.string.val =
!                 base_addr + getJsonbOffset(container, stopMiddle);
!             candidate.val.string.len = getJsonbLength(container, stopMiddle);

              difference = lengthCompareJsonbStringValue(&candidate, key);

              if (difference == 0)
              {
!                 /* Found our key, return corresponding value */
!                 int            index = stopMiddle + count;
!
!                 fillJsonbValue(container, index, base_addr,
!                                getJsonbOffset(container, index),
!                                result);

                  return result;
              }
*************** findJsonbValueFromContainer(JsonbContain
*** 335,341 ****
                  if (difference < 0)
                      stopLow = stopMiddle + 1;
                  else
!                     count = stopMiddle;
              }
          }
      }
--- 396,402 ----
                  if (difference < 0)
                      stopLow = stopMiddle + 1;
                  else
!                     stopHigh = stopMiddle;
              }
          }
      }
*************** getIthJsonbValueFromContainer(JsonbConta
*** 368,374 ****

      result = palloc(sizeof(JsonbValue));

!     fillJsonbValue(container->children, i, base_addr, result);

      return result;
  }
--- 429,437 ----

      result = palloc(sizeof(JsonbValue));

!     fillJsonbValue(container, i, base_addr,
!                    getJsonbOffset(container, i),
!                    result);

      return result;
  }
*************** getIthJsonbValueFromContainer(JsonbConta
*** 377,389 ****
   * A helper function to fill in a JsonbValue to represent an element of an
   * array, or a key or value of an object.
   *
   * A nested array or object will be returned as jbvBinary, ie. it won't be
   * expanded.
   */
  static void
! fillJsonbValue(JEntry *children, int index, char *base_addr, JsonbValue *result)
  {
!     JEntry        entry = children[index];

      if (JBE_ISNULL(entry))
      {
--- 440,459 ----
   * A helper function to fill in a JsonbValue to represent an element of an
   * array, or a key or value of an object.
   *
+  * The node's JEntry is at container->children[index], and its variable-length
+  * data is at base_addr + offset.  We make the caller determine the offset
+  * since in many cases the caller can amortize that work across multiple
+  * children.  When it can't, it can just call getJsonbOffset().
+  *
   * A nested array or object will be returned as jbvBinary, ie. it won't be
   * expanded.
   */
  static void
! fillJsonbValue(JsonbContainer *container, int index,
!                char *base_addr, uint32 offset,
!                JsonbValue *result)
  {
!     JEntry        entry = container->children[index];

      if (JBE_ISNULL(entry))
      {
*************** fillJsonbValue(JEntry *children, int ind
*** 392,405 ****
      else if (JBE_ISSTRING(entry))
      {
          result->type = jbvString;
!         result->val.string.val = base_addr + JBE_OFF(children, index);
!         result->val.string.len = JBE_LEN(children, index);
          Assert(result->val.string.len >= 0);
      }
      else if (JBE_ISNUMERIC(entry))
      {
          result->type = jbvNumeric;
!         result->val.numeric = (Numeric) (base_addr + INTALIGN(JBE_OFF(children, index)));
      }
      else if (JBE_ISBOOL_TRUE(entry))
      {
--- 462,475 ----
      else if (JBE_ISSTRING(entry))
      {
          result->type = jbvString;
!         result->val.string.val = base_addr + offset;
!         result->val.string.len = getJsonbLength(container, index);
          Assert(result->val.string.len >= 0);
      }
      else if (JBE_ISNUMERIC(entry))
      {
          result->type = jbvNumeric;
!         result->val.numeric = (Numeric) (base_addr + INTALIGN(offset));
      }
      else if (JBE_ISBOOL_TRUE(entry))
      {
*************** fillJsonbValue(JEntry *children, int ind
*** 415,422 ****
      {
          Assert(JBE_ISCONTAINER(entry));
          result->type = jbvBinary;
!         result->val.binary.data = (JsonbContainer *) (base_addr + INTALIGN(JBE_OFF(children, index)));
!         result->val.binary.len = JBE_LEN(children, index) - (INTALIGN(JBE_OFF(children, index)) - JBE_OFF(children,
index));
      }
  }

--- 485,494 ----
      {
          Assert(JBE_ISCONTAINER(entry));
          result->type = jbvBinary;
!         /* Remove alignment padding from data pointer and length */
!         result->val.binary.data = (JsonbContainer *) (base_addr + INTALIGN(offset));
!         result->val.binary.len = getJsonbLength(container, index) -
!             (INTALIGN(offset) - offset);
      }
  }

*************** recurse:
*** 668,680 ****
               * a full conversion
               */
              val->val.array.rawScalar = (*it)->isScalar;
!             (*it)->i = 0;
              /* Set state for next call */
              (*it)->state = JBI_ARRAY_ELEM;
              return WJB_BEGIN_ARRAY;

          case JBI_ARRAY_ELEM:
!             if ((*it)->i >= (*it)->nElems)
              {
                  /*
                   * All elements within array already processed.  Report this
--- 740,754 ----
               * a full conversion
               */
              val->val.array.rawScalar = (*it)->isScalar;
!             (*it)->curIndex = 0;
!             (*it)->curDataOffset = 0;
!             (*it)->curValueOffset = 0;    /* not actually used */
              /* Set state for next call */
              (*it)->state = JBI_ARRAY_ELEM;
              return WJB_BEGIN_ARRAY;

          case JBI_ARRAY_ELEM:
!             if ((*it)->curIndex >= (*it)->nElems)
              {
                  /*
                   * All elements within array already processed.  Report this
*************** recurse:
*** 686,692 ****
                  return WJB_END_ARRAY;
              }

!             fillJsonbValue((*it)->children, (*it)->i++, (*it)->dataProper, val);

              if (!IsAJsonbScalar(val) && !skipNested)
              {
--- 760,772 ----
                  return WJB_END_ARRAY;
              }

!             fillJsonbValue((*it)->container, (*it)->curIndex,
!                            (*it)->dataProper, (*it)->curDataOffset,
!                            val);
!
!             JBE_ADVANCE_OFFSET((*it)->curDataOffset,
!                                (*it)->children[(*it)->curIndex]);
!             (*it)->curIndex++;

              if (!IsAJsonbScalar(val) && !skipNested)
              {
*************** recurse:
*** 697,704 ****
              else
              {
                  /*
!                  * Scalar item in array, or a container and caller didn't
!                  * want us to recurse into it.
                   */
                  return WJB_ELEM;
              }
--- 777,784 ----
              else
              {
                  /*
!                  * Scalar item in array, or a container and caller didn't want
!                  * us to recurse into it.
                   */
                  return WJB_ELEM;
              }
*************** recurse:
*** 712,724 ****
               * v->val.object.pairs is not actually set, because we aren't
               * doing a full conversion
               */
!             (*it)->i = 0;
              /* Set state for next call */
              (*it)->state = JBI_OBJECT_KEY;
              return WJB_BEGIN_OBJECT;

          case JBI_OBJECT_KEY:
!             if ((*it)->i >= (*it)->nElems)
              {
                  /*
                   * All pairs within object already processed.  Report this to
--- 792,807 ----
               * v->val.object.pairs is not actually set, because we aren't
               * doing a full conversion
               */
!             (*it)->curIndex = 0;
!             (*it)->curDataOffset = 0;
!             (*it)->curValueOffset = getJsonbOffset((*it)->container,
!                                                    (*it)->nElems);
              /* Set state for next call */
              (*it)->state = JBI_OBJECT_KEY;
              return WJB_BEGIN_OBJECT;

          case JBI_OBJECT_KEY:
!             if ((*it)->curIndex >= (*it)->nElems)
              {
                  /*
                   * All pairs within object already processed.  Report this to
*************** recurse:
*** 732,738 ****
              else
              {
                  /* Return key of a key/value pair.  */
!                 fillJsonbValue((*it)->children, (*it)->i * 2, (*it)->dataProper, val);
                  if (val->type != jbvString)
                      elog(ERROR, "unexpected jsonb type as object key");

--- 815,823 ----
              else
              {
                  /* Return key of a key/value pair.  */
!                 fillJsonbValue((*it)->container, (*it)->curIndex,
!                                (*it)->dataProper, (*it)->curDataOffset,
!                                val);
                  if (val->type != jbvString)
                      elog(ERROR, "unexpected jsonb type as object key");

*************** recurse:
*** 745,752 ****
              /* Set state for next call */
              (*it)->state = JBI_OBJECT_KEY;

!             fillJsonbValue((*it)->children, ((*it)->i++) * 2 + 1,
!                            (*it)->dataProper, val);

              /*
               * Value may be a container, in which case we recurse with new,
--- 830,844 ----
              /* Set state for next call */
              (*it)->state = JBI_OBJECT_KEY;

!             fillJsonbValue((*it)->container, (*it)->curIndex + (*it)->nElems,
!                            (*it)->dataProper, (*it)->curValueOffset,
!                            val);
!
!             JBE_ADVANCE_OFFSET((*it)->curDataOffset,
!                                (*it)->children[(*it)->curIndex]);
!             JBE_ADVANCE_OFFSET((*it)->curValueOffset,
!                            (*it)->children[(*it)->curIndex + (*it)->nElems]);
!             (*it)->curIndex++;

              /*
               * Value may be a container, in which case we recurse with new,
*************** iteratorFromContainer(JsonbContainer *co
*** 795,805 ****
              break;

          case JB_FOBJECT:
-
-             /*
-              * Offset reflects that nElems indicates JsonbPairs in an object.
-              * Each key and each value contain Jentry metadata just the same.
-              */
              it->dataProper =
                  (char *) it->children + it->nElems * sizeof(JEntry) * 2;
              it->state = JBI_OBJECT_START;
--- 887,892 ----
*************** reserveFromBuffer(StringInfo buffer, int
*** 1209,1216 ****
      buffer->len += len;

      /*
!      * Keep a trailing null in place, even though it's not useful for us;
!      * it seems best to preserve the invariants of StringInfos.
       */
      buffer->data[buffer->len] = '\0';

--- 1296,1303 ----
      buffer->len += len;

      /*
!      * Keep a trailing null in place, even though it's not useful for us; it
!      * seems best to preserve the invariants of StringInfos.
       */
      buffer->data[buffer->len] = '\0';

*************** convertToJsonb(JsonbValue *val)
*** 1284,1291 ****

      /*
       * Note: the JEntry of the root is discarded. Therefore the root
!      * JsonbContainer struct must contain enough information to tell what
!      * kind of value it is.
       */

      res = (Jsonb *) buffer.data;
--- 1371,1378 ----

      /*
       * Note: the JEntry of the root is discarded. Therefore the root
!      * JsonbContainer struct must contain enough information to tell what kind
!      * of value it is.
       */

      res = (Jsonb *) buffer.data;
*************** convertToJsonb(JsonbValue *val)
*** 1298,1307 ****
  /*
   * Subroutine of convertJsonb: serialize a single JsonbValue into buffer.
   *
!  * The JEntry header for this node is returned in *header. It is filled in
!  * with the length of this value, but if it is stored in an array or an
!  * object (which is always, except for the root node), it is the caller's
!  * responsibility to adjust it with the offset within the container.
   *
   * If the value is an array or an object, this recurses. 'level' is only used
   * for debugging purposes.
--- 1385,1394 ----
  /*
   * Subroutine of convertJsonb: serialize a single JsonbValue into buffer.
   *
!  * The JEntry header for this node is returned in *header.  It is filled in
!  * with the length of this value and appropriate type bits.  If we wish to
!  * store an end offset rather than a length, it is the caller's responsibility
!  * to adjust for that.
   *
   * If the value is an array or an object, this recurses. 'level' is only used
   * for debugging purposes.
*************** convertJsonbValue(StringInfo buffer, JEn
*** 1315,1324 ****
          return;

      /*
!      * A JsonbValue passed as val should never have a type of jbvBinary,
!      * and neither should any of its sub-components. Those values will be
!      * produced by convertJsonbArray and convertJsonbObject, the results of
!      * which will not be passed back to this function as an argument.
       */

      if (IsAJsonbScalar(val))
--- 1402,1411 ----
          return;

      /*
!      * A JsonbValue passed as val should never have a type of jbvBinary, and
!      * neither should any of its sub-components. Those values will be produced
!      * by convertJsonbArray and convertJsonbObject, the results of which will
!      * not be passed back to this function as an argument.
       */

      if (IsAJsonbScalar(val))
*************** convertJsonbValue(StringInfo buffer, JEn
*** 1334,1457 ****
  static void
  convertJsonbArray(StringInfo buffer, JEntry *pheader, JsonbValue *val, int level)
  {
!     int            offset;
!     int            metaoffset;
      int            i;
      int            totallen;
      uint32        header;

!     /* Initialize pointer into conversion buffer at this level */
!     offset = buffer->len;

      padBufferToInt(buffer);

      /*
!      * Construct the header Jentry, stored in the beginning of the variable-
!      * length payload.
       */
!     header = val->val.array.nElems | JB_FARRAY;
      if (val->val.array.rawScalar)
      {
!         Assert(val->val.array.nElems == 1);
          Assert(level == 0);
          header |= JB_FSCALAR;
      }

      appendToBuffer(buffer, (char *) &header, sizeof(uint32));
!     /* reserve space for the JEntries of the elements. */
!     metaoffset = reserveFromBuffer(buffer, sizeof(JEntry) * val->val.array.nElems);

      totallen = 0;
!     for (i = 0; i < val->val.array.nElems; i++)
      {
          JsonbValue *elem = &val->val.array.elems[i];
          int            len;
          JEntry        meta;

          convertJsonbValue(buffer, &meta, elem, level + 1);
!         len = meta & JENTRY_POSMASK;
          totallen += len;

!         if (totallen > JENTRY_POSMASK)
              ereport(ERROR,
                      (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
                       errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
!                             JENTRY_POSMASK)));

!         if (i > 0)
!             meta = (meta & ~JENTRY_POSMASK) | totallen;
!         copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
!         metaoffset += sizeof(JEntry);
      }

!     totallen = buffer->len - offset;

!     /* Initialize the header of this node, in the container's JEntry array */
      *pheader = JENTRY_ISCONTAINER | totallen;
  }

  static void
  convertJsonbObject(StringInfo buffer, JEntry *pheader, JsonbValue *val, int level)
  {
!     uint32        header;
!     int            offset;
!     int            metaoffset;
      int            i;
      int            totallen;

!     /* Initialize pointer into conversion buffer at this level */
!     offset = buffer->len;

      padBufferToInt(buffer);

!     /* Initialize header */
!     header = val->val.object.nPairs | JB_FOBJECT;
      appendToBuffer(buffer, (char *) &header, sizeof(uint32));

!     /* reserve space for the JEntries of the keys and values */
!     metaoffset = reserveFromBuffer(buffer, sizeof(JEntry) * val->val.object.nPairs * 2);

      totallen = 0;
!     for (i = 0; i < val->val.object.nPairs; i++)
      {
!         JsonbPair *pair = &val->val.object.pairs[i];
!         int len;
!         JEntry meta;

!         /* put key */
          convertJsonbScalar(buffer, &meta, &pair->key);

!         len = meta & JENTRY_POSMASK;
          totallen += len;

!         if (totallen > JENTRY_POSMASK)
              ereport(ERROR,
                      (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
!                      errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
!                             JENTRY_POSMASK)));

!         if (i > 0)
!             meta = (meta & ~JENTRY_POSMASK) | totallen;
!         copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
!         metaoffset += sizeof(JEntry);

!         convertJsonbValue(buffer, &meta, &pair->value, level);
!         len = meta & JENTRY_POSMASK;
          totallen += len;

!         if (totallen > JENTRY_POSMASK)
              ereport(ERROR,
                      (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
!                      errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
!                             JENTRY_POSMASK)));

!         meta = (meta & ~JENTRY_POSMASK) | totallen;
!         copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
!         metaoffset += sizeof(JEntry);
      }

!     totallen = buffer->len - offset;

      *pheader = JENTRY_ISCONTAINER | totallen;
  }

--- 1421,1620 ----
  static void
  convertJsonbArray(StringInfo buffer, JEntry *pheader, JsonbValue *val, int level)
  {
!     int            base_offset;
!     int            jentry_offset;
      int            i;
      int            totallen;
      uint32        header;
+     int            nElems = val->val.array.nElems;

!     /* Remember where in the buffer this array starts. */
!     base_offset = buffer->len;

+     /* Align to 4-byte boundary (any padding counts as part of my data) */
      padBufferToInt(buffer);

      /*
!      * Construct the header Jentry and store it in the beginning of the
!      * variable-length payload.
       */
!     header = nElems | JB_FARRAY;
      if (val->val.array.rawScalar)
      {
!         Assert(nElems == 1);
          Assert(level == 0);
          header |= JB_FSCALAR;
      }

      appendToBuffer(buffer, (char *) &header, sizeof(uint32));
!
!     /* Reserve space for the JEntries of the elements. */
!     jentry_offset = reserveFromBuffer(buffer, sizeof(JEntry) * nElems);

      totallen = 0;
!     for (i = 0; i < nElems; i++)
      {
          JsonbValue *elem = &val->val.array.elems[i];
          int            len;
          JEntry        meta;

+         /*
+          * Convert element, producing a JEntry and appending its
+          * variable-length data to buffer
+          */
          convertJsonbValue(buffer, &meta, elem, level + 1);
!
!         len = JBE_OFFLENFLD(meta);
          totallen += len;

!         /*
!          * Bail out if total variable-length data exceeds what will fit in a
!          * JEntry length field.  We check this in each iteration, not just
!          * once at the end, to forestall possible integer overflow.
!          */
!         if (totallen > JENTRY_OFFLENMASK)
              ereport(ERROR,
                      (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
                       errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
!                             JENTRY_OFFLENMASK)));

!         /*
!          * Convert each JB_OFFSET_STRIDE'th length to an offset.
!          */
!         if ((i % JB_OFFSET_STRIDE) == 0)
!             meta = (meta & JENTRY_TYPEMASK) | totallen | JENTRY_HAS_OFF;
!
!         copyToBuffer(buffer, jentry_offset, (char *) &meta, sizeof(JEntry));
!         jentry_offset += sizeof(JEntry);
      }

!     /* Total data size is everything we've appended to buffer */
!     totallen = buffer->len - base_offset;

!     /* Check length again, since we didn't include the metadata above */
!     if (totallen > JENTRY_OFFLENMASK)
!         ereport(ERROR,
!                 (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
!                  errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
!                         JENTRY_OFFLENMASK)));
!
!     /* Initialize the header of this node in the container's JEntry array */
      *pheader = JENTRY_ISCONTAINER | totallen;
  }

  static void
  convertJsonbObject(StringInfo buffer, JEntry *pheader, JsonbValue *val, int level)
  {
!     int            base_offset;
!     int            jentry_offset;
      int            i;
      int            totallen;
+     uint32        header;
+     int            nPairs = val->val.object.nPairs;

!     /* Remember where in the buffer this object starts. */
!     base_offset = buffer->len;

+     /* Align to 4-byte boundary (any padding counts as part of my data) */
      padBufferToInt(buffer);

!     /*
!      * Construct the header Jentry and store it in the beginning of the
!      * variable-length payload.
!      */
!     header = nPairs | JB_FOBJECT;
      appendToBuffer(buffer, (char *) &header, sizeof(uint32));

!     /* Reserve space for the JEntries of the keys and values. */
!     jentry_offset = reserveFromBuffer(buffer, sizeof(JEntry) * nPairs * 2);

+     /*
+      * Iterate over the keys, then over the values, since that is the ordering
+      * we want in the on-disk representation.
+      */
      totallen = 0;
!     for (i = 0; i < nPairs; i++)
      {
!         JsonbPair  *pair = &val->val.object.pairs[i];
!         int            len;
!         JEntry        meta;

!         /*
!          * Convert key, producing a JEntry and appending its variable-length
!          * data to buffer
!          */
          convertJsonbScalar(buffer, &meta, &pair->key);

!         len = JBE_OFFLENFLD(meta);
          totallen += len;

!         /*
!          * Bail out if total variable-length data exceeds what will fit in a
!          * JEntry length field.  We check this in each iteration, not just
!          * once at the end, to forestall possible integer overflow.
!          */
!         if (totallen > JENTRY_OFFLENMASK)
              ereport(ERROR,
                      (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
!                      errmsg("total size of jsonb object elements exceeds the maximum of %u bytes",
!                             JENTRY_OFFLENMASK)));

!         /*
!          * Convert each JB_OFFSET_STRIDE'th length to an offset.
!          */
!         if ((i % JB_OFFSET_STRIDE) == 0)
!             meta = (meta & JENTRY_TYPEMASK) | totallen | JENTRY_HAS_OFF;

!         copyToBuffer(buffer, jentry_offset, (char *) &meta, sizeof(JEntry));
!         jentry_offset += sizeof(JEntry);
!     }
!     for (i = 0; i < nPairs; i++)
!     {
!         JsonbPair  *pair = &val->val.object.pairs[i];
!         int            len;
!         JEntry        meta;
!
!         /*
!          * Convert value, producing a JEntry and appending its variable-length
!          * data to buffer
!          */
!         convertJsonbValue(buffer, &meta, &pair->value, level + 1);
!
!         len = JBE_OFFLENFLD(meta);
          totallen += len;

!         /*
!          * Bail out if total variable-length data exceeds what will fit in a
!          * JEntry length field.  We check this in each iteration, not just
!          * once at the end, to forestall possible integer overflow.
!          */
!         if (totallen > JENTRY_OFFLENMASK)
              ereport(ERROR,
                      (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
!                      errmsg("total size of jsonb object elements exceeds the maximum of %u bytes",
!                             JENTRY_OFFLENMASK)));

!         /*
!          * Convert each JB_OFFSET_STRIDE'th length to an offset.
!          */
!         if (((i + nPairs) % JB_OFFSET_STRIDE) == 0)
!             meta = (meta & JENTRY_TYPEMASK) | totallen | JENTRY_HAS_OFF;
!
!         copyToBuffer(buffer, jentry_offset, (char *) &meta, sizeof(JEntry));
!         jentry_offset += sizeof(JEntry);
      }

!     /* Total data size is everything we've appended to buffer */
!     totallen = buffer->len - base_offset;

+     /* Check length again, since we didn't include the metadata above */
+     if (totallen > JENTRY_OFFLENMASK)
+         ereport(ERROR,
+                 (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+                  errmsg("total size of jsonb object elements exceeds the maximum of %u bytes",
+                         JENTRY_OFFLENMASK)));
+
+     /* Initialize the header of this node in the container's JEntry array */
      *pheader = JENTRY_ISCONTAINER | totallen;
  }

diff --git a/src/include/utils/jsonb.h b/src/include/utils/jsonb.h
index 91e3e14..b89e4cb 100644
*** a/src/include/utils/jsonb.h
--- b/src/include/utils/jsonb.h
*************** typedef struct JsonbValue JsonbValue;
*** 83,91 ****
   * buffer is accessed, but they can also be deep copied and passed around.
   *
   * Jsonb is a tree structure. Each node in the tree consists of a JEntry
!  * header, and a variable-length content.  The JEntry header indicates what
!  * kind of a node it is, e.g. a string or an array, and the offset and length
!  * of its variable-length portion within the container.
   *
   * The JEntry and the content of a node are not stored physically together.
   * Instead, the container array or object has an array that holds the JEntrys
--- 83,91 ----
   * buffer is accessed, but they can also be deep copied and passed around.
   *
   * Jsonb is a tree structure. Each node in the tree consists of a JEntry
!  * header and a variable-length content (possibly of zero size).  The JEntry
!  * header indicates what kind of a node it is, e.g. a string or an array,
!  * and provides the length of its variable-length portion.
   *
   * The JEntry and the content of a node are not stored physically together.
   * Instead, the container array or object has an array that holds the JEntrys
*************** typedef struct JsonbValue JsonbValue;
*** 95,134 ****
   * hold its JEntry. Hence, no JEntry header is stored for the root node.  It
   * is implicitly known that the root node must be an array or an object,
   * so we can get away without the type indicator as long as we can distinguish
!  * the two.  For that purpose, both an array and an object begins with a uint32
   * header field, which contains an JB_FOBJECT or JB_FARRAY flag.  When a naked
   * scalar value needs to be stored as a Jsonb value, what we actually store is
   * an array with one element, with the flags in the array's header field set
   * to JB_FSCALAR | JB_FARRAY.
   *
-  * To encode the length and offset of the variable-length portion of each
-  * node in a compact way, the JEntry stores only the end offset within the
-  * variable-length portion of the container node. For the first JEntry in the
-  * container's JEntry array, that equals to the length of the node data.  The
-  * begin offset and length of the rest of the entries can be calculated using
-  * the end offset of the previous JEntry in the array.
-  *
   * Overall, the Jsonb struct requires 4-bytes alignment. Within the struct,
   * the variable-length portion of some node types is aligned to a 4-byte
   * boundary, while others are not. When alignment is needed, the padding is
   * in the beginning of the node that requires it. For example, if a numeric
   * node is stored after a string node, so that the numeric node begins at
   * offset 3, the variable-length portion of the numeric node will begin with
!  * one padding byte.
   */

  /*
!  * Jentry format.
   *
!  * The least significant 28 bits store the end offset of the entry (see
!  * JBE_ENDPOS, JBE_OFF, JBE_LEN macros below). The next three bits
!  * are used to store the type of the entry. The most significant bit
!  * is unused, and should be set to zero.
   */
  typedef uint32 JEntry;

! #define JENTRY_POSMASK            0x0FFFFFFF
  #define JENTRY_TYPEMASK            0x70000000

  /* values stored in the type bits */
  #define JENTRY_ISSTRING            0x00000000
--- 95,146 ----
   * hold its JEntry. Hence, no JEntry header is stored for the root node.  It
   * is implicitly known that the root node must be an array or an object,
   * so we can get away without the type indicator as long as we can distinguish
!  * the two.  For that purpose, both an array and an object begin with a uint32
   * header field, which contains an JB_FOBJECT or JB_FARRAY flag.  When a naked
   * scalar value needs to be stored as a Jsonb value, what we actually store is
   * an array with one element, with the flags in the array's header field set
   * to JB_FSCALAR | JB_FARRAY.
   *
   * Overall, the Jsonb struct requires 4-bytes alignment. Within the struct,
   * the variable-length portion of some node types is aligned to a 4-byte
   * boundary, while others are not. When alignment is needed, the padding is
   * in the beginning of the node that requires it. For example, if a numeric
   * node is stored after a string node, so that the numeric node begins at
   * offset 3, the variable-length portion of the numeric node will begin with
!  * one padding byte so that the actual numeric data is 4-byte aligned.
   */

  /*
!  * JEntry format.
   *
!  * The least significant 28 bits store either the data length of the entry,
!  * or its end+1 offset from the start of the variable-length portion of the
!  * containing object.  The next three bits store the type of the entry, and
!  * the high-order bit tells whether the least significant bits store a length
!  * or an offset.
!  *
!  * The reason for the offset-or-length complication is to compromise between
!  * access speed and data compressibility.  In the initial design each JEntry
!  * always stored an offset, but this resulted in JEntry arrays with horrible
!  * compressibility properties, so that TOAST compression of a JSONB did not
!  * work well.  Storing only lengths would greatly improve compressibility,
!  * but it makes random access into large arrays expensive (O(N) not O(1)).
!  * So what we do is store an offset in every JB_OFFSET_STRIDE'th JEntry and
!  * a length in the rest.  This results in reasonably compressible data (as
!  * long as the stride isn't too small).  We may have to examine as many as
!  * JB_OFFSET_STRIDE JEntrys in order to find out the offset or length of any
!  * given item, but that's still O(1) no matter how large the container is.
!  *
!  * We could avoid eating a flag bit for this purpose if we were to store
!  * the stride in the container header, or if we were willing to treat the
!  * stride as an unchangeable constant.  Neither of those options is very
!  * attractive though.
   */
  typedef uint32 JEntry;

! #define JENTRY_OFFLENMASK        0x0FFFFFFF
  #define JENTRY_TYPEMASK            0x70000000
+ #define JENTRY_HAS_OFF            0x80000000

  /* values stored in the type bits */
  #define JENTRY_ISSTRING            0x00000000
*************** typedef uint32 JEntry;
*** 138,144 ****
  #define JENTRY_ISNULL            0x40000000
  #define JENTRY_ISCONTAINER        0x50000000        /* array or object */

! /* Note possible multiple evaluations */
  #define JBE_ISSTRING(je_)        (((je_) & JENTRY_TYPEMASK) == JENTRY_ISSTRING)
  #define JBE_ISNUMERIC(je_)        (((je_) & JENTRY_TYPEMASK) == JENTRY_ISNUMERIC)
  #define JBE_ISCONTAINER(je_)    (((je_) & JENTRY_TYPEMASK) == JENTRY_ISCONTAINER)
--- 150,158 ----
  #define JENTRY_ISNULL            0x40000000
  #define JENTRY_ISCONTAINER        0x50000000        /* array or object */

! /* Access macros.  Note possible multiple evaluations */
! #define JBE_OFFLENFLD(je_)        ((je_) & JENTRY_OFFLENMASK)
! #define JBE_HAS_OFF(je_)        (((je_) & JENTRY_HAS_OFF) != 0)
  #define JBE_ISSTRING(je_)        (((je_) & JENTRY_TYPEMASK) == JENTRY_ISSTRING)
  #define JBE_ISNUMERIC(je_)        (((je_) & JENTRY_TYPEMASK) == JENTRY_ISNUMERIC)
  #define JBE_ISCONTAINER(je_)    (((je_) & JENTRY_TYPEMASK) == JENTRY_ISCONTAINER)
*************** typedef uint32 JEntry;
*** 147,166 ****
  #define JBE_ISBOOL_FALSE(je_)    (((je_) & JENTRY_TYPEMASK) == JENTRY_ISBOOL_FALSE)
  #define JBE_ISBOOL(je_)            (JBE_ISBOOL_TRUE(je_) || JBE_ISBOOL_FALSE(je_))

  /*
!  * Macros for getting the offset and length of an element. Note multiple
!  * evaluations and access to prior array element.
   */
! #define JBE_ENDPOS(je_)            ((je_) & JENTRY_POSMASK)
! #define JBE_OFF(ja, i)            ((i) == 0 ? 0 : JBE_ENDPOS((ja)[i - 1]))
! #define JBE_LEN(ja, i)            ((i) == 0 ? JBE_ENDPOS((ja)[i]) \
!                                  : JBE_ENDPOS((ja)[i]) - JBE_ENDPOS((ja)[i - 1]))

  /*
   * A jsonb array or object node, within a Jsonb Datum.
   *
!  * An array has one child for each element. An object has two children for
!  * each key/value pair.
   */
  typedef struct JsonbContainer
  {
--- 161,194 ----
  #define JBE_ISBOOL_FALSE(je_)    (((je_) & JENTRY_TYPEMASK) == JENTRY_ISBOOL_FALSE)
  #define JBE_ISBOOL(je_)            (JBE_ISBOOL_TRUE(je_) || JBE_ISBOOL_FALSE(je_))

+ /* Macro for advancing an offset variable to the next JEntry */
+ #define JBE_ADVANCE_OFFSET(offset, je) \
+     do { \
+         JEntry    je_ = (je); \
+         if (JBE_HAS_OFF(je_)) \
+             (offset) = JBE_OFFLENFLD(je_); \
+         else \
+             (offset) += JBE_OFFLENFLD(je_); \
+     } while(0)
+
  /*
!  * We store an offset, not a length, every JB_OFFSET_STRIDE children.
!  * Caution: this macro should only be referenced when creating a JSONB
!  * value.  When examining an existing value, pay attention to the HAS_OFF
!  * bits instead.  This allows changes in the offset-placement heuristic
!  * without breaking on-disk compatibility.
   */
! #define JB_OFFSET_STRIDE        32

  /*
   * A jsonb array or object node, within a Jsonb Datum.
   *
!  * An array has one child for each element, stored in array order.
!  *
!  * An object has two children for each key/value pair.  The keys all appear
!  * first, in key sort order; then the values appear, in an order matching the
!  * key order.  This arrangement keeps the keys compact in memory, making a
!  * search for a particular key more cache-friendly.
   */
  typedef struct JsonbContainer
  {
*************** typedef struct JsonbContainer
*** 172,179 ****
  } JsonbContainer;

  /* flags for the header-field in JsonbContainer */
! #define JB_CMASK                0x0FFFFFFF
! #define JB_FSCALAR                0x10000000
  #define JB_FOBJECT                0x20000000
  #define JB_FARRAY                0x40000000

--- 200,207 ----
  } JsonbContainer;

  /* flags for the header-field in JsonbContainer */
! #define JB_CMASK                0x0FFFFFFF        /* mask for count field */
! #define JB_FSCALAR                0x10000000        /* flag bits */
  #define JB_FOBJECT                0x20000000
  #define JB_FARRAY                0x40000000

*************** struct JsonbValue
*** 248,265 ****
                                       (jsonbval)->type <= jbvBool)

  /*
!  * Pair within an Object.
   *
!  * Pairs with duplicate keys are de-duplicated.  We store the order for the
!  * benefit of doing so in a well-defined way with respect to the original
!  * observed order (which is "last observed wins").  This is only used briefly
!  * when originally constructing a Jsonb.
   */
  struct JsonbPair
  {
      JsonbValue    key;            /* Must be a jbvString */
      JsonbValue    value;            /* May be of any type */
!     uint32        order;            /* preserves order of pairs with equal keys */
  };

  /* Conversion state used when parsing Jsonb from text, or for type coercion */
--- 276,295 ----
                                       (jsonbval)->type <= jbvBool)

  /*
!  * Key/value pair within an Object.
   *
!  * This struct type is only used briefly while constructing a Jsonb; it is
!  * *not* the on-disk representation.
!  *
!  * Pairs with duplicate keys are de-duplicated.  We store the originally
!  * observed pair ordering for the purpose of removing duplicates in a
!  * well-defined way (which is "last observed wins").
   */
  struct JsonbPair
  {
      JsonbValue    key;            /* Must be a jbvString */
      JsonbValue    value;            /* May be of any type */
!     uint32        order;            /* Pair's index in original sequence */
  };

  /* Conversion state used when parsing Jsonb from text, or for type coercion */
*************** typedef struct JsonbIterator
*** 287,306 ****
  {
      /* Container being iterated */
      JsonbContainer *container;
!     uint32        nElems;            /* Number of elements in children array (will be
!                                  * nPairs for objects) */
      bool        isScalar;        /* Pseudo-array scalar value? */
!     JEntry       *children;

!     /* Current item in buffer (up to nElems, but must * 2 for objects) */
!     int            i;

      /*
!      * Data proper.  This points just past end of children array.
!      * We use the JBE_OFF() macro on the Jentrys to find offsets of each
!      * child in this area.
       */
!     char       *dataProper;

      /* Private state */
      JsonbIterState state;
--- 317,341 ----
  {
      /* Container being iterated */
      JsonbContainer *container;
!     uint32        nElems;            /* Number of elements in children array (will
!                                  * be nPairs for objects) */
      bool        isScalar;        /* Pseudo-array scalar value? */
!     JEntry       *children;        /* JEntrys for child nodes */
!     /* Data proper.  This points to the beginning of the variable-length data */
!     char       *dataProper;

!     /* Current item in buffer (up to nElems) */
!     int            curIndex;
!
!     /* Data offset corresponding to current item */
!     uint32        curDataOffset;

      /*
!      * If the container is an object, we want to return keys and values
!      * alternately; so curDataOffset points to the current key, and
!      * curValueOffset points to the current value.
       */
!     uint32        curValueOffset;

      /* Private state */
      JsonbIterState state;
*************** extern Datum gin_consistent_jsonb_path(P
*** 344,349 ****
--- 379,386 ----
  extern Datum gin_triconsistent_jsonb_path(PG_FUNCTION_ARGS);

  /* Support functions */
+ extern uint32 getJsonbOffset(const JsonbContainer *jc, int index);
+ extern uint32 getJsonbLength(const JsonbContainer *jc, int index);
  extern int    compareJsonbContainers(JsonbContainer *a, JsonbContainer *b);
  extern JsonbValue *findJsonbValueFromContainer(JsonbContainer *sheader,
                              uint32 flags,

Re: jsonb format is pessimal for toast compression

From
Josh Berkus
Date:
On 09/25/2014 08:10 PM, Tom Lane wrote:
> I wrote:
>> The "offsets-and-lengths" patch seems like the approach we ought to
>> compare to my patch, but it looks pretty unfinished to me: AFAICS it
>> includes logic to understand offsets sprinkled into a mostly-lengths
>> array, but no logic that would actually *store* any such offsets,
>> which means it's going to act just like my patch for performance
>> purposes.
> 
>> In the interests of pushing this forward, I will work today on
>> trying to finish and review Heikki's offsets-and-lengths patch
>> so that we have something we can do performance testing on.
>> I doubt that the performance testing will tell us anything we
>> don't expect, but we should do it anyway.
> 
> I've now done that, and attached is what I think would be a committable
> version.  Having done this work, I no longer think that this approach
> is significantly messier code-wise than the all-lengths version, and
> it does have the merit of not degrading on very large objects/arrays.
> So at the moment I'm leaning to this solution not the all-lengths one.
> 
> To get a sense of the compression effects of varying the stride distance,
> I repeated the compression measurements I'd done on 14 August with Pavel's
> geometry data (<24077.1408052877@sss.pgh.pa.us>).  The upshot of that was
> 
>                     min    max    avg
> 
> external text representation        220    172685    880.3
> JSON representation (compressed text)    224    78565    541.3
> pg_column_size, JSONB HEAD repr.    225    82540    639.0
> pg_column_size, all-lengths repr.    225    66794    531.1
> 
> Here's what I get with this patch and different stride distances:
> 
> JB_OFFSET_STRIDE = 8            225    68551    559.7
> JB_OFFSET_STRIDE = 16            225    67601    552.3
> JB_OFFSET_STRIDE = 32            225    67120    547.4
> JB_OFFSET_STRIDE = 64            225    66886    546.9
> JB_OFFSET_STRIDE = 128            225    66879    546.9
> JB_OFFSET_STRIDE = 256            225    66846    546.8
> 
> So at least for that test data, 32 seems like the sweet spot.
> We are giving up a couple percent of space in comparison to the
> all-lengths version, but this is probably an acceptable tradeoff
> for not degrading on very large arrays.
> 
> I've not done any speed testing.

I'll do some tommorrow.  I should have some different DBs to test on, too.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: jsonb format is pessimal for toast compression

From
Josh Berkus
Date:
All,

So these results have become a bit complex.  So spreadsheet time.

https://docs.google.com/spreadsheets/d/1Mokpx3EqlbWlFDIkF9qzpM7NneN9z-QOXWSzws3E-R4

Some details:

The Length-and-Offset test was performed using a more recent 9.4
checkout than the other two tests.  This was regrettable, and due to a
mistake with git, since the results tell me that there have been some
other changes.

I added two new datasets:

errlog2 is a simple, 4-column error log in JSON format, with 2 small
values and 2 large values in each datum.  It was there to check if any
of our changes affected the performance or size of such simple
structures (answer: no).

processed_b is a synthetic version of Mozilla Socorro's crash dumps,
about 900,000 of them, with nearly identical JSON on each row. These are
large json values (around 4KB each) with a broad mix of values and 5
levels of nesting.  However, none of the levels have very many keys per
level; the max is that the top level has up to 40 keys.  Unlike the
other data sets, I can provide a copy of processed_b for asking.

So, some observations:

* Data sizes with lengths-and-offets are slightly (3%) larger than
all-lengths for the pathological case (jsonbish) and unaffected for
other cases.

* Even large, complex JSON (processed_b) gets better compression with
the two patches than with head, although only slightly better (16%)

* This better compression for processed_b leads to slightly slower
extraction (6-7%), and surprisingly slower extraction for
length-and-offset than for all-lengths (about 2%).

* in the patholgical case, length-and-offset was notably faster on Q1
than all-lengths (24%), and somewhat slower on Q2 (8%).  I think this
shows me that I don't understand what JSON keys are "at the end".

* notably, length-and-offset when uncompressed (EXTERNAL) was faster on
Q1 than head!  This was surprising enough that I retested it.

Overall, I'm satisfied with the performance of the length-and-offset
patch.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: jsonb format is pessimal for toast compression

From
Josh Berkus
Date:
On 09/26/2014 06:20 PM, Josh Berkus wrote:
> Overall, I'm satisfied with the performance of the length-and-offset
> patch.

Oh, also ... no bugs found.

So, can we get Beta3 out now?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> So, can we get Beta3 out now?

If nobody else steps up and says they want to do some performance
testing, I'll push the latest lengths+offsets patch tomorrow.

Are any of the other open items listed at
https://wiki.postgresql.org/wiki/PostgreSQL_9.4_Open_Items
things that we must-fix-before-beta3?
        regards, tom lane



Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> On Thu, Sep 25, 2014 at 02:39:37PM -0400, Tom Lane wrote:
>> BTW, it seems like there is consensus that we ought to reorder the items
>> in a jsonb object to have keys first and then values, independently of the
>> other issues under discussion.  This means we *will* be breaking on-disk
>> compatibility with 9.4beta2, which means pg_upgrade will need to be taught
>> to refuse an upgrade if the database contains any jsonb columns.  Bruce,
>> do you have time to crank out a patch for that?

> Yes, I can do that easily.  Tell me when you want it --- I just need a
> catalog version number to trigger on.

Done --- 201409291 is the cutover point.
        regards, tom lane



Re: jsonb format is pessimal for toast compression

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > On Thu, Sep 25, 2014 at 02:39:37PM -0400, Tom Lane wrote:
> >> BTW, it seems like there is consensus that we ought to reorder the items
> >> in a jsonb object to have keys first and then values, independently of the
> >> other issues under discussion.  This means we *will* be breaking on-disk
> >> compatibility with 9.4beta2, which means pg_upgrade will need to be taught
> >> to refuse an upgrade if the database contains any jsonb columns.  Bruce,
> >> do you have time to crank out a patch for that?
>
> > Yes, I can do that easily.  Tell me when you want it --- I just need a
> > catalog version number to trigger on.
>
> Done --- 201409291 is the cutover point.

Just to clarify- the commit bumped the catversion to 201409292, so
version <= 201409291 has the old format while version > 201409291 has
the new format.  There was no 201409291, so I suppose it doesn't matter
too much, but technically 'version >= 201409291' wouldn't be accurate.

I'm guessing this all makes sense for how pg_upgrade works, but I found
it a bit surprising that the version mentioned as the cutover point
wasn't the catversion committed.
Thanks,
    Stephen

Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> Done --- 201409291 is the cutover point.

> Just to clarify- the commit bumped the catversion to 201409292, so
> version <= 201409291 has the old format while version > 201409291 has
> the new format.  There was no 201409291, so I suppose it doesn't matter
> too much, but technically 'version >= 201409291' wouldn't be accurate.

Nope.  See my response to Andrew: ...1 is the cutover commit Bruce
should use, because that's what it is in 9.4.
        regards, tom lane



Re: jsonb format is pessimal for toast compression

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > * Tom Lane (tgl@sss.pgh.pa.us) wrote:
> >> Done --- 201409291 is the cutover point.
>
> > Just to clarify- the commit bumped the catversion to 201409292, so
> > version <= 201409291 has the old format while version > 201409291 has
> > the new format.  There was no 201409291, so I suppose it doesn't matter
> > too much, but technically 'version >= 201409291' wouldn't be accurate.
>
> Nope.  See my response to Andrew: ...1 is the cutover commit Bruce
> should use, because that's what it is in 9.4.

Yup, makes sense.
Thanks!
    Stephen

Re: jsonb format is pessimal for toast compression

From
Arthur Silva
Date:
<div dir="ltr"><br /><div class="gmail_extra"><br /><div class="gmail_quote">On Mon, Sep 29, 2014 at 12:19 AM, Josh
Berkus<span dir="ltr"><<a href="mailto:josh@agliodbs.com" target="_blank">josh@agliodbs.com</a>></span> wrote:<br
/><blockquoteclass="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex"><spanclass="">On 09/26/2014 06:20 PM, Josh Berkus wrote:<br /> > Overall, I'm
satisfiedwith the performance of the length-and-offset<br /> > patch.<br /><br /></span>Oh, also ... no bugs
found.<br/><br /> So, can we get Beta3 out now?<br /><div class=""><div class="h5"><br /> --<br /> Josh Berkus<br />
PostgreSQLExperts Inc.<br /><a href="http://pgexperts.com" target="_blank">http://pgexperts.com</a><br /><br /><br />
--<br/> Sent via pgsql-hackers mailing list (<a
href="mailto:pgsql-hackers@postgresql.org">pgsql-hackers@postgresql.org</a>)<br/> To make changes to your
subscription:<br/><a href="http://www.postgresql.org/mailpref/pgsql-hackers"
target="_blank">http://www.postgresql.org/mailpref/pgsql-hackers</a><br/></div></div></blockquote></div><br />What's
thecall on the stride length? Are we going to keep it hardcoded?<br /></div></div> 

Re: jsonb format is pessimal for toast compression

From
Josh Berkus
Date:
On 09/29/2014 11:49 AM, Arthur Silva wrote:
> What's the call on the stride length? Are we going to keep it hardcoded?

Please, yes.  The complications caused by a variable stride length would
be horrible.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: jsonb format is pessimal for toast compression

From
Tom Lane
Date:
Arthur Silva <arthurprs@gmail.com> writes:
> What's the call on the stride length? Are we going to keep it hardcoded?

At the moment it's 32, but we could change it without forcing a new
initdb.  I ran a simple test that seemed to show 32 was a good choice,
but if anyone else wants to try other cases, go for it.
        regards, tom lane



Re: jsonb format is pessimal for toast compression

From
Bruce Momjian
Date:
On Mon, Sep 29, 2014 at 12:30:40PM -0400, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > On Thu, Sep 25, 2014 at 02:39:37PM -0400, Tom Lane wrote:
> >> BTW, it seems like there is consensus that we ought to reorder the items
> >> in a jsonb object to have keys first and then values, independently of the
> >> other issues under discussion.  This means we *will* be breaking on-disk
> >> compatibility with 9.4beta2, which means pg_upgrade will need to be taught
> >> to refuse an upgrade if the database contains any jsonb columns.  Bruce,
> >> do you have time to crank out a patch for that?
>
> > Yes, I can do that easily.  Tell me when you want it --- I just need a
> > catalog version number to trigger on.
>
> Done --- 201409291 is the cutover point.

Attached patch applied to head, and backpatched to 9.4.  I think we need
to keep this in all future pg_ugprade versions in case someone from the
beta tries to jump versions, e.g. 9.4 beta1 to 9.5.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +

Attachment