Thread: BUG #11109: No Toast compression on JSON, JSONB columns

BUG #11109: No Toast compression on JSON, JSONB columns

From
ljw1001@gmail.com
Date:
The following bug has been logged on the website:

Bug reference:      11109
Logged by:          Larry White
Email address:      ljw1001@gmail.com
PostgreSQL version: 9.4beta2
Operating system:   OSX Mavericks
Description:

if PG can, after compression, put an entire JSON or JSONB 'document' into
one row/page in the toast table it does. However, if the document is too big
to fit in one row after compression, it does no compression at all.

I created a test Json file (in part) with this loop:

        for (int j = 0; j < 110; j++) {
            mediumPayload.getJunk().add("124245etweetwet345gwtretwt43 qwrqwq
qwre qw rsdflkas");
            mediumPayload.getJunk().add("q4535 wqrqwrqwrqw2 wrqwrqwrq32232w
kswe sfasrs sdfsd");
        }

if the loop runs 110 times as shown, it compresses.
if the loop runs 111 times, it does not:

With 110 iterations:
Extended      8192 bytes  (one page)
External           66 MB

With 111 iterations:
Extended      69 MB
External        69 MB

This is not the behavior observed with Text compression in Toast.  For very
large files Text compression works correctly.  I got these results running
similar tests on larger json files:

Column Type  - Storage       - TOAST table size
JSONB           - EXTERNAL -  2448 MB
JSONB           - EXTENDED - 2448 MB
JSON             - EXTENDED - 2504 MB
TEXT             - EXTERNAL  - 2409 MB
TEXT             - EXTENDED -      40 MB

To Recreate:
Take a json file that is too large to compress into a single page. insert it
into a table and check the size of the Toast table. Repeat with a file that
is small enough to fit into a single page when compressed (but bigger than
the Toast minimum size).  The first file will have no compression and the
second file will compress correctly.

Re: BUG #11109: No Toast compression on JSON, JSONB columns

From
Larry White
Date:
I looked into this a bit more.  I confirmed that in addition to the Text
type, the XML type also compresses large files.  With a 2MB XML file, I get
the following results:

Storage               Toast Table size
EXTERNAL              2392 KB
EXTENDED               312 KB

However, I withdraw the earlier speculation about when the issue occurs. It
clearly stops compressing files that are 'too large', but I'm not certain
what the trigger is. It may be related to the PGLZ_Strategy struct, which
contains a max_input_size value, or to some other factor.

Best regards,

larrry

Re: BUG #11109: No Toast compression on JSON, JSONB columns

From
Tom Lane
Date:
ljw1001@gmail.com writes:
> if PG can, after compression, put an entire JSON or JSONB 'document' into
> one row/page in the toast table it does. However, if the document is too big
> to fit in one row after compression, it does no compression at all.

I can't reproduce this.

It does seem that JSONB is noticeably less compressible than the
equivalent plain-text representation, which is disappointing,
but I don't see it failing to compress at all.

I experimented with this SQL script:


create table bigjson(f1 jsonb);

-- alter table bigjson alter column f1 set storage external;

insert into bigjson values(
'{"junk":["124245etweetwet345gwtretwt43 qwrqwq qwre qwrsdflkas",
"q4535 wqrqwrqwrqw2 wrqwrqwrq32232w kswe sfasrssdfsd",
"124245etweetwet345gwtretwt43 qwrqwq qwre qw rsdflkas",
... 50000 repetitions of above two lines ...
"q4535wqrqwrqwrqw2 wrqwrqwrq32232w kswe sfasrssdfsd",
"124245etweetwet345gwtretwt43 qwrqwq qwre qw rsdflkas",
"foo"]}'
);

select pg_size_pretty(pg_table_size('bigjson'));

drop table bigjson;


I get about 568kB table size, versus 5640kB with the "external" option
uncommented, so it is compressing about 10-to-1.  If I change the column
type to text (or json) then I get a table size of 120kB, which is better.
But on the other hand we should not put a lot of stock in the compression
rate achievable on such bogus data, so I'd not panic until I saw numbers
for more realistic test data.

What might be happening for you is that your actual test case triggers the
"first_success_by" filter in pglz_compress() because the first kilobyte of
input data isn't compressible.  But since you didn't supply a fully
reproducible test case, that's just speculation.

> This is not the behavior observed with Text compression in Toast.  For very
> large files Text compression works correctly.  I got these results running
> similar tests on larger json files:

> Column Type  - Storage       - TOAST table size
> JSONB           - EXTERNAL -  2448 MB
> JSONB           - EXTENDED - 2448 MB
> JSON             - EXTENDED - 2504 MB
> TEXT             - EXTERNAL  - 2409 MB
> TEXT             - EXTENDED -      40 MB

json and text should (and do, for me) store exactly the same.  So there's
something fishy going on here.  Maybe you've changed the typstorage
values for json and jsonb in pg_type?

            regards, tom lane

Re: BUG #11109: No Toast compression on JSON, JSONB columns

From
Larry White
Date:
Hi Tom,

Thanks again for your help.

I think I have a test case that will fail.  It seems possibly related (in part) to the encoding, but I'm not sure. 

In any case, here is the table. The 'bad.json' file is attached.  It's a 13K file that produces a 32K TOAST table

DROP TABLE document;

CREATE TABLE document
(
  owner character varying(36) NOT NULL,
  document_type character varying(36) NOT NULL,
  guid character varying(36) NOT NULL,
  schema_version numeric(7,3) NOT NULL,
  payload jsonb NOT NULL,
  last_update timestamp with time zone NOT NULL,
  payload_class character varying NOT NULL,
  instance_version character varying(36) NOT NULL,
  acl_read character varying(36)[],
  deleted boolean NOT NULL DEFAULT false,
  fts_text tsvector,
  CONSTRAINT guid_idx PRIMARY KEY (guid)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE document
  OWNER TO postgres;

ALTER TABLE document ALTER COLUMN payload SET STORAGE EXTENDED;
-- Index: document_payload_idx

-- DROP INDEX document_payload_idx;

CREATE INDEX document_payload_idx
  ON document
  USING gin
  (payload jsonb_path_ops);

-- Index: document_type_idx

-- DROP INDEX document_type_idx;

CREATE INDEX document_type_idx
  ON document
  USING btree
  (document_type COLLATE pg_catalog."default");

-- Index: fts_idx

-- DROP INDEX fts_idx;

CREATE INDEX fts_idx
  ON document
  USING gin
  (fts_text);

-- Index: owner_idx

-- DROP INDEX owner_idx;

CREATE INDEX owner_idx
  ON document
  USING btree
  (owner COLLATE pg_catalog."default");




On Thu, Aug 7, 2014 at 8:21 PM, Larry White <ljw1001@gmail.com> wrote:
Thank you for looking into this. This is hard to replicate. Some (most) files work fine. It seems that I just hit on a peculiar case.  

Actually, I can replicate it at will running through my app, but have not come up with an easily separable case that works. Even with the same data, it seems to compress ok if I insert one, but not if I do it in batches via JDBC batch updates. 

Anyway, I'm relieved to see that this isn't a general problem, but some kind of weird corner case I lucked into on my first test :). I will still try to create something reproducible and portable and send it in.


On Thu, Aug 7, 2014 at 5:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
ljw1001@gmail.com writes:
> if PG can, after compression, put an entire JSON or JSONB 'document' into
> one row/page in the toast table it does. However, if the document is too big
> to fit in one row after compression, it does no compression at all.

I can't reproduce this.

It does seem that JSONB is noticeably less compressible than the
equivalent plain-text representation, which is disappointing,
but I don't see it failing to compress at all.

I experimented with this SQL script:


create table bigjson(f1 jsonb);

-- alter table bigjson alter column f1 set storage external;

insert into bigjson values(
'{"junk":["124245etweetwet345gwtretwt43 qwrqwq qwre qwrsdflkas",
"q4535 wqrqwrqwrqw2 wrqwrqwrq32232w kswe sfasrssdfsd",
"124245etweetwet345gwtretwt43 qwrqwq qwre qw rsdflkas",
... 50000 repetitions of above two lines ...
"q4535wqrqwrqwrqw2 wrqwrqwrq32232w kswe sfasrssdfsd",
"124245etweetwet345gwtretwt43 qwrqwq qwre qw rsdflkas",
"foo"]}'
);

select pg_size_pretty(pg_table_size('bigjson'));

drop table bigjson;


I get about 568kB table size, versus 5640kB with the "external" option
uncommented, so it is compressing about 10-to-1.  If I change the column
type to text (or json) then I get a table size of 120kB, which is better.
But on the other hand we should not put a lot of stock in the compression
rate achievable on such bogus data, so I'd not panic until I saw numbers
for more realistic test data.

What might be happening for you is that your actual test case triggers the
"first_success_by" filter in pglz_compress() because the first kilobyte of
input data isn't compressible.  But since you didn't supply a fully
reproducible test case, that's just speculation.

> This is not the behavior observed with Text compression in Toast.  For very
> large files Text compression works correctly.  I got these results running
> similar tests on larger json files:

> Column Type  - Storage       - TOAST table size
> JSONB           - EXTERNAL -  2448 MB
> JSONB           - EXTENDED - 2448 MB
> JSON             - EXTENDED - 2504 MB
> TEXT             - EXTERNAL  - 2409 MB
> TEXT             - EXTENDED -      40 MB

json and text should (and do, for me) store exactly the same.  So there's
something fishy going on here.  Maybe you've changed the typstorage
values for json and jsonb in pg_type?

                        regards, tom lane


Attachment