Re: jsonb format is pessimal for toast compression - Mailing list pgsql-hackers

From Josh Berkus
Subject Re: jsonb format is pessimal for toast compression
Date
Msg-id 53ED6228.2060507@agliodbs.com
Whole thread Raw
In response to Re: jsonb format is pessimal for toast compression  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: jsonb format is pessimal for toast compression
List pgsql-hackers
> Before changing to to INT_MAX:
> 
>  thetype |    colsize_distribution
> ---------+----------------------------
>  json    | {1741,1767,1854,1904,2292}
>  jsonb   | {3551,5866,5910,5958,6168}
> 
> After:
> 
>  thetype |    colsize_distribution
> ---------+----------------------------
>  json    | {1741,1767,1854,1904,2292}
>  jsonb   | {3515,3543,3636,3690,4038}
> 
> So that did improve things, just not as much as we'd like.

And with Tom's test patch:

postgres=# select pg_size_pretty(pg_total_relation_size('jsonic'));
pg_size_pretty
----------------394 MB
(1 row)

postgres=# select pg_size_pretty(pg_total_relation_size('jsonbish'));pg_size_pretty
----------------541 MB
(1 row)
thetype |    colsize_distribution
---------+----------------------------json    | {1741,1767,1854,1904,2292}jsonb   | {2037,2114,2288,2348,2746}

Since that improved things a *lot*, just +40% instead of +200%, I
thought I'd test some select queries.  I decided to test a GIN lookup
and value extraction, since indexed lookup is really what I care about.

9.4b2 no patches:

postgres=# explain analyze select row_to_json -> 'kt1_total_sum' from
jsonbish where row_to_json @> '{ "rpt_per_dt" : "2003-06-30" }';
      QUERY PLAN
 

-------------------------------------------------------------------------------------------------------------------------------------------Bitmap
HeapScan on jsonbish  (cost=29.55..582.92 rows=200 width=18)
 
(actual time=20.814..2845.454 rows=100423 loops=1)  Recheck Cond: (row_to_json @> '{"rpt_per_dt":
"2003-06-30"}'::jsonb) Heap Blocks: exact=1471  ->  Bitmap Index Scan on jsonbish_row_to_json_idx  (cost=0.00..29.50
 
rows=200 width=0) (actual time=20.551..20.551 rows=100423 loops=1)        Index Cond: (row_to_json @> '{"rpt_per_dt":
"2003-06-30"}'::jsonb)Planningtime: 0.102 msExecution time: 2856.179 ms
 


9.4b2 TL patch:

postgres=# explain analyze select row_to_json -> 'kt1_total_sum' from
jsonbish where row_to_json @> '{ "rpt_per_dt" : "2003-06-30" }';
      QUERY
 
PLAN

-------------------------------------------------------------------------------------------------------------------------------------------Bitmap
HeapScan on jsonbish  (cost=29.55..582.92 rows=200 width=18)
 
(actual time=24.071..5201.687 rows=100423 loops=1)  Recheck Cond: (row_to_json @> '{"rpt_per_dt":
"2003-06-30"}'::jsonb) Heap Blocks: exact=1471  ->  Bitmap Index Scan on jsonbish_row_to_json_idx  (cost=0.00..29.50
 
rows=200 width=0) (actual time=23.779..23.779 rows=100423 loops=1)        Index Cond: (row_to_json @> '{"rpt_per_dt":
"2003-06-30"}'::jsonb)Planningtime: 0.098 msExecution time: 5214.212 ms
 

... so, an 80% increase in lookup and extraction time for swapping
offsets for lengths.  That's actually all extraction time; I tried
removing the extraction from the query, and without it both queries are
close enough to be statstically insignificant.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Reporting the commit LSN at commit time
Next
From: Peter Eisentraut
Date:
Subject: run xmllint during build (was Re: need xmllint on borka)