Re: Add jsonb_compact(...) for whitespace-free jsonb to text - Mailing list pgsql-hackers

From Sehrope Sarkuni
Subject Re: Add jsonb_compact(...) for whitespace-free jsonb to text
Date
Msg-id CAH7T-ap83PwQjtUzFUPBUa6Qd6SvzRegOjMx5GTTuFuEZBWZJQ@mail.gmail.com
Whole thread Raw
In response to Re: Add jsonb_compact(...) for whitespace-free jsonb to text  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
On Sun, Apr 24, 2016 at 9:27 PM, Stephen Frost <sfrost@snowman.net> wrote:
* Andrew Dunstan (andrew@dunslane.net) wrote:
> On 04/24/2016 06:02 PM, Sehrope Sarkuni wrote:
> >AFAIK, there's also no guarantee on the specific order of the
> >resulting properties in the text representation either. I would
> >suppose it's fixed for a given jsonb value within a database major
> >version but across major versions it could change (if the
> >underlying representation changes).
>
> The order is fixed and very unlikely to change, as it was chosen
> quite deliberately to help ensure efficient processing. Any change
> in on-disk representation of data types is something we work very
> hard to avoid, as it makes it impossible to run pg_upgrade.

We do, from time-to-time, change on-disk formats in a
backwards-compatible way though.  In any case, it's my understanding
that we don't *guarantee* any ordering currently and therefore we should
discourage users from depending on it.  If we *are* going to guarantee
ordering, then we should document what that ordering is.

Yes that's the idea, namely to have a fixed text format that will not change across releases. If the on-disk representation is already supports that then this could just be a doc change (assuming there's agreement that it's a good idea and said guarantee will be maintained).

Separately, I think the compact (i.e. whitespace free) output is useful on it's own. It adds up to two bytes per key/value pair (one after the colon and one after the comma) so the more keys you have the more the savings.

Here's a (contrived) example to show the size difference when serializing information_schema.columns. The row_to_json(...) function returns whitespace free output (as json, not jsonb) so it's a proxy for json_compact(..). It comes out to 7.5% smaller than the default jsonb text format:

app=> SELECT  
  MAX((SELECT COUNT(*) FROM json_object_keys(x))) AS num_keys,
  AVG(length(x::text)) AS json_text,
  AVG(length(x::jsonb::text)) AS jsonb_text,
  AVG(length(x::text)) / AVG(length(x::jsonb::text)) AS ratio
FROM (SELECT row_to_json(z.*) AS x
      FROM information_schema.columns z) t;
 num_keys |       json_text       |      jsonb_text       |         ratio          
----------+-----------------------+-----------------------+------------------------
       44 | 1077.0748522652659225 | 1164.0748522652659225 | 0.92526253803121012857
(1 row)

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Is there a way around function search_path killing SQL function inlining?
Next
From: Michael Paquier
Date:
Subject: Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.