Re: Add jsonb_compact(...) for whitespace-free jsonb to text - Mailing list pgsql-hackers

From Merlin Moncure
Subject Re: Add jsonb_compact(...) for whitespace-free jsonb to text
Date
Msg-id CAHyXU0y=viUPBXit6mqASrJkZos+VBjfsms8iOHMpiebGisiCQ@mail.gmail.com
Whole thread Raw
In response to Re: Add jsonb_compact(...) for whitespace-free jsonb to text  (Stephen Frost <sfrost@snowman.net>)
Responses Re: Add jsonb_compact(...) for whitespace-free jsonb to text  ("David G. Johnston" <david.g.johnston@gmail.com>)
List pgsql-hackers
On Wed, Apr 27, 2016 at 4:05 PM, Stephen Frost <sfrost@snowman.net> wrote:
> * Merlin Moncure (mmoncure@gmail.com) wrote:
>> On Tue, Apr 26, 2016 at 11:49 AM, Stephen Frost <sfrost@snowman.net> wrote:
>> > As I mentioned to Sehrope on IRC, at least for my 2c, if you want a
>> > compact JSON format to reduce the amount of traffic over the wire or to
>> > do things with on the client side, we should probably come up with a
>> > binary format, rather than just hack out the whitespace.  It's not like
>> > representing numbers using ASCII characters is terribly efficient
>> > either.
>>
>> -1
>>
>> This will benefit pretty much nobody unless you are writing a hand
>> crafted C application that consumes and processes the data directly.
>
> That's not accurate.  All that's needed is for the libraries which
> either wrap libpq or re-implement it to be updated to understand the
> format and then convert the data into whatever structure makes sense for
> the language (or library that the language includes for working with
> JSON data).

Sure, that's pretty easy.   Note, I cowrote the only libpq wrapper
that demystifies pg binary formats, libpqtypes.  I can tell you that
binary formats are much faster than text formats in any cases where
parsing is non trivial -- geo types, timestamp types, containers etc.However I would be very surprised if postgres
binaryformat json
 
would replace language parsing of json in any popular language like
java for common usage.

I'll go further.   Postgres json support has pretty much made our
binary formats obsolete.  The main problem with text format data was
sending complex structured data to the client over our overlapping
escape mechanisms; client side parsing was slow and in certain
scenarios backslashes would proliferate exponentially.    json support
eliminates all of those problems and the performance advantages of
binary support (mainly parsing of complex types)  rarely justify the
development headaches.  These days, for the vast majority of data
traffic to the application it's a single row, single column json
coming in and out of the database.

>> I'd venture to guess this is a tiny fraction of pg users these days.
>> I do not understand at all the objection to removing whitespace.
>> Extra whitespace does nothing but pad the document as humans will
>> always run the document through a prettifier tuned to their specific
>> requirements (generally starting with, intelligent placement of
>> newlines) if reading directly.
>
> The objection is that it's a terribly poor solution as it simply makes
> things ugly for a pretty small amount of improvement.  Looking at it
> from the perspective of just "whitespace is bad!"

Whitespace is bad, because it simply pads documents on every stage of
processing.  You simply can't please everyone in terms of where it
should go so you don't and reserve that functionality for
prettification functions.  json is for *data serialization*.  We
should not inject extra characters for aesthetics in the general case;
reducing memory consumption by 10% on both the client and server
during parse is a feature.

Andrew mentions several solutions.  I like them all except I would
prefer not to introduce a GUC for controlling the output format.  I do
not think it's a good idea to set the expectation that clients can
rely on text out byte for byte for any type including json.

merlin



pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: Typo
Next
From: Craig Ringer
Date:
Subject: Re: Timeline following for logical slots