Thread: JSONB spaces in text presentation

JSONB spaces in text presentation

From
"Ilya I. Ashchepkov"
Date:
Hi.

Is spaces is necessary in text presentation of JSONB?
In my data resulting text contains ~12% of spaces.

I'm developing web application, and want to get json-string from pg and send it to browser without repacking.

--
С уважением,
Ащепков Илья koctep@gmail.com

Re: JSONB spaces in text presentation

From
John R Pierce
Date:
On 9/24/2014 12:23 AM, Ilya I. Ashchepkov wrote:
>
> Is spaces is necessary in text presentation of JSONB?
> In my data resulting text contains ~12% of spaces.

can you show us an example of this?


--
john r pierce                                      37N 122W
somewhere on the middle of the left coast



Re: JSONB spaces in text presentation

From
"Ilya I. Ashchepkov"
Date:
I'm sorry about sending email several times. I haven't understand, was it sent by gmail or not.


On Wed, Sep 24, 2014 at 2:30 PM, John R Pierce <pierce@hogranch.com> wrote:
On 9/24/2014 12:23 AM, Ilya I. Ashchepkov wrote:

Is spaces is necessary in text presentation of JSONB?
In my data resulting text contains ~12% of spaces.

can you show us an example of this?

One record
# select data from events.data limit 1;
{"can": {"lls": {"1": 76.4}, "mhs": 4674.85, "rpm": 168.888, "speed": 74, "runned": 166855895, "fuel_consumption": 74213.5}, "crc": 10084, "gps": 1, "gsm": {"signal": 100}, "lls": {"1": 733, "2": 717}, "used": 19, "speed": 87.4, "valid": 1, "msg_id": 89, "runned": 72.75, "boot_no": 256, "digital": {"in": {"1": 1, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0}, "out": {"1": 0, "2": 0}}, "visible": 20, "ignition": 1, "location": {"course": 265, "altitude": 143, "latitude": 55.127888997395836, "longitude": 80.8046142578125}, "protocol": 4, "coldstart": 1, "timesource": "terminal", "receiver_on": 1, "external_power": 28.07, "internal_power": 4.19}

Whitespacis percents in this record:
# select array_length(regexp_split_to_array(data::text, text ' '), 1)*100./length(data::text) from events.data limit 1;
      ?column?
---------------------
 12.3417721518987342

Whitespace in test data
 # select count(*),avg(array_length(regexp_split_to_array(data::text, text ' '), 1)*100./length(data::text)) from events.data ;
 count  |         avg
--------+---------------------
 242222 | 12.3649234646118312




--
john r pierce                                      37N 122W
somewhere on the middle of the left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
С уважением,
Ащепков Илья koctep@gmail.com

Re: JSONB spaces in text presentation

From
Merlin Moncure
Date:
On Wed, Sep 24, 2014 at 2:44 AM, Ilya I. Ashchepkov <koctep@gmail.com> wrote:
> I'm sorry about sending email several times. I haven't understand, was it
> sent by gmail or not.
>
>
> On Wed, Sep 24, 2014 at 2:30 PM, John R Pierce <pierce@hogranch.com> wrote:
>>
>> On 9/24/2014 12:23 AM, Ilya I. Ashchepkov wrote:
>>>
>>>
>>> Is spaces is necessary in text presentation of JSONB?
>>> In my data resulting text contains ~12% of spaces.
>>
>>
>> can you show us an example of this?
>
>
> One record
> # select data from events.data limit 1;
> {"can": {"lls": {"1": 76.4}, "mhs": 4674.85, "rpm": 168.888, "speed": 74,
> "runned": 166855895, "fuel_consumption": 74213.5}, "crc": 10084, "gps": 1,
> "gsm": {"signal": 100}, "lls": {"1": 733, "2": 717}, "used": 19, "speed":
> 87.4, "valid": 1, "msg_id": 89, "runned": 72.75, "boot_no": 256, "digital":
> {"in": {"1": 1, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0}, "out": {"1": 0,
> "2": 0}}, "visible": 20, "ignition": 1, "location": {"course": 265,
> "altitude": 143, "latitude": 55.127888997395836, "longitude":
> 80.8046142578125}, "protocol": 4, "coldstart": 1, "timesource": "terminal",
> "receiver_on": 1, "external_power": 28.07, "internal_power": 4.19}
>
> Whitespacis percents in this record:
> # select array_length(regexp_split_to_array(data::text, text ' '),
> 1)*100./length(data::text) from events.data limit 1;
>       ?column?
> ---------------------
>  12.3417721518987342
>
> Whitespace in test data
>  # select count(*),avg(array_length(regexp_split_to_array(data::text, text '
> '), 1)*100./length(data::text)) from events.data ;
>  count  |         avg
> --------+---------------------
>  242222 | 12.3649234646118312


For jsonb (unlike json), data is not actually stored as json but in a
binary format.  It will generally be much larger than the text
representation in fact but in exchange for that many operations will
be faster.  The spaces you see are generated when the jsonb type is
converted to text for output.  I actually think it's pretty reasonable
to want to redact all spaces from such objects in all cases where
converstion to text happens (output functions, xxxto_json, etc)
because ~12% savings are nothing to sneeze at when moving large
documents in and out of the database.

On the flip side, a more verbose prettification would be pretty nice
too.  I wonder if a hypothetical GUC is the best way to control this
behavior...

merlin


Re: JSONB spaces in text presentation

From
Adrian Klaver
Date:
On 09/24/2014 12:44 AM, Ilya I. Ashchepkov wrote:
> I'm sorry about sending email several times. I haven't understand, was
> it sent by gmail or not.
>
>
> On Wed, Sep 24, 2014 at 2:30 PM, John R Pierce <pierce@hogranch.com
> <mailto:pierce@hogranch.com>> wrote:
>
>     On 9/24/2014 12:23 AM, Ilya I. Ashchepkov wrote:
>
>
>         Is spaces is necessary in text presentation of JSONB?
>         In my data resulting text contains ~12% of spaces.
>
>
>     can you show us an example of this?
>
>
> One record
> # select data from events.data limit 1;
> {"can": {"lls": {"1": 76.4}, "mhs": 4674.85, "rpm": 168.888, "speed":
> 74, "runned": 166855895, "fuel_consumption": 74213.5}, "crc": 10084,
> "gps": 1, "gsm": {"signal": 100}, "lls": {"1": 733, "2": 717}, "used":
> 19, "speed": 87.4, "valid": 1, "msg_id": 89, "runned": 72.75, "boot_no":
> 256, "digital": {"in": {"1": 1, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0},
> "out": {"1": 0, "2": 0}}, "visible": 20, "ignition": 1, "location":
> {"course": 265, "altitude": 143, "latitude": 55.127888997395836,
> "longitude": 80.8046142578125}, "protocol": 4, "coldstart": 1,
> "timesource": "terminal", "receiver_on": 1, "external_power": 28.07,
> "internal_power": 4.19}
>
> Whitespacis percents in this record:
> # select array_length(regexp_split_to_array(data::text, text ' '),
> 1)*100./length(data::text) from events.data limit 1;
>        ?column?
> ---------------------
>   12.3417721518987342
>
> Whitespace in test data
>   # select count(*),avg(array_length(regexp_split_to_array(data::text,
> text ' '), 1)*100./length(data::text)) from events.data ;
>   count  |         avg
> --------+---------------------
>   242222 | 12.3649234646118312
>

The only thing I can of is to use json not jsonb. Modified example taken
from docs:

http://www.postgresql.org/docs/9.4/static/datatype-json.html

test=# SELECT '{"bar":"baz","balance":7.77,"active":false}'::jsonb;
                       jsonb
--------------------------------------------------
  {"bar": "baz", "active": false, "balance": 7.77}



test=# SELECT '{"bar":"baz","balance":7.77,"active":false}'::json;
                     json
---------------------------------------------
  {"bar":"baz","balance":7.77,"active":false}


json will return exactly what was put in.


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: JSONB spaces in text presentation

From
Seref Arikan
Date:
This is interesting. Most binary encoding methods I use produce smaller files than the text files for the same content. 
Having read your mail, I've realized that I have no reason to accept the same from the jsonb. I did a quick google search to see if it is wrong to expect binary encoding to decrease size and saw that I'm not alone (which still does not mean I'm being reasonable). 
This project: http://ubjson.org/#size is one of the hits which mentions some nice space gains thanks to binary encoding. 

The "much larger" part is a bit scary. Is this documented somewhere?

Best regards
Seref


On Wed, Sep 24, 2014 at 2:44 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Wed, Sep 24, 2014 at 2:44 AM, Ilya I. Ashchepkov <koctep@gmail.com> wrote:
> I'm sorry about sending email several times. I haven't understand, was it
> sent by gmail or not.
>
>
> On Wed, Sep 24, 2014 at 2:30 PM, John R Pierce <pierce@hogranch.com> wrote:
>>
>> On 9/24/2014 12:23 AM, Ilya I. Ashchepkov wrote:
>>>
>>>
>>> Is spaces is necessary in text presentation of JSONB?
>>> In my data resulting text contains ~12% of spaces.
>>
>>
>> can you show us an example of this?
>
>
> One record
> # select data from events.data limit 1;
> {"can": {"lls": {"1": 76.4}, "mhs": 4674.85, "rpm": 168.888, "speed": 74,
> "runned": 166855895, "fuel_consumption": 74213.5}, "crc": 10084, "gps": 1,
> "gsm": {"signal": 100}, "lls": {"1": 733, "2": 717}, "used": 19, "speed":
> 87.4, "valid": 1, "msg_id": 89, "runned": 72.75, "boot_no": 256, "digital":
> {"in": {"1": 1, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0}, "out": {"1": 0,
> "2": 0}}, "visible": 20, "ignition": 1, "location": {"course": 265,
> "altitude": 143, "latitude": 55.127888997395836, "longitude":
> 80.8046142578125}, "protocol": 4, "coldstart": 1, "timesource": "terminal",
> "receiver_on": 1, "external_power": 28.07, "internal_power": 4.19}
>
> Whitespacis percents in this record:
> # select array_length(regexp_split_to_array(data::text, text ' '),
> 1)*100./length(data::text) from events.data limit 1;
>       ?column?
> ---------------------
>  12.3417721518987342
>
> Whitespace in test data
>  # select count(*),avg(array_length(regexp_split_to_array(data::text, text '
> '), 1)*100./length(data::text)) from events.data ;
>  count  |         avg
> --------+---------------------
>  242222 | 12.3649234646118312


For jsonb (unlike json), data is not actually stored as json but in a
binary format.  It will generally be much larger than the text
representation in fact but in exchange for that many operations will
be faster.  The spaces you see are generated when the jsonb type is
converted to text for output.  I actually think it's pretty reasonable
to want to redact all spaces from such objects in all cases where
converstion to text happens (output functions, xxxto_json, etc)
because ~12% savings are nothing to sneeze at when moving large
documents in and out of the database.

On the flip side, a more verbose prettification would be pretty nice
too.  I wonder if a hypothetical GUC is the best way to control this
behavior...

merlin


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: JSONB spaces in text presentation

From
Adrian Klaver
Date:
On 09/24/2014 07:22 AM, Seref Arikan wrote:
> This is interesting. Most binary encoding methods I use produce smaller
> files than the text files for the same content.
> Having read your mail, I've realized that I have no reason to accept the
> same from the jsonb. I did a quick google search to see if it is wrong
> to expect binary encoding to decrease size and saw that I'm not alone
> (which still does not mean I'm being reasonable).
> This project: http://ubjson.org/#size is one of the hits which mentions
> some nice space gains thanks to binary encoding.
>
> The "much larger" part is a bit scary. Is this documented somewhere?

I believe Merlin is referring to the issue in this thread:

http://www.postgresql.org/message-id/27839.1407467863@sss.pgh.pa.us

>
> Best regards
> Seref
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: JSONB spaces in text presentation

From
"Ilya I. Ashchepkov"
Date:
IMHO, prettification is useful only for debugging.
It would be nice to have a session variable for the debug output with spaces, new lines and indentation.

On Wed, Sep 24, 2014 at 8:44 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Wed, Sep 24, 2014 at 2:44 AM, Ilya I. Ashchepkov <koctep@gmail.com> wrote:
> I'm sorry about sending email several times. I haven't understand, was it
> sent by gmail or not.
>
>
> On Wed, Sep 24, 2014 at 2:30 PM, John R Pierce <pierce@hogranch.com> wrote:
>>
>> On 9/24/2014 12:23 AM, Ilya I. Ashchepkov wrote:
>>>
>>>
>>> Is spaces is necessary in text presentation of JSONB?
>>> In my data resulting text contains ~12% of spaces.
>>
>>
>> can you show us an example of this?
>
>
> One record
> # select data from events.data limit 1;
> {"can": {"lls": {"1": 76.4}, "mhs": 4674.85, "rpm": 168.888, "speed": 74,
> "runned": 166855895, "fuel_consumption": 74213.5}, "crc": 10084, "gps": 1,
> "gsm": {"signal": 100}, "lls": {"1": 733, "2": 717}, "used": 19, "speed":
> 87.4, "valid": 1, "msg_id": 89, "runned": 72.75, "boot_no": 256, "digital":
> {"in": {"1": 1, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0}, "out": {"1": 0,
> "2": 0}}, "visible": 20, "ignition": 1, "location": {"course": 265,
> "altitude": 143, "latitude": 55.127888997395836, "longitude":
> 80.8046142578125}, "protocol": 4, "coldstart": 1, "timesource": "terminal",
> "receiver_on": 1, "external_power": 28.07, "internal_power": 4.19}
>
> Whitespacis percents in this record:
> # select array_length(regexp_split_to_array(data::text, text ' '),
> 1)*100./length(data::text) from events.data limit 1;
>       ?column?
> ---------------------
>  12.3417721518987342
>
> Whitespace in test data
>  # select count(*),avg(array_length(regexp_split_to_array(data::text, text '
> '), 1)*100./length(data::text)) from events.data ;
>  count  |         avg
> --------+---------------------
>  242222 | 12.3649234646118312


For jsonb (unlike json), data is not actually stored as json but in a
binary format.  It will generally be much larger than the text
representation in fact but in exchange for that many operations will
be faster.  The spaces you see are generated when the jsonb type is
converted to text for output.  I actually think it's pretty reasonable
to want to redact all spaces from such objects in all cases where
converstion to text happens (output functions, xxxto_json, etc)
because ~12% savings are nothing to sneeze at when moving large
documents in and out of the database.

On the flip side, a more verbose prettification would be pretty nice
too.  I wonder if a hypothetical GUC is the best way to control this
behavior...

merlin



--
С уважением,
Ащепков Илья koctep@gmail.com

Re: JSONB spaces in text presentation

From
"Ilya I. Ashchepkov"
Date:
With the same data:

# create cast (jsonb as bytea) without function;
# select sum(length(data::text))::float/sum(octet_length((data::jsonb)::bytea)) from data.packets;
     ?column?
-------------------
 0.630663654967513

and 0.554666142734544 without spaces

On Wed, Sep 24, 2014 at 9:22 PM, Seref Arikan <serefarikan@gmail.com> wrote:
This is interesting. Most binary encoding methods I use produce smaller files than the text files for the same content. 
Having read your mail, I've realized that I have no reason to accept the same from the jsonb. I did a quick google search to see if it is wrong to expect binary encoding to decrease size and saw that I'm not alone (which still does not mean I'm being reasonable). 
This project: http://ubjson.org/#size is one of the hits which mentions some nice space gains thanks to binary encoding. 

The "much larger" part is a bit scary. Is this documented somewhere?

Best regards
Seref


On Wed, Sep 24, 2014 at 2:44 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Wed, Sep 24, 2014 at 2:44 AM, Ilya I. Ashchepkov <koctep@gmail.com> wrote:
> I'm sorry about sending email several times. I haven't understand, was it
> sent by gmail or not.
>
>
> On Wed, Sep 24, 2014 at 2:30 PM, John R Pierce <pierce@hogranch.com> wrote:
>>
>> On 9/24/2014 12:23 AM, Ilya I. Ashchepkov wrote:
>>>
>>>
>>> Is spaces is necessary in text presentation of JSONB?
>>> In my data resulting text contains ~12% of spaces.
>>
>>
>> can you show us an example of this?
>
>
> One record
> # select data from events.data limit 1;
> {"can": {"lls": {"1": 76.4}, "mhs": 4674.85, "rpm": 168.888, "speed": 74,
> "runned": 166855895, "fuel_consumption": 74213.5}, "crc": 10084, "gps": 1,
> "gsm": {"signal": 100}, "lls": {"1": 733, "2": 717}, "used": 19, "speed":
> 87.4, "valid": 1, "msg_id": 89, "runned": 72.75, "boot_no": 256, "digital":
> {"in": {"1": 1, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0}, "out": {"1": 0,
> "2": 0}}, "visible": 20, "ignition": 1, "location": {"course": 265,
> "altitude": 143, "latitude": 55.127888997395836, "longitude":
> 80.8046142578125}, "protocol": 4, "coldstart": 1, "timesource": "terminal",
> "receiver_on": 1, "external_power": 28.07, "internal_power": 4.19}
>
> Whitespacis percents in this record:
> # select array_length(regexp_split_to_array(data::text, text ' '),
> 1)*100./length(data::text) from events.data limit 1;
>       ?column?
> ---------------------
>  12.3417721518987342
>
> Whitespace in test data
>  # select count(*),avg(array_length(regexp_split_to_array(data::text, text '
> '), 1)*100./length(data::text)) from events.data ;
>  count  |         avg
> --------+---------------------
>  242222 | 12.3649234646118312


For jsonb (unlike json), data is not actually stored as json but in a
binary format.  It will generally be much larger than the text
representation in fact but in exchange for that many operations will
be faster.  The spaces you see are generated when the jsonb type is
converted to text for output.  I actually think it's pretty reasonable
to want to redact all spaces from such objects in all cases where
converstion to text happens (output functions, xxxto_json, etc)
because ~12% savings are nothing to sneeze at when moving large
documents in and out of the database.

On the flip side, a more verbose prettification would be pretty nice
too.  I wonder if a hypothetical GUC is the best way to control this
behavior...

merlin


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general




--
С уважением,
Ащепков Илья koctep@gmail.com

Re: JSONB spaces in text presentation

From
Oleg Bartunov
Date:
Check slides 17-20 of http://www.sai.msu.su/~megera/postgres/talks/hstore-dublin-2013.pdf to understand, what 'binary format' means. The slides describes binary storage for nested hstore, not jsonb, but you'll get the idea.

On Wed, Sep 24, 2014 at 6:22 PM, Seref Arikan <serefarikan@gmail.com> wrote:
This is interesting. Most binary encoding methods I use produce smaller files than the text files for the same content. 
Having read your mail, I've realized that I have no reason to accept the same from the jsonb. I did a quick google search to see if it is wrong to expect binary encoding to decrease size and saw that I'm not alone (which still does not mean I'm being reasonable). 
This project: http://ubjson.org/#size is one of the hits which mentions some nice space gains thanks to binary encoding. 

The "much larger" part is a bit scary. Is this documented somewhere?

Best regards
Seref


On Wed, Sep 24, 2014 at 2:44 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Wed, Sep 24, 2014 at 2:44 AM, Ilya I. Ashchepkov <koctep@gmail.com> wrote:
> I'm sorry about sending email several times. I haven't understand, was it
> sent by gmail or not.
>
>
> On Wed, Sep 24, 2014 at 2:30 PM, John R Pierce <pierce@hogranch.com> wrote:
>>
>> On 9/24/2014 12:23 AM, Ilya I. Ashchepkov wrote:
>>>
>>>
>>> Is spaces is necessary in text presentation of JSONB?
>>> In my data resulting text contains ~12% of spaces.
>>
>>
>> can you show us an example of this?
>
>
> One record
> # select data from events.data limit 1;
> {"can": {"lls": {"1": 76.4}, "mhs": 4674.85, "rpm": 168.888, "speed": 74,
> "runned": 166855895, "fuel_consumption": 74213.5}, "crc": 10084, "gps": 1,
> "gsm": {"signal": 100}, "lls": {"1": 733, "2": 717}, "used": 19, "speed":
> 87.4, "valid": 1, "msg_id": 89, "runned": 72.75, "boot_no": 256, "digital":
> {"in": {"1": 1, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0}, "out": {"1": 0,
> "2": 0}}, "visible": 20, "ignition": 1, "location": {"course": 265,
> "altitude": 143, "latitude": 55.127888997395836, "longitude":
> 80.8046142578125}, "protocol": 4, "coldstart": 1, "timesource": "terminal",
> "receiver_on": 1, "external_power": 28.07, "internal_power": 4.19}
>
> Whitespacis percents in this record:
> # select array_length(regexp_split_to_array(data::text, text ' '),
> 1)*100./length(data::text) from events.data limit 1;
>       ?column?
> ---------------------
>  12.3417721518987342
>
> Whitespace in test data
>  # select count(*),avg(array_length(regexp_split_to_array(data::text, text '
> '), 1)*100./length(data::text)) from events.data ;
>  count  |         avg
> --------+---------------------
>  242222 | 12.3649234646118312


For jsonb (unlike json), data is not actually stored as json but in a
binary format.  It will generally be much larger than the text
representation in fact but in exchange for that many operations will
be faster.  The spaces you see are generated when the jsonb type is
converted to text for output.  I actually think it's pretty reasonable
to want to redact all spaces from such objects in all cases where
converstion to text happens (output functions, xxxto_json, etc)
because ~12% savings are nothing to sneeze at when moving large
documents in and out of the database.

On the flip side, a more verbose prettification would be pretty nice
too.  I wonder if a hypothetical GUC is the best way to control this
behavior...

merlin


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: JSONB spaces in text presentation

From
John R Pierce
Date:
On 9/24/2014 7:22 AM, Seref Arikan wrote:
> This is interesting. Most binary encoding methods I use produce
> smaller files than the text files for the same content.

'1'   vs INTEGER 1 ... 1 byte vs 4 bytes.

now add metadata necessary to represent the original json structure.



--
john r pierce                                      37N 122W
somewhere on the middle of the left coast



Re: JSONB spaces in text presentation

From
Seref Arikan
Date:
Thanks Oleg, I'll check the slides.


On Wed, Sep 24, 2014 at 8:07 PM, Oleg Bartunov <obartunov@gmail.com> wrote:
Check slides 17-20 of http://www.sai.msu.su/~megera/postgres/talks/hstore-dublin-2013.pdf to understand, what 'binary format' means. The slides describes binary storage for nested hstore, not jsonb, but you'll get the idea.

On Wed, Sep 24, 2014 at 6:22 PM, Seref Arikan <serefarikan@gmail.com> wrote:
This is interesting. Most binary encoding methods I use produce smaller files than the text files for the same content. 
Having read your mail, I've realized that I have no reason to accept the same from the jsonb. I did a quick google search to see if it is wrong to expect binary encoding to decrease size and saw that I'm not alone (which still does not mean I'm being reasonable). 
This project: http://ubjson.org/#size is one of the hits which mentions some nice space gains thanks to binary encoding. 

The "much larger" part is a bit scary. Is this documented somewhere?

Best regards
Seref


On Wed, Sep 24, 2014 at 2:44 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Wed, Sep 24, 2014 at 2:44 AM, Ilya I. Ashchepkov <koctep@gmail.com> wrote:
> I'm sorry about sending email several times. I haven't understand, was it
> sent by gmail or not.
>
>
> On Wed, Sep 24, 2014 at 2:30 PM, John R Pierce <pierce@hogranch.com> wrote:
>>
>> On 9/24/2014 12:23 AM, Ilya I. Ashchepkov wrote:
>>>
>>>
>>> Is spaces is necessary in text presentation of JSONB?
>>> In my data resulting text contains ~12% of spaces.
>>
>>
>> can you show us an example of this?
>
>
> One record
> # select data from events.data limit 1;
> {"can": {"lls": {"1": 76.4}, "mhs": 4674.85, "rpm": 168.888, "speed": 74,
> "runned": 166855895, "fuel_consumption": 74213.5}, "crc": 10084, "gps": 1,
> "gsm": {"signal": 100}, "lls": {"1": 733, "2": 717}, "used": 19, "speed":
> 87.4, "valid": 1, "msg_id": 89, "runned": 72.75, "boot_no": 256, "digital":
> {"in": {"1": 1, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0}, "out": {"1": 0,
> "2": 0}}, "visible": 20, "ignition": 1, "location": {"course": 265,
> "altitude": 143, "latitude": 55.127888997395836, "longitude":
> 80.8046142578125}, "protocol": 4, "coldstart": 1, "timesource": "terminal",
> "receiver_on": 1, "external_power": 28.07, "internal_power": 4.19}
>
> Whitespacis percents in this record:
> # select array_length(regexp_split_to_array(data::text, text ' '),
> 1)*100./length(data::text) from events.data limit 1;
>       ?column?
> ---------------------
>  12.3417721518987342
>
> Whitespace in test data
>  # select count(*),avg(array_length(regexp_split_to_array(data::text, text '
> '), 1)*100./length(data::text)) from events.data ;
>  count  |         avg
> --------+---------------------
>  242222 | 12.3649234646118312


For jsonb (unlike json), data is not actually stored as json but in a
binary format.  It will generally be much larger than the text
representation in fact but in exchange for that many operations will
be faster.  The spaces you see are generated when the jsonb type is
converted to text for output.  I actually think it's pretty reasonable
to want to redact all spaces from such objects in all cases where
converstion to text happens (output functions, xxxto_json, etc)
because ~12% savings are nothing to sneeze at when moving large
documents in and out of the database.

On the flip side, a more verbose prettification would be pretty nice
too.  I wonder if a hypothetical GUC is the best way to control this
behavior...

merlin


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general