WIP Patch: Add a function that returns binary JSONB as a bytea - Mailing list pgsql-hackers

From Kevin Van
Subject WIP Patch: Add a function that returns binary JSONB as a bytea
Date
Msg-id CAJArn6QFs2EZsX0NyTkBPpdmPc28o-8ze_ZvjwBLGNv+WmrgHw@mail.gmail.com
Whole thread Raw
Responses Re: WIP Patch: Add a function that returns binary JSONB as a bytea  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
This patch adds a new function that allows callers to receive binary jsonb. 
This change was proposed in the discussion here:
and the primary motivation is to reduce database load by skipping jsonb to string conversion (with the expectation that the application can parse binary jsonb).

This patch is still a WIP and I am looking for any guidance on the approach. If it is valid, I'd also appreciate any guidance on what kind of test coverage would be appropriate for a change like this.

Note: I prepended the version number "1" to the byte array -- this is similar to what jsonb_recv does in the same file. 

I have been testing on a x86-64 processor running MacOSX and do not know what the behavior is on other platforms. 

This patch is off of master and compiles successfully. The following is some example output using psql:

database=# SELECT jsonb_raw_bytes('{"b":12345}'::jsonb);

                     jsonb_raw_bytes                      

----------------------------------------------------------

 \x0101000020010000800d0000106200000028000000018001002909

(1 row)


database=# SELECT jsonb_raw_bytes('{"a":{"b":12345}}'::jsonb->'a');

                     jsonb_raw_bytes                      

----------------------------------------------------------

 \x0101000020010000800d0000106200000028000000018001002909

(1 row)



Some preliminary testing on my own machine shows me that this change has a significant impact on performance.


I used psql to select a jsonb column from all the rows in a table (about 1 million rows) where the json data was roughly 400-500 bytes per record.


database=# \timing

Timing is on.

database=# \o /tmp/raw_bytes_out.txt;

database=# SELECT jsonb_raw_bytes(data) FROM datatable;

Time: 2582.545 ms (00:02.583)

database=# \o /tmp/json_out.txt;

database=# SELECT data FROM datatable;

Time: 5653.235 ms (00:05.653)


Of note is that the size of raw_bytes_out.txt in the example is roughly twice that of json_out.txt so the timing difference is not due to less data being transferred.


Attachment

pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: FDW Parallel Append
Next
From: Pavel Stehule
Date:
Subject: Re: ToDo: show size of partitioned table