Thread: Bypassing NULL elements in row_to_json function

Bypassing NULL elements in row_to_json function

From
Michael Nolan
Date:
I'm looking at the possibility of using JSON as a data exchange format
with some apps running on both PCs and Macs.   .

The table I would be exporting has a lot of NULL values in it.  Is
there any way to skip the NULL values in the row_to_json function and
include only the fields that are non-null?
--
Mike Nolan
nolan@tssi.com


Re: Bypassing NULL elements in row_to_json function

From
Raymond O'Donnell
Date:
On 08/04/2016 16:31, Michael Nolan wrote:
> I'm looking at the possibility of using JSON as a data exchange format
> with some apps running on both PCs and Macs.   .
>
> The table I would be exporting has a lot of NULL values in it.  Is
> there any way to skip the NULL values in the row_to_json function and
> include only the fields that are non-null?

You could use a CTE to filter out the nulls (not tested - I haven't used
JSON in PG (yet!)):

with no_nulls as (
  select ... from my_table
  where whatever is not null
)
select row_to_json(....) from no_nulls;

Ray.

--
Raymond O'Donnell :: Galway :: Ireland
rod@iol.ie


Re: Bypassing NULL elements in row_to_json function

From
"David G. Johnston"
Date:
On Fri, Apr 8, 2016 at 8:53 AM, Raymond O'Donnell <rod@iol.ie> wrote:
On 08/04/2016 16:31, Michael Nolan wrote:
> I'm looking at the possibility of using JSON as a data exchange format
> with some apps running on both PCs and Macs.   .
>
> The table I would be exporting has a lot of NULL values in it.  Is
> there any way to skip the NULL values in the row_to_json function and
> include only the fields that are non-null?

You could use a CTE to filter out the nulls (not tested - I haven't used
JSON in PG (yet!)):

with no_nulls as (
  select ... from my_table
  where whatever is not null
)
select row_to_json(....) from no_nulls;

​One of us is confused.  I'm reading this as "I want a row_to_json" call to generate objects with different keys depending on whether a given key would have a null - in which case exclude the key.

I
​ think one would have to simply allow row_to_json to populate the keys with null values and then post-process them away:

json_strip_nulls(row_to_json(...))​


David J.

Re: Bypassing NULL elements in row_to_json function

From
Adrian Klaver
Date:
On 04/08/2016 08:31 AM, Michael Nolan wrote:
> I'm looking at the possibility of using JSON as a data exchange format
> with some apps running on both PCs and Macs.   .
>
> The table I would be exporting has a lot of NULL values in it.  Is
> there any way to skip the NULL values in the row_to_json function and
> include only the fields that are non-null?

I guess it depends on your data.

Are the NULLs all in one field or scattered across fields?

Imagining this scenario:

fld_1     fld_2    fld_3
'val1_1'  NULL     'val1_3
NULL      'val2_2' 'val2_3'
'val3_3'  'val3_2'  NULL

How do you deal with the holes(NULL) on the receiving end?


> --
> Mike Nolan
> nolan@tssi.com
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Bypassing NULL elements in row_to_json function

From
Michael Nolan
Date:
It looks like json_strip_nulls() may be what I need, I'm currently on 9.3, which doesn't have that function but may be in a position to upgrade to 9.5 this summer.   I think the apps that would be receiving the data can deal with any resulting 'holes' in the data set by just setting them to null.  
--
Mike Nolan


Re: Bypassing NULL elements in row_to_json function

From
Michael Nolan
Date:
I was able to try it on a test server, the combination of row_to_json() and json_strip_nulls() worked exactly as I had hoped.  Stripping nulls reduced the JSON file by over 50%. (The data I needed to export has around 900,000 rows, so it gets quite large.)

I've got a test file I can make available to app developers.

My next task is to find out if validating and importing a JSON file into a table is as easy as exporting a table in JSON turned out to be. Thanks for the help.
--
Mike Nolan



Re: Bypassing NULL elements in row_to_json function

From
Michael Nolan
Date:

2nd Followup:  It turns out that loading a table from a JSON string is more complicated than going from a table to JSON, perhaps for good reason.  There does not appear to be a direct inverse to the row_to_json() function, but it wasn't difficult for me to write a PHP program that takes the JSON file I created the other day and converts it back to a series of inserts, recreating the original table. 

Of course this simple program does NO validation (not that this file needed any), so if the JSON string is not well-formed for any of a number of reasons, or if it is not properly mapped to the table into which the inserts are made, an insert could fail or result in incorrect data. 
--
Mike Nolan

Re: Bypassing NULL elements in row_to_json function

From
"David G. Johnston"
Date:
On Sat, Apr 9, 2016 at 9:48 PM, Michael Nolan <htfoot@gmail.com> wrote:

2nd Followup:  It turns out that loading a table from a JSON string is more complicated than going from a table to JSON, perhaps for good reason.  There does not appear to be a direct inverse to the row_to_json() function, but it wasn't difficult for me to write a PHP program that takes the JSON file I created the other day and converts it back to a series of inserts, recreating the original table. 

Of course this simple program does NO validation (not that this file needed any), so if the JSON string is not well-formed for any of a number of reasons, or if it is not properly mapped to the table into which the inserts are made, an insert could fail or result in incorrect data. 
--
Mike Nolan


​json_populate_record(base anyelement, from_json json)
json_populate_recordset(base anyelement, from_json json)

Exists in 9.3 too...though if you are going heavy json I'd suggest doing whatever you can to keep up with the recent releases.

David J.

Re: Bypassing NULL elements in row_to_json function

From
Michael Nolan
Date:


On Sun, Apr 10, 2016 at 2:30 AM, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Sat, Apr 9, 2016 at 9:48 PM, Michael Nolan <htfoot@gmail.com> wrote:

2nd Followup:  It turns out that loading a table from a JSON string is more complicated than going from a table to JSON, perhaps for good reason.  There does not appear to be a direct inverse to the row_to_json() function, but it wasn't difficult for me to write a PHP program that takes the JSON file I created the other day and converts it back to a series of inserts, recreating the original table. 

Of course this simple program does NO validation (not that this file needed any), so if the JSON string is not well-formed for any of a number of reasons, or if it is not properly mapped to the table into which the inserts are made, an insert could fail or result in incorrect data. 
--
Mike Nolan


​json_populate_record(base anyelement, from_json json)
json_populate_recordset(base anyelement, from_json json)

Exists in 9.3 too...though if you are going heavy json I'd suggest doing whatever you can to keep up with the recent releases.

David J.


If there's a way to use the json_populate_record() or json_populate_recordset() functions to load a table from a JSON file (eg, using copy), it would be nice if it was better documented.  I did find a tool that loads a JSON file into a table (pgfutter), and even loaded one row from that table into another table using json_populate_record(), but the 'subquery returned multiple rows' issue wouldn't let me do the entire table.

But that still doesn't deal with validating individual fields or checking that the JSON is complete and consistent with the table to be loaded. 
--
Mike Nolan

Re: Bypassing NULL elements in row_to_json function

From
"David G. Johnston"
Date:
On Sun, Apr 10, 2016 at 7:49 AM, Michael Nolan <htfoot@gmail.com> wrote:


On Sun, Apr 10, 2016 at 2:30 AM, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Sat, Apr 9, 2016 at 9:48 PM, Michael Nolan <htfoot@gmail.com> wrote:

2nd Followup:  It turns out that loading a table from a JSON string is more complicated than going from a table to JSON, perhaps for good reason.  There does not appear to be a direct inverse to the row_to_json() function, but it wasn't difficult for me to write a PHP program that takes the JSON file I created the other day and converts it back to a series of inserts, recreating the original table. 

Of course this simple program does NO validation (not that this file needed any), so if the JSON string is not well-formed for any of a number of reasons, or if it is not properly mapped to the table into which the inserts are made, an insert could fail or result in incorrect data. 
--
Mike Nolan


​json_populate_record(base anyelement, from_json json)
json_populate_recordset(base anyelement, from_json json)

Exists in 9.3 too...though if you are going heavy json I'd suggest doing whatever you can to keep up with the recent releases.

David J.


If there's a way to use the json_populate_record() or json_populate_recordset() functions to load a table from a JSON file (eg, using copy), it would be nice if it was better documented.  I did find a tool that loads a JSON file into a table (pgfutter), and even loaded one row from that table into another table using json_populate_record(), but the 'subquery returned multiple rows' issue wouldn't let me do the entire table.

But that still doesn't deal with validating individual fields or checking that the JSON is complete and consistent with the table to be loaded. 


It isn't that involved once you've learned generally how to call normal record functions and also set-returning functions (that later must be in the FROM clause of the query).  If you provide what you attempted its becomes easier to explain away your mis-understanding.

It doesn't work with COPY.  You have to write an explicit INSERT+SELECT query where the text of the JSON is a parameter.  Your client library should let you do this.  If you are using "psql", which doesn't support parameters, you up having to store the json in a psql variable and reference that in the function.
​​
INSERT INTO %I SELECT * FROM json_populate_recordset(null::%I, $1)

​The function ensures that column order is consistent so "INSERT INTO %I" is all you need to write.

​Data validation is why we invented CHECK constraints - if you need more functionality than the simple mechanical conversion from a json object to a table row you will need to write code somewhere to do the additional work.  All json_populate_record(set) promises is that the above command will work.


I suppose the way you'd write your attempt that failed would be similar to:

INSERT INTO %I
SELECT rec.* FROM src_table_with_json LATERAL json_populate_record(null::%I, src_table_with_json.json_column);

Again, seeing what you actually did would be helpful - I'm having trouble imaging what you did to provoke that particular error.

David J.

Re: Bypassing NULL elements in row_to_json function

From
Adrian Klaver
Date:
On 04/10/2016 07:49 AM, Michael Nolan wrote:
>
>
> On Sun, Apr 10, 2016 at 2:30 AM, David G. Johnston
> <david.g.johnston@gmail.com <mailto:david.g.johnston@gmail.com>> wrote:
>
>     On Sat, Apr 9, 2016 at 9:48 PM, Michael Nolan <htfoot@gmail.com
>     <mailto:htfoot@gmail.com>>wrote:
>
>
>         2nd Followup:  It turns out that loading a table from a JSON
>         string is more complicated than going from a table to JSON,
>         perhaps for good reason.  There does not appear to be a direct
>         inverse to the row_to_json() function, but it wasn't difficult
>         for me to write a PHP program that takes the JSON file I created
>         the other day and converts it back to a series of inserts,
>         recreating the original table.
>
>         Of course this simple program does NO validation (not that this
>         file needed any), so if the JSON string is not well-formed for
>         any of a number of reasons, or if it is not properly mapped to
>         the table into which the inserts are made, an insert could fail
>         or result in incorrect data.
>         --
>         Mike Nolan
>
>
>     ​See: http://www.postgresql.org/docs/9.5/interactive/functions-json.html
>
>     ​json_populate_record(base anyelement, from_json json)
>     json_populate_recordset(base anyelement, from_json json)
>
>     Exists in 9.3 too...though if you are going heavy json I'd suggest
>     doing whatever you can to keep up with the recent releases.
>
>     David J.
>
>
> If there's a way to use the json_populate_record() or
> json_populate_recordset() functions to load a table from a JSON file
> (eg, using copy), it would be nice if it was better documented.  I did
> find a tool that loads a JSON file into a table (pgfutter), and even
> loaded one row from that table into another table using
> json_populate_record(), but the 'subquery returned multiple rows' issue
> wouldn't let me do the entire table.

Does the receiving table have the same structure as the sending table?

Is the receiving table already populated with data?

>
> But that still doesn't deal with validating individual fields or
> checking that the JSON is complete and consistent with the table to be
> loaded.

Well you know the JSON is not complete as you dropped all the fields in
each row that had NULL values.
Validation is a more complex subject and honestly something I do not
think could be accomplished in straight SQL. In other words it would
need to be run through some ETL tool. I use Python so as an example:

https://petl.readthedocs.org/en/latest/

In particular:

https://petl.readthedocs.org/en/latest/io.html#json-files

https://petl.readthedocs.org/en/latest/transform.html#validation

> --
> Mike Nolan


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Bypassing NULL elements in row_to_json function

From
Michael Nolan
Date:
Here's what I did:

 \d gold1604_test
Table "uscf.gold1604_test"
 Column | Type | Modifiers
--------+------+-----------
 data   | json |

Some sample data:
 {"id":"10000001","name":"MISNER, J NATHAN","st":"NY","exp":"2012-05-31","sts":
"A"} +

  {"id":"10000002","name":"MISNER, JUDY","st":"TN","exp":"2007-07-31","sts":"I"}
     +

  {"id":"10000003","name":"MISNER, J AMSCHEL","st":"NY","exp":"2007-05-31","sts"
:"A"}+


uscf-> \d goldmast_test
         Table "uscf.goldmast_test"
 Column |         Type          | Modifiers
--------+-----------------------+-----------
 id     | character varying(8)  |
 name   | character varying(40) |
 st     | character varying(2)  |
 exp    | date                  |
 sts    | character(1)          |
 supp   | date                  |
 rrtg   | character varying(8)  |
 qrtg   | character varying(8)  |
 brtg   | character varying(8)  |
 oqrtg  | character varying(8)  |
 obrtg  | character varying(8)  |
 fid    | character varying(12) |




insert into goldmast_test select * from json_populate_record(NULL::"goldmast_test", (select * from gold1604_test limit 1) )
 produces:
uscf=> select * from goldmast_test;
    id    |       name       | st |    exp     | sts | supp | rrtg | qrtg | brtg
 | oqrtg | obrtg | fid
----------+------------------+----+------------+-----+------+------+------+-----
-+-------+-------+-----
 10000001 | MISNER, J NATHAN | NY | 2012-05-31 | A   |      |      |      |
 |       |       |
(1 row)

The fact that the null values were stripped out is not an issue here.

But,
uscf=> insert into goldmast_test select * from json_populate_record(NULL::"goldmast_test", (select * from gold1604_test limit 2) )
uscf-> \g
ERROR:  more than one row returned by a subquery used as an expression

Is there a way to get around the one row per subquery issue?
--
Mike Nolan

Re: Bypassing NULL elements in row_to_json function

From
Adrian Klaver
Date:
On 04/10/2016 08:39 AM, Michael Nolan wrote:
> Here's what I did:
>
>   \d gold1604_test
> Table "uscf.gold1604_test"
>   Column | Type | Modifiers
> --------+------+-----------
>   data   | json |
>
> Some sample data:
>   {"id":"10000001","name":"MISNER, J
> NATHAN","st":"NY","exp":"2012-05-31","sts":
> "A"} +
>
>    {"id":"10000002","name":"MISNER,
> JUDY","st":"TN","exp":"2007-07-31","sts":"I"}
>       +
>
>    {"id":"10000003","name":"MISNER, J
> AMSCHEL","st":"NY","exp":"2007-05-31","sts"
> :"A"}+
>
>
> uscf-> \d goldmast_test
>           Table "uscf.goldmast_test"
>   Column |         Type          | Modifiers
> --------+-----------------------+-----------
>   id     | character varying(8)  |
>   name   | character varying(40) |
>   st     | character varying(2)  |
>   exp    | date                  |
>   sts    | character(1)          |
>   supp   | date                  |
>   rrtg   | character varying(8)  |
>   qrtg   | character varying(8)  |
>   brtg   | character varying(8)  |
>   oqrtg  | character varying(8)  |
>   obrtg  | character varying(8)  |
>   fid    | character varying(12) |
>
>
>
>
> insert into goldmast_test select * from
> json_populate_record(NULL::"goldmast_test", (select * from gold1604_test
> limit 1) )
>   produces:
> uscf=> select * from goldmast_test;
>      id    |       name       | st |    exp     | sts | supp | rrtg |
> qrtg | brtg
>   | oqrtg | obrtg | fid
> ----------+------------------+----+------------+-----+------+------+------+-----
> -+-------+-------+-----
>   10000001 | MISNER, J NATHAN | NY | 2012-05-31 | A   |      |      |      |
>   |       |       |
> (1 row)
>
> The fact that the null values were stripped out is not an issue here.
>
> But,
> uscf=> insert into goldmast_test select * from
> json_populate_record(NULL::"goldmast_test", (select * from gold1604_test
> limit 2) )
> uscf-> \g
> ERROR:  more than one row returned by a subquery used as an expression
>
> Is there a way to get around the one row per subquery issue?

Per Davids post:

http://www.postgresql.org/docs/9.5/interactive/functions-json.html

json_populate_recordset(base anyelement, from_json json)

Expands the outermost array of objects in from_json to a set of rows
whose columns match the record type defined by base (see note below).
select * from json_populate_recordset(null::myrowtype,
'[{"a":1,"b":2},{"a":3,"b":4}]')

> --
> Mike Nolan


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Bypassing NULL elements in row_to_json function

From
"David G. Johnston"
Date:
On Sun, Apr 10, 2016 at 8:39 AM, Michael Nolan <htfoot@gmail.com> wrote:
Here's what I did:

 \d gold1604_test
Table "uscf.gold1604_test"
 Column | Type | Modifiers
--------+------+-----------
 data   | json |

Some sample data:
 {"id":"10000001","name":"MISNER, J NATHAN","st":"NY","exp":"2012-05-31","sts":
"A"} +

  {"id":"10000002","name":"MISNER, JUDY","st":"TN","exp":"2007-07-31","sts":"I"}
     +

  {"id":"10000003","name":"MISNER, J AMSCHEL","st":"NY","exp":"2007-05-31","sts"
:"A"}+


​(I think) PostgreSQL assumes that there is only a single top-level json element, whether it be an array or an object.  The first thing you'd have to do is split on the newline and create a PostgreSQL text array.

But,
uscf=> insert into goldmast_test select * from json_populate_record(NULL::"goldmast_test", (select * from gold1604_test limit 2) )
uscf-> \g
ERROR:  more than one row returned by a subquery used as an expression

Is there a way to get around the one row per subquery issue?

​Yes, use LATERAL.

Something like the following should work (not tested):

INSERT INTO goldmast_test
SELECT jpr.* 
FROM gold1604_test 
LATERAL json_populate_record(null::goldmast_test", data) AS jpr

ideally you could just do (not tested):

INSERT INTO goldmast_test
SELECT jpr.* 
FROM (SELECT t::json FROM unnest(string_to_array(?::text, E'\n') un (t))) src j
LATERAL json_populate_record(null::goldmast_test", j) AS jpr

Where the "?::text" is placeholder for the textual JSON being handed to the query thus avoiding the temporary "gold1604_test" table.

David J.

Re: Bypassing NULL elements in row_to_json function

From
Michael Nolan
Date:

In case it wasn't clear, the sample data was 3 rows of data.  (There are actually around 890K rows in the table pgfutter built from the JSON file.)
-
Mike Nolan

Re: Bypassing NULL elements in row_to_json function

From
Adrian Klaver
Date:
On 04/10/2016 09:24 AM, David G. Johnston wrote:
> On Sun, Apr 10, 2016 at 8:39 AM, Michael Nolan <htfoot@gmail.com
> <mailto:htfoot@gmail.com>>wrote:
>
>     Here's what I did:
>
>       \d gold1604_test
>     Table "uscf.gold1604_test"
>       Column | Type | Modifiers
>     --------+------+-----------
>       data   | json |
>
>     Some sample data:
>       {"id":"10000001","name":"MISNER, J
>     NATHAN","st":"NY","exp":"2012-05-31","sts":
>     "A"} +
>
>        {"id":"10000002","name":"MISNER,
>     JUDY","st":"TN","exp":"2007-07-31","sts":"I"}
>           +
>
>        {"id":"10000003","name":"MISNER, J
>     AMSCHEL","st":"NY","exp":"2007-05-31","sts"
>     :"A"}+
>
>
> ​(I think) PostgreSQL assumes that there is only a single top-level json
> element, whether it be an array or an object.  The first thing you'd
> have to do is split on the newline and create a PostgreSQL text array.
>
>
>     But,
>     uscf=> insert into goldmast_test select * from
>     json_populate_record(NULL::"goldmast_test", (select * from
>     gold1604_test limit 2) )
>     uscf-> \g
>     ERROR:  more than one row returned by a subquery used as an expression
>
>     Is there a way to get around the one row per subquery issue?
>
>
> ​Yes, use LATERAL.
>
> Something like the following should work (not tested):
>
> INSERT INTO goldmast_test
> SELECT jpr.*
> FROM gold1604_test
> LATERAL json_populate_record(null::goldmast_test", data) AS jpr

I can confirm this works after a little clean up:

test=> INSERT INTO goldmast_test
SELECT jpr.*
FROM gold1604_test,
LATERAL json_populate_record(null::"goldmast_test", data) AS jpr;
INSERT 0 3

test=> select * from goldmast_test ;
     id    |       name        | st |    exp     | sts | supp | rrtg |
qrtg | brtg | oqrtg | obrtg | fid
----------+-------------------+----+------------+-----+------+------+------+------+-------+-------+-----
  10000001 | MISNER, J NATHAN  | NY | 2012-05-31 | A   |      |      |
     |      |       |       |
  10000002 | MISNER, JUDY      | TN | 2007-07-31 | I   |      |      |
     |      |       |       |
  10000003 | MISNER, J AMSCHEL | NY | 2007-05-31 | A   |      |      |
     |      |       |       |
(3 rows)

While trying to figure out how it works I discovered the LATERAL is not
necessary:

test=> INSERT INTO goldmast_test
SELECT jpr.*
FROM gold1604_test,
json_populate_record(null::"goldmast_test", data) AS jpr;
INSERT 0 3

test=> select * from goldmast_test ;
     id    |       name        | st |    exp     | sts | supp | rrtg |
qrtg | brtg | oqrtg | obrtg | fid
----------+-------------------+----+------------+-----+------+------+------+------+-------+-------+-----
  10000001 | MISNER, J NATHAN  | NY | 2012-05-31 | A   |      |      |
     |      |       |       |
  10000002 | MISNER, JUDY      | TN | 2007-07-31 | I   |      |      |
     |      |       |       |
  10000003 | MISNER, J AMSCHEL | NY | 2007-05-31 | A   |      |      |
     |      |       |       |
  10000001 | MISNER, J NATHAN  | NY | 2012-05-31 | A   |      |      |
     |      |       |       |
  10000002 | MISNER, JUDY      | TN | 2007-07-31 | I   |      |      |
     |      |       |       |
  10000003 | MISNER, J AMSCHEL | NY | 2007-05-31 | A   |      |      |
     |      |       |       |
(6 rows)


>
> ideally you could just do (not tested):
>
> INSERT INTO goldmast_test
> SELECT jpr.*
> FROM (SELECT t::json FROM unnest(string_to_array(?::text, E'\n') un
> (t))) src j
> LATERAL json_populate_record(null::goldmast_test", j) AS jpr
>
> Where the "?::text" is placeholder for the textual JSON being handed to
> the query thus avoiding the temporary "gold1604_test" table.
>
> David J.
>


--
Adrian Klaver
adrian.klaver@aklaver.com