Re: '{"x": 42, "y": null}'::jsonb != '{"x": 42}'::jsonb ... Really? - Mailing list pgsql-general

From Bryn Llewellyn
Subject Re: '{"x": 42, "y": null}'::jsonb != '{"x": 42}'::jsonb ... Really?
Date
Msg-id 5933E023-4B74-4401-9B9D-8434DF1A6E21@yugabyte.com
Whole thread Raw
In response to Re: '{"x": 42, "y": null}'::jsonb != '{"x": 42}'::jsonb ... Really?  (Alban Hertroys <haramrae@gmail.com>)
List pgsql-general
haramrae@gmail.com wrote

bryn@yugabyte.com wrote:

I implemented two complementary functions:

—"no_null_keys()" checks that a "jsonb" value has no occurrences of « "some key": null »

—"strip_null_keys()" removes « "some key": null » occurrences from a "jsonb" value

The code checks with "no_null_keys()" that, as expected, no ingested JSON document has an occurrence of « "some key": null ».

And it uses "strip_null_keys()" on the output of "to_jsonb()" — and, as appropriate, any other built-in JSON function that produces a "jsonb" value.

It was straightforward to implement these two functions by using REGEXP built-in functionality on the canonically formatted "text" value produced by the "jsonb::text" typecast.

In my experience, using regular expressions applied to document formats tends to get you false positives. I’d be worried about books with titles similar to:

«
How we wrote a regular expression to detect occurrences of "some key": null in our JSON documents
»

For stripping those null occurrences, you are aware of the json_strip_nulls(json) and jsonb_strip_nulls(jsonb) functions, right?

For detecting them on a recent PG, the @? operator or json_path_exists(json, jsonpath) functions would probably do the trick.
I am not too familiar with JSONPATH expressions, but I expect (it passed some preliminary testing) this would detect your nulls just fine, while taking JSON semantics into account:

jsonb_path_exists(document, '$.** ? (@ == null)'::jsonpath)

For PG-specifics on JSONPATH, see section 9.16.2: 
https://www.postgresql.org/docs/current/functions-json.html#FUNCTIONS-SQLJSON-OP-TABLE

A recursive query is another possible solution. It would probably perform far worse...

Thank you so much, Alban, for taking an interest in my « "some key": null » saga.

The "G", "E", "L", and "O" keys on my keyboard were broken. They're fixed now, and so "postgres strip keys with null value from jsonb object" got me to this (but, strangely, not to the PG docs):

This is embarrassing. I wrote that doc, along with the rest of the content in the "JSON data types and functionality" section just before COVID hit us. Apparently, life under lockdown has damaged my corpus callosum. I typed up my proof-of-concept code and emails with one half of my brain—and it was the other half that wrote that documentation.

So thanks for the memory jog. My excuse is that (as was the case with my malformed ISBN that Peter Holzer pointed out in a separate thread) I wanted just to show myself, as a proof-of-concept, that stripping nulls was feasible—so I gave it no more thought once I'd done that. But, I suppose, that's not excuse...

Anyway, my "strip_null_keys()" is already on the scrapheap. And the body of my "no_null_keys()" reduced to a single line:

create function no_null_keys(j in jsonb)
  returns boolean
  immutable
  language sql
as $body$
  select j = jsonb_strip_nulls(j);
$body$;


You might argue that I don't need to bother with the encapsulation. But it makes testing easier—and I'm trusting that inlining works as advertised.

Your point about false positives is well taken. So, just for sport:

create type             t1 as (k int, v text);
create type             t2 as (a int, b int, c t1, d t1, e text[]);

create function j()
  returns jsonb
  language plpgsql
as $body$
declare
  t    constant text    not null := 'How we wrote a regular expression to detect occurrences of « "some key": null » in our JSON documents!';
  c1   constant t1      not null := (17, t);
  c2   constant t1      not null := (29, null);
  arr  constant text[]  not null := array['x', null::text, t];
  r    constant t2      not null := (42, null, c1, c2, arr);
begin
  return to_jsonb(r);
end;
$body$;
select jsonb_pretty(j());

The output includes two occurrences of this:

"How we wrote a regular expression to detect occurrences of « \"some key\": null » in our JSON documents!"

I believe that the "jsonb" to "text" conversion never produces an isolated double-quote within the representation of an object key's value. I checked that my "strip_null_keys()" handled your example before consigning it to the scrapheap—and it didn't let me down. But it would be foolish to argue that there isn't some way to provoke a false positive.

pgsql-general by date:

Previous
From: Bryn Llewellyn
Date:
Subject: Re: ISBN (was: '{"x": 42, "y": null}'::jsonb != '{"x": 42}'::jsonb ... Really?)
Next
From: jian he
Date:
Subject: Re: '{"x": 42, "y": null}'::jsonb != '{"x": 42}'::jsonb ... Really?