Thread: Re: SQL:2023 JSON simplified accessor support

Re: SQL:2023 JSON simplified accessor support

From
Peter Eisentraut
Date:
On 29.08.24 18:33, Alexandra Wang wrote:
> I’ve implemented the member and array accessors and attached two
> alternative patches:
> 
> 1. v1-0001-Add-JSON-JSONB-simplified-accessor.patch: This patch
> enables dot access to JSON object fields and subscript access to
> indexed JSON array elements by converting "." and "[]" indirection
> into a JSON_QUERY JsonFuncExpr node.
> 
> 2. v2-0001-Transform-JSON-dot-access-to-arrow-operator.txt: This
> alternative patch implements dot access to JSON object fields by
> transforming the "." indirection into a "->" operator.
> 
> The upside of the v1 patch is that it strictly aligns with the SQL
> standard, which specifies that the simplified access is equivalent to:
> 
> JSON_QUERY (VEP, 'lax $.JC' WITH CONDITIONAL ARRAY WRAPPER NULL ON
> EMPTY NULL ON ERROR)
> 
> However, the performance of JSON_QUERY might be suboptimal due to
> function call overhead. Therefore, I implemented the v2 alternative
> using the "->" operator.

Using the operator approach would also allow taking advantage of 
optimizations such as 

<https://www.postgresql.org/message-id/flat/CAKU4AWoqAVya6PBhn%2BBCbFaBMt3z-2%3Di5fKO3bW%3D6HPhbid2Dw%40mail.gmail.com>.

> There is some uncertainty about the semantics of conditional array
> wrappers. Currently, there is at least one subtle difference between
> the "->" operator and JSON_QUERY, as shown:

That JSON_QUERY bug has been fixed.

I suggest you rebase both of your patches over this, just to double 
check everything.  But then I think you can drop the v1 patch and just 
submit a new version of v2.

The patch should eventually contain some documentation.  It might be 
good starting to look for a good spot where to put that documentation. 
It might be either near the json types documentation or near the general 
qualified identifier syntax, not sure.




Re: SQL:2023 JSON simplified accessor support

From
Andrew Dunstan
Date:
On 2024-09-26 Th 11:45 AM, Alexandra Wang wrote:
> Hi,
>
> I didn’t run pgindent earlier, so here’s the updated version with the
> correct indentation. Hope this helps!


This is a really nice feature, and provides a lot of expressive power 
for such a small piece of code.

I notice this doesn't seem to work for domains over json and jsonb.


andrew@~=# create domain json_d as json;
CREATE DOMAIN
andrew@~=# create table test_json_dot(id int, test_json json_d);
CREATE TABLE
andrew@~=# insert into test_json_dot select 1, '{"a": 1, "b": 42}'::json;
INSERT 0 1      |          |
andrew@~=# select (test_json_dot.test_json).b, json_query(test_json, 
'lax $.b' WITH CONDITIONAL WRAPPER NULL ON EMPTY NULL ON ERROR) as 
expected from test_json_dot;
ERROR:  column notation .b applied to type json_d, which is not a 
composite type
LINE 1: select (test_json_dot.test_json).b, json_query(test_json, 'l...


I'm not sure that's a terribly important use case, but we should 
probably make it work. If it's a domain we should get the basetype of 
the domain. There's some example code in src/backend/utils/adt/jsonfuncs.c


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: SQL:2023 JSON simplified accessor support

From
"David E. Wheeler"
Date:
On Sep 26, 2024, at 16:45, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote:

> I didn’t run pgindent earlier, so here’s the updated version with the
> correct indentation. Hope this helps!

Oh,  nice! I don’t suppose the standard also has defined an operator equivalent to ->>, though, has it? I tend to want
thetext output far more often than a JSON scalar. 

Best,

David




Re: SQL:2023 JSON simplified accessor support

From
Andrew Dunstan
Date:
On 2024-09-27 Fr 5:49 AM, David E. Wheeler wrote:
> On Sep 26, 2024, at 16:45, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote:
>
>> I didn’t run pgindent earlier, so here’s the updated version with the
>> correct indentation. Hope this helps!
> Oh,  nice! I don’t suppose the standard also has defined an operator equivalent to ->>, though, has it? I tend to
wantthe text output far more often than a JSON scalar.
 
>

That would defeat being able to chain these.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: SQL:2023 JSON simplified accessor support

From
"David E. Wheeler"
Date:
On Sep 27, 2024, at 12:07, Andrew Dunstan <andrew@dunslane.net> wrote:

> That would defeat being able to chain these.

Not if it’s a different operator. But I’m fine to just keep using ->> at the end of a chain.

D




Re: SQL:2023 JSON simplified accessor support

From
Peter Eisentraut
Date:
On 07.11.24 22:57, Alexandra Wang wrote:
> The v5 patch includes the following updates:
> 
> - Fixed the aforementioned issue and added more tests covering composite
> types with domains, nested domains, and arrays of domains over
> JSON/JSONB.
> 
> - Refactored the logic for parsing JSON/JSONB object fields by moving it
> from ParseFuncOrColumn() to transformIndirection() for improved
> readability. The ParseFuncOrColumn() function is already handling both
> single-argument function calls and composite types, and it has other
> callers besides transformIndirection().

This patch implements array subscripting support for the json type, but 
it does it in a non-standard way, using 
ParseJsonSimplifiedAccessorArrayElement().  This would be better done by 
providing a typsubscript function for the json type.  This is what jsonb 
already has, which is why your patch doesn't need to provide the array 
support for jsonb.  I suggest you implement the typsubscript support for 
the json type (make it a separate patch but you can keep it in this 
thread IMO) and remove the custom code from this patch.

A few comments on the tests:  The tests look good to me.  Good coverage 
of weirdly nested types.  Results look correct.

+drop table if exists test_json_dot;

This can be omitted, since we know that the table doesn't exist yet.

This code could be written in the more conventional insert ... values 
syntax:

+insert into test_json_dot select 1, '{"a": 1, "b": 42}'::json;
+insert into test_json_dot select 1, '{"a": 2, "b": {"c": 42}}'::json;
+insert into test_json_dot select 1, '{"a": 3, "b": {"c": "42"}, 
"d":[11, 12]}'::json;
+insert into test_json_dot select 1, '{"a": 3, "b": {"c": "42"}, 
"d":[{"x": [11, 12]}, {"y": [21, 22]}]}'::json;

Then the ::json casts can also go away.

Also, using a different value for "id" for each row would be more
useful, so that the subsequent tests could then be written like

     select id, (test_jsonb_dot.test_jsonb).b from test_jsonb_dot;

so we can see which result corresponds to which input row.  Also make
id the primary key in this table.

Also, let's keep the json and the jsonb variants aligned.  There are
some small differences, like the test_json_dot table having 4 rows but
the test_jsonb_dot having 3 rows.  And the array and wildcard tests in
the opposite order.  Not a big deal, but keeping these the same helps
eyeballing the test files.

Maybe add a comment somewhere in this file that you are running the
json_query equivalents to cross-check the semantics of the dot syntax.

Some documentation should be written.  This looks like this right place 
to start:

https://www.postgresql.org/docs/devel/sql-expressions.html#FIELD-SELECTION

and them maybe some cross-linking between there and the sections on JSON 
types and operators.




Re: SQL:2023 JSON simplified accessor support

From
Alexandra Wang
Date:
Hi,

On Tue, Nov 19, 2024 at 6:06 PM Nikita Glukhov <glukhov.n.a@gmail.com> wrote:
>
> Hi, hackers.
>
> I have implemented dot notation for jsonb using type subscripting back
> in April 2023, but failed post it because I left Postgres Professional
> company soon after and have not worked anywhere since, not even had
> any interest in programming.
>
> But yesterday I accidentally decided to look what is going on at
> commitfests and found this thread.  I immediately started to rebase
> code from PG16, fixed some bugs, and now I'm ready to present my
> version of the patches which is much more complex.
>
> Unfortunately, I probably won't be able to devote that much time to
> the patches as before.

Thank you so much, Nikita, for revisiting this topic and sharing your
v6 patches!

Now that we have two solutions, I’d like to summarize our current
options.

In Postgres, there are currently three ways to access json/jsonb
object fields and array elements:

1. '->' operator (Postgres-specific, predates SQL standard):

postgres=# select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::json) -> 'd'
-> 0; -- returns 1

2. jsonb subscripting (not available for the plain json type):

postgres=# select ('{"a": 1, "b": "c", "d": [1, 2,
3]}'::jsonb)['d'][0];  --returns 1

3. json_query() function:

postgres=# select json_query(jsonb '{"a": 1, "b": "c", "d": [1, 2,
3]}', 'lax $.d[0]');  --returns 1

A few weeks ago, I did the following performance benchmarking of the
three approaches:

-- setup:
create table tbl(id int, col1 jsonb);
insert into tbl select i, '{"x":"vx", "y":[{"a":[1,2,3]}, {"b":[1, 2,
{"j":"vj"}]}]}' from generate_series(1, 100000)i;

-- jsonb_operator.sql
SELECT id, col1 -> 'y' -> 1 -> 'b' -> 2 -> 'j' AS jsonb_operator FROM tbl;

-- jsonb_subscripting.sql
SELECT id, col1['y'][1]['b'][2]['j'] AS jsonb_subscript FROM tbl;

-- jsonb_path_query.sql
SELECT id, jsonb_path_query(col1, '$.y[1].b[2].j') FROM tbl;

# pgbench on my local MacOS machine, using -O3 optimization:
pgbench -n -f XXX.sql postgres -T100

Results (Latency | tps):

"->" operator: 14ms | 68
jsonb subscripting: 17ms | 58
jsonb_path_query() function: 23ms | 43

So performance from best to worst:
"->" operator > jsonb subscripting >> jsonb_path_query() function.

I’m excited to see your implementation of dot notation for jsonb using
type subscripting! This approach rounds out the three possible ways to
implement JSON simplified accessors:

## v1: json_query() implementation

Pros:
- Fully adheres to the SQL standard.

According to the SQL standard, if the JSON simplified accessor <JA> is
not a JSON item method, it is equivalent to a <JSON query>:

JSON_QUERY ( VEP, 'lax $.JC' WITH CONDITIONAL ARRAY WRAPPER NULL ON
EMPTY NULL ON ERROR)

(I’m skipping <JA> that includes a JSON item method, as it is
currently outside the scope of both sets of patches.)

- Easiest to implement

Cons:
- Slow due to function call overhead.

## v2-v5: "->" operator implementation

We initially chose this approach for its performance benefits.
However, while addressing Peter’s feedback on v5, I encountered the
following issue:

-- setup
create table test_json_dot(id serial primary key, test_json json);
insert into test_json_dot values (5, '[{"a": 1, "b": 42}, {"a": 2,
"b": {"c": 42}}]');

-- problematic query:
test1=# select id, (test_json).b, json_query(test_json, 'lax $.b' WITH
CONDITIONAL WRAPPER NULL ON EMPTY NULL ON ERROR) as expected from
test_json_dot;
 id | b |    expected
----+---+-----------------
  5 |   | [42, {"c": 42}]
(1 row)

This issue arises from the semantic differences between the "->"
operator and json_query’s "lax" mode. One possible workaround is to
redefine the "->" operator and modify its implementation. However, since
the "->" operator has been in use for a long time, such changes would
break backward compatibility.

## v6: jsonb subscription implementation

Nikita's patches pass all my functional test cases, including those
that failed with the previous approach.

Supported formats:
- JSON member accessor
- JSON wildcard member accessor (Not available in v5, so this is also a plus)
- JSON array accessor

Questions:

1. Since Nikita’s patches did not address the JSON data type, and JSON
currently does not support subscripting, should we limit the initial
feature set to JSONB dot-notation for now? In other words, if we aim
to fully support JSON simplified accessors for the plain JSON type,
should we handle support for plain JSON subscripting as a follow-up
effort?

2. I have yet to have a more thorough review of Nikita’s patches.
One area I am not familiar with is the hstore-related changes. How
relevant is hstore to the JSON simplified accessor?

Best,
Alex



Re: SQL:2023 JSON simplified accessor support

From
Andrew Dunstan
Date:
On 2024-11-21 Th 3:52 PM, Alexandra Wang wrote:
> Hi,
>
> On Tue, Nov 19, 2024 at 6:06 PM Nikita Glukhov <glukhov.n.a@gmail.com> wrote:
>> Hi, hackers.
>>
>> I have implemented dot notation for jsonb using type subscripting back
>> in April 2023, but failed post it because I left Postgres Professional
>> company soon after and have not worked anywhere since, not even had
>> any interest in programming.
>>
>> But yesterday I accidentally decided to look what is going on at
>> commitfests and found this thread.  I immediately started to rebase
>> code from PG16, fixed some bugs, and now I'm ready to present my
>> version of the patches which is much more complex.
>>
>> Unfortunately, I probably won't be able to devote that much time to
>> the patches as before.
> Thank you so much, Nikita, for revisiting this topic and sharing your
> v6 patches!
>
> Now that we have two solutions, I’d like to summarize our current
> options.
>
> In Postgres, there are currently three ways to access json/jsonb
> object fields and array elements:
>
> 1. '->' operator (Postgres-specific, predates SQL standard):
>
> postgres=# select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::json) -> 'd'
> -> 0; -- returns 1
>
> 2. jsonb subscripting (not available for the plain json type):
>
> postgres=# select ('{"a": 1, "b": "c", "d": [1, 2,
> 3]}'::jsonb)['d'][0];  --returns 1
>
> 3. json_query() function:
>
> postgres=# select json_query(jsonb '{"a": 1, "b": "c", "d": [1, 2,
> 3]}', 'lax $.d[0]');  --returns 1
>
> A few weeks ago, I did the following performance benchmarking of the
> three approaches:
>
> -- setup:
> create table tbl(id int, col1 jsonb);
> insert into tbl select i, '{"x":"vx", "y":[{"a":[1,2,3]}, {"b":[1, 2,
> {"j":"vj"}]}]}' from generate_series(1, 100000)i;
>
> -- jsonb_operator.sql
> SELECT id, col1 -> 'y' -> 1 -> 'b' -> 2 -> 'j' AS jsonb_operator FROM tbl;
>
> -- jsonb_subscripting.sql
> SELECT id, col1['y'][1]['b'][2]['j'] AS jsonb_subscript FROM tbl;
>
> -- jsonb_path_query.sql
> SELECT id, jsonb_path_query(col1, '$.y[1].b[2].j') FROM tbl;
>
> # pgbench on my local MacOS machine, using -O3 optimization:
> pgbench -n -f XXX.sql postgres -T100
>
> Results (Latency | tps):
>
> "->" operator: 14ms | 68
> jsonb subscripting: 17ms | 58
> jsonb_path_query() function: 23ms | 43
>
> So performance from best to worst:
> "->" operator > jsonb subscripting >> jsonb_path_query() function.
>
> I’m excited to see your implementation of dot notation for jsonb using
> type subscripting! This approach rounds out the three possible ways to
> implement JSON simplified accessors:
>
> ## v1: json_query() implementation
>
> Pros:
> - Fully adheres to the SQL standard.
>
> According to the SQL standard, if the JSON simplified accessor <JA> is
> not a JSON item method, it is equivalent to a <JSON query>:
>
> JSON_QUERY ( VEP, 'lax $.JC' WITH CONDITIONAL ARRAY WRAPPER NULL ON
> EMPTY NULL ON ERROR)
>
> (I’m skipping <JA> that includes a JSON item method, as it is
> currently outside the scope of both sets of patches.)
>
> - Easiest to implement
>
> Cons:
> - Slow due to function call overhead.
>
> ## v2-v5: "->" operator implementation
>
> We initially chose this approach for its performance benefits.
> However, while addressing Peter’s feedback on v5, I encountered the
> following issue:
>
> -- setup
> create table test_json_dot(id serial primary key, test_json json);
> insert into test_json_dot values (5, '[{"a": 1, "b": 42}, {"a": 2,
> "b": {"c": 42}}]');
>
> -- problematic query:
> test1=# select id, (test_json).b, json_query(test_json, 'lax $.b' WITH
> CONDITIONAL WRAPPER NULL ON EMPTY NULL ON ERROR) as expected from
> test_json_dot;
>   id | b |    expected
> ----+---+-----------------
>    5 |   | [42, {"c": 42}]
> (1 row)
>
> This issue arises from the semantic differences between the "->"
> operator and json_query’s "lax" mode. One possible workaround is to
> redefine the "->" operator and modify its implementation. However, since
> the "->" operator has been in use for a long time, such changes would
> break backward compatibility.
>
> ## v6: jsonb subscription implementation
>
> Nikita's patches pass all my functional test cases, including those
> that failed with the previous approach.
>
> Supported formats:
> - JSON member accessor
> - JSON wildcard member accessor (Not available in v5, so this is also a plus)
> - JSON array accessor
>
> Questions:
>
> 1. Since Nikita’s patches did not address the JSON data type, and JSON
> currently does not support subscripting, should we limit the initial
> feature set to JSONB dot-notation for now? In other words, if we aim
> to fully support JSON simplified accessors for the plain JSON type,
> should we handle support for plain JSON subscripting as a follow-up
> effort?
>
> 2. I have yet to have a more thorough review of Nikita’s patches.
> One area I am not familiar with is the hstore-related changes. How
> relevant is hstore to the JSON simplified accessor?
>

We can't change the way the "->" operator works, as there could well be 
uses of it in the field that rely on its current behaviour. But maybe we 
could invent a new operator which is compliant with the standard 
semantics for dot access, and call that. Then we'd get the best 
performance, and also we might be able to implement it for the plain 
JSON type. If that proves not possible we can think about not 
implementing for plain JSON, but I'd rather not go there until we have to.

I don't think we should be including hstore changes here - we should 
just be aiming at implementing the standard for JSON access. hstore 
changes if any should be a separate feature. The aren't relevant to JSON 
access, although they might use some of the same infrastructure, 
depending on implementation.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com