Re: SQL:2023 JSON simplified accessor support - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: SQL:2023 JSON simplified accessor support
Date
Msg-id b24dc997-d78f-47bd-b323-14e68232d127@dunslane.net
Whole thread Raw
In response to Re: SQL:2023 JSON simplified accessor support  (Alexandra Wang <alexandra.wang.oss@gmail.com>)
List pgsql-hackers
On 2024-11-21 Th 3:52 PM, Alexandra Wang wrote:
> Hi,
>
> On Tue, Nov 19, 2024 at 6:06 PM Nikita Glukhov <glukhov.n.a@gmail.com> wrote:
>> Hi, hackers.
>>
>> I have implemented dot notation for jsonb using type subscripting back
>> in April 2023, but failed post it because I left Postgres Professional
>> company soon after and have not worked anywhere since, not even had
>> any interest in programming.
>>
>> But yesterday I accidentally decided to look what is going on at
>> commitfests and found this thread.  I immediately started to rebase
>> code from PG16, fixed some bugs, and now I'm ready to present my
>> version of the patches which is much more complex.
>>
>> Unfortunately, I probably won't be able to devote that much time to
>> the patches as before.
> Thank you so much, Nikita, for revisiting this topic and sharing your
> v6 patches!
>
> Now that we have two solutions, I’d like to summarize our current
> options.
>
> In Postgres, there are currently three ways to access json/jsonb
> object fields and array elements:
>
> 1. '->' operator (Postgres-specific, predates SQL standard):
>
> postgres=# select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::json) -> 'd'
> -> 0; -- returns 1
>
> 2. jsonb subscripting (not available for the plain json type):
>
> postgres=# select ('{"a": 1, "b": "c", "d": [1, 2,
> 3]}'::jsonb)['d'][0];  --returns 1
>
> 3. json_query() function:
>
> postgres=# select json_query(jsonb '{"a": 1, "b": "c", "d": [1, 2,
> 3]}', 'lax $.d[0]');  --returns 1
>
> A few weeks ago, I did the following performance benchmarking of the
> three approaches:
>
> -- setup:
> create table tbl(id int, col1 jsonb);
> insert into tbl select i, '{"x":"vx", "y":[{"a":[1,2,3]}, {"b":[1, 2,
> {"j":"vj"}]}]}' from generate_series(1, 100000)i;
>
> -- jsonb_operator.sql
> SELECT id, col1 -> 'y' -> 1 -> 'b' -> 2 -> 'j' AS jsonb_operator FROM tbl;
>
> -- jsonb_subscripting.sql
> SELECT id, col1['y'][1]['b'][2]['j'] AS jsonb_subscript FROM tbl;
>
> -- jsonb_path_query.sql
> SELECT id, jsonb_path_query(col1, '$.y[1].b[2].j') FROM tbl;
>
> # pgbench on my local MacOS machine, using -O3 optimization:
> pgbench -n -f XXX.sql postgres -T100
>
> Results (Latency | tps):
>
> "->" operator: 14ms | 68
> jsonb subscripting: 17ms | 58
> jsonb_path_query() function: 23ms | 43
>
> So performance from best to worst:
> "->" operator > jsonb subscripting >> jsonb_path_query() function.
>
> I’m excited to see your implementation of dot notation for jsonb using
> type subscripting! This approach rounds out the three possible ways to
> implement JSON simplified accessors:
>
> ## v1: json_query() implementation
>
> Pros:
> - Fully adheres to the SQL standard.
>
> According to the SQL standard, if the JSON simplified accessor <JA> is
> not a JSON item method, it is equivalent to a <JSON query>:
>
> JSON_QUERY ( VEP, 'lax $.JC' WITH CONDITIONAL ARRAY WRAPPER NULL ON
> EMPTY NULL ON ERROR)
>
> (I’m skipping <JA> that includes a JSON item method, as it is
> currently outside the scope of both sets of patches.)
>
> - Easiest to implement
>
> Cons:
> - Slow due to function call overhead.
>
> ## v2-v5: "->" operator implementation
>
> We initially chose this approach for its performance benefits.
> However, while addressing Peter’s feedback on v5, I encountered the
> following issue:
>
> -- setup
> create table test_json_dot(id serial primary key, test_json json);
> insert into test_json_dot values (5, '[{"a": 1, "b": 42}, {"a": 2,
> "b": {"c": 42}}]');
>
> -- problematic query:
> test1=# select id, (test_json).b, json_query(test_json, 'lax $.b' WITH
> CONDITIONAL WRAPPER NULL ON EMPTY NULL ON ERROR) as expected from
> test_json_dot;
>   id | b |    expected
> ----+---+-----------------
>    5 |   | [42, {"c": 42}]
> (1 row)
>
> This issue arises from the semantic differences between the "->"
> operator and json_query’s "lax" mode. One possible workaround is to
> redefine the "->" operator and modify its implementation. However, since
> the "->" operator has been in use for a long time, such changes would
> break backward compatibility.
>
> ## v6: jsonb subscription implementation
>
> Nikita's patches pass all my functional test cases, including those
> that failed with the previous approach.
>
> Supported formats:
> - JSON member accessor
> - JSON wildcard member accessor (Not available in v5, so this is also a plus)
> - JSON array accessor
>
> Questions:
>
> 1. Since Nikita’s patches did not address the JSON data type, and JSON
> currently does not support subscripting, should we limit the initial
> feature set to JSONB dot-notation for now? In other words, if we aim
> to fully support JSON simplified accessors for the plain JSON type,
> should we handle support for plain JSON subscripting as a follow-up
> effort?
>
> 2. I have yet to have a more thorough review of Nikita’s patches.
> One area I am not familiar with is the hstore-related changes. How
> relevant is hstore to the JSON simplified accessor?
>

We can't change the way the "->" operator works, as there could well be 
uses of it in the field that rely on its current behaviour. But maybe we 
could invent a new operator which is compliant with the standard 
semantics for dot access, and call that. Then we'd get the best 
performance, and also we might be able to implement it for the plain 
JSON type. If that proves not possible we can think about not 
implementing for plain JSON, but I'd rather not go there until we have to.

I don't think we should be including hstore changes here - we should 
just be aiming at implementing the standard for JSON access. hstore 
changes if any should be a separate feature. The aren't relevant to JSON 
access, although they might use some of the same infrastructure, 
depending on implementation.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




pgsql-hackers by date:

Previous
From: Melanie Plageman
Date:
Subject: Re: Count and log pages set all-frozen by vacuum
Next
From: "Giacchino, Luca"
Date:
Subject: SIMD optimization for list_sort