Re: SQL:2023 JSON simplified accessor support - Mailing list pgsql-hackers
From | Alexandra Wang |
---|---|
Subject | Re: SQL:2023 JSON simplified accessor support |
Date | |
Msg-id | CAK98qZ2QGcyJrJAFv9wjY6S8yP9dUVnmG9Gb4OXuzuMMuM1Z5Q@mail.gmail.com Whole thread Raw |
In response to | Re: SQL:2023 JSON simplified accessor support (Peter Eisentraut <peter@eisentraut.org>) |
Responses |
Re: SQL:2023 JSON simplified accessor support
|
List | pgsql-hackers |
Hi, On Tue, Nov 19, 2024 at 6:06 PM Nikita Glukhov <glukhov.n.a@gmail.com> wrote: > > Hi, hackers. > > I have implemented dot notation for jsonb using type subscripting back > in April 2023, but failed post it because I left Postgres Professional > company soon after and have not worked anywhere since, not even had > any interest in programming. > > But yesterday I accidentally decided to look what is going on at > commitfests and found this thread. I immediately started to rebase > code from PG16, fixed some bugs, and now I'm ready to present my > version of the patches which is much more complex. > > Unfortunately, I probably won't be able to devote that much time to > the patches as before. Thank you so much, Nikita, for revisiting this topic and sharing your v6 patches! Now that we have two solutions, I’d like to summarize our current options. In Postgres, there are currently three ways to access json/jsonb object fields and array elements: 1. '->' operator (Postgres-specific, predates SQL standard): postgres=# select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::json) -> 'd' -> 0; -- returns 1 2. jsonb subscripting (not available for the plain json type): postgres=# select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb)['d'][0]; --returns 1 3. json_query() function: postgres=# select json_query(jsonb '{"a": 1, "b": "c", "d": [1, 2, 3]}', 'lax $.d[0]'); --returns 1 A few weeks ago, I did the following performance benchmarking of the three approaches: -- setup: create table tbl(id int, col1 jsonb); insert into tbl select i, '{"x":"vx", "y":[{"a":[1,2,3]}, {"b":[1, 2, {"j":"vj"}]}]}' from generate_series(1, 100000)i; -- jsonb_operator.sql SELECT id, col1 -> 'y' -> 1 -> 'b' -> 2 -> 'j' AS jsonb_operator FROM tbl; -- jsonb_subscripting.sql SELECT id, col1['y'][1]['b'][2]['j'] AS jsonb_subscript FROM tbl; -- jsonb_path_query.sql SELECT id, jsonb_path_query(col1, '$.y[1].b[2].j') FROM tbl; # pgbench on my local MacOS machine, using -O3 optimization: pgbench -n -f XXX.sql postgres -T100 Results (Latency | tps): "->" operator: 14ms | 68 jsonb subscripting: 17ms | 58 jsonb_path_query() function: 23ms | 43 So performance from best to worst: "->" operator > jsonb subscripting >> jsonb_path_query() function. I’m excited to see your implementation of dot notation for jsonb using type subscripting! This approach rounds out the three possible ways to implement JSON simplified accessors: ## v1: json_query() implementation Pros: - Fully adheres to the SQL standard. According to the SQL standard, if the JSON simplified accessor <JA> is not a JSON item method, it is equivalent to a <JSON query>: JSON_QUERY ( VEP, 'lax $.JC' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) (I’m skipping <JA> that includes a JSON item method, as it is currently outside the scope of both sets of patches.) - Easiest to implement Cons: - Slow due to function call overhead. ## v2-v5: "->" operator implementation We initially chose this approach for its performance benefits. However, while addressing Peter’s feedback on v5, I encountered the following issue: -- setup create table test_json_dot(id serial primary key, test_json json); insert into test_json_dot values (5, '[{"a": 1, "b": 42}, {"a": 2, "b": {"c": 42}}]'); -- problematic query: test1=# select id, (test_json).b, json_query(test_json, 'lax $.b' WITH CONDITIONAL WRAPPER NULL ON EMPTY NULL ON ERROR) as expected from test_json_dot; id | b | expected ----+---+----------------- 5 | | [42, {"c": 42}] (1 row) This issue arises from the semantic differences between the "->" operator and json_query’s "lax" mode. One possible workaround is to redefine the "->" operator and modify its implementation. However, since the "->" operator has been in use for a long time, such changes would break backward compatibility. ## v6: jsonb subscription implementation Nikita's patches pass all my functional test cases, including those that failed with the previous approach. Supported formats: - JSON member accessor - JSON wildcard member accessor (Not available in v5, so this is also a plus) - JSON array accessor Questions: 1. Since Nikita’s patches did not address the JSON data type, and JSON currently does not support subscripting, should we limit the initial feature set to JSONB dot-notation for now? In other words, if we aim to fully support JSON simplified accessors for the plain JSON type, should we handle support for plain JSON subscripting as a follow-up effort? 2. I have yet to have a more thorough review of Nikita’s patches. One area I am not familiar with is the hstore-related changes. How relevant is hstore to the JSON simplified accessor? Best, Alex
pgsql-hackers by date: