Thread: Re: SQL:2023 JSON simplified accessor support
On 29.08.24 18:33, Alexandra Wang wrote: > I’ve implemented the member and array accessors and attached two > alternative patches: > > 1. v1-0001-Add-JSON-JSONB-simplified-accessor.patch: This patch > enables dot access to JSON object fields and subscript access to > indexed JSON array elements by converting "." and "[]" indirection > into a JSON_QUERY JsonFuncExpr node. > > 2. v2-0001-Transform-JSON-dot-access-to-arrow-operator.txt: This > alternative patch implements dot access to JSON object fields by > transforming the "." indirection into a "->" operator. > > The upside of the v1 patch is that it strictly aligns with the SQL > standard, which specifies that the simplified access is equivalent to: > > JSON_QUERY (VEP, 'lax $.JC' WITH CONDITIONAL ARRAY WRAPPER NULL ON > EMPTY NULL ON ERROR) > > However, the performance of JSON_QUERY might be suboptimal due to > function call overhead. Therefore, I implemented the v2 alternative > using the "->" operator. Using the operator approach would also allow taking advantage of optimizations such as <https://www.postgresql.org/message-id/flat/CAKU4AWoqAVya6PBhn%2BBCbFaBMt3z-2%3Di5fKO3bW%3D6HPhbid2Dw%40mail.gmail.com>. > There is some uncertainty about the semantics of conditional array > wrappers. Currently, there is at least one subtle difference between > the "->" operator and JSON_QUERY, as shown: That JSON_QUERY bug has been fixed. I suggest you rebase both of your patches over this, just to double check everything. But then I think you can drop the v1 patch and just submit a new version of v2. The patch should eventually contain some documentation. It might be good starting to look for a good spot where to put that documentation. It might be either near the json types documentation or near the general qualified identifier syntax, not sure.
On 2024-09-26 Th 11:45 AM, Alexandra Wang wrote: > Hi, > > I didn’t run pgindent earlier, so here’s the updated version with the > correct indentation. Hope this helps! This is a really nice feature, and provides a lot of expressive power for such a small piece of code. I notice this doesn't seem to work for domains over json and jsonb. andrew@~=# create domain json_d as json; CREATE DOMAIN andrew@~=# create table test_json_dot(id int, test_json json_d); CREATE TABLE andrew@~=# insert into test_json_dot select 1, '{"a": 1, "b": 42}'::json; INSERT 0 1 | | andrew@~=# select (test_json_dot.test_json).b, json_query(test_json, 'lax $.b' WITH CONDITIONAL WRAPPER NULL ON EMPTY NULL ON ERROR) as expected from test_json_dot; ERROR: column notation .b applied to type json_d, which is not a composite type LINE 1: select (test_json_dot.test_json).b, json_query(test_json, 'l... I'm not sure that's a terribly important use case, but we should probably make it work. If it's a domain we should get the basetype of the domain. There's some example code in src/backend/utils/adt/jsonfuncs.c cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Sep 26, 2024, at 16:45, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote: > I didn’t run pgindent earlier, so here’s the updated version with the > correct indentation. Hope this helps! Oh, nice! I don’t suppose the standard also has defined an operator equivalent to ->>, though, has it? I tend to want thetext output far more often than a JSON scalar. Best, David
On 2024-09-27 Fr 5:49 AM, David E. Wheeler wrote: > On Sep 26, 2024, at 16:45, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote: > >> I didn’t run pgindent earlier, so here’s the updated version with the >> correct indentation. Hope this helps! > Oh, nice! I don’t suppose the standard also has defined an operator equivalent to ->>, though, has it? I tend to wantthe text output far more often than a JSON scalar. > That would defeat being able to chain these. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Sep 27, 2024, at 12:07, Andrew Dunstan <andrew@dunslane.net> wrote: > That would defeat being able to chain these. Not if it’s a different operator. But I’m fine to just keep using ->> at the end of a chain. D
On 07.11.24 22:57, Alexandra Wang wrote: > The v5 patch includes the following updates: > > - Fixed the aforementioned issue and added more tests covering composite > types with domains, nested domains, and arrays of domains over > JSON/JSONB. > > - Refactored the logic for parsing JSON/JSONB object fields by moving it > from ParseFuncOrColumn() to transformIndirection() for improved > readability. The ParseFuncOrColumn() function is already handling both > single-argument function calls and composite types, and it has other > callers besides transformIndirection(). This patch implements array subscripting support for the json type, but it does it in a non-standard way, using ParseJsonSimplifiedAccessorArrayElement(). This would be better done by providing a typsubscript function for the json type. This is what jsonb already has, which is why your patch doesn't need to provide the array support for jsonb. I suggest you implement the typsubscript support for the json type (make it a separate patch but you can keep it in this thread IMO) and remove the custom code from this patch. A few comments on the tests: The tests look good to me. Good coverage of weirdly nested types. Results look correct. +drop table if exists test_json_dot; This can be omitted, since we know that the table doesn't exist yet. This code could be written in the more conventional insert ... values syntax: +insert into test_json_dot select 1, '{"a": 1, "b": 42}'::json; +insert into test_json_dot select 1, '{"a": 2, "b": {"c": 42}}'::json; +insert into test_json_dot select 1, '{"a": 3, "b": {"c": "42"}, "d":[11, 12]}'::json; +insert into test_json_dot select 1, '{"a": 3, "b": {"c": "42"}, "d":[{"x": [11, 12]}, {"y": [21, 22]}]}'::json; Then the ::json casts can also go away. Also, using a different value for "id" for each row would be more useful, so that the subsequent tests could then be written like select id, (test_jsonb_dot.test_jsonb).b from test_jsonb_dot; so we can see which result corresponds to which input row. Also make id the primary key in this table. Also, let's keep the json and the jsonb variants aligned. There are some small differences, like the test_json_dot table having 4 rows but the test_jsonb_dot having 3 rows. And the array and wildcard tests in the opposite order. Not a big deal, but keeping these the same helps eyeballing the test files. Maybe add a comment somewhere in this file that you are running the json_query equivalents to cross-check the semantics of the dot syntax. Some documentation should be written. This looks like this right place to start: https://www.postgresql.org/docs/devel/sql-expressions.html#FIELD-SELECTION and them maybe some cross-linking between there and the sections on JSON types and operators.
Hi, On Tue, Nov 19, 2024 at 6:06 PM Nikita Glukhov <glukhov.n.a@gmail.com> wrote: > > Hi, hackers. > > I have implemented dot notation for jsonb using type subscripting back > in April 2023, but failed post it because I left Postgres Professional > company soon after and have not worked anywhere since, not even had > any interest in programming. > > But yesterday I accidentally decided to look what is going on at > commitfests and found this thread. I immediately started to rebase > code from PG16, fixed some bugs, and now I'm ready to present my > version of the patches which is much more complex. > > Unfortunately, I probably won't be able to devote that much time to > the patches as before. Thank you so much, Nikita, for revisiting this topic and sharing your v6 patches! Now that we have two solutions, I’d like to summarize our current options. In Postgres, there are currently three ways to access json/jsonb object fields and array elements: 1. '->' operator (Postgres-specific, predates SQL standard): postgres=# select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::json) -> 'd' -> 0; -- returns 1 2. jsonb subscripting (not available for the plain json type): postgres=# select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb)['d'][0]; --returns 1 3. json_query() function: postgres=# select json_query(jsonb '{"a": 1, "b": "c", "d": [1, 2, 3]}', 'lax $.d[0]'); --returns 1 A few weeks ago, I did the following performance benchmarking of the three approaches: -- setup: create table tbl(id int, col1 jsonb); insert into tbl select i, '{"x":"vx", "y":[{"a":[1,2,3]}, {"b":[1, 2, {"j":"vj"}]}]}' from generate_series(1, 100000)i; -- jsonb_operator.sql SELECT id, col1 -> 'y' -> 1 -> 'b' -> 2 -> 'j' AS jsonb_operator FROM tbl; -- jsonb_subscripting.sql SELECT id, col1['y'][1]['b'][2]['j'] AS jsonb_subscript FROM tbl; -- jsonb_path_query.sql SELECT id, jsonb_path_query(col1, '$.y[1].b[2].j') FROM tbl; # pgbench on my local MacOS machine, using -O3 optimization: pgbench -n -f XXX.sql postgres -T100 Results (Latency | tps): "->" operator: 14ms | 68 jsonb subscripting: 17ms | 58 jsonb_path_query() function: 23ms | 43 So performance from best to worst: "->" operator > jsonb subscripting >> jsonb_path_query() function. I’m excited to see your implementation of dot notation for jsonb using type subscripting! This approach rounds out the three possible ways to implement JSON simplified accessors: ## v1: json_query() implementation Pros: - Fully adheres to the SQL standard. According to the SQL standard, if the JSON simplified accessor <JA> is not a JSON item method, it is equivalent to a <JSON query>: JSON_QUERY ( VEP, 'lax $.JC' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) (I’m skipping <JA> that includes a JSON item method, as it is currently outside the scope of both sets of patches.) - Easiest to implement Cons: - Slow due to function call overhead. ## v2-v5: "->" operator implementation We initially chose this approach for its performance benefits. However, while addressing Peter’s feedback on v5, I encountered the following issue: -- setup create table test_json_dot(id serial primary key, test_json json); insert into test_json_dot values (5, '[{"a": 1, "b": 42}, {"a": 2, "b": {"c": 42}}]'); -- problematic query: test1=# select id, (test_json).b, json_query(test_json, 'lax $.b' WITH CONDITIONAL WRAPPER NULL ON EMPTY NULL ON ERROR) as expected from test_json_dot; id | b | expected ----+---+----------------- 5 | | [42, {"c": 42}] (1 row) This issue arises from the semantic differences between the "->" operator and json_query’s "lax" mode. One possible workaround is to redefine the "->" operator and modify its implementation. However, since the "->" operator has been in use for a long time, such changes would break backward compatibility. ## v6: jsonb subscription implementation Nikita's patches pass all my functional test cases, including those that failed with the previous approach. Supported formats: - JSON member accessor - JSON wildcard member accessor (Not available in v5, so this is also a plus) - JSON array accessor Questions: 1. Since Nikita’s patches did not address the JSON data type, and JSON currently does not support subscripting, should we limit the initial feature set to JSONB dot-notation for now? In other words, if we aim to fully support JSON simplified accessors for the plain JSON type, should we handle support for plain JSON subscripting as a follow-up effort? 2. I have yet to have a more thorough review of Nikita’s patches. One area I am not familiar with is the hstore-related changes. How relevant is hstore to the JSON simplified accessor? Best, Alex
On 2024-11-21 Th 3:52 PM, Alexandra Wang wrote: > Hi, > > On Tue, Nov 19, 2024 at 6:06 PM Nikita Glukhov <glukhov.n.a@gmail.com> wrote: >> Hi, hackers. >> >> I have implemented dot notation for jsonb using type subscripting back >> in April 2023, but failed post it because I left Postgres Professional >> company soon after and have not worked anywhere since, not even had >> any interest in programming. >> >> But yesterday I accidentally decided to look what is going on at >> commitfests and found this thread. I immediately started to rebase >> code from PG16, fixed some bugs, and now I'm ready to present my >> version of the patches which is much more complex. >> >> Unfortunately, I probably won't be able to devote that much time to >> the patches as before. > Thank you so much, Nikita, for revisiting this topic and sharing your > v6 patches! > > Now that we have two solutions, I’d like to summarize our current > options. > > In Postgres, there are currently three ways to access json/jsonb > object fields and array elements: > > 1. '->' operator (Postgres-specific, predates SQL standard): > > postgres=# select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::json) -> 'd' > -> 0; -- returns 1 > > 2. jsonb subscripting (not available for the plain json type): > > postgres=# select ('{"a": 1, "b": "c", "d": [1, 2, > 3]}'::jsonb)['d'][0]; --returns 1 > > 3. json_query() function: > > postgres=# select json_query(jsonb '{"a": 1, "b": "c", "d": [1, 2, > 3]}', 'lax $.d[0]'); --returns 1 > > A few weeks ago, I did the following performance benchmarking of the > three approaches: > > -- setup: > create table tbl(id int, col1 jsonb); > insert into tbl select i, '{"x":"vx", "y":[{"a":[1,2,3]}, {"b":[1, 2, > {"j":"vj"}]}]}' from generate_series(1, 100000)i; > > -- jsonb_operator.sql > SELECT id, col1 -> 'y' -> 1 -> 'b' -> 2 -> 'j' AS jsonb_operator FROM tbl; > > -- jsonb_subscripting.sql > SELECT id, col1['y'][1]['b'][2]['j'] AS jsonb_subscript FROM tbl; > > -- jsonb_path_query.sql > SELECT id, jsonb_path_query(col1, '$.y[1].b[2].j') FROM tbl; > > # pgbench on my local MacOS machine, using -O3 optimization: > pgbench -n -f XXX.sql postgres -T100 > > Results (Latency | tps): > > "->" operator: 14ms | 68 > jsonb subscripting: 17ms | 58 > jsonb_path_query() function: 23ms | 43 > > So performance from best to worst: > "->" operator > jsonb subscripting >> jsonb_path_query() function. > > I’m excited to see your implementation of dot notation for jsonb using > type subscripting! This approach rounds out the three possible ways to > implement JSON simplified accessors: > > ## v1: json_query() implementation > > Pros: > - Fully adheres to the SQL standard. > > According to the SQL standard, if the JSON simplified accessor <JA> is > not a JSON item method, it is equivalent to a <JSON query>: > > JSON_QUERY ( VEP, 'lax $.JC' WITH CONDITIONAL ARRAY WRAPPER NULL ON > EMPTY NULL ON ERROR) > > (I’m skipping <JA> that includes a JSON item method, as it is > currently outside the scope of both sets of patches.) > > - Easiest to implement > > Cons: > - Slow due to function call overhead. > > ## v2-v5: "->" operator implementation > > We initially chose this approach for its performance benefits. > However, while addressing Peter’s feedback on v5, I encountered the > following issue: > > -- setup > create table test_json_dot(id serial primary key, test_json json); > insert into test_json_dot values (5, '[{"a": 1, "b": 42}, {"a": 2, > "b": {"c": 42}}]'); > > -- problematic query: > test1=# select id, (test_json).b, json_query(test_json, 'lax $.b' WITH > CONDITIONAL WRAPPER NULL ON EMPTY NULL ON ERROR) as expected from > test_json_dot; > id | b | expected > ----+---+----------------- > 5 | | [42, {"c": 42}] > (1 row) > > This issue arises from the semantic differences between the "->" > operator and json_query’s "lax" mode. One possible workaround is to > redefine the "->" operator and modify its implementation. However, since > the "->" operator has been in use for a long time, such changes would > break backward compatibility. > > ## v6: jsonb subscription implementation > > Nikita's patches pass all my functional test cases, including those > that failed with the previous approach. > > Supported formats: > - JSON member accessor > - JSON wildcard member accessor (Not available in v5, so this is also a plus) > - JSON array accessor > > Questions: > > 1. Since Nikita’s patches did not address the JSON data type, and JSON > currently does not support subscripting, should we limit the initial > feature set to JSONB dot-notation for now? In other words, if we aim > to fully support JSON simplified accessors for the plain JSON type, > should we handle support for plain JSON subscripting as a follow-up > effort? > > 2. I have yet to have a more thorough review of Nikita’s patches. > One area I am not familiar with is the hstore-related changes. How > relevant is hstore to the JSON simplified accessor? > We can't change the way the "->" operator works, as there could well be uses of it in the field that rely on its current behaviour. But maybe we could invent a new operator which is compliant with the standard semantics for dot access, and call that. Then we'd get the best performance, and also we might be able to implement it for the plain JSON type. If that proves not possible we can think about not implementing for plain JSON, but I'd rather not go there until we have to. I don't think we should be including hstore changes here - we should just be aiming at implementing the standard for JSON access. hstore changes if any should be a separate feature. The aren't relevant to JSON access, although they might use some of the same infrastructure, depending on implementation. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com