Re: SQL:2023 JSON simplified accessor support - Mailing list pgsql-hackers
From | Alexandra Wang |
---|---|
Subject | Re: SQL:2023 JSON simplified accessor support |
Date | |
Msg-id | CAK98qZ2E-uc10nf4SR9K7Z8VeDPLeTW-+xtdh5Zej54X=mgZcA@mail.gmail.com Whole thread Raw |
In response to | Re: SQL:2023 JSON simplified accessor support (Chao Li <li.evan.chao@gmail.com>) |
List | pgsql-hackers |
Hi Chao,
Thanks for reviewing!
On Thu, Aug 28, 2025 at 8:29 PM Chao Li <li.evan.chao@gmail.com> wrote:
On Aug 26, 2025, at 11:52, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote:<v14-0002-Allow-Generic-Type-Subscripting-to-Accept-Dot-No.patch><v14-0003-Export-jsonPathFromParseResult.patch><v14-0001-Allow-transformation-of-only-a-sublist-of-subscr.patch><v14-0005-Implement-read-only-dot-notation-for-jsonb.patch><v14-0007-Implement-jsonb-wildcard-member-accessor.patch><v14-0006-Implement-Jsonb-subscripting-with-slicing.patch><v14-0004-Extract-coerce_jsonpath_subscript.patch>Best,AlexI found a bug.
```INSERT INTO test_jsonb_types (data) VALUES('[1, 2, "three"]'),('{"con": {"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}}’);```If I use a index following a slice, it doesn’t work:```evantest=# select data[0] from test_jsonb_types;data------1(2 rows)evantest=# select data[0:2][1] from test_jsonb_types; # This should return “2"data------(2 rows)evantest=# select (t.data)['con']['a'][0:1] from test_jsonb_types t; # returned the slice properlydata-----------------------------------------------------[{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}](2 rows)evantest=# select (t.data)['con']['a'][0:1][0] from test_jsonb_types t; # also returned the slice, which is wrongdata-----------------------------------------------------[{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}](2 rows)```We should consider a slice as a container, so the fix is simple. My quick unpolished fix is:```chaol@ChaodeMacBook-Air postgresql % git diffdiff --git a/src/backend/utils/adt/jsonbsubs.c b/src/backend/utils/adt/jsonbsubs.cindex cb72d12ca3f..8845dcf239a 100644--- a/src/backend/utils/adt/jsonbsubs.c+++ b/src/backend/utils/adt/jsonbsubs.c@@ -247,6 +247,7 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List **indirection, SubscriptiListCell *lc;Datum jsp;int pathlen = 0;+ bool isSlice = false;sbsref->refupperindexpr = NIL;sbsref->reflowerindexpr = NIL;@@ -285,6 +286,7 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List **indirection, Subscriptiif (ai->is_slice){+ isSlice = true;while (list_length(sbsref->reflowerindexpr) < list_length(sbsref->refupperindexpr))sbsref->reflowerindexpr = lappend(sbsref->reflowerindexpr, NULL);@@ -369,6 +371,9 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List **indirection, Subscriptipath->next = jpi;path = jpi;pathlen++;++ if (isSlice)+ break;}if (pathlen == 0)```After the fix, let’s test again:```evantest=# select data[0:2][1] from test_jsonb_types; # good resultdata------2(2 rows)evantest=# select (t.data)['con']['a'][0:1][0] from test_jsonb_types t; # good resultdata-------------------------{"b": {"c": {"d": 99}}}(2 rows)```
TL;DR: It is a feature, not a bug.
See longer explanation below:
This behavior aligns with the SQL:2023 standard. While the result you
expected is more intuitive in my opinion, it is incorrect according to
the spec.
expected is more intuitive in my opinion, it is incorrect according to
the spec.
As I mentioned in the commit message of patch v14-0005:
The SQL standard states that simplified access is equivalent to:
JSON_QUERY (VEP, 'lax $.JC' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR)
where:
VEP = <value expression primary>
JC = <JSON simplified accessor op chain>
JSON_QUERY (VEP, 'lax $.JC' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR)
where:
VEP = <value expression primary>
JC = <JSON simplified accessor op chain>
The most relevant detail that I want to highlight is "WITH CONDITIONAL
ARRAY WRAPPER". The documentation[1] says:
ARRAY WRAPPER". The documentation[1] says:
> If the path expression may return multiple values, it might be
> necessary to wrap those values using the WITH WRAPPER clause to make
> it a valid JSON string, because the default behavior is to not wrap
> them, as if WITHOUT WRAPPER were specified. The WITH WRAPPER clause is
> by default taken to mean WITH UNCONDITIONAL WRAPPER, which means that
> even a single result value will be wrapped. To apply the wrapper only
> when multiple values are present, specify WITH CONDITIONAL WRAPPER.
> Getting multiple values in result will be treated as an error if
> WITHOUT WRAPPER is specified.
> necessary to wrap those values using the WITH WRAPPER clause to make
> it a valid JSON string, because the default behavior is to not wrap
> them, as if WITHOUT WRAPPER were specified. The WITH WRAPPER clause is
> by default taken to mean WITH UNCONDITIONAL WRAPPER, which means that
> even a single result value will be wrapped. To apply the wrapper only
> when multiple values are present, specify WITH CONDITIONAL WRAPPER.
> Getting multiple values in result will be treated as an error if
> WITHOUT WRAPPER is specified.
So, for your test queries:
select data[0:2] from test_jsonb_types;
select data[0:2][1] from test_jsonb_types;
select (t.data)['con']['a'][0:1] from test_jsonb_types t;
select (t.data)['con']['a'][0:1][0] from test_jsonb_types t;
We have these equivalents using json_query():
select json_query(data, 'lax $[0 to 2]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) from test_jsonb_types;
select json_query(data, 'lax $[0 to 2][1]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) from test_jsonb_types;
select json_query(data, 'lax $.con.a[0 to 1]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) from test_jsonb_types; -- **[NOTE]**
select json_query(data, 'lax $.con.a[0 to 1][0]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) from test_jsonb_types; -- **[NOTE]**
**[NOTE]**: .a and ['a'] (as well as .con and ['con']) are not
syntactically equivalent, as the dot-notation .a is in "lax" mode,
whereas the pre-standard subscript ['a'] is in "strict" mode. I will
discuss this more in a separate reply to your other comment. However,
for the specific data we inserted in your example table, they happen
to return the same results. Since our focus here is not dot-notation,
syntactically equivalent, as the dot-notation .a is in "lax" mode,
whereas the pre-standard subscript ['a'] is in "strict" mode. I will
discuss this more in a separate reply to your other comment. However,
for the specific data we inserted in your example table, they happen
to return the same results. Since our focus here is not dot-notation,
we won’t go further into it here.
You can verify correctness with:
test=# select data[0:2] = json_query(data, 'lax $[0 to 2]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) from test_jsonb_types;
?column?
----------
t
t
(2 rows)
?column?
----------
t
t
(2 rows)
test=# select (data[0:2][1] is NULL) AND (json_query(data, 'lax $[0 to 2][1]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) is NULL) from test_jsonb_types;
?column?
----------
t
t
(2 rows)
?column?
----------
t
t
(2 rows)
test=# select (((t.data)['con']['a'][0:1] IS NULL) AND (json_query(data, 'lax $.con.a[0 to 1]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) IS NULL)) OR ((t.data)['con']['a'][0:1] = json_query(data, 'lax $.con.a[0 to 1]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR)) from test_jsonb_types t;
?column?
----------
t
t
(2 rows)
?column?
----------
t
t
(2 rows)
test=# select (((t.data)['con']['a'][0:1][0] IS NULL) AND (json_query(data, 'lax $.con.a[0 to 1][0]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) IS NULL)) OR ((t.data)['con']['a'][0:1][0] = json_query(data, 'lax $.con.a[0 to 1][0]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR)) from test_jsonb_types t;
?column?
----------
t
t
(2 rows)
?column?
----------
t
t
(2 rows)
All tests show "t", confirming the current results are all correct.
I think the root of your confusion is the meaning of CONDITIONAL ARRAY
WRAPPER. So let’s try more examples:
-- keep your previous setup
drop table test_jsonb_types;
create table test_jsonb_types (data jsonb);
INSERT INTO test_jsonb_types (data) VALUES ('[1, 2, "three"]'), ('{"con": {"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}}');
Let's start with:
select data[0:2] from test_jsonb_types;
It is equivalent to:
select json_query(data, 'lax $[0 to 2]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) from test_jsonb_types;
They all have the following output:
test=# select json_query(data, 'lax $[0 to 2]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) from test_jsonb_types;
json_query
---------------------------------------------------------------------
[1, 2, "three"]
{"con": {"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}}
(2 rows)
json_query
---------------------------------------------------------------------
[1, 2, "three"]
{"con": {"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}}
(2 rows)
To find out what WITH CONDITIONAL ARRAY WRAPPER does, let's toggle it
to WITHOUT ARRAY WRAPPER:
to WITHOUT ARRAY WRAPPER:
test=# select json_query(data, 'lax $[0 to 2]' WITHOUT ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) from test_jsonb_types;
json_query
---------------------------------------------------------------------
{"con": {"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}}
(2 rows)
json_query
---------------------------------------------------------------------
{"con": {"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}}
(2 rows)
The first row return NULL, because we've specified NULL ON ERROR, so
let's toggle that as well to ERROR ON ERROR:
let's toggle that as well to ERROR ON ERROR:
test=# select json_query(data, 'lax $[0 to 2]' WITHOUT ARRAY WRAPPER NULL ON EMPTY ERROR ON ERROR) from test_jsonb_types;
ERROR: 22034: JSON path expression in JSON_QUERY must return single item when no wrapper is requested
HINT: Use the WITH WRAPPER clause to wrap SQL/JSON items into an array.
LOCATION: JsonPathQuery, jsonpath_exec.c:3987
ERROR: 22034: JSON path expression in JSON_QUERY must return single item when no wrapper is requested
HINT: Use the WITH WRAPPER clause to wrap SQL/JSON items into an array.
LOCATION: JsonPathQuery, jsonpath_exec.c:3987
As shown, without ARRAY WRAPPER, the query produces a sequence of JSON
value items, not a JSON array.
value items, not a JSON array.
Note that array wrapping is only applied to the final result of a
jsonpath, not to each intermediate result in the chain. See the
following example:
jsonpath, not to each intermediate result in the chain. See the
following example:
-- new setup
truncate test_jsonb_types;
INSERT INTO test_jsonb_types (data) VALUES ('[1, 2, "three"]'), ('[1, [2, 22], "three"]');
test=# select json_query(data, 'lax $[0 to 2][1]' WITHOUT ARRAY WRAPPER NULL ON EMPTY ERROR ON ERROR) from test_jsonb_types;
json_query
------------
22
(2 rows)
json_query
------------
22
(2 rows)
In this case, you can see more clearly that "[0 to 2]" fetches three
individual jsonb array elements, and "[1]" treats each of the three
jsonb values as independent jsonb arrays, reading the first element of
each. This is different from wrapping the intermediate result into an
array and accessing the first element of that wrapped array.
individual jsonb array elements, and "[1]" treats each of the three
jsonb values as independent jsonb arrays, reading the first element of
each. This is different from wrapping the intermediate result into an
array and accessing the first element of that wrapped array.
Another thing I want to point out is that there is a trivial case for
"lax" mode when accessing a jsonb object (not a jsonb array) using a
JSON array accessor "[0]". For example:
"lax" mode when accessing a jsonb object (not a jsonb array) using a
JSON array accessor "[0]". For example:
-- another setup, still use your data
truncate test_jsonb_types;
insert into test_jsonb_types VALUES ('{"con": {"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}}');
test=# select (data).con[0] from test_jsonb_types;
con
------------------------------------------------------------
{"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}
(1 row)
con
------------------------------------------------------------
{"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}
(1 row)
This is equivalent to:
test=# select json_query(data, 'lax $.con[0]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) from test_jsonb_types;
json_query
------------------------------------------------------------
{"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}
(1 row)
json_query
------------------------------------------------------------
{"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}
(1 row)
which is different from the result if we use "strict" mode:
test=# select json_query(data, 'strict $.con[0]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) from test_jsonb_types;
json_query
------------
(1 row)
json_query
------------
(1 row)
According to SQL:2023:
— If an operation requires an SQL/JSON array but the operand is not an SQL/JSON array, then the operand is first “wrapped” in an SQL/JSON array prior to performing the operation.
— If an operation requires something other than an SQL/JSON array, but the operand is an SQL/JSON array, then the operand is “unwrapped” by converting its elements into an SQL/JSON sequence prior to performing the operation.
— After applying the preceding resolutions to structural errors, if there is still a structural error , the result is an empty SQL/JSON sequence.
Because "lax" mode wraps a jsonb object into an array of a single
element, accessing it with [0] will always return the same jsonb
object. In fact, you can access it with a chain of [0]s and still get
the same jsonb object:
test=# select json_query(data, 'lax $.con[0][0][0][0][0][0]' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) from test_jsonb_types;
json_query
------------------------------------------------------------
{"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}
(1 row)
test=# select (data).con[0][0][0][0][0][0] from test_jsonb_types;
con
------------------------------------------------------------
{"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}
(1 row)
json_query
------------------------------------------------------------
{"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}
(1 row)
test=# select (data).con[0][0][0][0][0][0] from test_jsonb_types;
con
------------------------------------------------------------
{"a": [{"b": {"c": {"d": 99}}}, {"b": {"c": {"d": 100}}}]}
(1 row)
I hope this long explanation helps!
Best,
Alex
pgsql-hackers by date: