Thread: ON ERROR in json_query and the like
Hi! I’ve noticed two “surprising” (to me) behaviors related to the “ON ERROR” clause of the new JSON query functions in 17beta1. 1. JSON parsing errors are not subject to ON ERROR Apparently, the functions expect JSONB so that a cast is implied when providing TEXT. However, the errors during that cast are not subject to the ON ERROR clause. 17beta1=# SELECT JSON_QUERY('invalid', '$' NULL ON ERROR); ERROR: invalid input syntax for type json DETAIL: Token "invalid" is invalid. CONTEXT: JSON data, line 1: invalid Oracle DB and Db2 (LUW) both return NULL in that case. I had a look on the list archive to see if that is intentional but frankly speaking these functions came a long way. In case it is intentional it might be worth adding a note to the docs. 2. EMPTY [ARRAY|OBJECT] ON ERROR implies ERROR ON EMPTY 17beta1=# SELECT JSON_QUERY('[]', '$[*]' EMPTY ARRAY ON ERROR) a; a ---- [] (1 row) As NULL ON EMPTY is implied, it should give the same result as explicitly adding NULL ON EMPTY: 17beta1=# SELECT JSON_QUERY('[]', '$[*]' NULL ON EMPTY EMPTY ARRAY ON ERROR) a; a --- (1 row) Interestingly, Oracle DB gives the same (wrong) results. Db2 (LUW) on the other hand returns NULL for both queries. I don’t think that PostgreSQL should follow Oracle DB's suit here but again, in case this is intentional it should be made explicit in the docs. -markus
út 28. 5. 2024 v 11:29 odesílatel Markus Winand <markus.winand@winand.at> napsal:
Hi!
I’ve noticed two “surprising” (to me) behaviors related to
the “ON ERROR” clause of the new JSON query functions in 17beta1.
1. JSON parsing errors are not subject to ON ERROR
Apparently, the functions expect JSONB so that a cast is implied
when providing TEXT. However, the errors during that cast are
not subject to the ON ERROR clause.
17beta1=# SELECT JSON_QUERY('invalid', '$' NULL ON ERROR);
ERROR: invalid input syntax for type json
DETAIL: Token "invalid" is invalid.
CONTEXT: JSON data, line 1: invalid
Oracle DB and Db2 (LUW) both return NULL in that case.
I had a look on the list archive to see if that is intentional but
frankly speaking these functions came a long way. In case it is
intentional it might be worth adding a note to the docs.
I remember a talk about this subject years ago. Originally the JSON_QUERY was designed in similar like Oracle, and casting to jsonb was done inside. If I remember this behave depends on the fact, so old SQL/JSON has not json type and it was based just on processing of plain text. But Postgres has JSON, and JSONB and then was more logical to use these types. And because the JSON_QUERY uses these types, and the casting is done before the execution of the function, then the clause ON ERROR cannot be handled. Moreover, until soft errors Postgres didn't allow handling input errors in common functions.
I think so this difference should be mentioned in documentation.
Regards
Pavel
2. EMPTY [ARRAY|OBJECT] ON ERROR implies ERROR ON EMPTY
17beta1=# SELECT JSON_QUERY('[]', '$[*]' EMPTY ARRAY ON ERROR) a;
a
----
[]
(1 row)
As NULL ON EMPTY is implied, it should give the same result as
explicitly adding NULL ON EMPTY:
17beta1=# SELECT JSON_QUERY('[]', '$[*]' NULL ON EMPTY EMPTY ARRAY ON ERROR) a;
a
---
(1 row)
Interestingly, Oracle DB gives the same (wrong) results. Db2 (LUW)
on the other hand returns NULL for both queries.
I don’t think that PostgreSQL should follow Oracle DB's suit here
but again, in case this is intentional it should be made explicit
in the docs.
-markus
On Tue, May 28, 2024 at 5:29 PM Markus Winand <markus.winand@winand.at> wrote: > > Hi! > > I’ve noticed two “surprising” (to me) behaviors related to > the “ON ERROR” clause of the new JSON query functions in 17beta1. > > 1. JSON parsing errors are not subject to ON ERROR > Apparently, the functions expect JSONB so that a cast is implied > when providing TEXT. However, the errors during that cast are > not subject to the ON ERROR clause. > > 17beta1=# SELECT JSON_QUERY('invalid', '$' NULL ON ERROR); > ERROR: invalid input syntax for type json > DETAIL: Token "invalid" is invalid. > CONTEXT: JSON data, line 1: invalid > > Oracle DB and Db2 (LUW) both return NULL in that case. > > I had a look on the list archive to see if that is intentional but > frankly speaking these functions came a long way. In case it is > intentional it might be worth adding a note to the docs. previous versions require SQL/JSON query function's context_item to explicitly cast to jsonb, if it is not it will error out. previous version the following query will have a error select json_value(text '"1"' , 'strict $[*]' DEFAULT 9 ON ERROR); now it only requires that (context_item) casting to jsonb successfully. I raise this issue separately at [1] [1] https://www.postgresql.org/message-id/CACJufxGWJTa-b0WjNH15otih42PA7SF%2Be7LbkAb0gThs7ojT5Q%40mail.gmail.com > 2. EMPTY [ARRAY|OBJECT] ON ERROR implies ERROR ON EMPTY > > 17beta1=# SELECT JSON_QUERY('[]', '$[*]' EMPTY ARRAY ON ERROR) a; > a > ---- > [] > (1 row) > > As NULL ON EMPTY is implied, it should give the same result as > explicitly adding NULL ON EMPTY: > I vaguely remember, we stumbled on ON ERROR, ON EMPTY several times. i don't have a standard, but the doc seems not explicit enough for the above example. in json_query, maybe we can rephrase like: -------------------- The ON EMPTY clause specifies the behavior if evaluating path_expression yields no value at all. The default when ON EMPTY is not specified and ON ERROR not specified is to return a null value. The ON ERROR clause specifies the behavior if an error occurs when evaluating path_expression, including evaluation yields no value at all and ON EMPTY is not specified, the operation to coerce the result value to the output type, or during the execution of ON EMPTY behavior (that is caused by empty result of path_expression evaluation). The default when ON ERROR is not specified is to return a null value. > 17beta1=# SELECT JSON_QUERY('[]', '$[*]' NULL ON EMPTY EMPTY ARRAY ON ERROR) a; > a > --- > > (1 row) > > Interestingly, Oracle DB gives the same (wrong) results. Db2 (LUW) > on the other hand returns NULL for both queries. > > I don’t think that PostgreSQL should follow Oracle DB's suit here > but again, in case this is intentional it should be made explicit > in the docs. ` SELECT JSON_QUERY('[]', '$[*]' NULL ON EMPTY EMPTY ARRAY ON ERROR) a; ` I think these sentences addressed the above query. <<< or during the execution of ON EMPTY behavior (that is caused by empty result of path_expression evaluation). <<< As you can see, in this context, "execution of ON EMPTY behavior" works fine, successfully returned null, so `EMPTY ARRAY ON ERROR` part was ignored.
On Tue, May 28, 2024 at 5:29 PM Markus Winand <markus.winand@winand.at> wrote: > > Hi! > > I’ve noticed two “surprising” (to me) behaviors related to > the “ON ERROR” clause of the new JSON query functions in 17beta1. > > 1. JSON parsing errors are not subject to ON ERROR > Apparently, the functions expect JSONB so that a cast is implied > when providing TEXT. However, the errors during that cast are > not subject to the ON ERROR clause. > > 17beta1=# SELECT JSON_QUERY('invalid', '$' NULL ON ERROR); > ERROR: invalid input syntax for type json > DETAIL: Token "invalid" is invalid. > CONTEXT: JSON data, line 1: invalid > > Oracle DB and Db2 (LUW) both return NULL in that case. > > I had a look on the list archive to see if that is intentional but > frankly speaking these functions came a long way. In case it is > intentional it might be worth adding a note to the docs. > json_query ( context_item, path_expression); `SELECT JSON_QUERY('invalid', '$' NULL ON ERROR);` to make this return NULL, that means to catch all the errors that happened while context_item evaluation. otherwise, it would not be consistent? Currently context_item expressions can be quite arbitrary. considering the following examples. create or replace function test(jsonb) returns jsonb as $$ begin raise exception 'abort'; end $$ language plpgsql; create or replace function test1(jsonb) returns jsonb as $$ begin return $1; end $$ language plpgsql; SELECT JSON_VALUE(test('1'), '$'); SELECT JSON_VALUE(test1('1'), '$'); SELECT JSON_VALUE((select '1'::jsonb), '$'); SELECT JSON_VALUE((with cte(s) as (select '1') select s::jsonb from cte), '$'); SELECT JSON_VALUE((with cte(s) as (select '1') select s::jsonb from cte union all select s::jsonb from cte limit 1), '$'); Currently, I don't think we can make SELECT JSON_VALUE(test('1'), '$' null on error); return NULL.
> On 11.06.2024, at 03:58, jian he <jian.universality@gmail.com> wrote: > > On Tue, May 28, 2024 at 5:29 PM Markus Winand <markus.winand@winand.at> wrote: >> >> Hi! >> >> I’ve noticed two “surprising” (to me) behaviors related to >> the “ON ERROR” clause of the new JSON query functions in 17beta1. >> >> 1. JSON parsing errors are not subject to ON ERROR >> Apparently, the functions expect JSONB so that a cast is implied >> when providing TEXT. However, the errors during that cast are >> not subject to the ON ERROR clause. >> >> 17beta1=# SELECT JSON_QUERY('invalid', '$' NULL ON ERROR); >> ERROR: invalid input syntax for type json >> DETAIL: Token "invalid" is invalid. >> CONTEXT: JSON data, line 1: invalid >> >> Oracle DB and Db2 (LUW) both return NULL in that case. >> >> I had a look on the list archive to see if that is intentional but >> frankly speaking these functions came a long way. In case it is >> intentional it might be worth adding a note to the docs. >> > > json_query ( context_item, path_expression); > > `SELECT JSON_QUERY('invalid', '$' NULL ON ERROR);` > to make this return NULL, that means to catch all the errors that > happened while context_item evaluation. > otherwise, it would not be consistent? > > Currently context_item expressions can be quite arbitrary. > considering the following examples. > > create or replace function test(jsonb) returns jsonb as $$ begin raise > exception 'abort'; end $$ language plpgsql; > create or replace function test1(jsonb) returns jsonb as $$ begin > return $1; end $$ language plpgsql; > SELECT JSON_VALUE(test('1'), '$'); > SELECT JSON_VALUE(test1('1'), '$'); > SELECT JSON_VALUE((select '1'::jsonb), '$'); > SELECT JSON_VALUE((with cte(s) as (select '1') select s::jsonb from cte), '$'); > SELECT JSON_VALUE((with cte(s) as (select '1') select s::jsonb from > cte union all select s::jsonb from cte limit 1), '$'); > > Currently, I don't think we can make > SELECT JSON_VALUE(test('1'), '$' null on error); > return NULL. This is not how it is meant. Your example is not subject to the ON ERROR clause because the error happens in a sub-expression. My point is that ON ERROR includes the String to JSON conversion (the JSON parsing) that — in the way the standard describes these functions — inside of them. In the standard, JSON_VALUE & co accept string types as well as the type JSON: 10.14 SR 1: The declared type of the <value expression> simply contained in the <JSON input expression> immediately containedin the <JSON context item> shall be a string type or a JSON type. It might be best to think of it as two separate functions, overloaded: JSON_VALUE(context_item JSONB, path_expression …) JSON_VALUE(context_item TEXT, path_expression …) Now if you do this: create function test2(text) returns text as $$ begin return $1; end $$ language plpgsql; create function test3(text) returns jsonb as $$ begin return $1::jsonb; end $$ language plpgsql; SELECT JSON_VALUE(test2('invalid'), '$' null on error); SELECT JSON_VALUE(test3('invalid'), '$' null on error); The first query should return NULL, while the second should (and does) fail. This is how I understand it. -markus
On Wednesday, June 12, 2024, Markus Winand <markus.winand@winand.at> wrote:
10.14 SR 1: The declared type of the <value expression> simply contained in the <JSON input expression> immediately contained in the <JSON context item> shall be a string type or a JSON type.
It might be best to think of it as two separate functions, overloaded:
JSON_VALUE(context_item JSONB, path_expression …)
JSON_VALUE(context_item TEXT, path_expression …)
Yes, we need to document that we deviate from (fail to fully implement) the standard here in that we only provide jsonb parameter functions, not text ones.
David J.
> On 04.06.2024, at 07:00, jian he <jian.universality@gmail.com> wrote: > > On Tue, May 28, 2024 at 5:29 PM Markus Winand <markus.winand@winand.at> wrote: > >> 2. EMPTY [ARRAY|OBJECT] ON ERROR implies ERROR ON EMPTY >> >> 17beta1=# SELECT JSON_QUERY('[]', '$[*]' EMPTY ARRAY ON ERROR) a; >> a >> ---- >> [] >> (1 row) >> >> As NULL ON EMPTY is implied, it should give the same result as >> explicitly adding NULL ON EMPTY: >> > > I vaguely remember, we stumbled on ON ERROR, ON EMPTY several times. > i don't have a standard, In my understanding of the standard is that there is no distinction between an explicit and implicit ON EMPTY clause. E.g. clause 6.35 (json_query) Syntax Rule 4: • If <JSON query empty behavior> is not specified, then NULL ON EMPTY is implicit. General Rule 5ai then covers the NULL ON EMPTY case: • i) If the length of SEQ2 is 0 (zero) and ONEMPTY is NULL, then let JV be the null value. Neither of these make the ON EMPTY handling dependent on the presence of ON ERROR. -markus
On Tuesday, May 28, 2024, Markus Winand <markus.winand@winand.at> wrote:
2. EMPTY [ARRAY|OBJECT] ON ERROR implies ERROR ON EMPTY
17beta1=# SELECT JSON_QUERY('[]', '$[*]' EMPTY ARRAY ON ERROR) a;
a
----
[]
(1 row)
As NULL ON EMPTY is implied, it should give the same result as
explicitly adding NULL ON EMPTY:
17beta1=# SELECT JSON_QUERY('[]', '$[*]' NULL ON EMPTY EMPTY ARRAY ON ERROR) a;
a
---
(1 row)
Interestingly, Oracle DB gives the same (wrong) results. Db2 (LUW)
on the other hand returns NULL for both queries.
I don’t think that PostgreSQL should follow Oracle DB's suit here
but again, in case this is intentional it should be made explicit
in the docs.
The docs here don’t seem to cover the on empty clause at all nor fully cover all options.
Where do you find the claim that the one implies the other? Is it a typo that your examples says “implies null on empty” but the subject line says “implies error on empty”?
Without those clauses a result is either empty or an error - they are mutually exclusive (ignoring matches). I would not expect one clause to imply or affect the behavior of the other. There is no chaining. The original result is transformed to the new result specified by the clause.
I’d need to figure out whether the example you show is actually producing empty or error; but it seems correct if the result is empty. The first query ignores the error clause - the empty array row seems to be the representation of empty here; the second one matches the empty clause and outputs null instead of the empty array.
David J.
> On 12.06.2024, at 15:31, David G. Johnston <david.g.johnston@gmail.com> wrote: > > On Tuesday, May 28, 2024, Markus Winand <markus.winand@winand.at> wrote: > > 2. EMPTY [ARRAY|OBJECT] ON ERROR implies ERROR ON EMPTY > > 17beta1=# SELECT JSON_QUERY('[]', '$[*]' EMPTY ARRAY ON ERROR) a; > a > ---- > [] > (1 row) > > As NULL ON EMPTY is implied, it should give the same result as > explicitly adding NULL ON EMPTY: > > 17beta1=# SELECT JSON_QUERY('[]', '$[*]' NULL ON EMPTY EMPTY ARRAY ON ERROR) a; > a > --- > > (1 row) > > Interestingly, Oracle DB gives the same (wrong) results. Db2 (LUW) > on the other hand returns NULL for both queries. > > I don’t think that PostgreSQL should follow Oracle DB's suit here > but again, in case this is intentional it should be made explicit > in the docs. > > The docs here don’t seem to cover the on empty clause at all nor fully cover all options. > > Where do you find the claim that the one implies the other? Is it a typo that your examples says “implies null on empty”but the subject line says “implies error on empty”? I see the confusion caused — sorry. The headline was meant to describe the observed behaviour in 17beta1, while the contentrefers to how the standard defines it. > Without those clauses a result is either empty or an error - they are mutually exclusive (ignoring matches). I would notexpect one clause to imply or affect the behavior of the other. There is no chaining. The original result is transformedto the new result specified by the clause. Agreed, that’s why I found the 17beta1 behaviour surprising. > I’d need to figure out whether the example you show is actually producing empty or error; but it seems correct if the resultis empty. As I understand the standard, an empty result is not an error. > The first query ignores the error clause - the empty array row seems to be the representation of empty here; the secondone matches the empty clause and outputs null instead of the empty array. But the first should behave the same, as the standard implies NULL ON EMPTY if there is no explicit ON EMPTY clause. OracleDB behaving differently here makes me wonder if there is something in the standard that I haven’t noticed yet... -markus
Hi, (apologies for not replying to this thread sooner) On Tue, May 28, 2024 at 6:57 PM Pavel Stehule <pavel.stehule@gmail.com> wrote: > út 28. 5. 2024 v 11:29 odesílatel Markus Winand <markus.winand@winand.at> napsal: >> >> Hi! >> >> I’ve noticed two “surprising” (to me) behaviors related to >> the “ON ERROR” clause of the new JSON query functions in 17beta1. >> >> 1. JSON parsing errors are not subject to ON ERROR >> Apparently, the functions expect JSONB so that a cast is implied >> when providing TEXT. However, the errors during that cast are >> not subject to the ON ERROR clause. >> >> 17beta1=# SELECT JSON_QUERY('invalid', '$' NULL ON ERROR); >> ERROR: invalid input syntax for type json >> DETAIL: Token "invalid" is invalid. >> CONTEXT: JSON data, line 1: invalid >> >> Oracle DB and Db2 (LUW) both return NULL in that case. >> >> I had a look on the list archive to see if that is intentional but >> frankly speaking these functions came a long way. In case it is >> intentional it might be worth adding a note to the docs. > > > I remember a talk about this subject years ago. Originally the JSON_QUERY was designed in similar like Oracle, and castingto jsonb was done inside. If I remember this behave depends on the fact, so old SQL/JSON has not json type and itwas based just on processing of plain text. But Postgres has JSON, and JSONB and then was more logical to use these types.And because the JSON_QUERY uses these types, and the casting is done before the execution of the function, then theclause ON ERROR cannot be handled. Moreover, until soft errors Postgres didn't allow handling input errors in common functions. > > I think so this difference should be mentioned in documentation. Agree that the documentation needs to be clear about this. I'll update my patch at [1] to add a note next to table 9.16.3. SQL/JSON Query Functions. >> 2. EMPTY [ARRAY|OBJECT] ON ERROR implies ERROR ON EMPTY >> >> 17beta1=# SELECT JSON_QUERY('[]', '$[*]' EMPTY ARRAY ON ERROR) a; >> a >> ---- >> [] >> (1 row) >> >> As NULL ON EMPTY is implied, it should give the same result as >> explicitly adding NULL ON EMPTY: >> >> 17beta1=# SELECT JSON_QUERY('[]', '$[*]' NULL ON EMPTY EMPTY ARRAY ON ERROR) a; >> a >> --- >> >> (1 row) >> >> Interestingly, Oracle DB gives the same (wrong) results. Db2 (LUW) >> on the other hand returns NULL for both queries. >> >> I don’t think that PostgreSQL should follow Oracle DB's suit here >> but again, in case this is intentional it should be made explicit >> in the docs. This behavior is a bug and result of an unintentional change that I made at some point after getting involved with this patch set. So I'm going to fix this so that the empty results of jsonpath evaluation use NULL ON EMPTY by default, ie, when the ON EMPTY clause is not present. Attached a patch to do so. -- Thanks, Amit Langote [1] https://www.postgresql.org/message-id/CA%2BHiwqGdineyHfcTEe0%3D8jjXonH3qXi4vFB%2BgRxf1L%2BxR2v_Pw%40mail.gmail.com
Attachment
Hi, On 06/17/24 02:20, Amit Langote wrote: >>> Apparently, the functions expect JSONB so that a cast is implied >>> when providing TEXT. However, the errors during that cast are >>> not subject to the ON ERROR clause. >>> >>> 17beta1=# SELECT JSON_QUERY('invalid', '$' NULL ON ERROR); >>> ERROR: invalid input syntax for type json >>> DETAIL: Token "invalid" is invalid. >>> CONTEXT: JSON data, line 1: invalid >>> >>> Oracle DB and Db2 (LUW) both return NULL in that case. I wonder, could prosupport rewriting be used to detect that the first argument is supplied by a cast, and rewrite the expression to apply the cast 'softly'? Or would that behavior be too magical? Regards, -Chap
> On 17.06.2024, at 08:20, Amit Langote <amitlangote09@gmail.com> wrote: > > Hi, > > (apologies for not replying to this thread sooner) > > On Tue, May 28, 2024 at 6:57 PM Pavel Stehule <pavel.stehule@gmail.com> wrote: >> út 28. 5. 2024 v 11:29 odesílatel Markus Winand <markus.winand@winand.at> napsal: >>> >>> Hi! >>> >>> I’ve noticed two “surprising” (to me) behaviors related to >>> the “ON ERROR” clause of the new JSON query functions in 17beta1. >>> >>> 1. JSON parsing errors are not subject to ON ERROR >>> Apparently, the functions expect JSONB so that a cast is implied >>> when providing TEXT. However, the errors during that cast are >>> not subject to the ON ERROR clause. >>> >>> 17beta1=# SELECT JSON_QUERY('invalid', '$' NULL ON ERROR); >>> ERROR: invalid input syntax for type json >>> DETAIL: Token "invalid" is invalid. >>> CONTEXT: JSON data, line 1: invalid >>> >>> Oracle DB and Db2 (LUW) both return NULL in that case. >>> >>> I had a look on the list archive to see if that is intentional but >>> frankly speaking these functions came a long way. In case it is >>> intentional it might be worth adding a note to the docs. >> >> >> I remember a talk about this subject years ago. Originally the JSON_QUERY was designed in similar like Oracle, and castingto jsonb was done inside. If I remember this behave depends on the fact, so old SQL/JSON has not json type and itwas based just on processing of plain text. But Postgres has JSON, and JSONB and then was more logical to use these types.And because the JSON_QUERY uses these types, and the casting is done before the execution of the function, then theclause ON ERROR cannot be handled. Moreover, until soft errors Postgres didn't allow handling input errors in common functions. >> >> I think so this difference should be mentioned in documentation. > > Agree that the documentation needs to be clear about this. I'll update > my patch at [1] to add a note next to table 9.16.3. SQL/JSON Query > Functions. Considering another branch of this thread [1] I think the "Supported Features” appendix of the docs should mention that as well. The way I see it is that the standards defines two overloaded JSON_QUERY functions, of which PostgreSQL will support only one. In case of valid JSON, the implied CAST makes it look as though the second variant of these function was supported as well but that illusion totally falls apart once the JSON is not valid anymore. I think it affects the following feature IDs: - T821, Basic SQL/JSON query operators For JSON_VALUE, JSON_TABLE and JSON_EXISTS - T828, JSON_QUERY Also, how hard would it be to add the functions that accept character strings? Is there, besides the effort, any thing else against it? I’m asking because I believe once released it might never be changed — for backward compatibility. [1] https://www.postgresql.org/message-id/CAKFQuwb50BAaj83Np%2B1O6xe3_T6DO8w2mxtFbgSbbUng%2BabrqA%40mail.gmail.com > >>> 2. EMPTY [ARRAY|OBJECT] ON ERROR implies ERROR ON EMPTY >>> >>> 17beta1=# SELECT JSON_QUERY('[]', '$[*]' EMPTY ARRAY ON ERROR) a; >>> a >>> ---- >>> [] >>> (1 row) >>> >>> As NULL ON EMPTY is implied, it should give the same result as >>> explicitly adding NULL ON EMPTY: >>> >>> 17beta1=# SELECT JSON_QUERY('[]', '$[*]' NULL ON EMPTY EMPTY ARRAY ON ERROR) a; >>> a >>> --- >>> >>> (1 row) >>> >>> Interestingly, Oracle DB gives the same (wrong) results. Db2 (LUW) >>> on the other hand returns NULL for both queries. >>> >>> I don’t think that PostgreSQL should follow Oracle DB's suit here >>> but again, in case this is intentional it should be made explicit >>> in the docs. > > This behavior is a bug and result of an unintentional change that I > made at some point after getting involved with this patch set. So I'm > going to fix this so that the empty results of jsonpath evaluation use > NULL ON EMPTY by default, ie, when the ON EMPTY clause is not present. > Attached a patch to do so. > Tested: works. Thanks :) -markus
po 17. 6. 2024 v 15:07 odesílatel Markus Winand <markus.winand@winand.at> napsal:
> On 17.06.2024, at 08:20, Amit Langote <amitlangote09@gmail.com> wrote:
>
> Hi,
>
> (apologies for not replying to this thread sooner)
>
> On Tue, May 28, 2024 at 6:57 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> út 28. 5. 2024 v 11:29 odesílatel Markus Winand <markus.winand@winand.at> napsal:
>>>
>>> Hi!
>>>
>>> I’ve noticed two “surprising” (to me) behaviors related to
>>> the “ON ERROR” clause of the new JSON query functions in 17beta1.
>>>
>>> 1. JSON parsing errors are not subject to ON ERROR
>>> Apparently, the functions expect JSONB so that a cast is implied
>>> when providing TEXT. However, the errors during that cast are
>>> not subject to the ON ERROR clause.
>>>
>>> 17beta1=# SELECT JSON_QUERY('invalid', '$' NULL ON ERROR);
>>> ERROR: invalid input syntax for type json
>>> DETAIL: Token "invalid" is invalid.
>>> CONTEXT: JSON data, line 1: invalid
>>>
>>> Oracle DB and Db2 (LUW) both return NULL in that case.
>>>
>>> I had a look on the list archive to see if that is intentional but
>>> frankly speaking these functions came a long way. In case it is
>>> intentional it might be worth adding a note to the docs.
>>
>>
>> I remember a talk about this subject years ago. Originally the JSON_QUERY was designed in similar like Oracle, and casting to jsonb was done inside. If I remember this behave depends on the fact, so old SQL/JSON has not json type and it was based just on processing of plain text. But Postgres has JSON, and JSONB and then was more logical to use these types. And because the JSON_QUERY uses these types, and the casting is done before the execution of the function, then the clause ON ERROR cannot be handled. Moreover, until soft errors Postgres didn't allow handling input errors in common functions.
>>
>> I think so this difference should be mentioned in documentation.
>
> Agree that the documentation needs to be clear about this. I'll update
> my patch at [1] to add a note next to table 9.16.3. SQL/JSON Query
> Functions.
Considering another branch of this thread [1] I think the
"Supported Features” appendix of the docs should mention that as well.
The way I see it is that the standards defines two overloaded
JSON_QUERY functions, of which PostgreSQL will support only one.
In case of valid JSON, the implied CAST makes it look as though
the second variant of these function was supported as well but that
illusion totally falls apart once the JSON is not valid anymore.
I think it affects the following feature IDs:
- T821, Basic SQL/JSON query operators
For JSON_VALUE, JSON_TABLE and JSON_EXISTS
- T828, JSON_QUERY
Also, how hard would it be to add the functions that accept
character strings? Is there, besides the effort, any thing else
against it? I’m asking because I believe once released it might
never be changed — for backward compatibility.
It is easy to add the function that accepts text, but when you have overloaded functions, then varchar or text is the preferred type, and then
the arguments will be casted to text by default instead of json. You can have one function with argument of type "any", but then the
execution is a little bit slower (outer cast is faster than cast inside function), and again the Postgres cannot deduce used argument types from function's argument types.
Probably this can be solved if we introduce a new kind of type, where the preferred type will be json, or jsonb.
So the problem is in the definition of implementation details about the mechanism of type deduction (when you use string literal or when you use string expression).
So now, when you will write json_table(x1 || x2), then and x1 and x2 are of unknown type, then Postgres can know, so x1 and x2 will be jsonb, but when there
will be secondary function json_table(text), then Postgres detects problem, and use preferred type (that is text).
Generally, Postgres supports function overloading and it is working well between text, int and numeric where constants have different syntax, but when the constant
literal has the same syntax, there can be problems with hidden casts to unwanted type, so not overloaded function is ideal. These issues can be solved in the analysis stage by special code, but again it increases code complexity.
[1] https://www.postgresql.org/message-id/CAKFQuwb50BAaj83Np%2B1O6xe3_T6DO8w2mxtFbgSbbUng%2BabrqA%40mail.gmail.com
>
>>> 2. EMPTY [ARRAY|OBJECT] ON ERROR implies ERROR ON EMPTY
>>>
>>> 17beta1=# SELECT JSON_QUERY('[]', '$[*]' EMPTY ARRAY ON ERROR) a;
>>> a
>>> ----
>>> []
>>> (1 row)
>>>
>>> As NULL ON EMPTY is implied, it should give the same result as
>>> explicitly adding NULL ON EMPTY:
>>>
>>> 17beta1=# SELECT JSON_QUERY('[]', '$[*]' NULL ON EMPTY EMPTY ARRAY ON ERROR) a;
>>> a
>>> ---
>>>
>>> (1 row)
>>>
>>> Interestingly, Oracle DB gives the same (wrong) results. Db2 (LUW)
>>> on the other hand returns NULL for both queries.
>>>
>>> I don’t think that PostgreSQL should follow Oracle DB's suit here
>>> but again, in case this is intentional it should be made explicit
>>> in the docs.
>
> This behavior is a bug and result of an unintentional change that I
> made at some point after getting involved with this patch set. So I'm
> going to fix this so that the empty results of jsonpath evaluation use
> NULL ON EMPTY by default, ie, when the ON EMPTY clause is not present.
> Attached a patch to do so.
>
Tested: works.
Thanks :)
-markus
On Mon, Jun 17, 2024 at 10:07 PM Markus Winand <markus.winand@winand.at> wrote: > > On 17.06.2024, at 08:20, Amit Langote <amitlangote09@gmail.com> wrote: > >>> 2. EMPTY [ARRAY|OBJECT] ON ERROR implies ERROR ON EMPTY > >>> > >>> 17beta1=# SELECT JSON_QUERY('[]', '$[*]' EMPTY ARRAY ON ERROR) a; > >>> a > >>> ---- > >>> [] > >>> (1 row) > >>> > >>> As NULL ON EMPTY is implied, it should give the same result as > >>> explicitly adding NULL ON EMPTY: > >>> > >>> 17beta1=# SELECT JSON_QUERY('[]', '$[*]' NULL ON EMPTY EMPTY ARRAY ON ERROR) a; > >>> a > >>> --- > >>> > >>> (1 row) > >>> > >>> Interestingly, Oracle DB gives the same (wrong) results. Db2 (LUW) > >>> on the other hand returns NULL for both queries. > >>> > >>> I don’t think that PostgreSQL should follow Oracle DB's suit here > >>> but again, in case this is intentional it should be made explicit > >>> in the docs. > > > > This behavior is a bug and result of an unintentional change that I > > made at some point after getting involved with this patch set. So I'm > > going to fix this so that the empty results of jsonpath evaluation use > > NULL ON EMPTY by default, ie, when the ON EMPTY clause is not present. > > Attached a patch to do so. > > > > Tested: works. Pushed, thanks for testing. I'll work on the documentation updates that may be needed based on this and nearby discussion(s). -- Thanks, Amit Langote
On Mon, Jun 17, 2024 at 9:07 PM Markus Winand <markus.winand@winand.at> wrote: > > > I think it affects the following feature IDs: > > - T821, Basic SQL/JSON query operators > For JSON_VALUE, JSON_TABLE and JSON_EXISTS > - T828, JSON_QUERY > > Also, how hard would it be to add the functions that accept > character strings? Is there, besides the effort, any thing else > against it? I’m asking because I believe once released it might > never be changed — for backward compatibility. > we have ExecEvalCoerceViaIOSafe, so it's doable. I tried, but other things break. so it's not super easy i think. because of eval_const_expressions_mutator, postgres will constantly evaluate simple const expressions to simplify some expressions. `SELECT JSON_QUERY('a', '$');` postgres will try to do the cast coercion from text 'a' to jsonb. but it will fail, but it's hard to send the cast failed information to later code, in ExecInterpExpr. in ExecEvalCoerceViaIOSafe, if we cast coercion failed, then this function value is zero, isnull is set to true. `SELECT JSON_QUERY('a', '$');` will be equivalent to `SELECT JSON_QUERY(NULL, '$');` so making one of the next 2 examples to return jsonb 1 would be hard to implement. SELECT JSON_QUERY('a', '$' default '1' on empty); SELECT JSON_QUERY('a', '$' default '1' on error); -------------------------------------------------------------------------- If the standard says the context_item can be text string (cannot cast to json successfully). future version we make it happen, then it should be fine? because it's like the previous version we are not fully compliant with standard, now the new version is in full compliance with standard.
Hi, On Mon, Jun 17, 2024 at 9:47 PM Chapman Flack <jcflack@acm.org> wrote: > On 06/17/24 02:20, Amit Langote wrote: > >>> Apparently, the functions expect JSONB so that a cast is implied > >>> when providing TEXT. However, the errors during that cast are > >>> not subject to the ON ERROR clause. > >>> > >>> 17beta1=# SELECT JSON_QUERY('invalid', '$' NULL ON ERROR); > >>> ERROR: invalid input syntax for type json > >>> DETAIL: Token "invalid" is invalid. > >>> CONTEXT: JSON data, line 1: invalid > >>> > >>> Oracle DB and Db2 (LUW) both return NULL in that case. > > I wonder, could prosupport rewriting be used to detect that the first > argument is supplied by a cast, and rewrite the expression to apply the > cast 'softly'? Or would that behavior be too magical? I don't think prosupport rewriting can be used, because JSON_QUERY(). We could possibly use "runtime coercion" for context_item so that the coercion errors can be "caught", which is how we coerce the jsonpath result to the RETURNING type. For now, I'm inclined to simply document the limitation that errors when coercing string arguments to json are always thrown. -- Thanks, Amit Langote
Hi, On Mon, Jun 17, 2024 at 10:07 PM Markus Winand <markus.winand@winand.at> wrote: > > On 17.06.2024, at 08:20, Amit Langote <amitlangote09@gmail.com> wrote: > > Agree that the documentation needs to be clear about this. I'll update > > my patch at [1] to add a note next to table 9.16.3. SQL/JSON Query > > Functions. > > Considering another branch of this thread [1] I think the > "Supported Features” appendix of the docs should mention that as well. > > The way I see it is that the standards defines two overloaded > JSON_QUERY functions, of which PostgreSQL will support only one. > In case of valid JSON, the implied CAST makes it look as though > the second variant of these function was supported as well but that > illusion totally falls apart once the JSON is not valid anymore. > > I think it affects the following feature IDs: > > - T821, Basic SQL/JSON query operators > For JSON_VALUE, JSON_TABLE and JSON_EXISTS > - T828, JSON_QUERY > > Also, how hard would it be to add the functions that accept > character strings? Is there, besides the effort, any thing else > against it? I’m asking because I believe once released it might > never be changed — for backward compatibility. Hmm, I'm starting to think that adding the implied cast to json wasn't such a great idea after all, because it might mislead the users to think that JSON parsing is transparent (respects ON ERROR), which is what you are saying, IIUC. I'm inclined to push the attached patch which puts back the restriction to allow only jsonb arguments, asking users to add an explicit cast if necessary. -- Thanks, Amit Langote
Attachment
On Thu, Jun 20, 2024 at 2:19 AM Amit Langote <amitlangote09@gmail.com> wrote:
Hi,
On Mon, Jun 17, 2024 at 10:07 PM Markus Winand <markus.winand@winand.at> wrote:
> > On 17.06.2024, at 08:20, Amit Langote <amitlangote09@gmail.com> wrote:
> > Agree that the documentation needs to be clear about this. I'll update
> > my patch at [1] to add a note next to table 9.16.3. SQL/JSON Query
> > Functions.
>
> Considering another branch of this thread [1] I think the
> "Supported Features” appendix of the docs should mention that as well.
>
> The way I see it is that the standards defines two overloaded
> JSON_QUERY functions, of which PostgreSQL will support only one.
> In case of valid JSON, the implied CAST makes it look as though
> the second variant of these function was supported as well but that
> illusion totally falls apart once the JSON is not valid anymore.
>
> I think it affects the following feature IDs:
>
> - T821, Basic SQL/JSON query operators
> For JSON_VALUE, JSON_TABLE and JSON_EXISTS
> - T828, JSON_QUERY
>
> Also, how hard would it be to add the functions that accept
> character strings? Is there, besides the effort, any thing else
> against it? I’m asking because I believe once released it might
> never be changed — for backward compatibility.
Hmm, I'm starting to think that adding the implied cast to json wasn't
such a great idea after all, because it might mislead the users to
think that JSON parsing is transparent (respects ON ERROR), which is
what you are saying, IIUC.
Actually, the implied cast is exactly the correct thing to do here - the issue is that we aren't using the newly added soft errors infrastructure [1] to catch the result of that cast and instead produce whatever output on error tells us to produce. This seems to be in the realm of doability so we should try in the interest of being standard conforming. I'd even argue to make this an open item unless and until the attempt is agreed upon to have failed (or it succeeds of course).
David J.
[1] d9f7f5d32f201bec61fef8104aafcb77cecb4dcb
On Fri, Jun 21, 2024 at 1:11 AM David G. Johnston <david.g.johnston@gmail.com> wrote: > On Thu, Jun 20, 2024 at 2:19 AM Amit Langote <amitlangote09@gmail.com> wrote: >> On Mon, Jun 17, 2024 at 10:07 PM Markus Winand <markus.winand@winand.at> wrote: >> > > On 17.06.2024, at 08:20, Amit Langote <amitlangote09@gmail.com> wrote: >> > > Agree that the documentation needs to be clear about this. I'll update >> > > my patch at [1] to add a note next to table 9.16.3. SQL/JSON Query >> > > Functions. >> > >> > Considering another branch of this thread [1] I think the >> > "Supported Features” appendix of the docs should mention that as well. >> > >> > The way I see it is that the standards defines two overloaded >> > JSON_QUERY functions, of which PostgreSQL will support only one. >> > In case of valid JSON, the implied CAST makes it look as though >> > the second variant of these function was supported as well but that >> > illusion totally falls apart once the JSON is not valid anymore. >> > >> > I think it affects the following feature IDs: >> > >> > - T821, Basic SQL/JSON query operators >> > For JSON_VALUE, JSON_TABLE and JSON_EXISTS >> > - T828, JSON_QUERY >> > >> > Also, how hard would it be to add the functions that accept >> > character strings? Is there, besides the effort, any thing else >> > against it? I’m asking because I believe once released it might >> > never be changed — for backward compatibility. >> >> Hmm, I'm starting to think that adding the implied cast to json wasn't >> such a great idea after all, because it might mislead the users to >> think that JSON parsing is transparent (respects ON ERROR), which is >> what you are saying, IIUC. >> > > Actually, the implied cast is exactly the correct thing to do here - the issue is that we aren't using the newly addedsoft errors infrastructure [1] to catch the result of that cast and instead produce whatever output on error tells usto produce. This seems to be in the realm of doability so we should try in the interest of being standard conforming. Soft error handling *was* used for catching cast errors in the very early versions of this patch (long before I got involved and the infrastructure you mention got added). It was taken out after Pavel said [1] that he didn't like producing NULL instead of throwing an error. Not sure if Pavel's around but it would be good to know why he didn't like it at the time. I can look into making that work again, but that is not going to make beta2. > I'd even argue to make this an open item unless and until the attempt is agreed upon to have failed (or it succeeds ofcourse). OK, adding an open item. -- Thanks, Amit Langote [1] https://www.postgresql.org/message-id/CAFj8pRCnzO2cnHi5ebXciV%3DtuGVvAQOW9uPU%2BDQV1GkL31R%3D-g%40mail.gmail.com
On Thu, Jun 20, 2024 at 5:22 PM Amit Langote <amitlangote09@gmail.com> wrote:
Soft error handling *was* used for catching cast errors in the very
early versions of this patch (long before I got involved and the
infrastructure you mention got added). It was taken out after Pavel
said [1] that he didn't like producing NULL instead of throwing an
error. Not sure if Pavel's around but it would be good to know why he
didn't like it at the time.
I'm personally in the "make it error" camp but "make it conform to the standard" is a stronger membership (in general).
I see this note in your linked thread:
> By the standard, it is implementation-defined whether JSON parsing errors
> should be caught by ON ERROR clause.
> should be caught by ON ERROR clause.
Absent someone contradicting that claim I retract my position here and am fine with failing if these "functions" are supplied with something that cannot be cast to json. I'd document them like functions that accept json with the implications that any casting to json happens before the function is called and thus its arguments do not apply to that step.
Given that we have "expression IS JSON" the ability to hack together a case expression to get non-erroring behavior exists.
David J.
On Fri, Jun 21, 2024 at 10:01 AM David G. Johnston <david.g.johnston@gmail.com> wrote: > On Thu, Jun 20, 2024 at 5:22 PM Amit Langote <amitlangote09@gmail.com> wrote: >> >> >> Soft error handling *was* used for catching cast errors in the very >> early versions of this patch (long before I got involved and the >> infrastructure you mention got added). It was taken out after Pavel >> said [1] that he didn't like producing NULL instead of throwing an >> error. Not sure if Pavel's around but it would be good to know why he >> didn't like it at the time. >> > > I'm personally in the "make it error" camp but "make it conform to the standard" is a stronger membership (in general). > > I see this note in your linked thread: > > > By the standard, it is implementation-defined whether JSON parsing errors > > should be caught by ON ERROR clause. > > Absent someone contradicting that claim I retract my position here and am fine with failing if these "functions" are suppliedwith something that cannot be cast to json. I'd document them like functions that accept json with the implicationsthat any casting to json happens before the function is called and thus its arguments do not apply to that step. Thanks for that clarification. So, there are the following options: 1. Disallow anything but jsonb for context_item (the patch I posted yesterday) 2. Continue allowing context_item to be non-json character or utf-8 encoded bytea strings, but document that any parsing errors do not respect the ON ERROR clause. 3. Go ahead and fix implicit casts to jsonb so that any parsing errors respect ON ERROR (no patch written yet). David's vote seems to be 2, which is my inclination too. Markus' vote seems to be either 1 or 3. Anyone else? -- Thanks, Amit Langote
pá 21. 6. 2024 v 2:22 odesílatel Amit Langote <amitlangote09@gmail.com> napsal:
On Fri, Jun 21, 2024 at 1:11 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:
> On Thu, Jun 20, 2024 at 2:19 AM Amit Langote <amitlangote09@gmail.com> wrote:
>> On Mon, Jun 17, 2024 at 10:07 PM Markus Winand <markus.winand@winand.at> wrote:
>> > > On 17.06.2024, at 08:20, Amit Langote <amitlangote09@gmail.com> wrote:
>> > > Agree that the documentation needs to be clear about this. I'll update
>> > > my patch at [1] to add a note next to table 9.16.3. SQL/JSON Query
>> > > Functions.
>> >
>> > Considering another branch of this thread [1] I think the
>> > "Supported Features” appendix of the docs should mention that as well.
>> >
>> > The way I see it is that the standards defines two overloaded
>> > JSON_QUERY functions, of which PostgreSQL will support only one.
>> > In case of valid JSON, the implied CAST makes it look as though
>> > the second variant of these function was supported as well but that
>> > illusion totally falls apart once the JSON is not valid anymore.
>> >
>> > I think it affects the following feature IDs:
>> >
>> > - T821, Basic SQL/JSON query operators
>> > For JSON_VALUE, JSON_TABLE and JSON_EXISTS
>> > - T828, JSON_QUERY
>> >
>> > Also, how hard would it be to add the functions that accept
>> > character strings? Is there, besides the effort, any thing else
>> > against it? I’m asking because I believe once released it might
>> > never be changed — for backward compatibility.
>>
>> Hmm, I'm starting to think that adding the implied cast to json wasn't
>> such a great idea after all, because it might mislead the users to
>> think that JSON parsing is transparent (respects ON ERROR), which is
>> what you are saying, IIUC.
>>
>
> Actually, the implied cast is exactly the correct thing to do here - the issue is that we aren't using the newly added soft errors infrastructure [1] to catch the result of that cast and instead produce whatever output on error tells us to produce. This seems to be in the realm of doability so we should try in the interest of being standard conforming.
Soft error handling *was* used for catching cast errors in the very
early versions of this patch (long before I got involved and the
infrastructure you mention got added). It was taken out after Pavel
said [1] that he didn't like producing NULL instead of throwing an
error. Not sure if Pavel's around but it would be good to know why he
didn't like it at the time.
I can look into making that work again, but that is not going to make beta2.
> I'd even argue to make this an open item unless and until the attempt is agreed upon to have failed (or it succeeds of course).
OK, adding an open item.
At this time, when I wrote this mail, I didn't exactly notice the standard, so broken format should be handled there too. In this time, there was no support for soft errors ever in Postgres, so handling broken formats was inconsistent.
Standard describes format errors, but exactly doesn't describe if this is error like missing key or broken json format. Maybe wrongly, but intuitively for me, these errors are of different kinds and broken input data is a different case than missing key (but fully valid json). I didn't find the exact sentence in standard when I searched it (but it was four years ago).
My position in this case is not extra strong. The original patch was written and tested to be compatible with Oracle (what is a strong argument and feature). On second hand, some things required subtransactioning what was wrong (soft errors were introduced later). The compatibility with Oracle is a strong argument, but Oracle by itself is not fully compatible with standard, and some cases are special (in Oracle) because empty string in Oracle is NULL, and then it is handled differently. In this time I had motivation to reduce the patch to "safe" minimum to be possible to accept it by committers. The patch was written in 2017 (I think). Handling broken format (input format) was one issue that I thought could be solved later.
The main reason for my mail is fact, so Postgres and Oracle have DIFFERENT correct format of JSON!
'{a:10}' is correct on Oracle, but not correct on Postgres. And with default ON ERROR NULL (what is default), then the Oracle returns 10, and Postgres NULL. I thought this can be very messy and better to just raise an exception.
Regards
Pavel
--
Thanks, Amit Langote
[1] https://www.postgresql.org/message-id/CAFj8pRCnzO2cnHi5ebXciV%3DtuGVvAQOW9uPU%2BDQV1GkL31R%3D-g%40mail.gmail.com
pá 21. 6. 2024 v 6:01 odesílatel Amit Langote <amitlangote09@gmail.com> napsal:
On Fri, Jun 21, 2024 at 10:01 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:
> On Thu, Jun 20, 2024 at 5:22 PM Amit Langote <amitlangote09@gmail.com> wrote:
>>
>>
>> Soft error handling *was* used for catching cast errors in the very
>> early versions of this patch (long before I got involved and the
>> infrastructure you mention got added). It was taken out after Pavel
>> said [1] that he didn't like producing NULL instead of throwing an
>> error. Not sure if Pavel's around but it would be good to know why he
>> didn't like it at the time.
>>
>
> I'm personally in the "make it error" camp but "make it conform to the standard" is a stronger membership (in general).
>
> I see this note in your linked thread:
>
> > By the standard, it is implementation-defined whether JSON parsing errors
> > should be caught by ON ERROR clause.
>
> Absent someone contradicting that claim I retract my position here and am fine with failing if these "functions" are supplied with something that cannot be cast to json. I'd document them like functions that accept json with the implications that any casting to json happens before the function is called and thus its arguments do not apply to that step.
Thanks for that clarification.
So, there are the following options:
1. Disallow anything but jsonb for context_item (the patch I posted yesterday)
2. Continue allowing context_item to be non-json character or utf-8
encoded bytea strings, but document that any parsing errors do not
respect the ON ERROR clause.
3. Go ahead and fix implicit casts to jsonb so that any parsing errors
respect ON ERROR (no patch written yet).
David's vote seems to be 2, which is my inclination too. Markus' vote
seems to be either 1 or 3. Anyone else?
@3 can be possibly messy (although be near Oracle or standard). I don't think it is safe - one example '{a:10}' is valid for Oracle, but not for Postgres, and using @3 impacts different results (better to raise an exception).
The effect of @1 and @2 is similar - @1 is better so the user needs to explicitly cast, so maybe it is cleaner, so the cast should not be handled, @2 is more user friendly, because it accepts unknown string literal. From a developer perspective I prefer @1, from a user perspective I prefer @2. Maybe @2 is a good compromise.
Regards
Pavel
--
Thanks, Amit Langote
On Thursday, June 20, 2024, Pavel Stehule <pavel.stehule@gmail.com> wrote:
pá 21. 6. 2024 v 6:01 odesílatel Amit Langote <amitlangote09@gmail.com> napsal:On Fri, Jun 21, 2024 at 10:01 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:
> > By the standard, it is implementation-defined whether JSON parsing errors
> > should be caught by ON ERROR clause.
>
> Absent someone contradicting that claim I retract my position here and am fine with failing if these "functions" are supplied with something that cannot be cast to json. I'd document them like functions that accept json with the implications that any casting to json happens before the function is called and thus its arguments do not apply to that step.
Thanks for that clarification.
So, there are the following options:
1. Disallow anything but jsonb for context_item (the patch I posted yesterday)
2. Continue allowing context_item to be non-json character or utf-8
encoded bytea strings, but document that any parsing errors do not
respect the ON ERROR clause.
3. Go ahead and fix implicit casts to jsonb so that any parsing errors
respect ON ERROR (no patch written yet).
David's vote seems to be 2, which is my inclination too. Markus' vote
seems to be either 1 or 3. Anyone else?@3 can be possibly messy (although be near Oracle or standard). I don't think it is safe - one example '{a:10}' is valid for Oracle, but not for Postgres, and using @3 impacts different results (better to raise an exception).The effect of @1 and @2 is similar - @1 is better so the user needs to explicitly cast, so maybe it is cleaner, so the cast should not be handled, @2 is more user friendly, because it accepts unknown string literal. From a developer perspective I prefer @1, from a user perspective I prefer @2. Maybe @2 is a good compromise.
2 also has the benefit of being standard conforming while 1 does not.
3 is also conforming and I wouldn’t object to it had we already done it that way.
But since 2 is conforming too and implemented, and we are in beta, I'm thinking we need to go with this option.
David J.
> On 21.06.2024, at 03:00, David G. Johnston <david.g.johnston@gmail.com> wrote: > > On Thu, Jun 20, 2024 at 5:22 PM Amit Langote <amitlangote09@gmail.com> wrote: > > Soft error handling *was* used for catching cast errors in the very > early versions of this patch (long before I got involved and the > infrastructure you mention got added). It was taken out after Pavel > said [1] that he didn't like producing NULL instead of throwing an > error. Not sure if Pavel's around but it would be good to know why he > didn't like it at the time. > > > I'm personally in the "make it error" camp but "make it conform to the standard" is a stronger membership (in general). > > I see this note in your linked thread: > > > By the standard, it is implementation-defined whether JSON parsing errors > > should be caught by ON ERROR clause. > > Absent someone contradicting that claim I retract my position here and am fine with failing if these "functions" are suppliedwith something that cannot be cast to json. I'd document them like functions that accept json with the implicationsthat any casting to json happens before the function is called and thus its arguments do not apply to that step. That claim was also made in 2020, before the current (2023) SQL standard was released — yet it might have been the same. My understanding of the 2023 standard is that ON ERROR covers invalid JSON because the conversion from a character string to JSON is done deeply nested inside the JSON_QUERY & Co functions. 9.47 Processing <JSON API common syntax> Function GR 3 triggers 9.46, “SQL/JSON path language: syntax and semantics” Where GR 11 says: ———— GR 11) The result of evaluating a <JSON path wff> is a completion condition, and, if that completion condition is successfulcompletion (00000), then an SQL/JSON sequence. For conciseness, the result will be stated either as an exceptioncondition or as an SQL/JSON sequence (in the latter case, the completion condition successful completion (00000)is implicit). Unsuccessful completion conditions are not automatically raised and do not terminate application ofthe General Rules in this Subclause. a) If <JSON path context variable> JPCV is specified, then Case: • i) If PARSED is True, then the result of evaluating JPCV is JT. • ii) If the declared type of JT is JSON, then the result of evaluating JPCV is JT. • iii) Otherwise: • 1) The General Rules of Subclause 9.42, “Parsing JSON text”, are applied with JT as JSON TEXT, an implementation-defined(IV185) <JSON key uniqueness constraint> as UNIQUENESS CONSTRAINT, and FO as FORMAT OPTION; let STbe the STATUS and let CISJI be the SQL/JSON ITEM returned from the application of those General Rules. • 2) Case: • A) If ST is not successful completion (00000), then the result of evaluating JPCV is ST. • B) Otherwise, the result of evaluating JPCV is CISJI. ———— In case of an exception, it is passed along to clause 9.44 Converting an SQL/JSON sequence to an SQL/JSON item where GR 5bultimately says (the exception is in TEMPST in the meanwhile): —— • b) If TEMPST is an exception condition, then Case: i) If ONERROR is ERROR, then let OUTST be TEMPST. ii) Otherwise, let OUTST be successful completion (00000). Case: • 1) If ONERROR is NULL, then let JV be the null value. • 2) If ONERROR is EMPTY ARRAY, then let JV be an SQL/JSON array that has no SQL/JSON elements. • 3) If ONERROR is EMPTY OBJECT, then let JV be an SQL/JSON object that has no SQL/JSON members. —— Let me know if I’m missing something here. The whole idea that a cast is implied outside of JSON_QUERY & co might be covered by a clause that generally allows implementations to cast as they like (don’t have the ref at hand, but I think such a clause is somewhere). On the other hand, the 2023 standard doesn’t even cover an **explicit** cast from character strings to JSON as per 6.13 SR 7 (that’ where the matrix of source- and destination types is given for cast). So my bet is this: * I’m pretty sure JSON parsing errors being subject to ON ERROR is conforming. That’s also “backed” by the Oracle and Db2 (LUW) implementations. * Implying a CAST might be ok, but I have doubts. * I don’t see how failing without being subject to ON ERRROR (as it is now in 17beta1) could possibly covered by the standard. But as we all know: the standard is confusing. If somebody thinks differently, references would be greatly appreciated. -markus
> On 21.06.2024, at 06:46, David G. Johnston <david.g.johnston@gmail.com> wrote: >> >> On Thursday, June 20, 2024, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> >> >> pá 21. 6. 2024 v 6:01 odesílatel Amit Langote <amitlangote09@gmail.com> napsal: >> On Fri, Jun 21, 2024 at 10:01 AM David G. Johnston >> <david.g.johnston@gmail.com> wrote: >> >> > > By the standard, it is implementation-defined whether JSON parsing errors >> > > should be caught by ON ERROR clause. >> > >> > Absent someone contradicting that claim I retract my position here and am fine with failing if these "functions" aresupplied with something that cannot be cast to json. I'd document them like functions that accept json with the implicationsthat any casting to json happens before the function is called and thus its arguments do not apply to that step. >> >> Thanks for that clarification. >> >> So, there are the following options: >> >> 1. Disallow anything but jsonb for context_item (the patch I posted yesterday) >> >> 2. Continue allowing context_item to be non-json character or utf-8 >> encoded bytea strings, but document that any parsing errors do not >> respect the ON ERROR clause. >> >> 3. Go ahead and fix implicit casts to jsonb so that any parsing errors >> respect ON ERROR (no patch written yet). >> >> David's vote seems to be 2, which is my inclination too. Markus' vote >> seems to be either 1 or 3. Anyone else? With a very strong preference of 3. >> >> @3 can be possibly messy (although be near Oracle or standard). I don't think it is safe - one example '{a:10}' is validfor Oracle, but not for Postgres, and using @3 impacts different results (better to raise an exception). The question of what is valid JSON is a different question, I guess. My original report is about something that is invalideverywhere. Having that in line would be a start. Also I believe Oracle’s habit to accept unquoted object keys isnot covered by the standard (unless defined as a JSON format and also explicitly using the corresponding FORMAT clause). >> The effect of @1 and @2 is similar - @1 is better so the user needs to explicitly cast, so maybe it is cleaner, so thecast should not be handled, @2 is more user friendly, because it accepts unknown string literal. From a developer perspectiveI prefer @1, from a user perspective I prefer @2. Maybe @2 is a good compromise. > > 2 also has the benefit of being standard conforming while 1 does not. Why do you think so? Do you have any references or is this just based on previous statements in this discussion? -markus
On Thursday, June 20, 2024, Markus Winand <markus.winand@winand.at> wrote:
> On 21.06.2024, at 06:46, David G. Johnston <david.g.johnston@gmail.com> wrote:
>>
>
> 2 also has the benefit of being standard conforming while 1 does not.
Why do you think so? Do you have any references or is this just based on previous statements in this discussion?
Hearsay.
> 4) If ALREADY PARSED is False, then it is implementation-defined whether the
> following rules are applied:
> a) The General Rules of Subclause 9.36, "Parsing JSON text", are applied with
> JT as JSON TEXT, an implementation-defined <JSON key uniqueness constraint>
> as UNIQUENESS CONSTRAINT, and FO as FORMAT OPTION; let ST be the STATUS and
> let CISJI be the SQL/JSON ITEM returned from the application of those
> General Rules.
> b) If ST is not successful completion, then ST is returned as the STATUS of
> this application of these General Rules, and no further General Rules of
> this Subclause are applied.
> following rules are applied:
> a) The General Rules of Subclause 9.36, "Parsing JSON text", are applied with
> JT as JSON TEXT, an implementation-defined <JSON key uniqueness constraint>
> as UNIQUENESS CONSTRAINT, and FO as FORMAT OPTION; let ST be the STATUS and
> let CISJI be the SQL/JSON ITEM returned from the application of those
> General Rules.
> b) If ST is not successful completion, then ST is returned as the STATUS of
> this application of these General Rules, and no further General Rules of
> this Subclause are applied.
But maybe I’m mis-interpreting that snippet and Nikita’s related commentary regarding have chosen between options for this implementation-defined feature.
David j.
> On 21.06.2024, at 07:38, David G. Johnston <david.g.johnston@gmail.com> wrote: > > On Thursday, June 20, 2024, Markus Winand <markus.winand@winand.at> wrote: > > > > On 21.06.2024, at 06:46, David G. Johnston <david.g.johnston@gmail.com> wrote: > >> > > > > > 2 also has the benefit of being standard conforming while 1 does not. > > Why do you think so? Do you have any references or is this just based on previous statements in this discussion? > > > Hearsay. > > https://www.postgresql.org/message-id/CAFj8pRCnzO2cnHi5ebXciV%3DtuGVvAQOW9uPU%2BDQV1GkL31R%3D-g%40mail.gmail.com > > > 4) If ALREADY PARSED is False, then it is implementation-defined whether the > > following rules are applied: > > a) The General Rules of Subclause 9.36, "Parsing JSON text", are applied with > > JT as JSON TEXT, an implementation-defined <JSON key uniqueness constraint> > > as UNIQUENESS CONSTRAINT, and FO as FORMAT OPTION; let ST be the STATUS and > > let CISJI be the SQL/JSON ITEM returned from the application of those > > General Rules. > > b) If ST is not successful completion, then ST is returned as the STATUS of > > this application of these General Rules, and no further General Rules of > > this Subclause are applied. > > But maybe I’m mis-interpreting that snippet and Nikita’s related commentary regarding have chosen between options for thisimplementation-defined feature. Ah, here we go. Nowadays this is called IA050, “Whether a JSON context item that is not of the JSON data type is parsed.”(Likewise IA054 “Whether a JSON parameter is parsed.”) So updating the three options: > 1. Disallow anything but jsonb for context_item (the patch I posted yesterday) * Non-conforming * patch available > 2. Continue allowing context_item to be non-json character or utf-8 > encoded bytea strings, but document that any parsing errors do not > respect the ON ERROR clause. * Conforming by choosing IA050 to implement GR4: raise errors independent of the ON ERROR clause. * currently committed. > 3. Go ahead and fix implicit casts to jsonb so that any parsing errors > respect ON ERROR (no patch written yet). * Conforming by choosing IA050 not to implement GR4: Parsing happens later, considering the ON ERROR clause. * no patch available, not trivial I guess I’m the only one in favour of 3 ;) My remaining arguments are that Oracle and Db2 (LUW) do it that way and also thatit is IMHO what users would expect. However, as 2 is also conforming (how could I miss that?), proper documentation isa very tempting option. -markus ps: Does anyone know a dialect that implements GR4?
Hi, Thanks all for chiming in. On Fri, Jun 21, 2024 at 8:00 PM Markus Winand <markus.winand@winand.at> wrote: > So updating the three options: > > 1. Disallow anything but jsonb for context_item (the patch I posted yesterday) > > * Non-conforming > * patch available > > > 2. Continue allowing context_item to be non-json character or utf-8 > > encoded bytea strings, but document that any parsing errors do not > > respect the ON ERROR clause. > > * Conforming by choosing IA050 to implement GR4: raise errors independent of the ON ERROR clause. > * currently committed. > > > 3. Go ahead and fix implicit casts to jsonb so that any parsing errors > > respect ON ERROR (no patch written yet). > > * Conforming by choosing IA050 not to implement GR4: Parsing happens later, considering the ON ERROR clause. > * no patch available, not trivial > > I guess I’m the only one in favour of 3 ;) My remaining arguments are that Oracle and Db2 (LUW) do it that way and alsothat it is IMHO what users would expect. However, as 2 is also conforming (how could I miss that?), proper documentationis a very tempting option. So, we should go with 2 for v17, because while 3 may be very appealing, there's a risk that it might not get done in the time remaining for v17. I'll post the documentation patch on Monday. -- Thanks, Amit Langote
Hi, On Sat, Jun 22, 2024 at 5:43 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Fri, Jun 21, 2024 at 8:00 PM Markus Winand <markus.winand@winand.at> wrote: > > So updating the three options: > > > 1. Disallow anything but jsonb for context_item (the patch I posted yesterday) > > > > * Non-conforming > > * patch available > > > > > 2. Continue allowing context_item to be non-json character or utf-8 > > > encoded bytea strings, but document that any parsing errors do not > > > respect the ON ERROR clause. > > > > * Conforming by choosing IA050 to implement GR4: raise errors independent of the ON ERROR clause. > > * currently committed. > > > > > 3. Go ahead and fix implicit casts to jsonb so that any parsing errors > > > respect ON ERROR (no patch written yet). > > > > * Conforming by choosing IA050 not to implement GR4: Parsing happens later, considering the ON ERROR clause. > > * no patch available, not trivial > > > > I guess I’m the only one in favour of 3 ;) My remaining arguments are that Oracle and Db2 (LUW) do it that way and alsothat it is IMHO what users would expect. However, as 2 is also conforming (how could I miss that?), proper documentationis a very tempting option. > > So, we should go with 2 for v17, because while 3 may be very > appealing, there's a risk that it might not get done in the time > remaining for v17. > > I'll post the documentation patch on Monday. Here's that patch, which adds this note after table 9.16.3. SQL/JSON Query Functions: + <note> + <para> + The <replaceable>context_item</replaceable> expression is converted to + <type>jsonb</type> by an implicit cast if the expression is not already of + type <type>jsonb</type>. Note, however, that any parsing errors that occur + during that conversion are thrown unconditionally, that is, are not + handled according to the (specified or implicit) <literal>ON ERROR</literal> + clause. + </para> + </note> Peter, sorry about the last-minute ask, but do you have any thoughts/advice on conformance as discussed above? -- Thanks, Amit Langote
Attachment
On Wed, Jun 26, 2024 at 9:10 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Sat, Jun 22, 2024 at 5:43 PM Amit Langote <amitlangote09@gmail.com> wrote: > > On Fri, Jun 21, 2024 at 8:00 PM Markus Winand <markus.winand@winand.at> wrote: > > > So updating the three options: > > > > 1. Disallow anything but jsonb for context_item (the patch I posted yesterday) > > > > > > * Non-conforming > > > * patch available > > > > > > > 2. Continue allowing context_item to be non-json character or utf-8 > > > > encoded bytea strings, but document that any parsing errors do not > > > > respect the ON ERROR clause. > > > > > > * Conforming by choosing IA050 to implement GR4: raise errors independent of the ON ERROR clause. > > > * currently committed. > > > > > > > 3. Go ahead and fix implicit casts to jsonb so that any parsing errors > > > > respect ON ERROR (no patch written yet). > > > > > > * Conforming by choosing IA050 not to implement GR4: Parsing happens later, considering the ON ERROR clause. > > > * no patch available, not trivial > > > > > > I guess I’m the only one in favour of 3 ;) My remaining arguments are that Oracle and Db2 (LUW) do it that way andalso that it is IMHO what users would expect. However, as 2 is also conforming (how could I miss that?), proper documentationis a very tempting option. > > > > So, we should go with 2 for v17, because while 3 may be very > > appealing, there's a risk that it might not get done in the time > > remaining for v17. > > > > I'll post the documentation patch on Monday. > > Here's that patch, which adds this note after table 9.16.3. SQL/JSON > Query Functions: > > + <note> > + <para> > + The <replaceable>context_item</replaceable> expression is converted to > + <type>jsonb</type> by an implicit cast if the expression is not already of > + type <type>jsonb</type>. Note, however, that any parsing errors that occur > + during that conversion are thrown unconditionally, that is, are not > + handled according to the (specified or implicit) <literal>ON > ERROR</literal> > + clause. > + </para> > + </note> I have pushed this. -- Thanks, Amit Langote