Re: SQL:2023 JSON simplified accessor support - Mailing list pgsql-hackers

From Alexandra Wang
Subject Re: SQL:2023 JSON simplified accessor support
Date
Msg-id CAK98qZ19bC=Qw9rWGOFKyX4B-fg1XQWEbV2OWAawqWC62fx79A@mail.gmail.com
Whole thread Raw
In response to Re: SQL:2023 JSON simplified accessor support  (jian he <jian.universality@gmail.com>)
Responses Re: SQL:2023 JSON simplified accessor support
List pgsql-hackers
Hi Jian,

Thanks for reviewing! I’ve attached v13, which addresses your
feedback.

See my individual replies below, and let me know if you have any
questions!

On Thu, Jul 10, 2025 at 1:54 AM jian he <jian.universality@gmail.com> wrote:
in v12-0001 and v12-0002.
in transformIndirection
        if (!newresult)
        {
            /*
             * generic subscripting failed; falling back to function call or
             * field selection for a composite type.
             */
            Node       *n;
            /* try to find function for field selection */
            newresult = ParseFuncOrColumn(pstate,
                                          list_make1(n),
                                          list_make1(result),
                                          last_srf,
                                          NULL,
                                          false,
                                          location);
}
the above comments mentioning "function call" is wrong?
you passed NULL for (FuncCall *fn) in ParseFuncOrColumn.
and ParseFuncOrColumn comments says
```If fn is null, we're dealing with column syntax not function syntax.``
 
You are right. Fixed.
 
On Thu, Jul 10, 2025 at 1:54 AM jian he <jian.universality@gmail.com> wrote:
I think coerce_jsonpath_subscript can be further simplified.
we already have message like:
errhint("jsonb subscript must be coercible to either integer or text."),
no need to pass the third argument a constant (INT4OID).
also
``Oid            targetType = UNKNOWNOID;``
set it as InvalidOid would be better.
attached is a minor refactoring of coerce_jsonpath_subscript
based on (v12-0001 to v12-0004).

Thanks for the patch! I squashed it with v13-0004 and renamed the
function to coerce_jsonpath_subscript_to_int4_or_text() for clarity.
 
On Thu, Jul 10, 2025 at 1:54 AM jian he <jian.universality@gmail.com> wrote:
after applied v12-0001 to v12-0006
+ /* emit warning conditionally to minimize duplicate warnings */
+ if (list_length(*indirection) > 0)
+ ereport(WARNING,
+ errcode(ERRCODE_WARNING),
+ errmsg("mixed usage of jsonb simplified accessor syntax and jsonb
subscripting."),
+ errhint("use dot-notation for member access, or use non-null integer
constants subscripting for array access."),
+ parser_errposition(pstate, warning_location));

src7=# select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb).d['1'::int8];
WARNING:  mixed usage of jsonb simplified accessor syntax and jsonb
subscripting.
LINE 1: ...t ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb).d['1'::int8]...
                                                             ^
HINT:  use dot-notation for member access, or use non-null integer
constants subscripting for array access.
ERROR:  subscript type bigint is not supported
LINE 1: ...t ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb).d['1'::int8]...
                                                             ^
HINT:  jsonb subscript must be coercible to either integer or text.

The above example looks very bad. location printed twice, hint message
is different.
two messages level (ERROR, WARNING).

also "or use non-null integer constants subscripting for array
access." seems wrong?
as you can see the below hint message saying it could be text or integer.

select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb)['1'::int8];
ERROR:  subscript type bigint is not supported
LINE 1: ...ect ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb)['1'::int8]...
                                                             ^
HINT:  jsonb subscript must be coercible to either integer or text.

also select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::jsonb)[NULL::int4];
return NULL,  so "use non-null integer constants" is wrong.

I see your confusion about the WARNING/ERROR messages. My intent was
to encourage users to use consistent syntax flavors rather than mixing
the SQL standard JSON simplified accessor with the pre-standard
PostgreSQL jsonb subscripting. In the meantime, I want to support
mixed syntax flavors as much as possible. However, your feedback makes
it clear that users don’t need that level of details. So, in v13 I’ve
removed the WARNING message and instead added a comment explaining how
we support mixed syntaxes.

Here's the relevant diff between v12 and v13, hope it helps:

--- a/src/backend/utils/adt/jsonbsubs.c
+++ b/src/backend/utils/adt/jsonbsubs.c
@@ -247,7 +247,6 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List **indirection, Subscripti
        ListCell   *lc;
        Datum           jsp;
        int                     pathlen = 0;
-       int                     warning_location = -1;

        sbsref->refupperindexpr = NIL;
        sbsref->reflowerindexpr = NIL;
@@ -311,10 +310,27 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List **indirection, Subscripti
                                if (jpi_from == NULL)
                                {
                                        /*
-                                        * postpone emitting the warning until the end of this
-                                        * function
+                                        * Break out of the loop if the subscript is not a
+                                        * non-null integer constant, so that we can fall back to
+                                        * jsonb subscripting logic.
+                                        *
+                                        * This is needed to handle cases with mixed usage of SQL
+                                        * standard json simplified accessor syntax and PostgreSQL
+                                        * jsonb subscripting syntax, e.g:
+                                        *
+                                        * select (jb).a['b'].c from jsonb_table;
+                                        *
+                                        * where dot-notation (.a and .c) is the SQL standard json
+                                        * simplified accessor syntax, and the ['b'] subscript is
+                                        * the PostgreSQL jsonb subscripting syntax, because 'b'
+                                        * is not a non-null constant integer and cannot be used
+                                        * for json array access.
+                                        *
+                                        * In this case, we cannot create a JsonPath item, so we
+                                        * break out of the loop and let
+                                        * jsonb_subscript_transform() handle this indirection as
+                                        * a PostgreSQL jsonb subscript.
                                         */
-                                       warning_location = exprLocation(ai->uidx);
                                        pfree(jpi->value.array.elems);
                                        pfree(jpi);
                                        break;
@@ -360,14 +376,6 @@ jsonb_subscript_make_jsonpath(ParseState *pstate, List **indirection, Subscripti

        *indirection = list_delete_first_n(*indirection, pathlen);

-       /* emit warning conditionally to minimize duplicate warnings */
-       if (list_length(*indirection) > 0)
-               ereport(WARNING,
-                               errcode(ERRCODE_WARNING),
-                               errmsg("mixed usage of jsonb simplified accessor syntax and jsonb subscripting."),
-                               errhint("use dot-notation for member access, or use non-null integer constants subscripting for array access."),
-                               parser_errposition(pstate, warning_location));
-
        jsp = jsonPathFromParseResult(&jpres, 0, NULL);

On Thu, Jul 10, 2025 at 6:34 AM jian he <jian.universality@gmail.com> wrote:
typedef struct FieldAccessorExpr
{
    Expr        xpr;
    char       *fieldname;        /* name of the JSONB object field accessed via
                                 * dot notation */
    Oid            faecollid pg_node_attr(query_jumble_ignore);
    int            location;
}            FieldAccessorExpr;

first field as NodeTag should be just fine?
I am not sure the field "location" is needed now, if it is needed, it should be
type as ParseLoc.
we should add it to src/tools/pgindent/typedefs.list

Good catch! I changed the first field from Expr to NodeTag, removed
the location field, and added the Node to typedefs.list.

On Thu, Jul 10, 2025 at 10:54 PM jian he <jian.universality@gmail.com> wrote:
On Thu, Jul 10, 2025 at 9:34 PM jian he <jian.universality@gmail.com> wrote:
>
> ------------------------------------------
> in jsonb_subscript_make_jsonpath we have
>     foreach(lc, *indirection)
> {
>         if (IsA(accessor, String))
>             ....
>         else if (IsA(accessor, A_Indices))
>         else
>             /*
>              * Unsupported node type for creating jsonpath. Instead of
>              * throwing an ERROR, break here so that we create a jsonpath from
>              * as many indirection elements as we can and let
>              * transformIndirection() fallback to alternative logic to handle
>              * the remaining indirection elements.
>              */
>               break;
> }
> the above ELSE branch comments look suspicious to me.

I kept the "else" branch, but added an Assert with updated comments.

Here's the relevant diff between v12 and v13:

In jsonb_subscript_make_jsonpath():
else
{
- * Unsupported node type for creating jsonpath. Instead of
- * throwing an ERROR, break here so that we create a jsonpath from
- * as many indirection elements as we can and let
- * transformIndirection() fallback to alternative logic to handle
- * the remaining indirection elements.
+ * Unexpected node type in indirection list. This should not
+ * happen with current grammar, but we handle it defensively by
+ * breaking out of the loop rather than crashing. In case of
+ * future grammar changes that might introduce new node types,
+ * this allows us to create a jsonpath from as many indirection
+ * elements as we can and let transformIndirection() fallback to
+ * alternative logic to handle the remaining indirection elements.
+ Assert(false); /* not reachable */
break;
}

Hope the updated comment clarifies why I kept it.
 
On Thu, Jul 10, 2025 at 10:54 PM jian he <jian.universality@gmail.com> wrote:
> transformIndirection->transformContainerSubscripts->jsonb_subscript_transform->jsonb_subscript_make_jsonpath
> As you can see, transformIndirection have a long distance from
> jsonb_subscript_make_jsonpath,
> let transformIndirection handle remaining indirection elements seems not good.

In 0001's commit message, we are given the following promise:

Author: Nikita Glukhov <n.gluhov@postgrespro.ru>
Date:   Tue Jul 8 22:18:07 2025 -0700

    Allow transformation of only a sublist of subscripts

    This is a preparation step for allowing subscripting containers to
    transform only a prefix of an indirection list and modify the list
    in-place by removing the processed elements. Currently, all elements
    are consumed, and the list is set to NIL after transformation.

    In the following commit, subscripting containers will gain the
    flexibility to stop transformation when encountering an unsupported
    indirection and return the remaining indirections to the caller. 

With this contract provided by the subscripting interface to the
container's transform function, I don't feel terribly inappropriate to
let jsonb's transform function: jsonb_subscript_transform(), to
transform indirections as much as it can, and leave the rest to
transformIndirection().

0002 is a good example that shows why the else branch is useful: it
adds support for String as a subscript node of jsonb, while String is
still unsupported in other in-core container subscripting.

For example, array_subscript_transform() has:
if (!IsA(lfirst(idx), A_Indices))
    break;

The else branch in jsonb_subscript_make_jsonpath() just follows that
same convention.
 
context v12-0001 to v12-0006.
this ELSE branch comments is wrong, because
+ if (jsonb_check_jsonpath_needed(*indirection))
+ {
+ jsonb_subscript_make_jsonpath(pstate, indirection, sbsref);
+ if (sbsref->refjsonbpath)
+ return;
+ }
in jsonb_check_jsonpath_needed we already use Assert to confirm that
list "indirection"
is either String or A_Indices Node.

The Assert I added in v13, as well as the one you mentioned in
jsonb_check_jsonpath_needed(), are both only applicable in development
build and won’t help in production. So I’d prefer to break here and
let the callee handle any unexpected Node with an ERROR.

Let me know if this explanation makes sense. If not, and if you think
the current logic is overly complicated, we can discuss further.
 
On Thu, Jul 10, 2025 at 10:54 PM jian he <jian.universality@gmail.com> wrote:
in transformContainerSubscripts we have
sbsroutines->transform(sbsref, indirection, pstate,
                           isSlice, isAssignment);
    /*
     * Error out, if datatype failed to consume any indirection elements.
     */
    if (list_length(*indirection) == indirection_length)
    {
        Node       *ind = linitial(*indirection);
        if (noError)
            return NULL;
        if (IsA(ind, String))
            ereport(ERROR,
                    (errcode(ERRCODE_DATATYPE_MISMATCH),
                     errmsg("type %s does not support dot notation",
                            format_type_be(containerType)),
                     parser_errposition(pstate, exprLocation(containerBase))));
        else if (IsA(ind, A_Indices))
            ereport(ERROR,
                    (errcode(ERRCODE_DATATYPE_MISMATCH),
                     errmsg("type %s does not support array subscripting",
                            format_type_be(containerType)),
                     parser_errposition(pstate, exprLocation(containerBase))));
        else
            elog(ERROR, "invalid indirection operation: %d", nodeTag(ind));
    }
sbsroutines->transform currently will call
array_subscript_transform, hstore_subscript_transform, jsonb_subscript_transform

in jsonb_subscript_transform callee we unconditionally do:
    *indirection = list_delete_first_n(*indirection, pathlen);
   *indirection = list_delete_first_n(*indirection, list_length(upperIndexpr));
in array_subscript_transform, we do
*indirection = list_delete_first_n(*indirection, ndim);

That means, if sbsroutines->transform not error out and indirection is
not NIL (which is unlikely)
then sbsroutines->transform will consume some induction elements.
instead of the above verbose ereport(ERROR, error handling, we can use Assert
    Assert(indirection_length > list_length(*indirection));

for the above comments, i did a refactoring based on v12 (0001 to 0006).
 
Thanks for the patch! I didn’t apply it for the same reason I
mentioned in the previous quote reply. In the case of an unexpected
Node type, I prefer raising an explicit ERROR rather than using an
Assert() that crashes in a dev build but silently continues in
production. One possible way to hit these ERRORs is by creating a
custom data type that supports subscripting, but only for a subset of
Node types that transformIndirection() allows.

Best,
Alex
 
Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Remove traces of long in dynahash.c
Next
From: Thomas Munro
Date:
Subject: Redesigning postmaster death handling