Re: SQL/JSON revisited - Mailing list pgsql-hackers

From Amit Langote
Subject Re: SQL/JSON revisited
Date
Msg-id CA+HiwqGemKf_CjExy+sfNYv8eu53h_zX=F_ia63DNOvAJrGiGg@mail.gmail.com
Whole thread Raw
In response to Re: SQL/JSON revisited  (Andres Freund <andres@anarazel.de>)
Responses Re: SQL/JSON revisited
Re: SQL/JSON revisited
List pgsql-hackers
Thanks a lot for taking a look at this and sorry about the delay in response.

On Tue, Feb 21, 2023 at 2:25 AM Andres Freund <andres@anarazel.de> wrote:
> On 2023-02-20 16:35:52 +0900, Amit Langote wrote:
> > Subject: [PATCH v4 03/10] SQL/JSON query functions
> > +/*
> > + * Evaluate a JSON error/empty behavior result.
> > + */
> > +static Datum
> > +ExecEvalJsonBehavior(JsonBehavior *behavior, bool *is_null)
> > +{
> > +     *is_null = false;
> > +
> > +     switch (behavior->btype)
> > +     {
> > +             case JSON_BEHAVIOR_EMPTY_ARRAY:
> > +                     return JsonbPGetDatum(JsonbMakeEmptyArray());
> > +
> > +             case JSON_BEHAVIOR_EMPTY_OBJECT:
> > +                     return JsonbPGetDatum(JsonbMakeEmptyObject());
> > +
> > +             case JSON_BEHAVIOR_TRUE:
> > +                     return BoolGetDatum(true);
> > +
> > +             case JSON_BEHAVIOR_FALSE:
> > +                     return BoolGetDatum(false);
> > +
> > +             case JSON_BEHAVIOR_NULL:
> > +             case JSON_BEHAVIOR_UNKNOWN:
> > +                     *is_null = true;
> > +                     return (Datum) 0;
> > +
> > +             case JSON_BEHAVIOR_DEFAULT:
> > +                     /* Always handled in the caller. */
> > +                     Assert(false);
> > +                     return (Datum) 0;
> > +
> > +             default:
> > +                     elog(ERROR, "unrecognized SQL/JSON behavior %d", behavior->btype);
> > +                     return (Datum) 0;
> > +     }
> > +}
>
> Does this actually need to be evaluated at expression eavluation time?
> Couldn't we just emit the proper constants in execExpr.c?

Yes, done that way in the updated patch.

> > +/* ----------------------------------------------------------------
> > + *           ExecEvalJson
> > + * ----------------------------------------------------------------
> > + */
> > +void
> > +ExecEvalJson(ExprState *state, ExprEvalStep *op, ExprContext *econtext)
>
> Pointless comment.

Removed this function altogether in favor of merging the body with
ExecEvalJsonExpr(), which does have a more sensible comment

> > +{
> > +     JsonExprState *jsestate = op->d.jsonexpr.jsestate;
> > +     JsonExprPreEvalState *pre_eval = &jsestate->pre_eval;
> > +     JsonExprPostEvalState *post_eval = &jsestate->post_eval;
> > +     JsonExpr   *jexpr = jsestate->jsexpr;
> > +     Datum           item;
> > +     Datum           res = (Datum) 0;
> > +     JsonPath   *path;
> > +     bool            throwErrors = jexpr->on_error->btype == JSON_BEHAVIOR_ERROR;
> > +
> > +     *op->resnull = true;            /* until we get a result */
> > +     *op->resvalue = (Datum) 0;
> > +
> > +     item = pre_eval->formatted_expr.value;
> > +     path = DatumGetJsonPathP(pre_eval->pathspec.value);
> > +
> > +     /* Reset JsonExprPostEvalState for this evaluation. */
> > +     memset(post_eval, 0, sizeof(*post_eval));
> > +
> > +     res = ExecEvalJsonExpr(op, econtext, path, item, op->resnull,
> > +                                                !throwErrors ? &post_eval->error : NULL);
> > +
> > +     *op->resvalue = res;
> > +}
>
> I really don't like having both ExecEvalJson() and ExecEvalJsonExpr(). There's
> really no way to know what which version does, just based on the name.

Yes, having two functions is no longer necessary.

> > --- a/src/backend/parser/gram.y
> > +++ b/src/backend/parser/gram.y
>
> This stuff adds quite a bit of complexity to the parser. Do we realy need like
> a dozen new rules here?
>>
> > +json_behavior_empty_array:
> > +                     EMPTY_P ARRAY   { $$ = makeJsonBehavior(JSON_BEHAVIOR_EMPTY_ARRAY, NULL); }
> > +                     /* non-standard, for Oracle compatibility only */
> > +                     | EMPTY_P               { $$ = makeJsonBehavior(JSON_BEHAVIOR_EMPTY_ARRAY, NULL); }
> > +             ;
>
> Do we really want to add random oracle compat crud here?

Hmm, sorry, but I haven't familiarized myself with the grammar side of
things as much as I perhaps should have, so I am not sure whether a
more simplified grammar would suffice for offering a
standard-compliant functionality.

Maybe we could take out the oracle-compatibility bit, but I'd
appreciate it if someone who has been involved with SQL/JSON from the
beginning can comment on the above 2 points.

> > +/*
> > + * Evaluate a JSON path variable caching computed value.
> > + */
> > +int
> > +EvalJsonPathVar(void *cxt, char *varName, int varNameLen,
> > +                             JsonbValue *val, JsonbValue *baseObject)
>
> Missing static?

Fixed.

> > +{
> > +     JsonPathVariableEvalContext *var = NULL;
> > +     List       *vars = cxt;
> > +     ListCell   *lc;
> > +     int                     id = 1;
> > +
> > +     if (!varName)
> > +             return list_length(vars);
> > +
> > +     foreach(lc, vars)
> > +     {
> > +             var = lfirst(lc);
> > +
> > +             if (!strncmp(var->name, varName, varNameLen))
> > +                     break;
> > +
> > +             var = NULL;
> > +             id++;
> > +     }
> > +
> > +     if (!var)
> > +             return -1;
> > +
> > +     /*
> > +      * When belonging to a JsonExpr, path variables are computed with the
> > +      * JsonExpr's ExprState (var->estate is NULL), so don't need to be computed
> > +      * here.  In some other cases, such as when the path variables belonging
> > +      * to a JsonTable instead, those variables must be evaluated on their own,
> > +      * without the enclosing JsonExpr itself needing to be evaluated, so must
> > +      * be handled here.
> > +      */
> > +     if (var->estate && !var->evaluated)
> > +     {
> > +             Assert(var->econtext != NULL);
> > +             var->value = ExecEvalExpr(var->estate, var->econtext, &var->isnull);
> > +             var->evaluated = true;
>
> Uh, so this continues to do recursive expression evaluation, as
> ExecEvalJsonExpr()->JsonPathQuery()->executeJsonPath(EvalJsonPathVar)
>
> I'm getting grumpy here. This is wrong, has been pointed out many times. The
> only thing that changes is that the point of recursion is moved around.

Actually, these JSON path vars, along with other sub-expressions of
JsonExpr, *are* computed non-recursively as ExprEvalSteps of the
JsonExprState, at least in the cases where the vars are to be computed
as part of evaluating the JsonExpr itself.  So, the code path you've
shown above perhaps as a hypothetical doesn't really exist, though
there *is* an instance where these path vars are computed *outside*
the context of evaluating the parent JsonExpr, such as in
JsonTableResetContextItem().  Maybe there's a cleaner way of doing
that though...

> > +
> > +/*
> > + * ExecEvalExprSafe
> > + *
> > + * Like ExecEvalExpr(), though this allows the caller to pass an
> > + * ErrorSaveContext to declare its intenion to catch any errors that occur when
> > + * executing the expression, such as when calling type input functions that may
> > + * be present in it.
> > + */
> > +static inline Datum
> > +ExecEvalExprSafe(ExprState *state,
> > +                              ExprContext *econtext,
> > +                              bool *isNull,
> > +                              Node *escontext,
> > +                              bool *error)
>
> Afaict there's no caller of this?

Oops, removed.  This was used in a previous version of the patch that
still had nested ExprStates inside JsonExprState.

> > +/*
> > + * ExecInitExprWithCaseValue
> > + *
> > + * This is the same as ExecInitExpr, except the caller passes the Datum and
> > + * bool pointers that it would like the ExprState.innermost_caseval
> > + * and ExprState.innermost_casenull, respectively, to be set to.  That way,
> > + * it can pass an input value to evaluate the expression via a CaseTestExpr.
> > + */
> > +ExprState *
> > +ExecInitExprWithCaseValue(Expr *node, PlanState *parent,
> > +                                               Datum *caseval, bool *casenull)
> > +{
> > +     ExprState  *state;
> > +     ExprEvalStep scratch = {0};
> > +
> > +     /* Special case: NULL expression produces a NULL ExprState pointer */
> > +     if (node == NULL)
> > +             return NULL;
> > +
> > +     /* Initialize ExprState with empty step list */
> > +     state = makeNode(ExprState);
> > +     state->expr = node;
> > +     state->parent = parent;
> > +     state->ext_params = NULL;
> > +     state->innermost_caseval = caseval;
> > +     state->innermost_casenull = casenull;
> > +
> > +     /* Insert EEOP_*_FETCHSOME steps as needed */
> > +     ExecInitExprSlots(state, (Node *) node);
> > +
> > +     /* Compile the expression proper */
> > +     ExecInitExprRec(node, state, &state->resvalue, &state->resnull);
> > +
> > +     /* Finally, append a DONE step */
> > +     scratch.opcode = EEOP_DONE;
> > +     ExprEvalPushStep(state, &scratch);
> > +
> > +     ExecReadyExpr(state);
> > +
> > +     return state;
>
> > +struct JsonTableJoinState
> > +{
> > +     union
> > +     {
> > +             struct
> > +             {
> > +                     JsonTableJoinState *left;
> > +                     JsonTableJoinState *right;
> > +                     bool            advanceRight;
> > +             }                       join;
> > +             JsonTableScanState scan;
> > +     }                       u;
> > +     bool            is_join;
> > +};
>
> A join state that unions the join member with a scan, and has a is_join field?

Yeah, I agree that's not the best form for what it is.  I've replaced
that with the following:

+/* Structures for JSON_TABLE execution  */
+
+typedef enum JsonTablePlanStateType
+{
+   JSON_TABLE_SCAN_STATE = 0,
+   JSON_TABLE_JOIN_STATE
+} JsonTablePlanStateType;
+
+typedef struct JsonTablePlanState
+{
+   JsonTablePlanStateType  type;
+
+   struct JsonTablePlanState *parent;
+   struct JsonTablePlanState *nested;
+} JsonTablePlanState;
+
+typedef struct JsonTableScanState
+{
+   JsonTablePlanState  plan;
+
+   MemoryContext mcxt;
+   JsonPath   *path;
+   List       *args;
+   JsonValueList found;
+   JsonValueListIterator iter;
+   Datum       current;
+   int         ordinal;
+   bool        currentIsNull;
+   bool        outerJoin;
+   bool        errorOnError;
+   bool        advanceNested;
+   bool        reset;
+} JsonTableScanState;
+
+typedef struct JsonTableJoinState
+{
+   JsonTablePlanState  plan;
+
+   JsonTablePlanState *left;
+   JsonTablePlanState *right;
+   bool        advanceRight;
+} JsonTableJoinState;

I considered using NodeTag but decided not to, because this stuff is
local to jsonpath_exec.c.

> > + * JsonTableInitOpaque
> > + *           Fill in TableFuncScanState->opaque for JsonTable processor
> > + */
> > +static void
> > +JsonTableInitOpaque(TableFuncScanState *state, int natts)
> > +{
> > +     JsonTableContext *cxt;
> > +     PlanState  *ps = &state->ss.ps;
> > +     TableFuncScan *tfs = castNode(TableFuncScan, ps->plan);
> > +     TableFunc  *tf = tfs->tablefunc;
> > +     JsonExpr   *ci = castNode(JsonExpr, tf->docexpr);
> > +     JsonTableParent *root = castNode(JsonTableParent, tf->plan);
> > +     List       *args = NIL;
> > +     ListCell   *lc;
> > +     int                     i;
> > +
> > +     cxt = palloc0(sizeof(JsonTableContext));
> > +     cxt->magic = JSON_TABLE_CONTEXT_MAGIC;
> > +
> > +     if (ci->passing_values)
> > +     {
> > +             ListCell   *exprlc;
> > +             ListCell   *namelc;
> > +
> > +             forboth(exprlc, ci->passing_values,
> > +                             namelc, ci->passing_names)
> > +             {
> > +                     Expr       *expr = (Expr *) lfirst(exprlc);
> > +                     String     *name = lfirst_node(String, namelc);
> > +                     JsonPathVariableEvalContext *var = palloc(sizeof(*var));
> > +
> > +                     var->name = pstrdup(name->sval);
> > +                     var->typid = exprType((Node *) expr);
> > +                     var->typmod = exprTypmod((Node *) expr);
> > +                     var->estate = ExecInitExpr(expr, ps);
> > +                     var->econtext = ps->ps_ExprContext;
> > +                     var->mcxt = CurrentMemoryContext;
> > +                     var->evaluated = false;
> > +                     var->value = (Datum) 0;
> > +                     var->isnull = true;
> > +
> > +                     args = lappend(args, var);
> > +             }
> > +     }
> > +
> > +     cxt->colexprs = palloc(sizeof(*cxt->colexprs) *
> > +                                                list_length(tf->colvalexprs));
> > +
> > +     JsonTableInitScanState(cxt, &cxt->root, root, NULL, args,
> > +                                                CurrentMemoryContext);
> > +
> > +     i = 0;
> > +
> > +     foreach(lc, tf->colvalexprs)
> > +     {
> > +             Expr       *expr = lfirst(lc);
> > +
> > +             cxt->colexprs[i].expr =
> > +                     ExecInitExprWithCaseValue(expr, ps,
> > +                                                                       &cxt->colexprs[i].scan->current,
> > +                                                                       &cxt->colexprs[i].scan->currentIsNull);
> > +
> > +             i++;
> > +     }
> > +
> > +     state->opaque = cxt;
> > +}
>
> Why don't you just emit the proper expression directly, insted of the
> CaseTestExpr stuff, that you then separately evaluate?

I suppose you mean emitting the expression that supplies the value
given by scan->current and scan->currentIsNull into the same ExprState
that holds the steps for a given colvalexpr.  If so, I don't really
see a way of doing that given the current model of JSON_TABLE
execution.  The former is computed as part of
TableFuncRoutine.FetchRow(scan), which sets scan.current (and
currentIsNull) and the letter is computer as part of
TableFuncRoutine.GetValue(scan, colnum).

> Evaluating N expressions for a json table isn't a good approach, both memory
> and CPU efficiency wise.

Are you referring to JsonTableInitOpaque() initializing all these
sub-expressions of JsonTableParent, especially colvalexprs, using N
*independent* ExprStates?  That could perhaps be made to work by
making JsonTableParent be an expression recognized by execExpr.c, so
that a single ExprState can store the steps for all its
sub-expressions, much like JsonExpr is.  I'll give that a try, though
I wonder if the semantics of making this work in a single
ExecEvalExpr() call will mismatch that of the current way, because
different sub-expressions are currently evaluated under different APIs
of TableFuncRoutine.

In the meantime, I'm attaching a version of the patchset with a few
things fixed as mentioned above.


--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: gamefunc
Date:
Subject: RE: [PATCH] fix msvc build libpq error LNK2019 when link openssl;
Next
From: Heikki Linnakangas
Date:
Subject: Re: PGDOCS - sgml linkend using single-quotes