Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size - Mailing list pgsql-hackers
From | Amit Langote |
---|---|
Subject | Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size |
Date | |
Msg-id | CA+HiwqEe3n-v86zy4gL2g+BWZDxwEtVeGX8YuOqRh8y7R8QPPw@mail.gmail.com Whole thread Raw |
In response to | Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size (Andres Freund <andres@anarazel.de>) |
Responses |
Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size
|
List | pgsql-hackers |
Hi, On Wed, Aug 3, 2022 at 12:00 AM Andres Freund <andres@anarazel.de> wrote: > On 2022-08-02 12:05:55 +0900, Amit Langote wrote: > > On Tue, Aug 2, 2022 at 9:39 AM Andres Freund <andres@anarazel.de> wrote: > > > On 2022-07-27 17:01:13 +0900, Amit Langote wrote: > > > > Here's an updated version of the patch, with mostly cosmetic changes. > > > > In particular, I added comments describing the new llvm_compile_expr() > > > > blobs. > > > > > > - I've asked a couple times before: Why do we need space for every possible > > > datatype at once in JsonItemCoercions? Can there be multiple "concurrent" > > > coercions in process? > > > > This topic has been a head-scratcher for me too from the beginning, > > but I've since come to understand (convince myself) that we do need > > the coercions for all possible types, because we don't know the type > > of the JSON item that's going to pop out of the main JSON path > > expression until we've run it through the JSON path executor that > > ExecEvalJson() invokes. So, it's not possible to statically assign > > the coercion. > > Sure. But that doesn't mean we have to have memory for every possible type *at > the same time*. > > > I am not really sure if different coercions may be used > > in the same query over multiple evaluations of the same JSON path > > expression, but maybe that's also possible. > > Even if the type can change, I don't think that means we need to have space > for multiple types at the same time - there can't be multiple coercions > happening at the same time, otherwise there could be two coercions of the same > type as well. So we don't need memory for every coercion type. Do you find it unnecessary to statically allocate memory for JsonItemCoercionState for each possible coercion, as in the following struct definition: typedef struct JsonItemCoercionsState { JsonItemCoercionState null; JsonItemCoercionState string; JsonItemCoercionState numeric; JsonItemCoercionState boolean; JsonItemCoercionState date; JsonItemCoercionState time; JsonItemCoercionState timetz; JsonItemCoercionState timestamp; JsonItemCoercionState timestamptz; JsonItemCoercionState composite; } JsonItemCoercionsState; A given JsonItemCoercionState (note singular Coercion) contains: typedef struct JsonItemCoercionState { /* Expression used to evaluate the coercion */ JsonCoercion *coercion; /* ExprEvalStep to compute this coercion's expression */ int jump_eval_expr; } JsonItemCoercionState; jump_eval_expr above is the address in JsonItemCoercions' ExprState.steps of the 1st ExprEvalStep corresponding to coercion->expr. IIUC, all ExprEvalSteps needed to evaluate an expression and its children must be allocated statically in ExecInitExprRec(), and none on-the-fly as needed. So, this considers all coercions and allocates states of all statically. > > > The whole coercion stuff just seems incredibly clunky (in a slightly > > > different shape before this patch). ExecEvalJsonExprItemCoercion() calls > > > ExecPrepareJsonItemCoercion(), which gets a pointer to one of the per-type > > > elements in JsonItemCoercionsState, dispatching on the type of the json > > > object. Then we later call ExecGetJsonItemCoercion() (via a convoluted > > > path), which again will dispatch on the type (extracting the json object > > > again afaics!), to then somehow eventually get the coerced value. > > > > I think it might be possible to make this a bit simpler, by not > > leaving anything coercion-related in ExecEvalJsonExpr(). > > Honestly, this code seems like it should just be rewritten from scratch. Based on what I wrote above, please let me know if I've misunderstood your concerns about over-allocation of coercion state. I can try to rewrite one more time if I know what this should look like instead. > > I left some pieces there, because I thought the error of not finding an > > appropriate coercion must be thrown right away as the code in > > ExecEvalJsonExpr() does after calling ExecGetJsonItemCoercion(). > > > > ExecPrepareJsonItemCoercion() is called later when it's time to > > actually evaluate the coercion. If we move the error path to > > ExecPrepareJsonItemCoercion(), both ExecGetJsonItemCoercion() and the > > error path code in ExecEvalJsonExpr() will be unnecessary. I will > > give that a try. > > Why do we need the separation of prepare and then evaluation? They're executed > straight after each other? ExecPrepareJsonItemCoercion() is a helper routine to choose the coercion and extract the Datum out of the JsonbValue produced by the EEOP_JSONEXPR_PATH step to feed to the coercion expression's ExprEvalStep. The coercion evaluation will be done by jumping to said step in ExecInterpExpr(). > > > - Looks like there's still some recursive expression states, namely > > > JsonExprState->{result_coercion, coercions}? > > > > So, the problem with inlining coercion evaluation into the main parent > > JsonExpr's is that it needs to be wrapped in a sub-transaction to > > catch any errors and return NULL instead. I don't know a way to wrap > > ExprEvalStep evaluation in a sub-transaction to achieve that effect. > > But we don't need to wrap arbitrary evaluation in a subtransaction - afaics > the coercion calls a single function, not an arbitrary expression? It can do EEOP_IOCOERCE for example, and the input/output function may cause an error depending on what comes out of the JSON blob. IIUC, those errors need to be caught to satisfy some SQL/JSON spec. > > Ooh, thanks for letting me know. So maybe I am missing some > > llvmjist_emit.h/type.c infrastructure to read an int32 value > > (jumpdone) out of an int32 pointer (&jumpdone)? > > No, you just need to replace l_ptr(TypeSizeT) with l_ptr(LLVMInt32Type()). OK, thanks. > > > I first was confused why the code tries to load the jump target > > > dynamically. But then I saw that the interpreted code sets it dynamically - > > > why? That's completely unnecessary overhead afaics? There's just two > > > possible jump targets, no? > > > > Hmm, I looked at the code for other expressions that jump, especially > > CASE WHEN, but they use ready-made EEOP_JUMP_IF_* steps, which can be > > added statically. I thought we can't use them in this case, because > > the conditions are very ad-hoc, like if the JSON path computation > > returned an "empty" item or if the "error" flag was set during that > > computation, etc. > > The minimal fix would be to return the jump target from the function, and then > jump to that. That at least avoids the roundtrip to memory you have right now. You mean like this: LLVMValueRef v_args[2]; LLVMValueRef v_ret; /* * Call ExecEvalJsonExprSkip() to decide if JSON path * evaluation can be skipped. This returns the step * address to jump to. */ v_args[0] = v_state; v_args[1] = l_ptr_const(op, l_ptr(StructExprEvalStep)); v_ret = LLVMBuildCall(b, llvm_pg_func(mod, "ExecEvalJsonExprSkip"), params, lengthof(params), ""); Actually, this is how I had started, but never figured out how to jump to the address in v_ret. As in, how to extract the plain C int32 value that is the jump address from v_ret, an LLVMValueRef, to use in the following statement: LLVMBuildBr(b, opblocks[<int32-in-v_ret>]); -- Thanks, Amit Langote EDB: http://www.enterprisedb.com
pgsql-hackers by date: