Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size - Mailing list pgsql-hackers

From Andres Freund
Subject Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size
Date
Msg-id 20220805203656.mdgmajvsahi5srcy@awork3.anarazel.de
Whole thread Raw
In response to Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size  (Amit Langote <amitlangote09@gmail.com>)
Responses Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size
Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size
List pgsql-hackers
Hi,

I tried to look into some of the questions from Amit, but I have e.g. no idea
what exactly the use of subtransactions tries to achieve - afaics 1a36bc9dba8
is the first patch to introduce needing to evaluate parts expressions in a
subtransaction - but there's not a single comment explaining why.

It's really hard to understand the new json code. It's a substantial amount of
new infrastructure, without any design documentation that I can find. And it's
not like it's just code that's equivalent to nearby stuff. To me this falls
way below our standards and I think it's *months* of work to fix.

Even just the expresion evaluation code: EvalJsonPathVar(),
ExecEvalJsonConstructor(), ExecEvalJsonExpr(), ExecEvalJson(). There's one
layer of subtransactions in one of the paths in ExecEvalJsonExpr(), another in
ExecEvalJson(). Some paths of ExecEvalJsonExpr() go through subtransactions,
others don't.

It's one thing for a small set of changes to be of this quality. But this is
pretty large - a quick summing of diffstat ends up with about 17k lines added,
of which ~2.5k are docs, ~4.8k are tests.


On 2022-08-04 17:01:48 +0900, Amit Langote wrote:
> On Wed, Aug 3, 2022 at 12:00 AM Andres Freund <andres@anarazel.de> wrote:
> > > >   The whole coercion stuff just seems incredibly clunky (in a slightly
> > > >   different shape before this patch). ExecEvalJsonExprItemCoercion() calls
> > > >   ExecPrepareJsonItemCoercion(), which gets a pointer to one of the per-type
> > > >   elements in JsonItemCoercionsState, dispatching on the type of the json
> > > >   object. Then we later call ExecGetJsonItemCoercion() (via a convoluted
> > > >   path), which again will dispatch on the type (extracting the json object
> > > >   again afaics!), to then somehow eventually get the coerced value.
> > >
> > > I think it might be possible to make this a bit simpler, by not
> > > leaving anything coercion-related in ExecEvalJsonExpr().
> >
> > Honestly, this code seems like it should just be rewritten from scratch.
>
> Based on what I wrote above, please let me know if I've misunderstood
> your concerns about over-allocation of coercion state.  I can try to
> rewrite one more time if I know what this should look like instead.

I don't know. I don't understand the design of what needs to have error
handling, and what not.


> > > >   I first was confused why the code tries to load the jump target
> > > >   dynamically. But then I saw that the interpreted code sets it dynamically -
> > > >   why? That's completely unnecessary overhead afaics? There's just two
> > > >   possible jump targets, no?
> > >
> > > Hmm, I looked at the code for other expressions that jump, especially
> > > CASE WHEN, but they use ready-made EEOP_JUMP_IF_* steps, which can be
> > > added statically.  I thought we can't use them in this case, because
> > > the conditions are very ad-hoc, like if the JSON path computation
> > > returned an "empty" item or if the "error" flag was set during that
> > > computation, etc.
> >
> > The minimal fix would be to return the jump target from the function, and then
> > jump to that. That at least avoids the roundtrip to memory you have right now.
>
> You mean like this:
>
>                     LLVMValueRef v_args[2];
>                     LLVMValueRef v_ret;
>
>                     /*
>                      * Call ExecEvalJsonExprSkip() to decide if JSON path
>                      * evaluation can be skipped.  This returns the step
>                      * address to jump to.
>                      */
>                     v_args[0] = v_state;
>                     v_args[1] = l_ptr_const(op, l_ptr(StructExprEvalStep));
>                     v_ret = LLVMBuildCall(b,
>                                           llvm_pg_func(mod,
> "ExecEvalJsonExprSkip"),
>                                           params, lengthof(params), "");
>
> Actually, this is how I had started, but never figured out how to jump
> to the address in v_ret.  As in, how to extract the plain C int32
> value that is the jump address from v_ret, an LLVMValueRef, to use in
> the following statement:
>
>                     LLVMBuildBr(b, opblocks[<int32-in-v_ret>]);

We could make that work, but even keeping it more similar to your current
code, you're already dealing with a variable jump target. Only that you load
it from a variable, instead of the function return type. So you could just
v_ret instead of v_jumpaddr, and your code would be simpler and faster.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Next
From: Andres Freund
Date:
Subject: Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints