Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size - Mailing list pgsql-hackers

From Andres Freund
Subject Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size
Date
Msg-id 20220802150022.jqomxwum5eazybfv@awork3.anarazel.de
Whole thread Raw
In response to Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size  (Amit Langote <amitlangote09@gmail.com>)
Responses Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size
List pgsql-hackers
Hi,

On 2022-08-02 12:05:55 +0900, Amit Langote wrote:
> On Tue, Aug 2, 2022 at 9:39 AM Andres Freund <andres@anarazel.de> wrote:
> > On 2022-07-27 17:01:13 +0900, Amit Langote wrote:
> > > Here's an updated version of the patch, with mostly cosmetic changes.
> > > In particular, I added comments describing the new llvm_compile_expr()
> > > blobs.
> >
> > - I've asked a couple times before: Why do we need space for every possible
> >   datatype at once in JsonItemCoercions? Can there be multiple "concurrent"
> >   coercions in process?
> 
> This topic has been a head-scratcher for me too from the beginning,
> but I've since come to understand (convince myself) that we do need
> the coercions for all possible types, because we don't know the type
> of the JSON item that's going to pop out of the main JSON path
> expression until we've run it through the JSON path executor that
> ExecEvalJson() invokes.  So, it's not possible to statically assign
> the coercion.

Sure. But that doesn't mean we have to have memory for every possible type *at
the same time*.


> I am not really sure if different coercions may be used
> in the same query over multiple evaluations of the same JSON path
> expression, but maybe that's also possible.

Even if the type can change, I don't think that means we need to have space
for multiple types at the same time - there can't be multiple coercions
happening at the same time, otherwise there could be two coercions of the same
type as well. So we don't need memory for every coercion type.

> 
> >   The whole coercion stuff just seems incredibly clunky (in a slightly
> >   different shape before this patch). ExecEvalJsonExprItemCoercion() calls
> >   ExecPrepareJsonItemCoercion(), which gets a pointer to one of the per-type
> >   elements in JsonItemCoercionsState, dispatching on the type of the json
> >   object. Then we later call ExecGetJsonItemCoercion() (via a convoluted
> >   path), which again will dispatch on the type (extracting the json object
> >   again afaics!), to then somehow eventually get the coerced value.
> 
> I think it might be possible to make this a bit simpler, by not
> leaving anything coercion-related in ExecEvalJsonExpr().

Honestly, this code seems like it should just be rewritten from scratch.


> I left some pieces there, because I thought the error of not finding an
> appropriate coercion must be thrown right away as the code in
> ExecEvalJsonExpr() does after calling ExecGetJsonItemCoercion().
> 
> ExecPrepareJsonItemCoercion() is called later when it's time to
> actually evaluate the coercion.  If we move the error path to
> ExecPrepareJsonItemCoercion(), both ExecGetJsonItemCoercion() and the
> error path code in ExecEvalJsonExpr() will be unnecessary.  I will
> give that a try.

Why do we need the separation of prepare and then evaluation? They're executed
straight after each other?


> > - Looks like there's still some recursive expression states, namely
> >   JsonExprState->{result_coercion, coercions}?
> 
> So, the problem with inlining coercion evaluation into the main parent
> JsonExpr's is that it needs to be wrapped in a sub-transaction to
> catch any errors and return NULL instead.  I don't know a way to wrap
> ExprEvalStep evaluation in a sub-transaction to achieve that effect.

But we don't need to wrap arbitrary evaluation in a subtransaction - afaics
the coercion calls a single function, not an arbitrary expression?


> Ooh, thanks for letting me know.  So maybe I am missing some
> llvmjist_emit.h/type.c infrastructure to read an int32 value
> (jumpdone) out of an int32 pointer (&jumpdone)?

No, you just need to replace l_ptr(TypeSizeT) with l_ptr(LLVMInt32Type()).


> >   I first was confused why the code tries to load the jump target
> >   dynamically. But then I saw that the interpreted code sets it dynamically -
> >   why? That's completely unnecessary overhead afaics? There's just two
> >   possible jump targets, no?
> 
> Hmm, I looked at the code for other expressions that jump, especially
> CASE WHEN, but they use ready-made EEOP_JUMP_IF_* steps, which can be
> added statically.  I thought we can't use them in this case, because
> the conditions are very ad-hoc, like if the JSON path computation
> returned an "empty" item or if the "error" flag was set during that
> computation, etc.

The minimal fix would be to return the jump target from the function, and then
jump to that. That at least avoids the roundtrip to memory you have right now.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Add proper planner support for ORDER BY / DISTINCT aggregates
Next
From: David Steele
Date:
Subject: Re: Race between KeepFileRestoredFromArchive() and restartpoint