Re: SQL/JSON features for v15 - Mailing list pgsql-hackers

From Andres Freund
Subject Re: SQL/JSON features for v15
Date
Msg-id 20220816021425.5sa6gx67zmpou3bc@awork3.anarazel.de
Whole thread Raw
In response to Re: SQL/JSON features for v15  (Nikita Glukhov <n.gluhov@postgrespro.ru>)
Responses Re: SQL/JSON features for v15
List pgsql-hackers
Hi,

On 2022-08-16 04:02:17 +0300, Nikita Glukhov wrote:
> Hi,
>
>
> On 16.08.2022 01:38, Andres Freund wrote:
> > Continuation from the thread at
> > https://postgr.es/m/20220811171740.m5b4h7x63g4lzgrk%40awork3.anarazel.de
> >
> >
> > I started hacking on this Friday.  I think there's some relatively easy
> > improvements that make the code substantially more understandable, at least
> > for me, without even addressing the structural stuff.
>
> I also started hacking Friday, hacked all weekend, and now have a new
> version of the patch.

Cool.


> > I don't understand the design of what needs to have error handling,
> > and what not.
>
> > I don't think subtransactions per-se are a fundamental problem.
> > I think the error handling implementation is ridiculously complicated,
> > and while I started to hack on improving it, I stopped when I really
> > couldn't understand what errors it actually needs to handle when and
> > why.
>
> Here is the diagram that may help to understand error handling in
> SQL/JSON functions (I hope it will be displayed correctly):

I think that is helpful.


>  JSON path  -------
>  expression        \
>                     ->+-----------+    SQL/JSON    +----------+  Result
>  PASSING args ------->| JSON path |--> item or --->|  Output  |-> SQL
>                     ->|  executor |     JSONB   .->| Coercion |  value
>                    /  +-----------+     datum   |  +----------+
>   JSON     + - - - -+   |      |                |       |
>  Context ->: FORMAT :   v      v                |       v
>   item     : JSON   : error? empty?             |     error?
>            + - - - -+   |      |                |       |
>                |        | +----------+          |      /
>                v        | | ON EMPTY |--> SQL --'     /
>              error?     | +----------+   value       /
>                |        |      |                    /
>                 \       |      v                   /
>                  \       \   error?               /
>                   \       \    |                 /
>                    \______ \   |   _____________/
>                           \ \  |  /
>                            v v v v       +----------+
>                          +----------+    |  Output  |   Result
>                          | ON ERROR |--->| Coercion |--> SQL
>                          +----------+    +----------+   value
>                                |              |
>                                V              V
>                            EXCEPTION      EXCEPTION
>
>
> The first dashed box "FORMAT JSON" used for parsing JSON is absent in
> our implementation, because we support only jsonb type which is
> pre-parsed.  This could be used in queries like that:
> JSON_VALUE('invalid json', '$' DEFAULT 'error' ON ERROR) => 'error'


> On Aug 3, 2022 at 12:00 AM Andres Freund<andres@anarazel.de>  wrote:
> > But we don't need to wrap arbitrary evaluation in a subtransaction -
> > afaics the coercion calls a single function, not an arbitrary
> > expression?
>
> SQL standard says that scalar SQL/JSON items are converted to SQL type
> through  CAST(corresponding_SQL_type_for_item AS returning_type).
> Our JSON_VALUE implementation supports arbitrary output types that can
> have specific CASTs from numeric, bool, datetime, which we can't
> emulate with simple I/O coercion.  But supporting of arbitrary types
> may be dangerous, because SQL standard denotes only a limited set of
> types:
>
>   The <data type> contained in the explicit or implicit
>   <JSON returning clause> shall be a <predefined type> that identifies
>   a character string data type, numeric data type, boolean data type,
>   or datetime data type.
>
> I/O coercion will not even work in the following simple case:
>  JSON_VALUE('1.23', '$' RETURNING int)
> It is expected to return 1::int, like ordinary cast 1.23::numeric::int.

Whether it's just IO coercions or also coercions through function calls
doesn't matter terribly, as long as both can be wrapped as a single
interpretation step.  You can have a EEOP_JSON_COERCE_IO,
EEOP_JSON_COERCE_FUNC that respectively call input/output function and the
transformation routine within a subtransaction. On error they can jump to some
on_error execution step.

The difficulty is likely just dealing with the intermediary nodes like
RelabelType.


> Exceptions may come not only from coercions.  Expressions in DEFAULT ON
> EMPTY can also throw exceptions, which also must be handled.

Are there other cases?


> Only the one item coercion is used in execution of JSON_VALUE().
> Multiple coercions could be executed, if we supported quite useful SRF
> JSON_QUERY() using "RETURNING SETOF type" (I had this idea for a long
> time, but I didn't dare to implement it).
>
> I don't understand what "memory" you mean.

I'm not entirely sure what I meant at that time either. Understanding this
code involves a lot of guessing since there's practically no explanatory
comments.


> If we will not emit all possible expressions statically, we will need to
> generate them dynamically at run-time, and this could be hardly acceptable.

I'm not convinced that that's true. We spend a fair amount of memory
generating expression paths for the per-type elements in JsonItemCoercions,
most of which will never be used.  Even trivial stuff ends up with ~2kB.

Then there's of course the executor side, where the various ExprStates really
add up:
MemoryContextStats(CurrentMemoryContext) in ExecInitExprRec(), just before
if (jext->coercions)

ExecutorState: 8192 total in 1 blocks; 4464 free (0 chunks); 3728 used
  ExprContext: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
Grand total: 16384 bytes in 2 blocks; 12392 free (0 chunks); 3992 used

just after:

ExecutorState: 32768 total in 3 blocks; 15032 free (2 chunks); 17736 used
  ExprContext: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
Grand total: 40960 bytes in 4 blocks; 22960 free (2 chunks); 18000 used

for SELECT JSON_VALUE(NULL::jsonb, '$');


> In the last version of the fix there is only 4 bytes (int jump) of
> additional state space per coercion.

That's certainly a *lot* better.



> On Aug 6, 2022 at 5:37 Andres Freund<andres@anarazel.de>  wrote:
> > There's one layer of subtransactions in one of the paths in
> > ExecEvalJsonExpr(), another in ExecEvalJson(). Some paths of
> > ExecEvalJsonExpr() go through subtransactions, others don't.
>
> Really, there is only one level of subtransactions. Outer subtransaction
> may be used for FORMAT JSON handling which always requires
> subtransaction at the beginning of expression execution.
> Inner subtransactions are conditional, they are started only and when
> there is no outer subtransaction.

Yea, I realized that by now as well. But the code doesn't make that
understandable. E.g.:

> Now, outer subtransactions are not used at all,
> ExecEvalJsonNeedsSubtransaction(NULL) always returns false. (AFAIR,
> FORMAT JSON was present in older version of SQL/JSON patches, then it
> was removed, but outer subtransactions were not).

is very misleading.


> 0002 - Add EEOP_SUBTRANS executor step
>
> On 2022-08-02 12:05:55 +0900, Amit Langote wrote:
> > So, the problem with inlining coercion evaluation into the main parent
> > JsonExpr's is that it needs to be wrapped in a sub-transaction to
> > catch any errors and return NULL instead.  I don't know a way to wrap
> > ExprEvalStep evaluation in a sub-transaction to achieve that effect.
>
> I also don't know way to run subtransactions without recursion in
> executor, but I still managed to elimiate subsidary ExprStates.
>
> I have introduced new EEOP_SUBTRANS step which executes its subsequent
> steps in a subtransaction.  It recursively calls a new variant of
> ExecInterpExpr() in which starting stepno is passed.  The return from
> subtransaction is done with EEOP_DONE step that emitted after
> subexpression.  This step can be reused for other future expressions,
> that's why it has no JSON prefix in its name (you could see recent
> message in the thread about casts with default values, which are
> missing in PostgreSQL).

I've wondered about this as well, but I think it'd require quite careful work
to be safe. And certainly isn't something we can do at this point in the cycle
- it'll potentially impact every query, not just ones with json in, if we
screw up something (or introduce overhead).


> But for JIT I still had to construct additional ExprState with a
> function compiled from subexpression steps.

JIT is one of the reason *not* want to construct subsidiary ExprState's, since
they will trigger separate code generation (and thus overhead).

Why did you have to do this?



> 0003 - Simplify JsonExpr execution:
>
>  - New EEOP_SUBTRANS was used to wrap individual coercion expressions:
>      after execution it jumps to "done" or "onerror" step
>  - JSONEXPR_ITEM_COERCE step was removed
>  - JSONEXPR_COERCE split into JSONEXPR_IOCOERCE and JSONEXPR_POPULATE
>  - Removed all JsonExprPostEvalState
>  - JSONEXPR step simply returns jump address to one of its possible
>    continuations: done, onempty, onerror, coercion, coercion_subtrans,
>    io_coercion or one of item_coercions
>  - Fixed JsonExprNeedsSubTransaction(): considired more cases
>  - Eliminated transactions on Const expressions


I pushed a few cleanups to https://github.com/anarazel/postgres/commits/json
while I was hacking on this (ignore that it's based on the meson tree, that's
just faster for me). Some of them might not be applicable anymore, but it
might still make sense for you to look at.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: John Naylor
Date:
Subject: Re: fix typos
Next
From: Bharath Rupireddy
Date:
Subject: Add find_in_log() and advance_wal() perl functions to core test framework (?)