Re: SQL/JSON features for v15 - Mailing list pgsql-hackers

From Nikita Glukhov
Subject Re: SQL/JSON features for v15
Date
Msg-id 225d341b-f310-eed0-ceff-b8ec61669d9f@postgrespro.ru
Whole thread Raw
In response to Re: SQL/JSON features for v15  (Andres Freund <andres@anarazel.de>)
Responses Re: SQL/JSON features for v15
List pgsql-hackers
Hi,

On 16.08.2022 01:38, Andres Freund wrote:
Continuation from the thread at
https://postgr.es/m/20220811171740.m5b4h7x63g4lzgrk%40awork3.anarazel.de


I started hacking on this Friday.  I think there's some relatively easy
improvements that make the code substantially more understandable, at least
for me, without even addressing the structural stuff.
I also started hacking Friday, hacked all weekend, and now have a new
version of the patch.

I received your message when I finished writing of mine, so I will
try answer your new questions only in next message.  But in short, I
can say that some things like ExecEvalJsonExprSubtrans() were fixed.


I took Amit's patch and tried to simplify execution further.
Explanation of the patches is at the very end of message.

Next, I try to answer some of previous questions.

On Aug 2, 2022 at 9:39 AM Andres Freund <andres@anarazel.de> wrote:
> The whole coercion stuff just seems incredibly clunky (in a
> slightly different shape before this patch).
> ExecEvalJsonExprItemCoercion() calls ExecPrepareJsonItemCoercion(),
> which gets a pointer to one of the per-type elements in
> JsonItemCoercionsState, dispatching on the type of the json
> object.  Then we later call ExecGetJsonItemCoercion() (via a
> convoluted path), which again will dispatch on the type
> (extracting the json object again afaics!), to then somehow
> eventually get the coerced value.  I think it might be possible
> to make this a bit simpler, by not leaving anything
> coercion-related in ExecEvalJsonExpr().

On 2022-08-02 12:05:55 +0900, Amit Langote wrote:
> I left some pieces there, because I thought the error of not finding an
> appropriate coercion must be thrown right away as the code in
> ExecEvalJsonExpr() does after calling ExecGetJsonItemCoercion().

> ExecPrepareJsonItemCoercion() is called later when it's time to
> actually evaluate the coercion.  If we move the error path to
> ExecPrepareJsonItemCoercion(), both ExecGetJsonItemCoercion() and the
> error path code in ExecEvalJsonExpr() will be unnecessary.  I will
> give that a try.

The first dispatch is done only for throwing error about missing cast
without starting subtransaction in which second dispatch is executed.
I agree, this is bad that result of first dispatch is not used later,
and I have removed second dispatch.


> I don't understand the design of what needs to have error handling,
> and what not.

> I don't think subtransactions per-se are a fundamental problem.
> I think the error handling implementation is ridiculously complicated,
> and while I started to hack on improving it, I stopped when I really
> couldn't understand what errors it actually needs to handle when and
> why.

Here is the diagram that may help to understand error handling in
SQL/JSON functions (I hope it will be displayed correctly):

 JSON path  ------- expression        \                    ->+-----------+    SQL/JSON    +----------+  Result PASSING args ------->| JSON path |--> item or --->|  Output  |-> SQL                    ->|  executor |     JSONB   .->| Coercion |  value                   /  +-----------+     datum   |  +----------+  JSON     + - - - -+   |      |                |       | Context ->: FORMAT :   v      v                |       v  item     : JSON   : error? empty?             |     error?           + - - - -+   |      |                |       |               |        | +----------+          |      /               v        | | ON EMPTY |--> SQL --'     /             error?     | +----------+   value       /               |        |      |                    /                \       |      v                   /                 \       \   error?               /                  \       \    |                 /                   \______ \   |   _____________/                          \ \  |  /                           v v v v       +----------+                         +----------+    |  Output  |   Result                         | ON ERROR |--->| Coercion |--> SQL                         +----------+    +----------+   value                               |              |                               V              V                           EXCEPTION      EXCEPTION


The first dashed box "FORMAT JSON" used for parsing JSON is absent in
our implementation, because we support only jsonb type which is
pre-parsed.  This could be used in queries like that:
JSON_VALUE('invalid json', '$' DEFAULT 'error' ON ERROR) => 'error'

JSON path executor already has error handling and does not need
subtransactions.  We had to add functions like numeric_add_opt_error()
which return error flag instead of throwing exceptions.


On Aug 10, 2022 at 3:57 AM Andres Freund <andres@anarazel.de> wrote:
>>> One way this code could be drastically simplified is to force all
>>> type-coercions to go through the "io coercion" path, which could be
>>> implemented as a single execution step (which thus could trivially
>>> start/finish a subtransaction) and would remove a lot of the
>>> complicated code around coercions.

>> Could you please clarify how you think we might do the io coercion
>> wrapped with a subtransaction all as a single execution step?  I
>> would've thought that we couldn't do the sub-transaction without
>> leaving ExecInterpExpr() anyway, so maybe you meant the io coercion
>> itself was done using some code outside ExecInterpExpr()?

>> The current JsonExpr code does it by recursively calling
>> ExecInterpExpr() using the nested ExprState expressly for the
>> coercion.

> The basic idea is to rip out all the type-dependent stuff out and
> replace it with a single JSON_IOCERCE step, which has a parameter
> about whether to wrap things in a subtransaction or not. That step
> would always perform the coercion by calling the text output function
> of the input and the text input function of the output.

On Aug 3, 2022 at 12:00 AM Andres Freund <andres@anarazel.de> wrote:
> But we don't need to wrap arbitrary evaluation in a subtransaction -
> afaics the coercion calls a single function, not an arbitrary
> expression?

SQL standard says that scalar SQL/JSON items are converted to SQL type
through  CAST(corresponding_SQL_type_for_item AS returning_type).
Our JSON_VALUE implementation supports arbitrary output types that can
have specific CASTs from numeric, bool, datetime, which we can't
emulate with simple I/O coercion.  But supporting of arbitrary types
may be dangerous, because SQL standard denotes only a limited set of
types:
  The <data type> contained in the explicit or implicit  <JSON returning clause> shall be a <predefined type> that identifies  a character string data type, numeric data type, boolean data type,  or datetime data type.

I/O coercion will not even work in the following simple case: JSON_VALUE('1.23', '$' RETURNING int)
It is expected to return 1::int, like ordinary cast 1.23::numeric::int.

Exceptions may come not only from coercions.  Expressions in DEFAULT ON
EMPTY can also throw exceptions, which also must be handled.

Here is excerpt from ISO/IEC 19075-6:2021(E) "Part 6: Support for JSON",
which explains SQL standard features in human-readable manner:
  6.4.3 JSON_VALUE:
  <JSON value error behavior> specifies what to do if there is an  unhandled error.  Unhandled errors can arise if there is an input  conversion error (for example, if the context item cannot be parsed),  an error returned by the SQL/JSON path engine, or an output  conversion error.  The choices are the same as for  <JSON value empty behavior>.
  When using DEFAULT <value expression> for either the empty or error  behavior, what happens if the <value expression> raises an exception?  The answer is that an error during empty behavior "falls through"  to the error behavior.  If the error behavior itself has an error,  there is no further recourse but to raise the exception.

So, we need to support execution of arbitrary expressions inside a
subtransaction, and do not try to somehow simplify coercions.


In Amit's fix, wrapping DEFAULT ON EMPTY into subtransactions was
lost, mainly because there were no tests for this case.  The following
test should not fall on the second row:
  SELECT JSON_VALUE(jsonb '1', '$.a' RETURNING int                    DEFAULT 1 / x ON EMPTY                    DEFAULT 2 ON ERROR)  FROM (VALUES (1::int), (0)) x(x);
   json_value  ------------   1   2

I have added this test in 0003.


On Aug 3, 2022 at 12:00 AM Andres Freund <andres@anarazel.de> wrote:
> On 2022-08-02 12:05:55 +0900, Amit Langote wrote:
>> I am not really sure if different coercions may be used
>> in the same query over multiple evaluations of the same JSON path
>> expression, but maybe that's also possible.

> Even if the type can change, I don't think that means we need to have
> space for multiple types at the same time - there can't be multiple
> coercions happening at the same time, otherwise there could be two
> coercions of the same type as well.  So we don't need memory for
> every coercion type.

Only the one item coercion is used in execution of JSON_VALUE().
Multiple coercions could be executed, if we supported quite useful SRF
JSON_QUERY() using "RETURNING SETOF type" (I had this idea for a long
time, but I didn't dare to implement it).

I don't understand what "memory" you mean.  If we will not emit all
possible expressions statically, we will need to generate them
dynamically at run-time, and this could be hardly acceptable.  In the
last version of the fix there is only 4 bytes (int jump) of additional
state space per coercion.


On Aug 6, 2022 at 5:37 Andres Freund <andres@anarazel.de> wrote:
> There's one layer of subtransactions in one of the paths in
> ExecEvalJsonExpr(), another in ExecEvalJson(). Some paths of
> ExecEvalJsonExpr() go through subtransactions, others don't.

Really, there is only one level of subtransactions. Outer subtransaction
may be used for FORMAT JSON handling which always requires
subtransaction at the beginning of expression execution.
Inner subtransactions are conditional, they are started only and when
there is no outer subtransaction.

Now, outer subtransactions are not used at all,
ExecEvalJsonNeedsSubtransaction(NULL) always returns false. (AFAIR,
FORMAT JSON was present in older version of SQL/JSON patches, then it
was removed, but outer subtransactions were not).  In the last version
of the fix I have removed them completely and moved inner
subtransactions into a separate executor step (see below).



The description of the patches:

0001 - Fix returning of json[b] domains in JSON_VALUE()

(This may require a separate thread.)

I found a bug in returning json[b] domains in JSON_VALUE(). json[b]
has special processing in JSON_VALUE, bypassing oridinary
SQL/JSON item type => SQL type coercions.  But json[b] domains miss
this processing:
  CREATE DOMAIN jsonb_not_null AS jsonb NOT NULL;
  SELECT JSON_VALUE('"123"', '$' RETURNING jsonb);    "123"
  SELECT JSON_VALUE( '123',  '$' RETURNING jsonb);    123
  SELECT JSON_VALUE('"123"', '$' RETURNING jsonb_not_null);    123
  SELECT JSON_VALUE( '123',  '$' RETURNING jsonb_not_null ERROR ON ERROR);    ERROR: SQL/JSON item cannot be cast to target type

Fixed by examinating output base type in parse_expr.c and skipping
allocation of item coercions, what later will be a signal for special
processing in ExecEvalJsonExpr().


0002 - Add EEOP_SUBTRANS executor step

On 2022-08-02 12:05:55 +0900, Amit Langote wrote:
> So, the problem with inlining coercion evaluation into the main parent
> JsonExpr's is that it needs to be wrapped in a sub-transaction to
> catch any errors and return NULL instead.  I don't know a way to wrap
> ExprEvalStep evaluation in a sub-transaction to achieve that effect.

I also don't know way to run subtransactions without recursion in
executor, but I still managed to elimiate subsidary ExprStates.

I have introduced new EEOP_SUBTRANS step which executes its subsequent
steps in a subtransaction.  It recursively calls a new variant of
ExecInterpExpr() in which starting stepno is passed.  The return from
subtransaction is done with EEOP_DONE step that emitted after
subexpression.  This step can be reused for other future expressions,
that's why it has no JSON prefix in its name (you could see recent
message in the thread about casts with default values, which are
missing in PostgreSQL).

But for JIT I still had to construct additional ExprState with a
function compiled from subexpression steps.


0003 - Simplify JsonExpr execution:
 - New EEOP_SUBTRANS was used to wrap individual coercion expressions:     after execution it jumps to "done" or "onerror" step - JSONEXPR_ITEM_COERCE step was removed - JSONEXPR_COERCE split into JSONEXPR_IOCOERCE and JSONEXPR_POPULATE - Removed all JsonExprPostEvalState - JSONEXPR step simply returns jump address to one of its possible   continuations: done, onempty, onerror, coercion, coercion_subtrans,   io_coercion or one of item_coercions - Fixed JsonExprNeedsSubTransaction(): considired more cases - Eliminated transactions on Const expressions

-- 
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment

pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Wrong comment in statscmds.c/CreateStatistics?
Next
From: Thomas Munro
Date:
Subject: Re: Cleaning up historical portability baggage