Re: Extract numeric filed in JSONB more effectively - Mailing list pgsql-hackers

From Chapman Flack
Subject Re: Extract numeric filed in JSONB more effectively
Date
Msg-id 5955e93347a7e3b1612cf7e129ae6d04@anastigmatix.net
Whole thread Raw
In response to Re: Extract numeric filed in JSONB more effectively  (Chapman Flack <chap@anastigmatix.net>)
Responses Re: Extract numeric filed in JSONB more effectively
List pgsql-hackers
On 2023-08-22 08:16, Chapman Flack wrote:
> On 2023-08-22 01:54, Andy Fan wrote:
>> After we label it, we will get error like this:
>> 
>> select (a->'a')::int4 from m;
>> ERROR:  cannot display a value of type internal
> 
> Without looking in depth right now, I would double-check
> what relabel node is being applied at the result. The idea,
> of course, was to relabel the result as the expected result

as I suspected, looking at v10-0003, there's this:

+   fexpr = (FuncExpr *)makeRelabelType((Expr *) fexpr, INTERNALOID,
+                                       0, InvalidOid, 
COERCE_IMPLICIT_CAST);

compared to the example I had sent earlier:

On 2023-08-18 17:02, Chapman Flack wrote:
>     expr = (Expr *)makeRelabelType((Expr *)fexpr,
>       targetOid, -1, InvalidOid, COERCE_IMPLICIT_CAST);

The key difference: this is the label going on the result type
of the function we are swapping in. The function already has
return type declared internal; we want to relabel it as
returning the type identified by targetOid. A relabel node
to type internal is the reverse of what's needed (and also
superfluous, as the function's return type is internal already).

Two more minor differences: (1) the node you get from
makeRelabelType is an Expr, but not really a FuncExpr. Casting
it to FuncExpr is a bit bogus. Also, the third argument to
makeRelabelType is a typmod, and I believe the "not-modified"
typmod is -1, not 0.

In the example I had sent earlier, there were two relabel nodes,
serving different purposes. In one, we have a function that wants
internal for an argument, but we've made a Const with type OIDOID,
so we want to say "this is internal, the thing the function wants."

The situation is reversed at the return type of the function: the
function declares its return type internal, but the surrounding
query is expecting an int4 (or whatever targetOid identifies),
so any relabel node there needs to say it's an int4 (or whatever).

So, that approach involves two relabelings. In the one idea for
restructuring that I suggested earlier, that's reduced: the
..._int function produces a JsonbValue (typed internal) and
the selected ..._finish function expects that, so those types
match and no relabeling is called for. And with the correct
..._finish function selected at rewrite time, it already has
the same return type the surrounding query expects, so no
relabeling is called for there either.

However, simply to ensure the ..._int function cannot be
casually called, it needs an internal parameter, even an
unused one, and the rewriter must supply a value for that,
which may call for one relabel node.

On 2023-08-21 06:50, Andy Fan wrote:
> I'm not very excited with this manner, reasons are: a).  It will have
> to emit more steps in ExprState->steps which will be harmful for
> execution. The overhead  is something I'm not willing to afford.

I would be open to a performance comparison, but offhand I am not
sure whether the overhead of another step or two in an ExprState
is appreciably more than some of the overhead in the present patch,
such as the every-time-through fcinfo initialization buried in
DirectFunctionCall1 where you don't necessarily see it. I bet
the fcinfo in an ExprState step gets set up once, and just has
new argument values slammed into it each time through.

(Also, I know very little about how the JIT compiler is used in PG,
but I suspect that a step you bury inside your function is a step
it may not get to see.)

> b). this manner requires more *internal*, which is kind of similar
> to "void *"  in C.

I'm not sure in what sense you mean "more". The present patch
has to deal with two places where some other type must be
relabeled internal or something internal relabeled to another
type. The approach I suggested does involve two families of
function, one returning internal (a JsonbValue) and one
expecting internal (a JsonbValue), where the rewriter would
compose one over the other, no relabeling needed. There's
also an internal parameter needed for whatever returns
internal, and that's just the protocol for how such things
are done. To me, that seems like a fairly principled use of
the type.

I would not underestimate the benefit of reducing the code
duplication and keeping the patch as clear as possible.
The key contributions of the patch are getting a numeric or
boolean efficiently out of the JSON operation. Getting from
numeric to int or float are things the system already does
well. A patch that focuses on what it contributes, and avoids
redoing things the system already can do--unless the duplication
can be shown to have a strong performance benefit--is easier to
review and probably to get integrated.

Regards,
-Chap



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: broken master regress tests
Next
From: Pavel Stehule
Date:
Subject: Re: broken master regress tests