Thread: Extract numeric filed in JSONB more effectively

Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:
Hi:

Currently if we want to extract a numeric field in jsonb, we need to use
the following expression:  cast (a->>'a' as numeric). It will turn a numeric
to text first and then turn the text to numeric again. See 
jsonb_object_field_text and JsonbValueAsText.  However the binary format
of numeric in JSONB is compatible with the numeric in SQL, so I think we
can have an operator to extract the numeric directly. If the value of a given
field is not a numeric data type, an error will be raised, this can be 
documented.

In this patch, I added a new operator for this purpose, here is the
performance gain because of this.

create table tb (a jsonb);
insert into tb select '{"a": 1}'::jsonb from generate_series(1, 100000)i;

current method:
select count(*) from tb where cast (a->>'a' as numeric) = 2;
167ms.

new method:
select count(*) from tb where a@->'a' = 2;
65ms.

Is this the right way to go? Testcase, document and catalog version are
updated.


--
Best Regards
Andy Fan
Attachment

Re: Extract numeric filed in JSONB more effectively

From
Matthias van de Meent
Date:
On Tue, 1 Aug 2023 at 06:39, Andy Fan <zhihui.fan1213@gmail.com> wrote:
>
> Hi:
>
> Currently if we want to extract a numeric field in jsonb, we need to use
> the following expression:  cast (a->>'a' as numeric). It will turn a numeric
> to text first and then turn the text to numeric again.

Why wouldn't you use cast(a->'a' as numeric), or ((a->'a')::numeric)?
We have a cast from jsonb to numeric (jsonb_numeric in jsonb.c) that
does not require this additional (de)serialization through text.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)



Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:


On Tue, Aug 1, 2023 at 7:03 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
On Tue, 1 Aug 2023 at 06:39, Andy Fan <zhihui.fan1213@gmail.com> wrote:
>
> Hi:
>
> Currently if we want to extract a numeric field in jsonb, we need to use
> the following expression:  cast (a->>'a' as numeric). It will turn a numeric
> to text first and then turn the text to numeric again.

Why wouldn't you use cast(a->'a' as numeric), or ((a->'a')::numeric)?

Thanks for this information! I didn't realize we have this function
already at [1].


--
Best Regards
Andy Fan

Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:
Hi Matthias:

On Wed, Aug 2, 2023 at 7:33 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:


On Tue, Aug 1, 2023 at 7:03 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
On Tue, 1 Aug 2023 at 06:39, Andy Fan <zhihui.fan1213@gmail.com> wrote:
>
> Hi:
>
> Currently if we want to extract a numeric field in jsonb, we need to use
> the following expression:  cast (a->>'a' as numeric). It will turn a numeric
> to text first and then turn the text to numeric again.

Why wouldn't you use cast(a->'a' as numeric), or ((a->'a')::numeric)?

Thanks for this information! I didn't realize we have this function
already at [1].


Hi:

I just found ((a->'a')::numeric) is not as effective as I expected.

First in the above expression we used jsonb_object_field which
returns a jsonb (see JsonbValueToJsonb), and then we convert jsonb
to jsonbValue in jsonb_numeric (see JsonbExtractScalar). This
looks like a wastage.

Secondly, because of the same reason above, we use PG_GETARG_JSONB_P(0),
which may detoast a value so we need to free it with PG_FREE_IF_COPY.
then this looks like another potential wastage.

Thirdly, I am not sure we need to do the NumericCopy automatically
in jsonb_numeric. an option in my mind is maybe we can leave this
to the caller?  At least in the normal case (a->'a')::numeric, we don't
need this copy IIUC.

/*
 * v.val.numeric points into jsonb body, so we need to make a copy to
 * return
 */
retValue = DatumGetNumericCopy(NumericGetDatum(v.val.numeric));

At last this method needs 1 extra FuncExpr than my method, this would
cost some expression execution effort. I'm not saying we need to avoid
expression execution generally, but extracting numeric fields from jsonb
looks a reasonable case. As a comparison, cast to other data types like
int2/int4 may be not needed since they are not binary compatible.


Here is the performance comparison (with -O3, my previous post is -O0).

select 1 from tb where (a->'a')::numeric = 2;  31ms.
select 1 from tb where (a@->'a') = 2;  15ms 


--
Best Regards
Andy Fan

Re: Extract numeric filed in JSONB more effectively

From
jian he
Date:
On Tue, Aug 1, 2023 at 12:39 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:
>
> Hi:
>
> Currently if we want to extract a numeric field in jsonb, we need to use
> the following expression:  cast (a->>'a' as numeric). It will turn a numeric
> to text first and then turn the text to numeric again. See
> jsonb_object_field_text and JsonbValueAsText.  However the binary format
> of numeric in JSONB is compatible with the numeric in SQL, so I think we
> can have an operator to extract the numeric directly. If the value of a given
> field is not a numeric data type, an error will be raised, this can be
> documented.
>
> In this patch, I added a new operator for this purpose, here is the
> performance gain because of this.
>
> create table tb (a jsonb);
> insert into tb select '{"a": 1}'::jsonb from generate_series(1, 100000)i;
>
> current method:
> select count(*) from tb where cast (a->>'a' as numeric) = 2;
> 167ms.
>
> new method:
> select count(*) from tb where a@->'a' = 2;
> 65ms.
>
> Is this the right way to go? Testcase, document and catalog version are
> updated.
>
>
> --
> Best Regards
> Andy Fan


return PointerGetDatum(v->val.numeric);
should be something like
PG_RETURN_NUMERIC(v->val.numeric);
?



Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:
Hi Jian:
 
return PointerGetDatum(v->val.numeric);
should be something like
PG_RETURN_NUMERIC(v->val.numeric);
?

Thanks for this reminder, a new patch is attached.  and commitfest
entry is added as well[1]. For recording purposes,  I compared the
new operator with all the existing operators.

select 1 from tb where (a->'a')::numeric = 2;   30.56ms
select 1 from tb where (a->>'a')::numeric = 2; 29.43ms
select 1 from tb where (a@->'a') = 2;              14.80ms 


--
Best Regards
Andy Fan
Attachment

Re: Extract numeric filed in JSONB more effectively

From
Pavel Stehule
Date:
Hi

čt 3. 8. 2023 v 2:51 odesílatel Andy Fan <zhihui.fan1213@gmail.com> napsal:
Hi Jian:
 
return PointerGetDatum(v->val.numeric);
should be something like
PG_RETURN_NUMERIC(v->val.numeric);
?

Thanks for this reminder, a new patch is attached.  and commitfest
entry is added as well[1]. For recording purposes,  I compared the
new operator with all the existing operators.

select 1 from tb where (a->'a')::numeric = 2;   30.56ms
select 1 from tb where (a->>'a')::numeric = 2; 29.43ms
select 1 from tb where (a@->'a') = 2;              14.80ms 



I don't like this solution because it is bloating  operators and it is not extra readable. For completeness you should implement cast for date, int, boolean too. Next, the same problem is with XML or hstore type (probably with any types that are containers).

It is strange so only casting is 2x slower. I don't like the idea so using a special operator is 2x faster than common syntax for casting. It is a signal, so there is a space for optimization. Black magic with special operators is not user friendly for relatively common problems.

Maybe we can introduce some *internal operator* "extract to type", and in rewrite stage we can the pattern (x->'field')::type transform to OP(x, 'field', typid)

Regards

Pavel


--
Best Regards
Andy Fan

Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:
Hi Pavel:

Thanks for the feedback. 

I don't like this solution because it is bloating  operators and it is not extra readable.
 
If we support it with cast, could we say we are bloating CAST?  It is true
that it is not extra readable, if so how about  a->>'a'  return text?  Actually
I can't guess any meaning of the existing jsonb operations without
documentation.

For completeness you should implement cast for date, int, boolean too. Next, the same problem is with XML or hstore type (probably with any types that are containers).

I am not sure completeness is a gold rule we should obey anytime,
like we have some function like int24le to avoid the unnecessary
cast, but we just avoid casting for special types for performance
reason, but not for all. At the same time,  `int2/int4/int8` doesn't
have a binary compatibility type in jsonb. and the serialization
/deserialization for boolean is pretty cheap.

I didn't realize timetime types are binary compatible with SQL,  
so maybe we can have some similar optimization as well. 
(It is a pity that timestamp(tz) are not binary, or else we may 
just need one operator). 
 

I don't like the idea so using a special operator is 2x faster than common syntax for casting. It is a signal, so there is a space for optimization. Black magic with special operators is not user friendly for relatively common problems.

I don't think "Black magic" is a proper word here, since it is not much
different from ->> return a text.  If you argue text can be cast to 
most-of-types,  that would be a reason, but I doubt this difference
should generate a "black magic". 
 

Maybe we can introduce some *internal operator* "extract to type", and in rewrite stage we can the pattern (x->'field')::type transform to OP(x, 'field', typid)

Not sure what the OP should be?  If it is a function, what is the
return value?  It looks to me like it is hard to do in c language?

After all,  if we really care about the number of operators, I'm OK
with just let users use the function directly, like

jsonb_field_as_numeric(jsonb, 'filedname') 
jsonb_field_as_timestamp(jsonb, 'filedname'); 
jsonb_field_as_timestamptz(jsonb, 'filedname'); 
jsonb_field_as_date(jsonb, 'filedname'); 

it can save an operator and sloves the readable issue. 

--
Best Regards
Andy Fan

Re: Extract numeric filed in JSONB more effectively

From
Pavel Stehule
Date:
Hi

čt 3. 8. 2023 v 9:53 odesílatel Andy Fan <zhihui.fan1213@gmail.com> napsal:
Hi Pavel:

Thanks for the feedback. 

I don't like this solution because it is bloating  operators and it is not extra readable.
 
If we support it with cast, could we say we are bloating CAST?  It is true
that it is not extra readable, if so how about  a->>'a'  return text?  Actually
I can't guess any meaning of the existing jsonb operations without
documentation.

yes, it can bloat CAST, but for usage we have already used syntax, and these casts are cooked already:

(2023-08-03 11:04:51) postgres=# select castfunc::regprocedure from pg_cast where castsource = 'jsonb'::regtype;
┌──────────────────┐
│     castfunc     │
╞══════════════════╡
│ -                │
│ bool(jsonb)      │
│ "numeric"(jsonb) │
│ int2(jsonb)      │
│ int4(jsonb)      │
│ int8(jsonb)      │
│ float4(jsonb)    │
│ float8(jsonb)    │
└──────────────────┘
(8 rows)

 
the operator ->> was a special case, the text type is special in postgres as the most convertible type. And when you want to visualise a value or display the value, you should convert value to text.

I can live with that because it is just one, but with your proposal opening the doors for implementing tens of similar operators, I think it is bad. Using ::target_type is common syntax and doesn't require reading documentation.

More, I believe so lot of people uses more common syntax, and then this syntax should to have good performance - for jsonb - (val->'op')::numeric works, and then there should not be performance penalty, because this syntax will be used in 99%.

Usage of cast is self documented.


For completeness you should implement cast for date, int, boolean too. Next, the same problem is with XML or hstore type (probably with any types that are containers).

I am not sure completeness is a gold rule we should obey anytime,
like we have some function like int24le to avoid the unnecessary
cast, but we just avoid casting for special types for performance
reason, but not for all. At the same time,  `int2/int4/int8` doesn't
have a binary compatibility type in jsonb. and the serialization
/deserialization for boolean is pretty cheap.

I didn't realize timetime types are binary compatible with SQL,  
so maybe we can have some similar optimization as well. 
(It is a pity that timestamp(tz) are not binary, or else we may 
just need one operator). 
 

I don't like the idea so using a special operator is 2x faster than common syntax for casting. It is a signal, so there is a space for optimization. Black magic with special operators is not user friendly for relatively common problems.

I don't think "Black magic" is a proper word here, since it is not much
different from ->> return a text.  If you argue text can be cast to 
most-of-types,  that would be a reason, but I doubt this difference
should generate a "black magic". 

I used the term black magic, because nobody without reading documentation can find this operator. It is used just for this special case, and the functionality is the same as using cast (only with different performance).

The operator ->> is more widely used. But if we have some possibility to work without it, then the usage for a lot of users will be more simple. More if the target types can be based on context

Can be nice to use some like `EXTRACT(YEAR FROM val->'field')` instead `EXTRACT(YEAR FROM (val->>'field')::date)`

 

Maybe we can introduce some *internal operator* "extract to type", and in rewrite stage we can the pattern (x->'field')::type transform to OP(x, 'field', typid)

Not sure what the OP should be?  If it is a function, what is the
return value?  It looks to me like it is hard to do in c language?

It should be internal structure - it can be similar like COALESCE or IS operator
 

After all,  if we really care about the number of operators, I'm OK
with just let users use the function directly, like

jsonb_field_as_numeric(jsonb, 'filedname') 
jsonb_field_as_timestamp(jsonb, 'filedname'); 
jsonb_field_as_timestamptz(jsonb, 'filedname'); 
jsonb_field_as_date(jsonb, 'filedname'); 

it can save an operator and sloves the readable issue. 

I don't like it too much, but it is better than introduction new operator 

We already have the jsonb_extract_path and jsonb_extract_path_text function.

I can imagine to usage "anyelement" type too. some like `jsonb_extract_path_type(jsonb, anyelement, variadic text[] )`






--
Best Regards
Andy Fan

Re: Extract numeric filed in JSONB more effectively

From
Matthias van de Meent
Date:
On Wed, 2 Aug 2023 at 03:05, Andy Fan <zhihui.fan1213@gmail.com> wrote:
>
> Hi Matthias:
>
> On Wed, Aug 2, 2023 at 7:33 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:
>>
>>
>>
>> On Tue, Aug 1, 2023 at 7:03 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
>>>
>>> On Tue, 1 Aug 2023 at 06:39, Andy Fan <zhihui.fan1213@gmail.com> wrote:
>>> >
>>> > Hi:
>>> >
>>> > Currently if we want to extract a numeric field in jsonb, we need to use
>>> > the following expression:  cast (a->>'a' as numeric). It will turn a numeric
>>> > to text first and then turn the text to numeric again.
>>>
>>> Why wouldn't you use cast(a->'a' as numeric), or ((a->'a')::numeric)?
>>
>>
>> Thanks for this information! I didn't realize we have this function
>> already at [1].
>>
>> https://www.postgresql.org/docs/15/functions-json.html
>
>
> Hi:
>
> I just found ((a->'a')::numeric) is not as effective as I expected.
>
> First in the above expression we used jsonb_object_field which
> returns a jsonb (see JsonbValueToJsonb), and then we convert jsonb
> to jsonbValue in jsonb_numeric (see JsonbExtractScalar). This
> looks like a wastage.

Yes, it's not great, but that's just how this works. We can't
pre-specialize all possible operations that one might want to do in
PostgreSQL - that'd be absurdly expensive for binary and initial
database sizes.

> Secondly, because of the same reason above, we use PG_GETARG_JSONB_P(0),
> which may detoast a value so we need to free it with PG_FREE_IF_COPY.
> then this looks like another potential wastage.

Is it? Detoasting only happens if the argument was toasted, and I have
serious doubts that the result of (a->'a') will be toasted in our
current system. Sure, we do need to allocate an intermediate result,
but that's in a temporary memory context that should be trivially
cheap to free.

> /*
>  * v.val.numeric points into jsonb body, so we need to make a copy to
>  * return
>  */
> retValue = DatumGetNumericCopy(NumericGetDatum(v.val.numeric));
>
> At last this method needs 1 extra FuncExpr than my method, this would
> cost some expression execution effort. I'm not saying we need to avoid
> expression execution generally, but extracting numeric fields from jsonb
> looks a reasonable case.

But we don't have special cases for the other jsonb types  - the one
that is available (text) is lossy and doesn't work reliably without
making sure the field we're accessing is actually a string, and not
any other type of value.

> As a comparison, cast to other data types like
> int2/int4 may be not needed since they are not binary compatible.

Yet there are casts from jsonb to and back from int2, int4 and int8. I
don't see a very good reason to add this, for the same reasons
mentioned by Pavel.

*If* we were to add this operator, I would want this patch to also
include a #-variant for text[]-based deep access (c.q. #> / #>>), and
equivalent operators for the json type to keep the current access
operator parity.

> Here is the performance comparison (with -O3, my previous post is -O0).
>
> select 1 from tb where (a->'a')::numeric = 2;  31ms.
> select 1 from tb where (a@->'a') = 2;  15ms

What's tb here?


Kind regards,

Matthias van de Meent
Neon (https://neon.tech)



Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:
Hi: 
 

Yes, it's not great, but that's just how this works. We can't
pre-specialize all possible operations that one might want to do in
PostgreSQL - that'd be absurdly expensive for binary and initial
database sizes.

Are any people saying we would  pre-specialize all possible operators?
I would say anything if adding operators will be expensive for binary and
initial database sizes.  If so,  how many per operator and how many
operators would be in your expectation?   


> Secondly, because of the same reason above, we use PG_GETARG_JSONB_P(0),
> which may detoast a value so we need to free it with PG_FREE_IF_COPY.
> then this looks like another potential wastage.

Is it? Detoasting only happens if the argument was toasted, and I have
serious doubts that the result of (a->'a') will be toasted in our
current system. Sure, we do need to allocate an intermediate result,
but that's in a temporary memory context that should be trivially
cheap to free.

If you take care about my context, I put this as a second factor for the
current strategy.  and it is the side effects of factor 1.  FWIW,  that cost
is paid for every jsonb object, not something during the initial database. 


> As a comparison, cast to other data types like
> int2/int4 may be not needed since they are not binary compatible.

Yet there are casts from jsonb to and back from int2, int4 and int8. I
don't see a very good reason to add this, for the same reasons
mentioned by Pavel.
 
Who is insisting on adding such an operator in your opinion?  


*If* we were to add this operator, I would want this patch to also
include a #-variant for text[]-based deep access (c.q. #> / #>>), and
equivalent operators for the json type to keep the current access
operator parity.

> Here is the performance comparison (with -O3, my previous post is -O0).
>
> select 1 from tb where (a->'a')::numeric = 2;  31ms.
> select 1 from tb where (a@->'a') = 2;  15ms

What's tb here?

This is my first post.  Copy it here again. 

create table tb (a jsonb);
insert into tb select '{"a": 1}'::jsonb from generate_series(1, 100000)i;

--
Best Regards
Andy Fan

Re: Extract numeric filed in JSONB more effectively

From
Chapman Flack
Date:
On 2023-08-03 03:53, Andy Fan wrote:
> I didn't realize timetime types are binary compatible with SQL,
> so maybe we can have some similar optimization as well.
> (It is a pity that timestamp(tz) are not binary, or else we may
> just need one operator).

Not to veer from the thread, but something about that paragraph
has been hard for me to parse/follow.

>> Maybe we can introduce some *internal operator* "extract to type", and 
>> in
>> rewrite stage we can the pattern (x->'field')::type transform to OP(x,
>> 'field', typid)
> 
> Not sure what the OP should be?  If it is a function, what is the
> return value?  It looks to me like it is hard to do in c language?

Now I am wondering about the 'planner support function' available
in CREATE FUNCTION since PG 12. I've never played with that yet.
Would that make it possible to have some, rather generic, extract
from JSON operator that can look at the surrounding expression
and replace itself sometimes with something more efficient?

Regards,
-Chap



Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:
Hi:
 
More, I believe so lot of people uses more common syntax, and then this syntax should to have good performance - for jsonb - (val->'op')::numeric works, and then there should not be performance penalty, because this syntax will be used in 99%.

This looks like a valid opinion IMO,  but to rescue it, we have to do
something like "internal structure" and remove the existing cast. 
But even we pay the effort, it still breaks some common knowledge,
since xx:numeric is not a cast.  It is an "internal structure"!  


I don't think "Black magic" is a proper word here, since it is not much
different from ->> return a text.  If you argue text can be cast to 
most-of-types,  that would be a reason, but I doubt this difference
should generate a "black magic". 

I used the term black magic, because nobody without reading documentation can find this operator.

I think this is what document is used for..   
 
It is used just for this special case, and the functionality is the same as using cast (only with different performance).

This is not good, but I didn't see a better choice so far,  see my first graph.
 

The operator ->> is more widely used. But if we have some possibility to work without it, then the usage for a lot of users will be more simple. More if the target types can be based on context

It would be cool but still I didn't see a way to do that without making
something else complex. 


Maybe we can introduce some *internal operator* "extract to type", and in rewrite stage we can the pattern (x->'field')::type transform to OP(x, 'field', typid)

Not sure what the OP should be?  If it is a function, what is the
return value?  It looks to me like it is hard to do in c language?

It should be internal structure - it can be similar like COALESCE or IS operator

It may work, but see my answer in the first graph. 
 
 

After all,  if we really care about the number of operators, I'm OK
with just let users use the function directly, like

jsonb_field_as_numeric(jsonb, 'filedname') 
jsonb_field_as_timestamp(jsonb, 'filedname'); 
jsonb_field_as_timestamptz(jsonb, 'filedname'); 
jsonb_field_as_date(jsonb, 'filedname'); 

it can save an operator and sloves the readable issue. 

I don't like it too much, but it is better than introduction new operator 

Good to know it.  Naming operators is a complex task  if we add four. 


We already have the jsonb_extract_path and jsonb_extract_path_text function.

I can't follow this.  jsonb_extract_path returns a jsonb, which is  far away from
our goal: return a numeric effectively?

I can imagine to usage "anyelement" type too. some like `jsonb_extract_path_type(jsonb, anyelement, variadic text[] )`
 
Can you elaborate this please?   

--
Best Regards
Andy Fan

Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:
Hi: 

On Thu, Aug 3, 2023 at 8:34 PM Chapman Flack <chap@anastigmatix.net> wrote:
On 2023-08-03 03:53, Andy Fan wrote:
> I didn't realize timetime types are binary compatible with SQL,
> so maybe we can have some similar optimization as well.
> (It is a pity that timestamp(tz) are not binary, or else we may
> just need one operator).

Not to veer from the thread, but something about that paragraph
has been hard for me to parse/follow.

I don't think this is a key conflict so far. but I'd explain this in more
detail. If timestamp -> timestamptz or timestamptz -> timestamp is
binary compatible,  we can only have 1 operator to return a timestamp.
then when we cast it to timestamptz, it will be a no-op during runtime.
however cast between timestamp and timestamptz is not binary
compatible. whose castmethod is 'f'; 

 

>> Maybe we can introduce some *internal operator* "extract to type", and
>> in
>> rewrite stage we can the pattern (x->'field')::type transform to OP(x,
>> 'field', typid)
>
> Not sure what the OP should be?  If it is a function, what is the
> return value?  It looks to me like it is hard to do in c language?

Now I am wondering about the 'planner support function' available
in CREATE FUNCTION since PG 12. I've never played with that yet.
Would that make it possible to have some, rather generic, extract
from JSON operator that can look at the surrounding expression
and replace itself sometimes with something  efficient?

I didn't realize this before,  'planner support function' looks
amazing and SupportRequestSimplify looks promising, I will check it 
more. 

--
Best Regards
Andy Fan

Re: Extract numeric filed in JSONB more effectively

From
Pavel Stehule
Date:
Hi

čt 3. 8. 2023 v 15:23 odesílatel Andy Fan <zhihui.fan1213@gmail.com> napsal:
Hi:
 
More, I believe so lot of people uses more common syntax, and then this syntax should to have good performance - for jsonb - (val->'op')::numeric works, and then there should not be performance penalty, because this syntax will be used in 99%.

This looks like a valid opinion IMO,  but to rescue it, we have to do
something like "internal structure" and remove the existing cast. 
But even we pay the effort, it still breaks some common knowledge,
since xx:numeric is not a cast.  It is an "internal structure"!  

I didn't study jsonb function, but there is an xml function that extracts value and next casts it to some target type. It does what is expected - for known types use hard coded casts, for other ask system catalog for cast function or does IO cast. This code is used for the XMLTABLE function. The JSON_TABLE function is not implemented yet, but there should be similar code. If you use explicit cast, then the code should not be hard, in the rewrite stage all information should be known.




I don't think "Black magic" is a proper word here, since it is not much
different from ->> return a text.  If you argue text can be cast to 
most-of-types,  that would be a reason, but I doubt this difference
should generate a "black magic". 

I used the term black magic, because nobody without reading documentation can find this operator.

I think this is what document is used for..   
 
It is used just for this special case, and the functionality is the same as using cast (only with different performance).

This is not good, but I didn't see a better choice so far,  see my first graph.
 

The operator ->> is more widely used. But if we have some possibility to work without it, then the usage for a lot of users will be more simple. More if the target types can be based on context

It would be cool but still I didn't see a way to do that without making
something else complex. 

sure - it is significantly more work, but it should be usable for all types and just use common syntax. The custom @-> operator you can implement in your own custom extension. Builtin solutions should be generic as it is possible.

The things should be as simple as possible - mainly for users, that missing knowledge, and any other possibility of how to do some task just increases their confusion. Can be nice if users find one solution on stack overflow and this solution should be great for performance too. It is worse if users find more solutions, but it is not too bad, if these solutions have similar performance. It is too bad if any solution has great performance and others not too much. Users has not internal knowledge, and then don't understand why sometimes should to use special operator and not common syntax.
 


Maybe we can introduce some *internal operator* "extract to type", and in rewrite stage we can the pattern (x->'field')::type transform to OP(x, 'field', typid)

Not sure what the OP should be?  If it is a function, what is the
return value?  It looks to me like it is hard to do in c language?

It should be internal structure - it can be similar like COALESCE or IS operator

It may work, but see my answer in the first graph. 
 
 

After all,  if we really care about the number of operators, I'm OK
with just let users use the function directly, like

jsonb_field_as_numeric(jsonb, 'filedname') 
jsonb_field_as_timestamp(jsonb, 'filedname'); 
jsonb_field_as_timestamptz(jsonb, 'filedname'); 
jsonb_field_as_date(jsonb, 'filedname'); 

it can save an operator and sloves the readable issue. 

I don't like it too much, but it is better than introduction new operator 

Good to know it.  Naming operators is a complex task  if we add four. 


We already have the jsonb_extract_path and jsonb_extract_path_text function.

I can't follow this.  jsonb_extract_path returns a jsonb, which is  far away from
our goal: return a numeric effectively?

I proposed `jsonb_extract_path_type` that is of anyelement type.

Regards

Pavel



I can imagine to usage "anyelement" type too. some like `jsonb_extract_path_type(jsonb, anyelement, variadic text[] )`
 
Can you elaborate this please?   

--
Best Regards
Andy Fan

Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:
Hi: 


 If you use explicit cast, then the code should not be hard, in the rewrite stage all information should be known.

Can you point to me where the code is for the XML stuff?  I thought
this is a bad idea but I may accept it if some existing code does
such a thing already.   "such thing"  is  typeA:typeB is
converted something else but user can't find out an entry in
pg_cast for typeA to typeB. 
 
It would be cool but still I didn't see a way to do that without making
something else complex. 

 The custom @-> operator you can implement in your own custom extension. Builtin solutions should be generic as it is possible.

I agree, but actually I think there is no clean way to do it, at least I
dislike the conversion of typeA to typeB in a cast syntax but there
is no entry in pg_cast for it.  Are you saying something like this 
or I misunderstood you? 


--
Best Regards
Andy Fan

Re: Extract numeric filed in JSONB more effectively

From
Tom Lane
Date:
Andy Fan <zhihui.fan1213@gmail.com> writes:
>> If you use explicit cast, then the code should not be hard, in the
>> rewrite stage all information should be known.

> Can you point to me where the code is for the XML stuff?

I think Pavel means XMLTABLE, which IMO is an ugly monstrosity of
syntax --- but count on the SQL committee to do it that way :-(.

As far as the current discussion goes, I'm strongly against
introducing new functions or operators to do the same things that
we already have perfectly good syntax for.  "There's more than one
way to do it" isn't necessarily a virtue, and for sure it isn't a
virtue if people have to rewrite their existing queries to make use
of your optimization.

Also, why stop at optimizing "(jsonbval->'fld')::sometype"?  There are
many other extraction cases that people might wish were faster, such
as json array indexing, nested fields, etc.  It certainly won't make
sense to introduce yet another set of functions for each pattern you
want to optimize --- or at least, we won't want to ask users to change
their queries to invoke those functions explicitly.

I do like the idea of attaching a Simplify planner support function
to jsonb_numeric (and any other ones that seem worth optimizing)
that can convert a stack of jsonb transformations into a bundled
operation that avoids unnecessary conversions.  Then you get the
speedup without any need for people to change their existing queries.
We'd still have functions like jsonb_field_as_numeric() under the
hood, but there's not an expectation that users call them explicitly.
(Alternatively, the output of this Simplify could be a new kind of
expression node that bundles one or more jsonb extractions with a
type conversion.  I don't have an opinion yet on which way is better.)

            regards, tom lane



Re: Extract numeric filed in JSONB more effectively

From
jian he
Date:
On Thu, Aug 3, 2023 at 6:04 PM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
>
>
> Is it? Detoasting only happens if the argument was toasted, and I have
> serious doubts that the result of (a->'a') will be toasted in our
> current system. Sure, we do need to allocate an intermediate result,
> but that's in a temporary memory context that should be trivially
> cheap to free.
>
> > /*
> >  * v.val.numeric points into jsonb body, so we need to make a copy to
> >  * return
> >  */
> > retValue = DatumGetNumericCopy(NumericGetDatum(v.val.numeric));
> >
> > At last this method needs 1 extra FuncExpr than my method, this would
> > cost some expression execution effort. I'm not saying we need to avoid
> > expression execution generally, but extracting numeric fields from jsonb
> > looks a reasonable case.
>
> What's tb here?
>
>
> Kind regards,
>
> Matthias van de Meent
> Neon (https://neon.tech)
>
>

can confirm the patch's jsonb_object_field_numeric is faster than
pg_catalog."numeric"(jsonb).
also it works accurately either jsonb is in the page or in toast schema chunks.
I don't understand why we need to allocate an intermediate result
part. since you cannot directly update a jsonb value field.

This function is not easy to find out...
select numeric('{"a":11}'->'a'); --fail.
select jsonb_numeric(jsonb'{"a":11}'->'a'); --fail
select "numeric"('{"a":11}'::jsonb->'a'); --ok



Re: Extract numeric filed in JSONB more effectively

From
Pavel Stehule
Date:


čt 3. 8. 2023 v 16:27 odesílatel Andy Fan <zhihui.fan1213@gmail.com> napsal:
Hi: 


 If you use explicit cast, then the code should not be hard, in the rewrite stage all information should be known.

Can you point to me where the code is for the XML stuff?  I thought
this is a bad idea but I may accept it if some existing code does
such a thing already.   "such thing"  is  typeA:typeB is
converted something else but user can't find out an entry in
pg_cast for typeA to typeB. 

in XML there is src/backend/utils/adt/xml.c, the XmlTableGetValue routine. It is not an internal transformation - and from XML type to some else.

you can look at parser - parse_expr, parse_func. You can watch the lifecycle of :: operator. There are transformations of nodes to different nodes

you can look to patches related to SQL/JSON (not fully committed yet) and json_table


 
 
It would be cool but still I didn't see a way to do that without making
something else complex. 

 The custom @-> operator you can implement in your own custom extension. Builtin solutions should be generic as it is possible.

I agree, but actually I think there is no clean way to do it, at least I
dislike the conversion of typeA to typeB in a cast syntax but there
is no entry in pg_cast for it.  Are you saying something like this 
or I misunderstood you? 

There is not any possibility of user level space.  The conversions should be supported by cast from pg_cast, where it is possible. When it is impossible, then you can raise an exception in some strict mode, or you can do IO cast. But this is not hard part

You should to teach parser to push type info deeper to some nodes about expected result

(2023-08-04 05:28:36) postgres=# select ('{"a":2, "b":"nazdar"}'::jsonb)['a']::numeric;
┌─────────┐
│ numeric │
╞═════════╡
│       2 │
└─────────┘
(1 row)

(2023-08-04 05:28:36) postgres=# select ('{"a":2, "b":"nazdar"}'::jsonb)['a']::numeric;
┌─────────┐
│ numeric │
╞═════════╡
│       2 │
└─────────┘
(1 row)

(2023-08-04 05:28:41) postgres=# select ('{"a":2, "b":"nazdar"}'::jsonb)['a']::int;
┌──────┐
│ int4 │
╞══════╡
│    2 │
└──────┘
(1 row)

when the parser iterates over the expression, it crosses ::type node first, so you have information about the target type. Currently this information is used when the parser is going back and when the source type is the same as the target type, the cast can be ignored. Probably it needs to add some flag to the operator if they are able to use this. Maybe it can be a new third argument with an expected type. So new kinds of op functions can look like opfx("any", "any", anyelement) returns anyelement. Maybe you find another possibility. It can be invisible for me (or for you) now.

It is much more work, but the benefits will be generic. I think this is an important part for container types, so partial fix is not good, and it requires a system solution. The performance is important, but without generic solutions, the complexity increases, and this is a much bigger problem.

Regards

Pavel





--
Best Regards
Andy Fan

Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:
Hi:  

can confirm the patch's jsonb_object_field_numeric is faster than
pg_catalog."numeric"(jsonb).

Thanks for the confirmation. 
 

This function is not easy to find out...

select jsonb_numeric(jsonb'{"a":11}'->'a'); --fail

jsonb_numeric is a prosrc rather than a proname,  that's why you
can't call them directly. 

select * from pg_proc where prosrc = 'jsonb_numeric';
select * from pg_proc where proname = 'jsonb_numeric';

It is bound to "numeric"(jsonb) cast, so we can call it with
a->'a'::numeric. 

    select numeric('{"a":11}'->'a'); --fail.
select "numeric"('{"a":11}'::jsonb->'a'); --ok

The double quotes look weird to me.  but it looks like  a common situation.

select numeric('1'::int); -- failed.
select "numeric"('1'::int); -- ok.

--
Best Regards
Andy Fan

Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:
Hi Tom:

On Fri, Aug 4, 2023 at 3:13 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andy Fan <zhihui.fan1213@gmail.com> writes:
>> If you use explicit cast, then the code should not be hard, in the
>> rewrite stage all information should be known.

> Can you point to me where the code is for the XML stuff?

I think Pavel means XMLTABLE, which IMO is an ugly monstrosity of
syntax --- but count on the SQL committee to do it that way :-(.

Thanks for this input! 
 

As far as the current discussion goes, I'm strongly against
introducing new functions or operators to do the same things that
we already have perfectly good syntax for.  "There's more than one
way to do it" isn't necessarily a virtue, and for sure it isn't a
virtue if people have to rewrite their existing queries to make use
of your optimization.

I agree, this is always the best/only reason I'd like to accept. 
 


I do like the idea of attaching a Simplify planner support function
to jsonb_numeric (and any other ones that seem worth optimizing)
 
I have a study planner support function today,  that looks great and
I don't think we need much work to do to get our goal, that's amzing.

For all the people who are interested in this topic, I will post a 
planner support function soon,  you can check that then.

--
Best Regards
Andy Fan

Re: Extract numeric filed in JSONB more effectively

From
Chapman Flack
Date:
On 2023-08-03 23:55, Andy Fan wrote:
> The double quotes look weird to me.  but it looks like  a common
> situation.
> 
> select numeric('1'::int); -- failed.
> select "numeric"('1'::int); -- ok.

It arises when you have an object (type, function, cast, whatever) whose
name in the catalog is the same as some SQL-standard keyword that the
parser knows. The same thing happens with the PG type named char, which has
to be spelled "char" in a query because otherwise you get the SQL standard
char type, which is different.

On 2023-08-03 09:50, Andy Fan wrote:
> I don't think this is a key conflict so far. but I'd explain this in more
> detail. If timestamp -> timestamptz or timestamptz -> timestamp is
> binary compatible,  ... however cast between timestamp and timestamptz is
> not binary compatible. whose castmethod is 'f';

This is one of those cases where the incompatibility is a semantic one.
Both types are the same number of bits and they both represent microseconds
since midnight of the "Postgres epoch", but only timestamptz is anchored
to a time zone (UTC), so unless you happen to live in that time zone, they
mean different things. To just copy the binary bits from one to the other
would be lying (unless you know that the person you are copying them for
lives in UTC).

> On Fri, Aug 4, 2023 at 3:13 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Andy Fan <zhihui.fan1213@gmail.com> writes:
>> > Can you point to me where the code is for the XML stuff?
>>
>> I think Pavel means XMLTABLE, which IMO is an ugly monstrosity of
>> syntax --- but count on the SQL committee to do it that way :-(.

Another interesting thing about XMLTABLE is that ISO defines its behavior
entirely as a big rewriting into another SQL query built out of XMLQUERY
and XMLCAST functions, and a notional XMLITERATE function that isn't
visible to a user but is a factor of the rewriting. And the definition of
XMLCAST itself is something that is sometimes rewritten to plain CAST, and
sometimes rewritten away. Almost as if they had visions of planner support
functions.

Regards,
-Chap



Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:
Hi:
 
For all the people who are interested in this topic, I will post a 
planner support function soon,  you can check that then.


The updated patch doesn't need users to change their codes and can get
better performance. Thanks for all the feedback which makes things better.

To verify there is no unexpected stuff happening, here is the performance
comparison between master and patched.

create table tb(a jsonb);
insert into tb select '{"a": true, "b": 23.3333}' from generate_series(1,
100000)i;

Master:
select 1 from tb where  (a->'b')::numeric = 1;
Time: 31.020 ms

select 1 from tb where not (a->'a')::boolean;
Time: 25.888 ms

select 1 from tb where  (a->'b')::int2 = 1;
Time: 30.138 ms

select 1 from tb where  (a->'b')::int4 = 1;
Time: 32.384 ms

select 1 from tb where  (a->'b')::int8 = 1;\
Time: 29.922 ms

select 1 from tb where  (a->'b')::float4 = 1;
Time: 54.139 ms

select 1 from tb where  (a->'b')::float8 = 1;
Time: 66.933 ms

Patched:

select 1 from tb where  (a->'b')::numeric = 1;
Time: 15.203 ms

select 1 from tb where not (a->'a')::boolean;
Time: 12.894 ms

select 1 from tb where  (a->'b')::int2 = 1;
Time: 16.847 ms

select 1 from tb where  (a->'b')::int4 = 1;
Time: 17.105 ms

select 1 from tb where  (a->'b')::int8 = 1;
Time: 16.720 ms

select 1 from tb where  (a->'b')::float4 = 1;
Time: 33.409 ms

select 1 from tb where  (a->'b')::float8 = 1;
Time: 34.660 ms

--
Best Regards
Andy Fan
Attachment

Re: Extract numeric filed in JSONB more effectively

From
Pavel Stehule
Date:
Hi

po 7. 8. 2023 v 5:04 odesílatel Andy Fan <zhihui.fan1213@gmail.com> napsal:
Hi:
 
For all the people who are interested in this topic, I will post a 
planner support function soon,  you can check that then.


The updated patch doesn't need users to change their codes and can get
better performance. Thanks for all the feedback which makes things better.

To verify there is no unexpected stuff happening, here is the performance
comparison between master and patched.

I am looking on your patch, and the message

+
+ default:
+ elog(ERROR, "cast jsonb field to %d is not supported.", targetOid);

is a little bit messy. This case should not be possible, because it is filtered by jsonb_cast_is_optimized. So the message should be changed or it needs a comment.

Regards

Pavel
 

create table tb(a jsonb);
insert into tb select '{"a": true, "b": 23.3333}' from generate_series(1,
100000)i;

Master:
select 1 from tb where  (a->'b')::numeric = 1;
Time: 31.020 ms

select 1 from tb where not (a->'a')::boolean;
Time: 25.888 ms

select 1 from tb where  (a->'b')::int2 = 1;
Time: 30.138 ms

select 1 from tb where  (a->'b')::int4 = 1;
Time: 32.384 ms

select 1 from tb where  (a->'b')::int8 = 1;\
Time: 29.922 ms

select 1 from tb where  (a->'b')::float4 = 1;
Time: 54.139 ms

select 1 from tb where  (a->'b')::float8 = 1;
Time: 66.933 ms

Patched:

select 1 from tb where  (a->'b')::numeric = 1;
Time: 15.203 ms

select 1 from tb where not (a->'a')::boolean;
Time: 12.894 ms

select 1 from tb where  (a->'b')::int2 = 1;
Time: 16.847 ms

select 1 from tb where  (a->'b')::int4 = 1;
Time: 17.105 ms

select 1 from tb where  (a->'b')::int8 = 1;
Time: 16.720 ms

select 1 from tb where  (a->'b')::float4 = 1;
Time: 33.409 ms

select 1 from tb where  (a->'b')::float8 = 1;
Time: 34.660 ms

--
Best Regards
Andy Fan

Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:
Hi Pavel: 

Thanks for the code level review!
 

I am looking on your patch, and the message

+
+ default:
+ elog(ERROR, "cast jsonb field to %d is not supported.", targetOid);

is a little bit messy. This case should not be possible, because it is filtered by jsonb_cast_is_optimized. So the message should be changed or it needs a comment.

Yes, the double check is not necessary,  that is removed in the attached v4 patch. 


--
Best Regards
Andy Fan
Attachment

Re: Extract numeric filed in JSONB more effectively

From
jian he
Date:
Hi.

+Datum
+jsonb_object_field_type(PG_FUNCTION_ARGS)
+{
+ Jsonb    *jb = PG_GETARG_JSONB_P(0);
+ text    *key = PG_GETARG_TEXT_PP(1);
+ Oid targetOid = PG_GETARG_OID(2);

compared with jsonb_numeric. I am wondering if you need a free *jb.
elog(INFO,"jb=%p arg pointer=%p ", jb, PG_GETARG_POINTER(0));
says there two are not the same.



Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:
Hi Jian:

Thanks for the review! 

compared with jsonb_numeric. I am wondering if you need a free *jb.
elog(INFO,"jb=%p arg pointer=%p ", jb, PG_GETARG_POINTER(0));
says there two are not the same.

Thanks for pointing this out,  I am not sure what to do right now.
Basically the question is that shall we free the memory which
is allocated in a function call.  The proof to do it is obvious, but the
proof to NOT do it may be usually the memory is allocated under
ExprContext  Memorycontext,  it will be reset once the current 
tuple is proceed, and MemoryContextReset will be more effective
than pfrees; 

I checked most of the functions to free its memory, besides the
ones you mentioned,  numeric_gt/ne/xxx function also free them
directly.  But the functions like jsonb_object_field_text, 
jsonb_array_element,  jsonb_array_element_text don't.

I'd like to hear more options from more experienced people,
this issue also confused me before. and I'm neutral to this now. 
after we get an agreement on this,  I will update the patch
accordingly. 

--
Best Regards
Andy Fan

Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:
Hi: 

On Mon, Aug 7, 2023 at 7:51 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:
Hi Jian:

Thanks for the review! 

compared with jsonb_numeric. I am wondering if you need a free *jb.
elog(INFO,"jb=%p arg pointer=%p ", jb, PG_GETARG_POINTER(0));
says there two are not the same.

Thanks for pointing this out,  I am not sure what to do right now.
Basically the question is that shall we free the memory which
is allocated in a function call.  The proof to do it is obvious, but the
proof to NOT do it may be usually the memory is allocated under
ExprContext  Memorycontext,  it will be reset once the current 
tuple is proceed, and MemoryContextReset will be more effective
than pfrees; 

I just found Andres's opinion on this, it looks like he would suggest
not free it [1], and the reason is similar here [2], so I would like to 
keep it as it is.

 
--
Best Regards
Andy Fan

Re: Extract numeric [field] in JSONB more effectively

From
Chapman Flack
Date:
Hi,

Looking at the most recent patch, so far I have a minor
spelling point, and a question (which I have not personally
explored).

The minor spelling point, the word 'field' has been spelled
'filed' throughout this comment (just as in the email subject):

+        /*
+         * Simplify cast(jsonb_object_filed(jsonb, filedName) as type)
+         * to jsonb_object_field_type(jsonb, filedName, targetTypeOid);
+         */

The question: the simplification is currently being applied
when the underlying operation uses F_JSONB_OBJECT_FIELD.
Are there opportunities for a similar benefit if applied
over F_JSONB_ARRAY_ELEMENT and/or F_JSONB_EXTRACT_PATH?

Regards,
-Chap



Re: Extract numeric [field] in JSONB more effectively

From
jian he
Date:


On Wed, Aug 9, 2023 at 4:30 AM Chapman Flack <chap@anastigmatix.net> wrote:
>
> Hi,
>
> Looking at the most recent patch, so far I have a minor
> spelling point, and a question (which I have not personally
> explored).
>
> The minor spelling point, the word 'field' has been spelled
> 'filed' throughout this comment (just as in the email subject):
>
> +               /*
> +                * Simplify cast(jsonb_object_filed(jsonb, filedName) as type)
> +                * to jsonb_object_field_type(jsonb, filedName, targetTypeOid);
> +                */
>
> The question: the simplification is currently being applied
> when the underlying operation uses F_JSONB_OBJECT_FIELD.
> Are there opportunities for a similar benefit if applied
> over F_JSONB_ARRAY_ELEMENT and/or F_JSONB_EXTRACT_PATH?
>
> Regards,
> -Chap


Based on most recent patch
in jsonb_object_field_type function, I made some changesneed to include  <unistd.h>. just created a C function, but didn't rebuild. then compare it with the "numeric"(jsonb) function. overall it's fast

some changes I made in jsonb_object_field_type.

uint32 i;
char *endptr;
if (JB_ROOT_IS_OBJECT(jb))
v = getKeyJsonValueFromContainer(&jb->root,
VARDATA_ANY(key),
VARSIZE_ANY_EXHDR(key),
&vbuf);
else if (JB_ROOT_IS_ARRAY(jb) && !JB_ROOT_IS_SCALAR(jb)) /* scalar element is  pseudo-array */
{
errno = 0;
char *src = text_to_cstring(key);
i = (uint32) strtoul(src, &endptr, 10);

if (endptr == src || *endptr != '\0' || errno != 0)
{
elog(ERROR,"invalid input syntax when convert to integer:");
}
// i boundary index checked inside.
v = getIthJsonbValueFromContainer(&jb->root,i);
}
else if (JB_ROOT_IS_SCALAR(jb))
{
if (!JsonbExtractScalar(&jb->root, &vbuf) || vbuf.type != jbvNumeric)
cannotCastJsonbValue(vbuf.type, "numeric");
v = &vbuf;
}
else
PG_RETURN_NULL();
---------------------------------------
The following query will return zero rows. but jsonb_object_field_type will be faster.
select jsonb_object_field_type('[1.1,2.2]'::jsonb,'1', 1700),
jsonb_object_field_type('{"1":10.2}'::jsonb,'1', 1700),
jsonb_object_field_type('10.2'::jsonb,'1', 1700)
except
select "numeric"(('[1.1,2.2]'::jsonb)[1]),
"numeric"('{"1":10.2}'::jsonb->'1'),
"numeric"('10.2'::jsonb);

how to glue it as a support function, or make it more generic needs extra thinking.

Re: Extract numeric [field] in JSONB more effectively

From
Andy Fan
Date:
Hi Chap:
 
  Thanks for the review. 
 
The minor spelling point, the word 'field' has been spelled
'filed' throughout this comment (just as in the email subject):

+               /*
+                * Simplify cast(jsonb_object_filed(jsonb, filedName) as type)
+                * to jsonb_object_field_type(jsonb, filedName, targetTypeOid);
+                */

 
Thanks for catching this, fixed in v5. 
 
The question: the simplification is currently being applied
when the underlying operation uses F_JSONB_OBJECT_FIELD.
Are there opportunities for a similar benefit if applied
over F_JSONB_ARRAY_ELEMENT and/or F_JSONB_EXTRACT_PATH?

Yes, we do have similar opportunities for both functions.  v5 attached for this.    

--
Best Regards
Andy Fan
Attachment

Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:

We'd still have functions like jsonb_field_as_numeric() under the
hood, but there's not an expectation that users call them explicitly.

To avoid the lots of functions like jsonb_field_as_int2/int4, I defined
Datum jsonb_object_field_type(.., Oid target_oid) at last,  so the
function must return "internal" or "anyelement".  Then we can see:

select jsonb_object_field_type(tb.a, 'a'::text, 1700) from tb;
ERROR:  cannot display a value of type anyelement.

The reason is clear to me, but  I'm not sure how to fix that or deserves
a fix? Or shall I define jsonb_object_field_int2/int8 to avoid this? 

This is an unresolved issue at the latest patch. 
--
Best Regards
Andy Fan

Re: Extract numeric filed in JSONB more effectively

From
Pavel Stehule
Date:


po 14. 8. 2023 v 9:06 odesílatel Andy Fan <zhihui.fan1213@gmail.com> napsal:

We'd still have functions like jsonb_field_as_numeric() under the
hood, but there's not an expectation that users call them explicitly.

To avoid the lots of functions like jsonb_field_as_int2/int4, I defined
Datum jsonb_object_field_type(.., Oid target_oid) at last,  so the
function must return "internal" or "anyelement".  Then we can see:

select jsonb_object_field_type(tb.a, 'a'::text, 1700) from tb;
ERROR:  cannot display a value of type anyelement.

you cannot to use type as parameter. There should be some typed value - like

jsonb_object_field, '{"a":10}', 'a', NULL::int)

and return type should be anyelement.

Another solution should be more deeper change like implementation of "coalesce"

 

The reason is clear to me, but  I'm not sure how to fix that or deserves
a fix? Or shall I define jsonb_object_field_int2/int8 to avoid this? 

This is an unresolved issue at the latest patch. 
--
Best Regards
Andy Fan

Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:

you cannot to use type as parameter. There should be some typed value - like

jsonb_object_field, '{"a":10}', 'a', NULL::int)

and return type should be anyelement.


So could we get the inputted type in the body of jsonb_object_field? 
I guess no.  IIUC, our goal will still be missed in this way. 

--
Best Regards
Andy Fan

Re: Extract numeric filed in JSONB more effectively

From
Pavel Stehule
Date:


po 14. 8. 2023 v 11:17 odesílatel Andy Fan <zhihui.fan1213@gmail.com> napsal:

you cannot to use type as parameter. There should be some typed value - like

jsonb_object_field, '{"a":10}', 'a', NULL::int)

and return type should be anyelement.


So could we get the inputted type in the body of jsonb_object_field? 
I guess no.  IIUC, our goal will still be missed in this way. 

why not? You can easily build null constant of any type. 

--
Best Regards
Andy Fan

Re: Extract numeric filed in JSONB more effectively

From
Chapman Flack
Date:
On 2023-08-14 03:06, Andy Fan wrote:
>> We'd still have functions like jsonb_field_as_numeric() under the
>> hood, but there's not an expectation that users call them explicitly.
> 
> To avoid the lots of functions like jsonb_field_as_int2/int4, I defined
> Datum jsonb_object_field_type(.., Oid target_oid) at last,  so the
> function must return "internal" or "anyelement".
> ...
> I'm not sure how to fix that or deserves
> a fix? Or shall I define jsonb_object_field_int2/int8 to avoid this?

As far as I'm concerned, if the intent is for this to be a function
that is swapped in by SupportRequestSimplify and not necessarily to
be called by users directly, I don't mind if users can't call it
directly. As long as there is a nice familiar jsonb function the user
can call in a nice familiar way and knows it will be handled
efficiently behind the curtain, that seems to be good enough for
the user--better, even, than having a new oddball function to
remember.

However, I believe the rule is that a function declared to return
internal must also declare at least one parameter as internal.
That way, a user won't be shown errors about displaying its
returned value, because the user won't be able to call it
in the first place, having no values of type 'internal' lying
around to pass to it. It could simply have that trailing oid
parameter declared as internal, and there you have a strictly
internal-use function.

Providing a function with return type declared internal but
with no parameter of that type is not good, because then a
user could, in principle, call it and obtain a value of
'internal' type, and so get around the typing rules that
prevent calling other internal functions.

Regards,
-Chap



Re: Extract numeric filed in JSONB more effectively

From
Tom Lane
Date:
Chapman Flack <chap@anastigmatix.net> writes:
> Providing a function with return type declared internal but
> with no parameter of that type is not good,

Not so much "not good" as "absolutely, positively WILL NOT HAPPEN".

> because then a
> user could, in principle, call it and obtain a value of
> 'internal' type, and so get around the typing rules that
> prevent calling other internal functions.

Right --- it'd completely break the system's type-safety for
other internal-using functions.

You could argue that we should never have abused "internal"
to this extent in the first place, compared to inventing a
plethora of internal-ish types to correspond to each of the
things "internal" is used for.  But here we are so we'd
better be darn careful with it.

            regards, tom lane



Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:


On Mon, Aug 14, 2023 at 10:10 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Chapman Flack <chap@anastigmatix.net> writes:
> Providing a function with return type declared internal but
> with no parameter of that type is not good,

Not so much "not good" as "absolutely, positively WILL NOT HAPPEN".

Chap is pretty nice to others:).  
 
 
> because then a
> user could, in principle, call it and obtain a value of
> 'internal' type, and so get around the typing rules that
> prevent calling other internal functions.

Right --- it'd completely break the system's type-safety for
other internal-using functions.

 
I do see something bad in opr_sanity.sql.  Pavel suggested 
get_fn_expr_argtype which can resolve this issue pretty well, so
I have changed 

jsonb_extract_xx_type(..,  Oid taget_oid) -> anyelement. 
to
jsonb_extract_xx_type(..,  anyelement) -> anyelement. 

The only bad smell left is since I want to define jsonb_extract_xx_type
as strict so I can't use   jsonb_extract_xx_type(.., NULL::a-type)
since it will be evaluated to NULL directly.  So I hacked it with

/* mock the type. */
            Const   *target =  makeNullConst(fexpr->funcresulttype,
                                             -1,
                                             InvalidOid);

/* hack the NULL attribute */
            /*                                                                                                                                                                                                                                                                
             * Since all the above functions are strict, we can't input                                                                                                                                                                                                        
             * a NULL value.                                                                                                                                                                                                                                                  
             */
            target->constisnull = false;

 jsonb_extract_xx_type just cares about the argtype, but 
'explain select xx'  will still access the const->constvalue.
const->constvalue is 0 which is set by makeNullConst currently, 
and it is ok for the current supported type. but I'm not sure 
about the future or if we still have a better solution. 

v6 is attached.  any feedback is welcome!

--
Best Regards
Andy Fan
Attachment

Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:

 jsonb_extract_xx_type just cares about the argtype, but 
'explain select xx'  will still access the const->constvalue.
const->constvalue is 0 which is set by makeNullConst currently, 
and it is ok for the current supported type. 

The exception is numeric data type, the constvalue can't be 0. 
so hack it with the below line.  maybe not good enough,  but I
have no better solution now. 

+                       Const   *target =  makeNullConst(fexpr->funcresulttype,
+                                                                                        -1,
+                                                                                        InvalidOid);
+                       /*
+                        * Since all the above functions are strict, we can't input
+                        * a NULL value.
+                        */
+                       target->constisnull = false;
+      
+                       Assert(target->constbyval || target->consttype == NUMERICOID);
+              
+                       /* Mock a valid datum for !constbyval type. */
+                       if (fexpr->funcresulttype == NUMERICOID)
+                               target->constvalue = DirectFunctionCall1(numeric_in, CStringGetDatum("0"));

--
Best Regards
Andy Fan
Attachment

Re: Extract numeric filed in JSONB more effectively

From
Pavel Stehule
Date:
Hi

út 15. 8. 2023 v 5:24 odesílatel Andy Fan <zhihui.fan1213@gmail.com> napsal:

 jsonb_extract_xx_type just cares about the argtype, but 
'explain select xx'  will still access the const->constvalue.
const->constvalue is 0 which is set by makeNullConst currently, 
and it is ok for the current supported type. 

The exception is numeric data type, the constvalue can't be 0. 
so hack it with the below line.  maybe not good enough,  but I
have no better solution now. 

+                       Const   *target =  makeNullConst(fexpr->funcresulttype,
+                                                                                        -1,
+                                                                                        InvalidOid);
+                       /*
+                        * Since all the above functions are strict, we can't input
+                        * a NULL value.
+                        */
+                       target->constisnull = false;
+      
+                       Assert(target->constbyval || target->consttype == NUMERICOID);
+              
+                       /* Mock a valid datum for !constbyval type. */
+                       if (fexpr->funcresulttype == NUMERICOID)
+                               target->constvalue = DirectFunctionCall1(numeric_in, CStringGetDatum("0"));


Personally I think this workaround is too dirty, and better to use a strict function (I believe so the overhead for NULL values is acceptable), or introduce a different mechanism.

Your design is workable, and I think acceptable, but I don't think it is an ideal or final solution. It is not really generic. It doesn't help with XML or Hstore. You need to touch cast functions, which I think is not best, because cast functions should not cooperate on optimization of execution of another function.

My idea of an ideal solution is the introduction of the possibility to use "any" pseudotype as return type with possibility to set default return type. Now, "any" is allowed only for arguments. The planner can set the expected type when it knows it, or can use the default type.

so for extraction of jsonb field we can use FUNCTION jsonb_extract_field(jsonb, text) RETURNS "any" DEFAULT jsonb

if we call SELECT jsonb_extract_field(..., 'x') -> then it returns jsonb, if we use SELECT jsonb_extract_field('...', 'x')::date, then it returns date

With this possibility we don't need to touch to cast functions, and we can simply implement similar functions for other non atomic types.



--
Best Regards
Andy Fan

Re: Extract numeric filed in JSONB more effectively

From
Pavel Stehule
Date:


út 15. 8. 2023 v 7:23 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
Hi

út 15. 8. 2023 v 5:24 odesílatel Andy Fan <zhihui.fan1213@gmail.com> napsal:

 jsonb_extract_xx_type just cares about the argtype, but 
'explain select xx'  will still access the const->constvalue.
const->constvalue is 0 which is set by makeNullConst currently, 
and it is ok for the current supported type. 

The exception is numeric data type, the constvalue can't be 0. 
so hack it with the below line.  maybe not good enough,  but I
have no better solution now. 

+                       Const   *target =  makeNullConst(fexpr->funcresulttype,
+                                                                                        -1,
+                                                                                        InvalidOid);
+                       /*
+                        * Since all the above functions are strict, we can't input
+                        * a NULL value.
+                        */
+                       target->constisnull = false;
+      
+                       Assert(target->constbyval || target->consttype == NUMERICOID);
+              
+                       /* Mock a valid datum for !constbyval type. */
+                       if (fexpr->funcresulttype == NUMERICOID)
+                               target->constvalue = DirectFunctionCall1(numeric_in, CStringGetDatum("0"));


Personally I think this workaround is too dirty, and better to use a strict function (I believe so the overhead for NULL values is acceptable), or introduce a different mechanism.

Your design is workable, and I think acceptable, but I don't think it is an ideal or final solution. It is not really generic. It doesn't help with XML or Hstore. You need to touch cast functions, which I think is not best, because cast functions should not cooperate on optimization of execution of another function.

My idea of an ideal solution is the introduction of the possibility to use "any" pseudotype as return type with possibility to set default return type. Now, "any" is allowed only for arguments. The planner can set the expected type when it knows it, or can use the default type.

so for extraction of jsonb field we can use FUNCTION jsonb_extract_field(jsonb, text) RETURNS "any" DEFAULT jsonb

if we call SELECT jsonb_extract_field(..., 'x') -> then it returns jsonb, if we use SELECT jsonb_extract_field('...', 'x')::date, then it returns date

With this possibility we don't need to touch to cast functions, and we can simply implement similar functions for other non atomic types.

this syntax can be used instead NULL::type trick

like

SELECT jsonb_populate_record('{...}')::pg_class;

instead 

SELECT jsonb_populate_record(NULL::pg_class, '{...}')

 



--
Best Regards
Andy Fan

Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:

My idea of an ideal solution is the introduction of the possibility to use "any" pseudotype as return type with possibility to set default return type. Now, "any" is allowed only for arguments. The planner can set the expected type when it knows it, or can use the default type.

so for extraction of jsonb field we can use FUNCTION jsonb_extract_field(jsonb, text) RETURNS "any" DEFAULT jsonb

Is this an existing framework or do you want to create something new? 

if we call SELECT jsonb_extract_field(..., 'x') -> then it returns jsonb, if we use SELECT jsonb_extract_field('...', 'x')::date, then it returns date

If so, what is the difference from the current  jsonb->'f'   and (jsonb->'f' )::date?  

With this possibility we don't need to touch to cast functions, and we can simply implement similar functions for other non atomic types.
 
What do you mean by "atomic type" here?   If you want to introduce some new framework,  I think we need a very clear benefit.  

--
Best Regards
Andy Fan

Re: Extract numeric filed in JSONB more effectively

From
Pavel Stehule
Date:


út 15. 8. 2023 v 8:04 odesílatel Andy Fan <zhihui.fan1213@gmail.com> napsal:

My idea of an ideal solution is the introduction of the possibility to use "any" pseudotype as return type with possibility to set default return type. Now, "any" is allowed only for arguments. The planner can set the expected type when it knows it, or can use the default type.

so for extraction of jsonb field we can use FUNCTION jsonb_extract_field(jsonb, text) RETURNS "any" DEFAULT jsonb
 
Is this an existing framework or do you want to create something new? 

This should be created
 

if we call SELECT jsonb_extract_field(..., 'x') -> then it returns jsonb, if we use SELECT jsonb_extract_field('...', 'x')::date, then it returns date

If so, what is the difference from the current  jsonb->'f'   and (jsonb->'f' )::date?  

a) effectiveness. The ending performance should be similar like your current patch, but without necessity to use planner support API.

b) more generic usage. For example, the expressions in plpgsql are executed a little bit differently than SQL queries. So there the optimization from your patch probably should not work, because you can write only var := j->'f', and plpgsql forces cast function execution, but not via planner.

c) nothing else. It should not to require to modify cast function definitions
 


With this possibility we don't need to touch to cast functions, and we can simply implement similar functions for other non atomic types.
 
What do you mean by "atomic type" here?   If you want to introduce some new framework,  I think we need a very clear benefit.  

Atomic types (skalar types like int, varchar, date), nonatomic types - array, composite, xml, jsonb, hstore or arrays of composite types.

 

--
Best Regards
Andy Fan

Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:


a) effectiveness. The ending performance should be similar like your current patch, but without necessity to use planner support API.
 
So the cost is we need to create a new & different framework.  
 
b) because you can write only var := j->'f', and plpgsql forces cast function execution, but not via planner.
 
var a := 1 needs going with planner,  IIUC,  same with j->'f'.  

c) nothing else. It should not to require to modify cast function definitions
 
If you look at how the planner support function works,  that is
pretty simple,  just modify the prosupport attribute. I'm not sure
this should be called an issue or avoiding it can be described
as a benefit. 

I don't think the current case is as bad as the other ones like
users needing to modify their queries or type-safety system
being broken. So personally I'm not willing to creating some
thing new & heavy. However I'm open to see what others say.  
 
--
Best Regards
Andy Fan

Re: Extract numeric filed in JSONB more effectively

From
Pavel Stehule
Date:


út 15. 8. 2023 v 9:05 odesílatel Andy Fan <zhihui.fan1213@gmail.com> napsal:


a) effectiveness. The ending performance should be similar like your current patch, but without necessity to use planner support API.
 
So the cost is we need to create a new & different framework.  

yes, it can be less work, code than for example introduction of "anycompatible". 
 
 
b) because you can write only var := j->'f', and plpgsql forces cast function execution, but not via planner.
 
var a := 1 needs going with planner,  IIUC,  same with j->'f'.  

i was wrong, the planner is full, but the executor is reduced.

 

c) nothing else. It should not to require to modify cast function definitions
 
If you look at how the planner support function works,  that is
pretty simple,  just modify the prosupport attribute. I'm not sure
this should be called an issue or avoiding it can be described
as a benefit. 

I don't think the current case is as bad as the other ones like
users needing to modify their queries or type-safety system
being broken. So personally I'm not willing to creating some
thing new & heavy. However I'm open to see what others say.  

ok

regards

Pavel
 
 
--
Best Regards
Andy Fan

Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:


On Tue, Aug 15, 2023 at 1:24 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

út 15. 8. 2023 v 5:24 odesílatel Andy Fan <zhihui.fan1213@gmail.com> napsal:

 jsonb_extract_xx_type just cares about the argtype, but 
'explain select xx'  will still access the const->constvalue.
const->constvalue is 0 which is set by makeNullConst currently, 
and it is ok for the current supported type. 

The exception is numeric data type, the constvalue can't be 0. 
so hack it with the below line.  maybe not good enough,  but I
have no better solution now. 

+                       Const   *target =  makeNullConst(fexpr->funcresulttype,
+                                                                                        -1,
+                                                                                        InvalidOid);
+                       /*
+                        * Since all the above functions are strict, we can't input
+                        * a NULL value.
+                        */
+                       target->constisnull = false;
+      
+                       Assert(target->constbyval || target->consttype == NUMERICOID);
+              
+                       /* Mock a valid datum for !constbyval type. */
+                       if (fexpr->funcresulttype == NUMERICOID)
+                               target->constvalue = DirectFunctionCall1(numeric_in, CStringGetDatum("0"));


Personally I think this workaround is too dirty, and better to use a strict function (I believe so the overhead for NULL values is acceptable).

In the patch v8,  I created a new routine named makeDummyConst,
which just sits by makeNullConst. It may be helpful to some extent.
a).  The code is self-document for the user/reader.  b).  We have a
central place to maintain this routine. 

Besides the framework,  the troubles for the reviewer may be if the
code has some corner case issue or behavior changes. Especially
I have some code refactor when working on jsonb_extract_path.
so the attached test.sql is designed for this.  I have compared the
result between master and patched version and I think reviewer
can do some extra testing with it.

v8 is the finished version in my mind, so I think it is ready for review now. 

--
Best Regards
Andy Fan
Attachment

Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:
update with the correct patch.. 
Attachment

Re: Extract numeric filed in JSONB more effectively

From
jian he
Date:
On Wed, Aug 16, 2023 at 2:28 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:
>
> update with the correct patch..

regression=# select proname, pg_catalog.pg_get_function_arguments(oid)
from pg_proc
where proname =  'jsonb_extract_path_type';
         proname         |                     pg_get_function_arguments
-------------------------+--------------------------------------------------------------------
 jsonb_extract_path_type | from_json jsonb, VARIADIC path_elems
text[], target_oid anyelement
(1 row)

VARIADIC should be the last argument?

select jsonb_array_element_type(jsonb'[1231]',0, null::int);
now return null.
Should it return 1231?

regression=# select jsonb_array_element_type(jsonb'1231',0, 1::int);
 jsonb_array_element_type
--------------------------
                     1231
(1 row)

not sure if it's ok. if you think it's not ok then:
+ if (!JB_ROOT_IS_ARRAY(jb))
+PG_RETURN_NULL();
change to
+if (JB_ROOT_IS_SCALAR(jb) || !JB_ROOT_IS_ARRAY(jb))
+PG_RETURN_NULL();

select jsonb_array_element_type(jsonb'[1231]',0, '1'::jsonb);
will crash, because jsonb_array_element_type call
cast_jsonbvalue_to_type then in switch case, it will go to
default part. in default part you have Assert(false);
also in cast_jsonbvalue_to_type, PG_RETURN_POINTER(NULL) code won't be reached.

in jsonb_cast_support function. you already have
!jsonb_cast_is_optimized(fexpr->funcresulttype)). then in the default
branch of cast_jsonbvalue_to_type, you can just elog(error, "can only
cast to xxx type"). jsonb_array_element_type, jsonb_object_field_type,
 third argument is anyelement. so targetOid can be any datatype's oid.



Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:
Hi jian:

On Thu, Aug 17, 2023 at 12:32 AM jian he <jian.universality@gmail.com> wrote:
On Wed, Aug 16, 2023 at 2:28 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:
>
> update with the correct patch..

regression=# select proname, pg_catalog.pg_get_function_arguments(oid)
from pg_proc
where proname =  'jsonb_extract_path_type';
         proname         |                     pg_get_function_arguments
-------------------------+--------------------------------------------------------------------
 jsonb_extract_path_type | from_json jsonb, VARIADIC path_elems
text[], target_oid anyelement
(1 row)

VARIADIC should be the last argument?

Currently if users call this function directly(usually I don't  think
so), they will get something wrong.   This issue is fixed in the 
v9 version.  To keep the consistency among all the functions, 
I moved the 'target_type anyelement' to the 1st argument. 
Thanks for the report!
 
select jsonb_array_element_type(jsonb'[1231]',0, null::int);
now return null.
Should it return 1231?

No, this is by design. the function is declared as strict, so
any NULL inputs yield a NULL output.  That's just what we
talked above (the markDummyConst section).  I don't
think this should be addressed.
 
select jsonb_array_element_type(jsonb'[1231]',0, '1'::jsonb);
will crash

OK,  looks I didn't pay enough attention to the 'user directly call
jsonb_xx_type' function, so I changed the code in v9 based on
your suggestion. 

Thanks for the review,  v9 attached!

--
Best Regards
Andy Fan
Attachment

Re: Extract numeric filed in JSONB more effectively

From
Chapman Flack
Date:
On 2023-08-17 05:07, Andy Fan wrote:
> Thanks for the review,  v9 attached!

 From the earliest iterations of this patch, I seem to recall
a couple of designs being considered:

In one, the type-specific cast function would only be internally
usable, would take a type oid as an extra parameter (supplied in
the SupportRequestSimplify rewriting), and would have to be
declared with some nonspecific return type; 'internal' was
mentioned.

The idea of an 'internal' return type with no 'internal' parameter
was quickly and rightly shot down. But it would have seemed to me
enough to address that objection by using 'internal' also in its
parameter list. I could imagine a function declared with two
'internal' parameters, one understood to be a JsonbValue and one
understood to be a type oid, and an 'internal' result, treated in
the rewritten expression tree as binary-coercible to the desired
result.

Admittedly, I have not tried to implement that myself to see
what unexpected roadblocks might exist on that path. Perhaps
there are parts of that rewriting that no existing node type
can represent? Someone more familiar with those corners of
PostgreSQL may immediately see other difficulties I do not.

But I have the sense that that approach was abandoned early, in
favor of the current approach using user-visible polymorphic
types, and supplying typed dummy constants for use in the
resolution of those types, with a new function introduced to create
said dummy constants, including allocation and input conversion
in the case of numeric, just so said dummy constants can be
passed into functions that have no use for them other than to
call get_fn_expr_argtype to recover the type oid, which was the
only thing needed in the first place.

Compared to the initial direction I thought this was going,
none of that strikes me as better.

Nothing makes my opinion authoritative here, and there may
indeed be reasons it is better, known to others more familiar
with that code than I am. But it bugs me.

If the obstacles to the earlier approach came down to needing
a new type of expression node, something like "assertion of
'internal'-to-foo binary coercibility, vouched by a prosupport
function", would that be a bad thing?

Regards,
-Chap



Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:
Hi Chapman, 

Thanks for the review!

The idea of an 'internal' return type with no 'internal' parameter
was quickly and rightly shot down.

Yes, it mainly breaks the type-safety system.  Parser need to know
the result type, so PG defines the rule like this:

anyelement fn(anyment in); 

if the exprType(in) in the query tree is X, then PG would think fn
return type X.  that's why we have to have an anyelement in the
input. 
 
But it would have seemed to me
enough to address that objection by using 'internal' also in its
parameter list. I could imagine a function declared with two
'internal' parameters, one understood to be a JsonbValue and one
understood to be a type oid, and an 'internal' result, treated in
the rewritten expression tree as binary-coercible to the desired
result.

I have some trouble understanding this.  are you saying something
like:

internal fn(internal jsonValue,  internal typeOid)?  

If so, would it break the type-safety system?  And I'm not pretty
sure the 'binary-coercible' here.  is it same as the 'binary-coercible'
in "timestamp is not binary coercible with timestamptz since..."?
I have a strong feeling that I think I misunderstood you here. 
 
Perhaps there are parts of that rewriting that no existing node type
can represent? 
 
I didn't understand this as well:(:( 

But I have the sense that that approach was abandoned early, in
favor of the current approach using user-visible polymorphic
types, and supplying typed dummy constants for use in the
resolution of those types, with a new function introduced to create
said dummy constants, including allocation and input conversion
in the case of numeric, just so said dummy constants can be
passed into functions that have no use for them other than to
call get_fn_expr_argtype to recover the type oid, which was the
only thing needed in the first place.

Yes,  but if we follow the type-safety system, we can't simply input
a Oid targetOid, then there are some more considerations here:  
a).  we can't use the makeNullConst because jsonb_xxx_type is
strict,  so if we have NULL constant input here,  the PG system
will return NULL directly.  b).  Not only the type oid is the thing
We are interested in the  const.constvalue is as well since 
'explain select xxxx'  to access it to show it as a string.  
Datum(0) as the constvalue will crash in this sense.  That's why
makeDummyConst was introduced. 
 
something like "assertion of
'internal'-to-foo binary coercibility, vouched by a prosupport
function", would that be a bad thing?

I can't follow this as well.  Could you provide the function prototype
here? 

--
Best Regards
Andy Fan

Re: Extract numeric filed in JSONB more effectively

From
Chapman Flack
Date:
On 2023-08-17 21:14, Andy Fan wrote:
>> The idea of an 'internal' return type with no 'internal' parameter
>> was quickly and rightly shot down.
> 
> Yes, it mainly breaks the type-safety system.  Parser need to know
> the result type, so PG defines the rule like this:

Well, the reason "internal return type with no internal parameter type"
was shot down was more specific: if there is such a function in the
catalog, an SQL user can call it, and then its return type is a value
typed 'internal', and with that the SQL user could call other
functions with 'internal' parameters, and that's what breaks type
safety. The specific problem is not having at least one 'internal'
input parameter.

There are lots of functions in the catalog with internal return type
(I count 111). They are not inherently bad; the rule is simply that
each one also needs at least one IN parameter typed internal, to
make sure it can't be directly called from SQL.

> anyelement fn(anyment in);
> 
> if the exprType(in) in the query tree is X, then PG would think fn
> return type X.  that's why we have to have an anyelement in the
> input.

That's a consequence of the choice to have anyelement as the return
type, though. A different choice wouldn't have that consequence.

> I have some trouble understanding this.  are you saying something
> like:
> 
> internal fn(internal jsonValue,  internal typeOid)?
> 
> If so, would it break the type-safety system?

That is what I'm saying, and it doesn't break type safety at the
SQL level, because as long as it has parameters declared internal,
no SQL can ever call it. So it can only appear in an expression
tree because your SupportRequestSimplify put it there properly
typed, after the SQL query was parsed but before evaluation.

The thing about 'internal' is it doesn't represent any specific
type, it doesn't necessarily represent the same type every time it
is mentioned, and it often means something that isn't a cataloged
type at all, such as a pointer to some kind of struct. There must be
documentation explaining what it has to be. For example, your
jsonb_cast_support function has an 'internal' parameter and
'internal' return type. From the specification for support
functions, you know the 'internal' for the parameter type means
"one of the Node structs in supportnodes.h", and the 'internal'
for the return type means "an expression tree semantically
equivalent to the FuncExpr".

So, in addition to declaring
internal fn(internal jsonValue,  internal typeOid), you would
have to write a clear spec that jsonValue has to be a JsonbValue,
typeOid has to be something you can call DatumGetObjectId on,
and the return value should be a Datum in proper form
corresponding to typeOid. And, of course, generate the expression
tree so all of that is true when it's evaluated.

>> Perhaps there are parts of that rewriting that no existing node type
>> can represent?

The description above was in broad strokes. Because I haven't
tried to implement this, I don't know whether some roadblock would
appear, such as, is it hard to make a Const node of type internal
and containing an oid? Or, what sort of node must be inserted to
clarify that the 'internal' return is actually a Datum of the
expected type? By construction, we know that it is, but how to
make that explicit in the expression tree?

> a).  we can't use the makeNullConst because jsonb_xxx_type is
> strict,  so if we have NULL constant input here,  the PG system
> will return NULL directly.  b).  Not only the type oid is the thing
> We are interested in the  const.constvalue is as well since
> 'explain select xxxx'  to access it to show it as a string.
> Datum(0) as the constvalue will crash in this sense.  That's why
> makeDummyConst was introduced.

Again, all of that complication stems from the choice to use the
anyelement return type and rely on polymorphic type resolution
to figure the oid out, when we already have the oid to begin with
and the oid is all we want.

>> something like "assertion of
>> 'internal'-to-foo binary coercibility, vouched by a prosupport
>> function", would that be a bad thing?
> 
> I can't follow this as well.

That was just another way of saying what I was getting at above
about what's needed in the expression tree to indicate that the
'internal' produced by this function is, in fact, really a bool
(or whatever). We know that it is, but perhaps the expression
tree will be considered ill-formed without a node that says so.
A node representing a no-op, binary conversion would suffice,
but is there already a node that's allowed to represent an
internal-to-bool no-op cast?

If there isn't, one might have to be invented. So it might be that
if we go down the "use polymorphic resolution" road, we have to
invent dummy Consts, and down the "internal" road we also have to
invent something, like the "no-op cast considered correct because
a SupportRequestSimplify function put it here" node.

If it came down to having to invent one of those things or the
other, I'd think the latter more directly captures what we really
want to do.

(And I'm not even sure anything has to be invented. If there's an
existing node for no-op binary casts, I think I'd first try
putting that there and see if anything complains.)

If this thread is being followed by others more familiar with
the relevant code or who see obvious problems I'm missing,
please chime in!

Regards,
-Chap



Re: Extract numeric filed in JSONB more effectively

From
jian he
Date:
On Fri, Aug 18, 2023 at 10:55 AM Chapman Flack <chap@anastigmatix.net> wrote:
>
>
> Again, all of that complication stems from the choice to use the
> anyelement return type and rely on polymorphic type resolution
> to figure the oid out, when we already have the oid to begin with
> and the oid is all we want.
>

you want jsonb_object_field_type(internal, jsonb, text)? because on
sql level, it's safe.

The return data type is determined when we are in jsonb_cast_support.
we just need to pass the {return data type} information to the next
function: jsonb_object_field_type.



Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:



because as long as it has parameters declared internal,
no SQL can ever call it.
 
I was confused about the difference between anyelement and
internal, and I want to know a way to create a function which
is disallowed to be called by the user.  Your above words 
resolved two questions of mine!

So it can only appear in an expression
tree because your SupportRequestSimplify put it there properly
typed, after the SQL query was parsed but before evaluation.

The thing about 'internal' is it doesn't represent any specific
type, it doesn't necessarily represent the same type every time it
is mentioned, and it often means something that isn't a cataloged
type at all, such as a pointer to some kind of struct.

I should have noticed this during the study planner support function,
but highlighting this is pretty amazing. 
 
If there isn't, one might have to be invented. So it might be that
if we go down the "use polymorphic resolution" road, we have to
invent dummy Consts, and down the "internal" road we also have to
invent something.

I think you might already feel that putting an internal function
into an expression would cause something wrong.  I just have
a quick hack on this, and crash happens at the simplest case. 
If something already exists to fix this, I am inclined 
to use 'internal', but I didn't find the way.  I'm thinking if we
should clarify "internal" should only be used internally and
should never be used in expression by design? 
 
(And I'm not even sure anything has to be invented. If there's an
existing node for no-op binary casts, I think I'd first try
putting that there and see if anything complains.)

If this thread is being followed by others more familiar with
the relevant code or who see obvious problems I'm missing,
please chime in!

Thank you wise & modest gentleman,  I would really hope Tom can
chime in at this time. 

In general,  the current decision we need to make is shall we use
'internal' or 'anyelement' to present the target OID.  the internal way
would be more straight but have troubles to be in the expression tree.
the 'anyelement'  way is compatible with expression, but it introduces
the makeDummyConst overhead and I'm not pretty sure it is a correct
implementation in makeDummyConst. see the XXX part. 

+/*
+ * makeDummyConst
+ *      create a Const node with the specified type/typmod.
+ *
+ * This is a convenience routine to create a Const which only the
+ * type is interested but make sure the value is accessible.
+ */
+Const *
+makeDummyConst(Oid consttype, int32 consttypmod, Oid constcollid)
+{
+       int16           typLen;
+       bool            typByVal;
+       Const           *c;
+       Datum           val = 0;
+
+
+       get_typlenbyval(consttype, &typLen, &typByVal);
+
+       if (consttype == NUMERICOID)
+               val = DirectFunctionCall1(numeric_in, CStringGetDatum("0"));
+       else if (!typByVal)
+               elog(ERROR, "create dummy const for type %u is not supported.", consttype);
+
+       /* XXX: here I assume constvalue=0 is accessible for const by value type.*/
+       c = makeConst(consttype, consttypmod, 0, (int) typLen, val, false, typByVal);
+
+       return c;
+}

--
Best Regards
Andy Fan
Attachment

Re: Extract numeric filed in JSONB more effectively

From
Chapman Flack
Date:
On 2023-08-18 03:41, Andy Fan wrote:
> I just have
> a quick hack on this, and crash happens at the simplest case.

If I build from this patch, this test:

SELECT (test_json -> 0)::int4, test_json -> 0 FROM test_jsonb WHERE 
json_type = 'scalarint';

fails like this:

Program received signal SIGSEGV, Segmentation fault.
convert_saop_to_hashed_saop_walker (node=0x17, context=0x0)
     at 
/var/tmp/nohome/pgbuildh/../postgresql/src/backend/optimizer/util/clauses.c:2215
2215        if (IsA(node, ScalarArrayOpExpr))

(gdb) p node
$1 = (Node *) 0x17

So the optimizer is looking at some node to see if it is a
ScalarArrayOpExpr, but the node has some rather weird address.

Or maybe it's not that weird. 0x17 is 23, and so is:

select 'int4'::regtype::oid;
  oid
-----
   23

See what happened?

+ int64 target_typ = fexpr->funcresulttype;
...
+ fexpr->args = list_insert_nth(fexpr->args, 0, (void *) target_typ);

This is inserting the desired result type oid directly as the first
thing in the list of fexpr's args.

But at the time your support function is called, nothing is being
evaluated yet. You are just manipulating a tree of expressions to
be evaluated later, and you want fexpr's first arg to be an
expression that will produce this type oid later, when it is
evaluated.

A constant would do nicely:

+ Const    *target  = makeConst(
    INTERNALOID, -1, InvalidOid, SIZEOF_DATUM,
    ObjectIdGetDatum(fexpr->funcresulttype), false, true);
+ fexpr->args = list_insert_nth(fexpr->args, 0, target);

With that change, it doesn't segfault, but it does do this:

ERROR:  cast jsonb to type 0 is not allowed

and that's because of this:

+ Oid            targetOid = DatumGetObjectId(0);

The DatumGetFoo(x) macros are for when you already have the Datum
(it's x) and you know it's a Foo. So this is just setting targetOid
to zero. When you want to get something from function argument 0 and
you know that's a Foo, you use a PG_GETARG_FOO(argno) macro (which
amounts to PG_GETARG_DATUM(argno) followed by DatumGetFoo.

So, with

+ Oid            targetOid = PG_GETARG_OID(0);

SELECT (test_json -> 0)::int4, test_json -> 0 FROM test_jsonb WHERE 
json_type = 'scalarint';
  int4 | ?column?
------+----------
     2 | 2

However, EXPLAIN is sad:

ERROR:  cannot display a value of type internal

and that may be where this idea runs aground.

Now, I was expecting something to complain about the result of
jsonb_array_element_type, and that didn't happen. We rewrote
a function that was supposed to be a cast to int4, and
replaced it with a function returning internal, and evaluation
happily just took that as the int4 that the next node expected.

If something had complained about that, it might have been
necessary to insert some new node above the internal-returning
function to say the result was really int4. Notice there is a
makeRelabelType() for that. (I had figured there probably was,
but didn't know its exact name.)

So it doesn't seem strictly necessary to do that, but it might
make the EXPLAIN result look better (if EXPLAIN were made to work,
of course).

Now, my guess is EXPLAIN is complaining when it sees the Const
of type internal, and doesn't know how to show that value.
Perhaps makeRelabelType is the answer there, too: what if the
Const has Oid type, so EXPLAIN can show it, and what's inserted
as the function argument is a relabel node saying it's internal?

Haven't tried that yet.

Regards,
-Chap



Re: Extract numeric filed in JSONB more effectively

From
Chapman Flack
Date:
On 2023-08-18 14:50, Chapman Flack wrote:
> Now, my guess is EXPLAIN is complaining when it sees the Const
> of type internal, and doesn't know how to show that value.
> Perhaps makeRelabelType is the answer there, too: what if the
> Const has Oid type, so EXPLAIN can show it, and what's inserted
> as the function argument is a relabel node saying it's internal?

Simply changing the Const to be of type Oid makes EXPLAIN happy,
and nothing ever says "hey, why are you passing this oid for an
arg that wants internal?". This is without adding any relabel
nodes anywhere.

  Seq Scan on pg_temp.test_jsonb
    Output: pg_catalog.jsonb_array_element_type('23'::oid, test_json, 0), 
(test_json -> 0)
    Filter: (test_jsonb.json_type = 'scalarint'::text)

Nothing in that EXPLAIN output to make you think anything weird
was going on, unless you went and looked up jsonb_array_element_type
and saw that its arg0 isn't oid and its return type isn't int4.

But I don't know that adding relabel nodes wouldn't still be
the civilized thing to do.

Regards,
-Chap



Re: Extract numeric filed in JSONB more effectively

From
Chapman Flack
Date:
On 2023-08-18 15:08, Chapman Flack wrote:
> But I don't know that adding relabel nodes wouldn't still be
> the civilized thing to do.

Interestingly, when I relabel both places, like this:

     Oid   targetOid = fexpr->funcresulttype;
     Const *target  = makeConst(
       OIDOID, -1, InvalidOid, sizeof(Oid),
       ObjectIdGetDatum(targetOid), false, true);
     RelabelType *rTarget = makeRelabelType((Expr *)target,
       INTERNALOID, -1, InvalidOid, COERCE_IMPLICIT_CAST);
     fexpr->funcid = new_func_id;
     fexpr->args = opexpr->args;
     fexpr->args = list_insert_nth(fexpr->args, 0, rTarget);
     expr = (Expr *)makeRelabelType((Expr *)fexpr,
       targetOid, -1, InvalidOid, COERCE_IMPLICIT_CAST);
   }
   PG_RETURN_POINTER(expr);

EXPLAIN looks like this:

  Seq Scan on pg_temp.test_jsonb
    Output: jsonb_array_element_type(('23'::oid)::internal, test_json, 
0), (test_json -> 0)
    Filter: (test_jsonb.json_type = 'scalarint'::text)

With COERCE_IMPLICIT_CAST both places, the relabeling of the
function result is invisible, but the relabeling of the argument
is visible.

With the second one changed to COERCE_EXPLICIT_CAST:

  Seq Scan on pg_temp.test_jsonb
    Output: (jsonb_array_element_type(('23'::oid)::internal, test_json, 
0))::integer, (test_json -> 0)
    Filter: (test_jsonb.json_type = 'scalarint'::text)

then both relabelings are visible.

I'm not sure whether one way is better than the other, or whether
it is even important to add the relabel nodes at all, as nothing
raises an error without them. As a matter of taste, it seems like
a good idea though.

Regards,
-Chap



Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:


On Sat, Aug 19, 2023 at 3:09 AM Chapman Flack <chap@anastigmatix.net> wrote:
On 2023-08-18 14:50, Chapman Flack wrote:
> Now, my guess is EXPLAIN is complaining when it sees the Const
> of type internal, and doesn't know how to show that value.
> Perhaps makeRelabelType is the answer there, too: what if the
> Const has Oid type, so EXPLAIN can show it, and what's inserted
> as the function argument is a relabel node saying it's internal?
 
Simply changing the Const to be of type Oid makes EXPLAIN happy,
and nothing ever says "hey, why are you passing this oid for an
arg that wants internal?". This is without adding any relabel
nodes anywhere.


Highlighting the user case of makeRelableType is interesting! But using 
the Oid directly looks more promising for this question IMO, it looks like:
"you said we can put anything in this arg,  so I put an OID const here",
seems nothing is wrong.  Compared with the makeRelableType method,
I think the current method is more straightforward.  Compared with 
anyelement, it avoids the creation of makeDummyConst which I'm not 
sure the implementation is alway correct.  So I am pretty inclined to this way!

v10 attached. 

--
Best Regards
Andy Fan
Attachment

Re: Extract numeric filed in JSONB more effectively

From
Chapman Flack
Date:
On 2023-08-20 21:31, Andy Fan wrote:
> Highlighting the user case of makeRelableType is interesting! But using
> the Oid directly looks more promising for this question IMO, it looks 
> like:
> "you said we can put anything in this arg,  so I put an OID const 
> here",
> seems nothing is wrong.

Perhaps one of the more senior developers will chime in, but to me,
leaving out the relabel nodes looks more like "all of PostgreSQL's
type checking happened before the SupportRequestSimplify, so nothing
has noticed that we rewrote the tree with mismatched types, and as
long as nothing crashes we sort of got away with it."

Suppose somebody writes an extension to double-check that plan
trees are correctly typed. Or improves EXPLAIN to check a little more
carefully than it seems to. Omitting the relabel nodes could spell
trouble then.

Or, someone more familiar with the code than I am might say "oh,
mismatches like that are common in rewritten trees, we live with it."
But unless somebody tells me that, I'm not believing it.

But I would say we have proved the concept of SupportRequestSimplify
for this task. :)

Now, it would make me happy to further reduce some of the code
duplication between the original and the _type versions of these
functions. I see that you noticed the duplication in the case of
jsonb_extract_path, and you factored out jsonb_get_jsonbvalue so
it could be reused. There is also some duplication with object_field
and array_element. (Also, we may have overlooked jsonb_path_query
and jsonb_path_query_first as candidates for the source of the
cast. Two more candidates; five total.)

Here is one way this could be structured. Observe that every one
of those five source candidates operates in two stages:

Start: All of the processing until a JsonbValue has been obtained.
Finish: Converting the JsonbValue to some form for return.

Before this patch, there were two choices for Finish:
JsonbValueToJsonb or JsonbValueAsText.

With this patch, there are four Finish choices: those two, plus
PG_RETURN_BOOL(v->val.boolean), PG_RETURN_NUMERIC(v->val.numeric).

Clearly, with rewriting, we can avoid 5✕4 = 20 distinct
functions. The five candidate functions only differ in Start.
Suppose each of those had a _start version, like
jsonb_object_field_start, that only proceeds as far as
obtaining the JsonbValue, and returns that directly (an
'internal' return type). Naturally, each _start function would
need an 'internal' parameter also, even if it isn't used,
just to make sure it is not SQL-callable.

Now consider four Finish functions: jsonb_finish_jsonb,
jsonb_finish_text, jsonb_finish_boolean, jsonb_finish_numeric.

Each would have one 'internal' parameter (a JsonbValue), and
its return type declared normally. There is no need to pass
a type oid to any of these, and they need not contain any
switch to select a return type. The correct finisher to use
is simply chosen once at the time of rewriting.

So cast(jsonb_array_element(jb, 0) as numeric) would just get
rewritten as jsonb_finish_numeric(jsonb_array_element_start(jb,0)).

The other (int and float) types don't need new code; just have
the rewriter add a cast-from-numeric node on top. That's all
those other switch cases in cast_jsonbvalue_to_type are doing,
anyway.

Notice in this structure, less relabeling is needed. The
final return does not need relabeling, because each finish
function has the expected return type. Each finish function's
parameter is typed 'internal' (a JsonbValue), but that's just
what each start function returns, so no relabeling needed
there either.

The rewriter will have to supply some 'internal' constant
as a start-function parameter (because of the necessary
'internal' parameter). It might still be civilized to relabel
that.

Regards,
-Chap



Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:

Interestingly, when I relabel both places, like this:

     Oid   targetOid = fexpr->funcresulttype;
     Const *target  = makeConst(
       OIDOID, -1, InvalidOid, sizeof(Oid),
       ObjectIdGetDatum(targetOid), false, true);
     RelabelType *rTarget = makeRelabelType((Expr *)target,
       INTERNALOID, -1, InvalidOid, COERCE_IMPLICIT_CAST);
     fexpr->funcid = new_func_id;
     fexpr->args = opexpr->args;
     fexpr->args = list_insert_nth(fexpr->args, 0, rTarget);
     expr = (Expr *)makeRelabelType((Expr *)fexpr,
       targetOid, -1, InvalidOid, COERCE_IMPLICIT_CAST);
   }
   PG_RETURN_POINTER(expr);

EXPLAIN looks like this:

  Seq Scan on pg_temp.test_jsonb
    Output: jsonb_array_element_type(('23'::oid)::internal, test_json,
0), (test_json -> 0)
    Filter: (test_jsonb.json_type = 'scalarint'::text)

With COERCE_IMPLICIT_CAST both places, the relabeling of the
function result is invisible, but the relabeling of the argument
is visible.


I think this is because get_rule_expr's showimplicit is always
true for args in this case, checking the implementation of 
get_rule_expr, I found PG behavior like this in many places.

--
Best Regards
Andy Fan

Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:
(Just relalized this was sent to chap in private, resent it again). 

On Mon, Aug 21, 2023 at 6:50 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:


On Mon, Aug 21, 2023 at 11:19 AM Chapman Flack <chap@anastigmatix.net> wrote:
On 2023-08-20 21:31, Andy Fan wrote:
> Highlighting the user case of makeRelableType is interesting! But using
> the Oid directly looks more promising for this question IMO, it looks
> like:
> "you said we can put anything in this arg,  so I put an OID const
> here",
> seems nothing is wrong.

Perhaps one of the more senior developers will chime in, but to me,
leaving out the relabel nodes looks more like "all of PostgreSQL's
type checking happened before the SupportRequestSimplify, so nothing
has noticed that we rewrote the tree with mismatched types, and as
long as nothing crashes we sort of got away with it."

Suppose somebody writes an extension to double-check that plan
trees are correctly typed. Or improves EXPLAIN to check a little more
carefully than it seems to. Omitting the relabel nodes could spell
trouble then.

Or, someone more familiar with the code than I am might say "oh,
mismatches like that are common in rewritten trees, we live with it."
But unless somebody tells me that, I'm not believing it.

Well, this sounds long-lived.  I kind of prefer to label it now.  Adding
the 3rd commit to relabel the arg and return value. 
 
But I would say we have proved the concept of SupportRequestSimplify
for this task. :)

Yes,  this is great! 
 
Now, it would make me happy to further reduce some of the code
duplication between the original and the _type versions of these
functions. I see that you noticed the duplication in the case of
jsonb_extract_path, and you factored out jsonb_get_jsonbvalue so
it could be reused. There is also some duplication with object_field
and array_element.

Yes, compared with jsonb_extract_path,  object_field and array_element
just have much less duplication, which are 2 lines and 6 lines separately. 
 
(Also, we may have overlooked jsonb_path_query
and jsonb_path_query_first as candidates for the source of the
cast. Two more candidates; five total.)

I can try to add them at the same time when we talk about the
infrastruct,  thanks for the hint! 


Here is one way this could be structured. Observe that every one
of those five source candidates operates in two stages:

I'm not very excited with this manner, reasons are: a).  It will have
to emit more steps in ExprState->steps which will be harmful for
execution. The overhead  is something I'm not willing to afford.
b). this manner requires more *internal*, which is kind of similar
to "void *"  in C.  Could you explain more about the benefits of this? 

--
Best Regards
Andy Fan


--
Best Regards
Andy Fan
Attachment

Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:

Perhaps one of the more senior developers will chime in, but to me,
leaving out the relabel nodes looks more like "all of PostgreSQL's
type checking happened before the SupportRequestSimplify, so nothing
has noticed that we rewrote the tree with mismatched types, and as
long as nothing crashes we sort of got away with it."

Suppose somebody writes an extension to double-check that plan
trees are correctly typed. Or improves EXPLAIN to check a little more
carefully than it seems to. Omitting the relabel nodes could spell
trouble then.

Or, someone more familiar with the code than I am might say "oh,
mismatches like that are common in rewritten trees, we live with it."
But unless somebody tells me that, I'm not believing it.

Well, this sounds long-lived.  I kind of prefer to label it now.  Adding
the 3rd commit to relabel the arg and return value. 


After we label it, we will get error like this: 

select (a->'a')::int4 from m;
ERROR:  cannot display a value of type internal

However the following statement can work well.

 select ('{"a": 12345}'::jsonb->'a')::numeric;
 numeric
---------
   12345

That's mainly because the later query doesn't go through the planner
support function. I didn't realize this before so the test case doesn't 
catch it.  Will add the test case  in the next version.  The reason why 
we get the error for the first query is because the query tree says 
we should output  an "internal"  result at last and then pg doesn't
know how to output an internal data type. This is kind of in conflict
with our goal.

So currently the only choices are:  PATCH 001 or PATCH 001 + 002. 


--
Best Regards
Andy Fan

Re: Extract numeric filed in JSONB more effectively

From
Chapman Flack
Date:
On 2023-08-22 01:54, Andy Fan wrote:
> After we label it, we will get error like this:
> 
> select (a->'a')::int4 from m;
> ERROR:  cannot display a value of type internal

Without looking in depth right now, I would double-check
what relabel node is being applied at the result. The idea,
of course, was to relabel the result as the expected result
type, not internal.

(Or, as in the restructuring suggested earlier, to use a
finish function whose return type is already as expected,
and needs no relabeling.)

Regards,
-Chap



Re: Extract numeric filed in JSONB more effectively

From
Chapman Flack
Date:
On 2023-08-22 08:16, Chapman Flack wrote:
> On 2023-08-22 01:54, Andy Fan wrote:
>> After we label it, we will get error like this:
>> 
>> select (a->'a')::int4 from m;
>> ERROR:  cannot display a value of type internal
> 
> Without looking in depth right now, I would double-check
> what relabel node is being applied at the result. The idea,
> of course, was to relabel the result as the expected result

as I suspected, looking at v10-0003, there's this:

+   fexpr = (FuncExpr *)makeRelabelType((Expr *) fexpr, INTERNALOID,
+                                       0, InvalidOid, 
COERCE_IMPLICIT_CAST);

compared to the example I had sent earlier:

On 2023-08-18 17:02, Chapman Flack wrote:
>     expr = (Expr *)makeRelabelType((Expr *)fexpr,
>       targetOid, -1, InvalidOid, COERCE_IMPLICIT_CAST);

The key difference: this is the label going on the result type
of the function we are swapping in. The function already has
return type declared internal; we want to relabel it as
returning the type identified by targetOid. A relabel node
to type internal is the reverse of what's needed (and also
superfluous, as the function's return type is internal already).

Two more minor differences: (1) the node you get from
makeRelabelType is an Expr, but not really a FuncExpr. Casting
it to FuncExpr is a bit bogus. Also, the third argument to
makeRelabelType is a typmod, and I believe the "not-modified"
typmod is -1, not 0.

In the example I had sent earlier, there were two relabel nodes,
serving different purposes. In one, we have a function that wants
internal for an argument, but we've made a Const with type OIDOID,
so we want to say "this is internal, the thing the function wants."

The situation is reversed at the return type of the function: the
function declares its return type internal, but the surrounding
query is expecting an int4 (or whatever targetOid identifies),
so any relabel node there needs to say it's an int4 (or whatever).

So, that approach involves two relabelings. In the one idea for
restructuring that I suggested earlier, that's reduced: the
..._int function produces a JsonbValue (typed internal) and
the selected ..._finish function expects that, so those types
match and no relabeling is called for. And with the correct
..._finish function selected at rewrite time, it already has
the same return type the surrounding query expects, so no
relabeling is called for there either.

However, simply to ensure the ..._int function cannot be
casually called, it needs an internal parameter, even an
unused one, and the rewriter must supply a value for that,
which may call for one relabel node.

On 2023-08-21 06:50, Andy Fan wrote:
> I'm not very excited with this manner, reasons are: a).  It will have
> to emit more steps in ExprState->steps which will be harmful for
> execution. The overhead  is something I'm not willing to afford.

I would be open to a performance comparison, but offhand I am not
sure whether the overhead of another step or two in an ExprState
is appreciably more than some of the overhead in the present patch,
such as the every-time-through fcinfo initialization buried in
DirectFunctionCall1 where you don't necessarily see it. I bet
the fcinfo in an ExprState step gets set up once, and just has
new argument values slammed into it each time through.

(Also, I know very little about how the JIT compiler is used in PG,
but I suspect that a step you bury inside your function is a step
it may not get to see.)

> b). this manner requires more *internal*, which is kind of similar
> to "void *"  in C.

I'm not sure in what sense you mean "more". The present patch
has to deal with two places where some other type must be
relabeled internal or something internal relabeled to another
type. The approach I suggested does involve two families of
function, one returning internal (a JsonbValue) and one
expecting internal (a JsonbValue), where the rewriter would
compose one over the other, no relabeling needed. There's
also an internal parameter needed for whatever returns
internal, and that's just the protocol for how such things
are done. To me, that seems like a fairly principled use of
the type.

I would not underestimate the benefit of reducing the code
duplication and keeping the patch as clear as possible.
The key contributions of the patch are getting a numeric or
boolean efficiently out of the JSON operation. Getting from
numeric to int or float are things the system already does
well. A patch that focuses on what it contributes, and avoids
redoing things the system already can do--unless the duplication
can be shown to have a strong performance benefit--is easier to
review and probably to get integrated.

Regards,
-Chap



Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:
(Sorry for leaving this discussion for such a long time,  how times fly!) 

On Sun, Aug 27, 2023 at 6:28 AM Chapman Flack <chap@anastigmatix.net> wrote:
On 2023-08-22 08:16, Chapman Flack wrote:
> On 2023-08-22 01:54, Andy Fan wrote:
>> After we label it, we will get error like this:
>>
>> select (a->'a')::int4 from m;
>> ERROR:  cannot display a value of type internal
>
> Without looking in depth right now, I would double-check
> what relabel node is being applied at the result. The idea,
> of course, was to relabel the result as the expected result

as I suspected, looking at v10-0003, there's this:

+   fexpr = (FuncExpr *)makeRelabelType((Expr *) fexpr, INTERNALOID,
+                                       0, InvalidOid,
COERCE_IMPLICIT_CAST);

compared to the example I had sent earlier:

On 2023-08-18 17:02, Chapman Flack wrote:
>     expr = (Expr *)makeRelabelType((Expr *)fexpr,
>       targetOid, -1, InvalidOid, COERCE_IMPLICIT_CAST);

The key difference: this is the label going on the result type
of the function we are swapping in.

I'm feeling we have some understanding gap in this area, let's
see what it is.  Suppose the original query is:

numeric(jsonb_object_field(v_jsonb, text)) -> numeric.

without the patch 003,  the rewritten query is:
jsonb_object_field_type(NUMERICOID,  v_jsonb, text) -> NUMERIC. 

However the declared type of jsonb_object_field_type is:

jsonb_object_field_type(internal, jsonb, text) -> internal. 

So the situation is:  a).  We input a CONST(type=OIDOID, ..) for an
internal argument.  b).  We return a NUMERIC type which matches
the original query c).  result type NUMERIC doesn't match the declared
type  'internal'  d).  it doesn't match the  run-time type of internal 
argument which is OID. 

case a) is fixed by RelableType.  case b) shouldn't be treat as an
issue.  I thought you wanted to address the case c), and patch 
003 tries to fix it, then the ERROR above.  At last I realized case 
c) isn't the one you want to fix.  case d) shouldn't be requirement
at the first place IIUC. 

So your new method is:
1. jsonb_{op}_start() ->  internal  (internal actually JsonbValue). 
2. jsonb_finish_{type}(internal, ..) -> type.   (internal is JsonbValue ). 

This avoids the case a) at the very beginning.  I'd like to provides
patches for both solutions for comparison.  Any other benefits of
this method I am missing? 
 
Two more minor differences: (1) the node you get from
makeRelabelType is an Expr, but not really a FuncExpr. Casting
it to FuncExpr is a bit bogus. Also, the third argument to
makeRelabelType is a typmod, and I believe the "not-modified"
typmod is -1, not 0.

My fault, you are right. 
 

On 2023-08-21 06:50, Andy Fan wrote:
> I'm not very excited with this manner, reasons are: a).  It will have
> to emit more steps in ExprState->steps which will be harmful for
> execution. The overhead  is something I'm not willing to afford.

I would be open to a performance comparison, but offhand I am not
sure whether the overhead of another step or two in an ExprState
is appreciably more than some of the overhead in the present patch,
such as the every-time-through fcinfo initialization buried in
DirectFunctionCall1 where you don't necessarily see it. I bet
the fcinfo in an ExprState step gets set up once, and just has
new argument values slammed into it each time through.

fcinfo initialization in DirectFunctionCall1 is an interesting point!
so  I am persuaded the extra steps in  ExprState may not be 
worse than the current way due to the "every-time-through 
fcinfo initialization" (in which case the memory is allocated 
once in heap rather than every time in stack).   I can do a 
comparison at last to see if we can find some other interesting
findings.  

 
I would not underestimate the benefit of reducing the code
duplication and keeping the patch as clear as possible.
The key contributions of the patch are getting a numeric or
boolean efficiently out of the JSON operation. Getting from
numeric to int or float are things the system already does
well.

True, reusing the casting system should be better than hard-code
the casting function manually.  I'd apply this on both methods. 
 
A patch that focuses on what it contributes, and avoids
redoing things the system already can do--unless the duplication
can be shown to have a strong performance benefit--is easier to
review and probably to get integrated.

Agreed.  

At last, thanks for the great insights and patience!

--
Best Regards
Andy Fan

Re: Extract numeric filed in JSONB more effectively

From
Chapman Flack
Date:
On 2023-08-30 00:47, Andy Fan wrote:
> see what it is.  Suppose the original query is:
> 
> numeric(jsonb_object_field(v_jsonb, text)) -> numeric.
> ...
> However the declared type of jsonb_object_field_type is:
> 
> jsonb_object_field_type(internal, jsonb, text) -> internal.
> 
> So the situation is: b).  We return a NUMERIC type which matches
> the original query ...
>  case b) shouldn't be treat as an issue.

*We* may know we are returning a NUMERIC type which matches the
original query, but nothing else knows that. Anything that
examined the complete tree after our rewriting would see some
expression that wants a numeric type, but supplied with a
subexpression that returns internal. Without a relabel node
there to promise that we know this internal is really numeric,
any type checker would reject the tree.

The fact that it even works at all without a relabel node there
seems to indicate that all of PostgreSQL's type checking was
done before calling the support function, and that there is not
much sanity checking of what the support function returns,
which I guess is efficient, if a little scary. Seems like
writing a support function is a bit like trapeze performing
without a net.

> So your new method is:
> 1. jsonb_{op}_start() ->  internal  (internal actually JsonbValue).
> 2. jsonb_finish_{type}(internal, ..) -> type.   (internal is JsonbValue 
> ).
> 
> This avoids the case a) at the very beginning.  I'd like to provides
> patches for both solutions for comparison.

I think, unavoidably, there is still a case a) at the very beginning,
just because of the rule that if json_{op}_start is going to have an
internal return type, it needs to have at least one internal parameter
to prevent casual calls from SQL, even if that parameter is not used
for anything.

It would be ok to write in a Const for that parameter, just zero or
42 or anything besides null (in case the function is strict), but
again if the Const has type internal then EXPLAIN will be sad, so
it has to be some type that makes EXPLAIN cheerful, and relabeled
internal.

But with this approach there is no longer a type mismatch of the
end result.

> fcinfo initialization in DirectFunctionCall1 is an interesting point!
> so  I am persuaded the extra steps in  ExprState may not be
> worse than the current way due to the "every-time-through
> fcinfo initialization" (in which case the memory is allocated
> once in heap rather than every time in stack).

Stack allocation is super cheap, just by emitting the function
entry to reserve n+m bytes instead of just m, so it there's any
measurable cost to the DirectFunctionCall I would think it more
likely to be in the initialization after allocation ... but I
haven't looked at that code closely to see how much there is.
I just wanted to make the point that another step or two in
ExprState might not be a priori worse. We might be talking
about negligible effects in either direction.

> I can do a
> comparison at last to see if we can find some other interesting
> findings.

That would be the way to find out. I think I would still lean
toward the approach with less code duplication, unless there
is a strong timing benefit the other way.

> True, reusing the casting system should be better than hard-code
> the casting function manually.  I'd apply this on both methods.

I noticed there is another patch registered in this CF: [1]
It adds new operations within jsonpath like .bigint .time
and so on.

I was wondering whether that work would be conflicting or
complementary with this. It looks to be complementary. The
operations being added there are within jsonpath evaluation.
Here we are working on faster ways to get those results out.

It does not seem that [1] will add any new choices in
JsonbValue. All of its (.bigint .integer .number) seem to
verify the requested form and then put the result as a
numeric in ->val.numeric. So that doesn't add any new
cases for this patch to handle. (Too bad, in a way: if that
other patch added ->val.bigint, this patch could add a case
to retrieve that value without going through the work of
making a numeric. But that would complicate other things
touching JsonbValue, and be a matter for that other patch.)

It may be expanding the choices for what we might one day
find in ->val.datetime though.

Regards,
-Chap


[1] https://commitfest.postgresql.org/44/4526/



Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:
Hi Chap,

The v11 attached, mainly changes are:
1.  use the jsonb_xx_start and jsonb_finish_numeric style. 
2.  improve the test case a bit.  

It doesn't include:
1.  the jsonb_finish_text function, since we have a operator ->> for text
already and the performance for it is OK and there is no cast entry for
jsonb to text. 
2.  the jsonb_finish_jsonb since I can't see a clear user case for now. 
Rewriting jsonb_object_field with 2 DirectFunctionCall looks not pretty
reasonable as we paid 2 DirectFunctionCall overhead to reduce ~10 lines
code duplication. 


An incompatible issue at error message level is found during test: 
create table jb(a jsonb);
insert into jb select '{"a": "a"}'::jsonb;
select (a->'a')::int4 from jb;

master:   ERROR:  cannot cast jsonb string to type integer
patch:  ERROR:  cannot cast jsonb string to type numeric

That's mainly because we first extract the field to numeric and
then cast it to int4 and the error raised at the first step and it
doesn't know the final type.  One way to fix it is adding a 2nd
argument for jsonb_finish_numeric for the real type, but
it looks weird and more suggestions on this would be good. 

Performance comparison between v10 and v11. 

create table tb (a jsonb);
insert into tb select '{"a": 1}'::jsonb from generate_series(1, 100000)i;
select 1 from tb where (a->'a')::int2 = 2;   (pgbench 5 times)

v11:  16.273 ms 
v10:  15.986 ms
master: 32.530ms

So I think the performance would not be an issue.  


I noticed there is another patch registered in this CF: [1]
It adds new operations within jsonpath like .bigint .time
and so on.

I was wondering whether that work would be conflicting or
complementary with this. It looks to be complementary. The
operations being added there are within jsonpath evaluation.
Here we are working on faster ways to get those results out.

It does not seem that [1] will add any new choices in
JsonbValue. All of its (.bigint .integer .number) seem to
verify the requested form and then put the result as a
numeric in ->val.numeric. So that doesn't add any new
cases for this patch to handle. (Too bad, in a way: if that
other patch added ->val.bigint, this patch could add a case
to retrieve that value without going through the work of
making a numeric. But that would complicate other things
touching JsonbValue, and be a matter for that other patch.)

It may be expanding the choices for what we might one day
find in ->val.datetime though.

Thanks for this information. I tried the  jsonb_xx_start and
jsonb_finish_numeric style, and it looks like a good experience 
and it may not make things too complicated even if the above 
things happen IMO. 

Any feedback is welcome. 

--
Best Regards
Andy Fan
Attachment

Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:
 
An incompatible issue at error message level is found during test: 
create table jb(a jsonb);
insert into jb select '{"a": "a"}'::jsonb;
select (a->'a')::int4 from jb;

master:   ERROR:  cannot cast jsonb string to type integer
patch:  ERROR:  cannot cast jsonb string to type numeric

That's mainly because we first extract the field to numeric and
then cast it to int4 and the error raised at the first step and it
doesn't know the final type.  One way to fix it is adding a 2nd
argument for jsonb_finish_numeric for the real type, but
it looks weird and more suggestions on this would be good. 


v12 is attached to address the above issue, I added a new argument
named target_oid for jsonb_finish_numeric so that it can raise a
correct error message.  I also fixed the issue reported by opr_sanity
in this version. 

Chap, do you still think we should refactor the code for the previous
existing functions like jsonb_object_field for less code duplication
purpose?  I think we should not do it because a). The code duplication
is just ~10 rows.  b).  If we do the refactor, we have to implement
two DirectFunctionCall1.   Point b) is the key reason I am not willing
to do it.  Or do I miss other important reasons? 

--
Best Regards
Andy Fan
Attachment

Re: Extract numeric filed in JSONB more effectively

From
jian he
Date:
I think the last patch failed. I am not 100% sure.
https://cirrus-ci.com/task/5464366154252288
says "Created 21 hours ago", I assume the latest patch.

the diff in Artifacts section. you can go to
testrun/build/testrun/regress/regress/regression.diffs

diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/jsonb.out
/tmp/cirrus-ci-build/build/testrun/regress/regress/results/jsonb.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/jsonb.out
2023-09-01 03:34:43.585036700 +0000
+++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/jsonb.out
2023-09-01 03:39:05.800452844 +0000
@@ -528,7 +528,7 @@
 (3 rows)

 SELECT (test_json -> 'field1')::int4 FROM test_jsonb WHERE json_type
= 'object';
-ERROR:  cannot cast jsonb string to type integer
+ERROR:  unknown jsonb type: 1125096840
 SELECT (test_json -> 'field1')::bool FROM test_jsonb WHERE json_type
= 'object';
 ERROR:  cannot cast jsonb string to type boolean
 \pset null ''



Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:
Hi Jian,

 SELECT (test_json -> 'field1')::int4 FROM test_jsonb WHERE json_type
= 'object';
-ERROR:  cannot cast jsonb string to type integer
+ERROR:  unknown jsonb type: 1125096840

Thanks for the report!  The reason is I return the address of a local variable. 

jsonb_object_field_start(PG_FUNCTION_ARGS)
{

    JsonbValue  *v;
    JsonbValue  vbuf;
    v = getKeyJsonValueFromContainer(&jb->root,
                                     VARDATA_ANY(key),\
                                     VARSIZE_ANY_EXHDR(key),
                                     &vbuf);
    PG_RETURN_POINTER(v);
}

Here the v points to vbuf which is a local variable in stack.  I'm confused
that why it works on my local machine and also works in the most queries
in cfbot, the fix is below

    v = getKeyJsonValueFromContainer(&jb->root,
                                     VARDATA_ANY(key),\
                                     VARSIZE_ANY_EXHDR(key),
                                     NULL);


I will send an updated version soon. 

--
Best Regards
Andy Fan

Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:
Hi,

  v13 attached.  Changes includes:

1.  fix the bug Jian provides.  
2.  reduce more code duplication without DirectFunctionCall. 
3.  add the overlooked  jsonb_path_query and jsonb_path_query_first as candidates


--
Best Regards
Andy Fan
Attachment

Re: Extract numeric filed in JSONB more effectively

From
jian he
Date:
On Mon, Sep 4, 2023 at 10:35 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:
>
> Hi,
>
>   v13 attached.  Changes includes:
>
> 1.  fix the bug Jian provides.
> 2.  reduce more code duplication without DirectFunctionCall.
> 3.  add the overlooked  jsonb_path_query and jsonb_path_query_first as candidates
>
>
> --
> Best Regards
> Andy Fan

based on v13.
IMHO, it might be a good idea to write some comments on
jsonb_object_field_internal. especially the second boolean argument.
something like "some case, we just want return JsonbValue rather than
Jsonb. to return JsonbValue, make as_jsonb be false".

I am not sure "jsonb_object_field_start" is a good name, so far I only
come up with "jsonb_object_field_to_jsonbvalues".

linitial(jsonb_start_func->args) =
makeRelabelType(linitial(jsonb_start_func->args),
   INTERNALOID, 0,
   InvalidOid,
   COERCE_IMPLICIT_CAST);

if no need, output typmod (usually -1), so here should be -1 rather than 0?

list_make2(jsonb_start_func, makeConst(.....). you can just combine
two different types then make a list, seems pretty cool...



Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:

based on v13.
IMHO, it might be a good idea to write some comments on
jsonb_object_field_internal. especially the second boolean argument.
something like "some case, we just want return JsonbValue rather than
Jsonb. to return JsonbValue, make as_jsonb be false".

OK,  I will proposal  "return a JsonbValue when as_jsonb is false". 
 
I am not sure "jsonb_object_field_start" is a good name, so far I only
come up with "jsonb_object_field_to_jsonbvalues".

Yes, I think it is a good idea.  Puting the jsonbvalue in the name can
compensate for the imprecision of "internal" as a return type.  I am thinking
if we should rename jsonb_finish_numeric to jsonbvalue_to_numeric as
well. 

 
linitial(jsonb_start_func->args) =
makeRelabelType(linitial(jsonb_start_func->args),
   INTERNALOID, 0,
   InvalidOid,
   COERCE_IMPLICIT_CAST);

if no need, output typmod (usually -1), so here should be -1 rather than 0?
 
I agree. -1 is better than 0. 

Thanks for the code level review again! I want to wait for some longer time
to gather more feedback.  I'm willing to name it better,  but hope I didn't
rename it to A and rename it back shortly. 

--
Best Regards
Andy Fan

Re: Extract numeric filed in JSONB more effectively

From
Chapman Flack
Date:
On 2023-09-04 10:35, Andy Fan wrote:
>   v13 attached.  Changes includes:
> 
> 1.  fix the bug Jian provides.
> 2.  reduce more code duplication without DirectFunctionCall.
> 3.  add the overlooked  jsonb_path_query and jsonb_path_query_first as
> candidates

Apologies for the delay. I like the way this is shaping up.

My first comment will be one that may be too large for this patch
(and too large to rest on my opinion alone); that's why I'm making
it first.

It seems at first a minor point: to me it feels like a wart to have
to pass jsonb_finish_numeric (and *only* jsonb_finish_numeric) a type
oid reflecting the target type of a cast that's going to be applied
*after jsonb_finish_numeric has done its work*, and only for the
purpose of generating a message if the jsonb type *isn't numeric*,
but saying "cannot cast to" (that later target type) instead.

I understand this is done to exactly match the existing behavior,
so what makes this a larger issue is I'm not convinced the existing
behavior is good. Therefore I'm not convinced that bending over
backward to preserve it is good.

What's not good: the places a message from cannotCastJsonbValue
are produced, there has been no attempt yet to cast anything.
The message purely tells you about whether you have the kind
of jsonb node you think you have (and array, bool, null, numeric,
object, string are the only kinds of those). If you're wrong
about what kind of jsonb node it is, you get this message.
If you're right about the kind of node, you don't get this
message, regardless of whether its value can be cast to the
later target type. For example, '32768'::jsonb::int2 produces
ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE "smallint out of range"
but that message comes from the actual int2 cast.

IMV, what the "cannot cast jsonb foo to type %s" message really
means is "jsonb foo where jsonb bar is required" and that's what
it should say, and that message depends on nothing about any
future plans for what will be done to the jsonb bar, so it can
be produced without needing any extra information to be passed.

I'm also not convinced ERRCODE_INVALID_PARAMETER_VALUE is a
good errcode for that message (whatever the wording). I do not
see much precedent elsewhere in the code for using
INVALID_PARAMETER_VALUE to signal this kind of "data value
isn't what you think it is" condition. Mostly it is used
when checking the kinds of parameters passed to a function to
indicate what it should do.

There seem to be several more likely choices for an errcode
there in the 2203x range.

But I understand that issue is not with this patch so much
as with preexisting behavior, and because it's preexisting,
there can be sound arguments against changing it.

But if that preexisting message could be changed, it would
eliminate the need for an unpleasing wart here.

Other notes are more minor:

+        else
+            /* not the desired pattern. */
+            PG_RETURN_POINTER(fexpr);
...
+
+        if (!OidIsValid(new_func_id))
+            PG_RETURN_POINTER(fexpr);
...
+            default:
+                PG_RETURN_POINTER(fexpr);

If I am reading supportnodes.h right, returning NULL is
sufficient to say no transformation is needed.

+        FuncExpr    *fexpr = palloc0(sizeof(FuncExpr));
...
+        memcpy(fexpr, req->fcall, sizeof(FuncExpr));

Is the shallow copy necessary? If so, a comment explaining why
might be apropos. Because the copy is shallow, if there is any
concern about the lifespan of req->fcall, would there not be a
concern about its children?

Is there a reason not to transform the _tz flavors of
jsonb_path_query and jsonb_path-query_first?

-    JsonbValue *v;
-    JsonbValue    vbuf;
+    JsonbValue    *v;
...
+    int i;
      JsonbValue *jbvp = NULL;
-    int            i;

Sometimes it's worth looking over a patch for changes like these,
and reverting such whitespace or position changes, if they aren't
things you want a reviewer to be squinting at. :)

Regards,
-Chap



Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:


On Thu, Sep 14, 2023 at 5:18 AM Chapman Flack <chap@anastigmatix.net> wrote:
On 2023-09-04 10:35, Andy Fan wrote:
>   v13 attached.  Changes includes:
>
> 1.  fix the bug Jian provides.
> 2.  reduce more code duplication without DirectFunctionCall.
> 3.  add the overlooked  jsonb_path_query and jsonb_path_query_first as
> candidates

Apologies for the delay. I like the way this is shaping up.

This is a great signal:) 
 
 
My first comment will be one that may be too large for this patch
(and too large to rest on my opinion alone); that's why I'm making
it first.

It seems at first a minor point: to me it feels like a wart to have
to pass jsonb_finish_numeric (and *only* jsonb_finish_numeric) a type
oid reflecting the target type of a cast that's going to be applied
*after jsonb_finish_numeric has done its work*, and only for the
purpose of generating a message if the jsonb type *isn't numeric*,
but saying "cannot cast to" (that later target type) instead.

I understand this is done to exactly match the existing behavior,
so what makes this a larger issue is I'm not convinced the existing
behavior is good. Therefore I'm not convinced that bending over
backward to preserve it is good.

I hesitated to do so, but I'm thinking if any postgresql user uses
some code like   if (errMsg.equals('old-error-message')),  and if we
change the error message, the application will be broken. I agree 
with the place for the error message,  IIUC,  you intend to choose
not compatible with the old error message? 

What's not good: the places a message from cannotCastJsonbValue
are produced, there has been no attempt yet to cast anything.
The message purely tells you about whether you have the kind
of jsonb node you think you have (and array, bool, null, numeric,
object, string are the only kinds of those). If you're wrong
about what kind of jsonb node it is, you get this message.
If you're right about the kind of node, you don't get this
message, regardless of whether its value can be cast to the
later target type. For example, '32768'::jsonb::int2 produces
ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE "smallint out of range"
but that message comes from the actual int2 cast.

IMV, what the "cannot cast jsonb foo to type %s" message really
means is "jsonb foo where jsonb bar is required" and that's what
it should say, and that message depends on nothing about any
future plans for what will be done to the jsonb bar, so it can
be produced without needing any extra information to be passed.

I'm also not convinced ERRCODE_INVALID_PARAMETER_VALUE is a
good errcode for that message (whatever the wording). I do not
see much precedent elsewhere in the code for using
INVALID_PARAMETER_VALUE to signal this kind of "data value
isn't what you think it is" condition. Mostly it is used
when checking the kinds of parameters passed to a function to
indicate what it should do.

There seem to be several more likely choices for an errcode
there in the 2203x range.

But I understand that issue is not with this patch so much
as with preexisting behavior, and because it's preexisting,
there can be sound arguments against changing it.  

But if that preexisting message could be changed, it would
eliminate the need for an unpleasing wart here.

Other notes are more minor:

+               else
+                       /* not the desired pattern. */
+                       PG_RETURN_POINTER(fexpr);
...
+
+               if (!OidIsValid(new_func_id))
+                       PG_RETURN_POINTER(fexpr);
...
+                       default:
+                               PG_RETURN_POINTER(fexpr);

If I am reading supportnodes.h right, returning NULL is
sufficient to say no transformation is needed.

I double confirmed you are right here.  
Changed it to PG_RETURN_POINT(null);   here in the next version. 

+               FuncExpr        *fexpr = palloc0(sizeof(FuncExpr));
...
+               memcpy(fexpr, req->fcall, sizeof(FuncExpr));

Is the shallow copy necessary? If so, a comment explaining why
might be apropos. Because the copy is shallow, if there is any
concern about the lifespan of req->fcall, would there not be a
concern about its children?

All the interesting parts in the input FuncExpr are heap based, 
but the FuncExpr itself is stack based (I'm not sure why the fact
works like this),  Also based on my previously understanding, I
need to return a FuncExpr original even if the function can't be 
simplified, so I made a shallow copy.  There will be no copy at 
all in the following version since I returned NULL in such a case. 
 
Is there a reason not to transform the _tz flavors of
jsonb_path_query and jsonb_path-query_first?

I misunderstood the _tz flavors return timestamp,  after some deep
reading of these functions, they just work at the comparisons part.
so I will add them in the following version.  
 

-       JsonbValue *v;
-       JsonbValue      vbuf;
+       JsonbValue      *v;
...
+       int i;
        JsonbValue *jbvp = NULL;
-       int                     i;

Sometimes it's worth looking over a patch for changes like these,
and reverting such whitespace or position changes, if they aren't
things you want a reviewer to be squinting at. :)

Yes, I  look over my patch carefully before sending it out usually,
but this is an oversight. 

Lastly,  do you have any idea about the function naming as Jian & I
discussed at [1]?


--
Best Regards
Andy Fan

Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:

Is there a reason not to transform the _tz flavors of
jsonb_path_query and jsonb_path-query_first?

I misunderstood the _tz flavors return timestamp,  after some deep
reading of these functions, they just work at the comparisons part.
so I will add them in the following version.  

_tz favors did return timestamp..  the reason is stated in the commit
 messge of patch 2. 

try to apply jsonb extraction optimization to  _tz functions.

both jsonb_path_query_tz and jsonb_path_query_tz_first returns
the elements which are timestamp comparable, but such elements
are impossible to be cast to numeric or boolean IIUC. Just provides
this commit for communication purpose only. 

so v14 is attached, changes include:
1. Change the typmod for internal type from 0 to -1.
2. return NULL for non-simplify expressions from the planner 
support function, hence shallow copy is removed as well.

Things are not addressed yet:
1.  the error message handling. 
2.  if we have chances to optimize _tz functions, I guess no.
3.  function naming issue. I think I can get it modified once after
all the other issues are addressed. 

--
Best Regards
Andy Fan
Attachment

Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:
Hi,

I am feeling this topic has been well discussed and the only pending 
issues are below,  it would be great that any committer can have a 
look at these,  so I mark this entry as "Ready for Committer". 

Things are not addressed yet:
1.  the error message handling. 

You can check [1] for more background of this,  I think blocking this
feature at an error message level is not pretty reasonable. 
 
2.  if we have chances to optimize _tz functions, I guess no.
 
patch 002 is dedicated  for this,  I think it should not be committed,
the reason is described in the commit message.

3.  function naming issue. I think I can get it modified once after
all the other issues are addressed. 


 
--
Best Regards
Andy Fan

Re: Extract numeric filed in JSONB more effectively

From
Chapman Flack
Date:
Adding this comment via the CF app so it isn't lost, while an improperly-interpreted-DKIM-headers issue is still
preventingme from mailing directly to -hackers.
 

It was my view that the patch was getting close by the end of the last commitfest, but still contained a bit of a logic
wartmade necessary by a questionable choice of error message wording, such that in my view it would be better to
determinewhether a different error message would better conform to ISO SQL in the first place, and obviate the need for
thelogic wart.
 

There seemed to be some progress possible on that when petere had time to weigh in on the standard shortly after the
lastCF ended.
 

So, it would not have been my choice to assign RfC status before getting to a resolution on that.

Also, it is possible for a JsonbValue to hold a timestamp (as a result of a jsonpath evaluation, I don't think that can
happenany other way), and if such a jsonpath evaluation were to be the source expression of a cast to SQL timestamp,
thatsituation seems exactly analogous to the other situations being optimized here and would require only a few more
linesin the exact pattern here introduced. While that could be called out of scope when this patch's title refers to
"numericfield" specifically, it might be worth considering for completeness. The patch does, after all, handle boolean
already,as well as numeric. 

Re: Extract numeric filed in JSONB more effectively

From
John Naylor
Date:
On Wed, Nov 1, 2023 at 9:18 AM Chapman Flack <chap@anastigmatix.net> wrote:
> So, it would not have been my choice to assign RfC status before getting to a resolution on that.

It's up to the reviewer (here Chapman), not the author, to decide
whether to set it to RfC. I've set the status to "needs review".



Re: Extract numeric filed in JSONB more effectively

From
zhihuifan1213@163.com
Date:
Chapman Flack <chap@anastigmatix.net> writes:

(This is Andy Fan and I just switch to my new email address).

Hi Chap,

Thanks for alway keep an eye on this!

> Adding this comment via the CF app so it isn't lost, while an
> improperly-interpreted-DKIM-headers issue is still preventing me from
> mailing directly to -hackers.
>
> It was my view that the patch was getting close by the end of the last
> commitfest, but still contained a bit of a logic wart made necessary by
> a questionable choice of error message wording, such that in my view it
> would be better to determine whether a different error message would
> better conform to ISO SQL in the first place, and obviate the need for
> the logic wart.
>
> There seemed to be some progress possible on that when petere had time
> to weigh in on the standard shortly after the last CF ended.
>
> So, it would not have been my choice to assign RfC status before
> getting to a resolution on that.

I agree with this.

>
> Also, it is possible for a JsonbValue to hold a timestamp (as a result
> of a jsonpath evaluation, I don't think that can happen any other
> way),

I believe this is where our disagreement lies.

CREATE TABLE employees (
                         
 
   id serial PRIMARY KEY,
   data jsonb
);

INSERT INTO employees (data) VALUES (
   '{
      "employees":[
         {
            "firstName":"John",
            "lastName":"Doe",
            "hireDate":"2022-01-01T09:00:00Z",
            "age": 30
         },
         {
            "firstName":"Jane",
            "lastName":"Smith",
            "hireDate":"2022-02-01T10:00:00Z",
            "age": 25
         }
      ]
   }'
);

select
jsonb_path_query_tz(data, '$.employees[*] ? (@.hireDate >=
"2022-02-01T00:00:00Z" && @.hireDate < "2022-03-01T00:00:00Z")')
from employees;

select jsonb_path_query_tz(data, '$.employees[*].hireDate ? (@ >=
"2022-02-01T00:00:00Z" && @ < "2022-03-01T00:00:00Z")') from employees;
select pg_typeof(jsonb_path_query_tz(data, '$.employees[*].hireDate ? (@
>= "2022-02-01T00:00:00Z" && @ < "2022-03-01T00:00:00Z")')) from
employees;

select jsonb_path_query_tz(data, '$.employees[*].hireDate ? (@
>= "2022-02-01T00:00:00Z" && @ < "2022-03-01T00:00:00Z")')::timestamp
from employees;
select jsonb_path_query_tz(data, '$.employees[*].hireDate ? (@
>= "2022-02-01T00:00:00Z" && @ < "2022-03-01T00:00:00Z")')::timestamptz
from employees;

I tried all of the above queires and can't find a place where this
optimization would apply. am I miss something? 


> and if such a jsonpath evaluation were to be the source expression of a
> cast to SQL timestamp, that situation seems exactly analogous to the
> other situations being optimized here and would require only a few more
> lines in the exact pattern here introduced.

Could you provide an example of this? 

> While that could be called
> out of scope when this patch's title refers to "numeric field"
> specifically, it might be worth considering for completeness. The patch
> does, after all, handle boolean already, as well as numeric.

I'd never arugment for this, at this point at least. 

v15 is provides without any fundamental changes.  Just rebase to the
lastest code and prepared a better commit message.


-- 
Best Regards
Andy Fan

Attachment

Re: Extract numeric filed in JSONB more effectively

From
jian he
Date:
hi.
you don't need to change src/include/catalog/catversion.h
as mentioned in https://wiki.postgresql.org/wiki/Committing_checklist
Otherwise, cfbot will fail many times.

+typedef enum JsonbValueTarget
+{
+ JsonbValue_AsJsonbValue,
+ JsonbValue_AsJsonb,
+ JsonbValue_AsText
+} JsonbValueTarget;

change to

+typedef enum JsonbValueTarget
+{
+ JsonbValue_AsJsonbValue,
+ JsonbValue_AsJsonb,
+ JsonbValue_AsText,
+} JsonbValueTarget;

currently cannot do `git apply`.



Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:
Hi,

> hi.
> you don't need to change src/include/catalog/catversion.h
> as mentioned in https://wiki.postgresql.org/wiki/Committing_checklist
> Otherwise, cfbot will fail many times.

Thanks for the wiki.

I checked the wiki and search "catversion", the only message I got is:

"Consider the need for a catversion bump."

How could this be explained as "no need to change ../catversion.h"? 

>
> +typedef enum JsonbValueTarget
> +{
> + JsonbValue_AsJsonbValue,
> + JsonbValue_AsJsonb,
> + JsonbValue_AsText
> +} JsonbValueTarget;
>
> change to
>
> +typedef enum JsonbValueTarget
> +{
> + JsonbValue_AsJsonbValue,
> + JsonbValue_AsJsonb,
> + JsonbValue_AsText,
> +} JsonbValueTarget;
>
> currently cannot do `git apply`.

OK, I guess it's something about whitespaces, my git-commit hook has
been configured to capture this during commit. After we reach an
agreement about the 'catversion.h' stuff, the next version of patch
should fix this issue. 

-- 
Best Regards
Andy Fan




Re: Extract numeric filed in JSONB more effectively

From
jian he
Date:
On Sun, Jan 7, 2024 at 3:26 PM Andy Fan <zhihuifan1213@163.com> wrote:
>
>
> Hi,
>
> > hi.
> > you don't need to change src/include/catalog/catversion.h
> > as mentioned in https://wiki.postgresql.org/wiki/Committing_checklist
> > Otherwise, cfbot will fail many times.
>
> Thanks for the wiki.
>
> I checked the wiki and search "catversion", the only message I got is:
>
> "Consider the need for a catversion bump."
>
> How could this be explained as "no need to change ../catversion.h"?

that means catversion.h changes is the committer's responsibility, IMHO.

IMHO, main reason is every time the catversion.h change, cfbot
http://cfbot.cputube.org will fail.
one patch took very long time to be committable.
you don't need update your patch for the every catversion.h changes.

> >
> > +typedef enum JsonbValueTarget
> > +{
> > + JsonbValue_AsJsonbValue,
> > + JsonbValue_AsJsonb,
> > + JsonbValue_AsText
> > +} JsonbValueTarget;
> >
> > change to
> >
> > +typedef enum JsonbValueTarget
> > +{
> > + JsonbValue_AsJsonbValue,
> > + JsonbValue_AsJsonb,
> > + JsonbValue_AsText,
> > +} JsonbValueTarget;
> >

reason: https://git.postgresql.org/cgit/postgresql.git/commit/?id=611806cd726fc92989ac918eac48fd8d684869c7

> > currently cannot do `git apply`.
>
> OK, I guess it's something about whitespaces, my git-commit hook has
> been configured to capture this during commit. After we reach an
> agreement about the 'catversion.h' stuff, the next version of patch
> should fix this issue.

Anyway, I made the following change:
remove catversion.h changes.
refactored the tests. Some of the explain(costs off, verbose) output
is very very long.
it's unreadable on the web browser. so I cut them into small pieces.
resolve duplicate OID issues.
slight refactored jsonbvalue_covert function, for the switch
statement, add a default branch.
see file v16-0001-Improve-the-performance-of-Jsonb-extraction.patch

you made a lot of changes, that might not be easy to get committed, i think.
Maybe we can split the patch into several pieces.
The first part is the original idea that:  pattern:  (jsonb(object) ->
'key')::numerica_data_type can be optimized.
The second part:  is other cases where cast jsonb to scalar data type
can also be optimized.

So, I refactor your patch. only have optimized casts for:
(jsonb(object) -> 'key')::numerica_data_type.
We can optimize more cast cases, but IMHO,
make it as minimal as possible, easier to review, easier to understand.
If people think this performance gain is good, then later we can add
more on top of it.

summary: 2 files attached.
v16-0001-Improve-the-performance-of-Jsonb-extraction.patch
refactored of your patch, that covers all the cast optimization cases,
this file will run the CI test.

v1-0001-Improve-performance-of-Jsonb-extract-via-key-and-c.no-cfbot
this one also based on your patch. but as a minimum patch to optimize
(jsonb(object) -> 'key')::numerica_data_type case only. (this one will
not run CI test).

Attachment

Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:
Hi,

Here is the update of this patch.

1. What is it for?

commit f7b93acc24b4a152984048fefc6d71db606e3204 (HEAD -> jsonb_numeric)
Author: yizhi.fzh <yizhi.fzh@alibaba-inc.com>
Date:   Fri Feb 9 16:54:06 2024 +0800

    Improve the performance of Jsonb numeric/bool extraction.
    
    JSONB object uses a binary compatible numeric format with the numeric
    data type in SQL. However in the past, extracting a numeric value from a
    JSONB field still needs to find the corresponding JsonbValue first,
    then convert the JsonbValue to Jsonb, and finally use the cast system to
    convert the Jsonb to a Numeric data type. This approach was very
    inefficient in terms of performance.
    
    In the current patch, It is handled that the JsonbValue is converted to
    numeric data type directly.  This is done by the planner support
    function which detects the above case and simplify the expression.
    Because the boolean type and numeric type share certain similarities in
    their attributes, we have implemented the same optimization approach for
    both.  In the ideal test case, the performance can be 2x than before.
    
    The optimized functions and operators includes:
    1. jsonb_object_field / ->
    2. jsonb_array_element / ->
    3. jsonb_extract_path / #>
    4. jsonb_path_query
    5. jsonb_path_query_first

example:
create table tb(a jsonb);
insert into tb select '{"a": 1, "b": "a"}'::jsonb;


master:
explain (costs off, verbose) select * from tb where (a->'a')::numeric > 3::numeric;
                        QUERY PLAN                         
-----------------------------------------------------------
 Seq Scan on public.tb
   Output: a
   Filter: (((tb.a -> 'a'::text))::numeric > '3'::numeric)
(3 rows)

patched:

postgres=# explain (costs off, verbose) select * from tb where (a->'a')::numeric > 3::numeric;
                                                     QUERY PLAN                                                      
---------------------------------------------------------------------------------------------------------------------
 Seq Scan on public.tb
   Output: a
   Filter: (jsonb_finish_numeric(jsonb_object_field_start((tb.a)::internal, 'a'::text), '1700'::oid) > '3'::numeric)
(3 rows)

The final expression generated by planner support function includes:

1).
jsonb_object_field_start((tb.a)::internal, 'a'::text) first, this
function returns the internal datum which is JsonbValue in fact.
2).
jsonb_finish_numeric(internal (jsonbvalue), '1700::oid) convert the
jsonbvalue to numeric directly without the jsonb as a intermedia result.

the reason why "1700::oid" will be explained later, that's the key issue
right now.

The reason why we need the 2 steps rather than 1 step is because the
code can be better abstracted, the idea comes from Chap, the detailed
explaination is at [1]. You can search "Now, it would make me happy to
further reduce some of the code duplication" and read the following
graph. 


2. Where is the current feature blocked for the past few months?

It's error message compatible issue! Continue with above setup:

master:

select * from tb where (a->'b')::numeric > 3::numeric;
ERROR:  cannot cast jsonb string to type numeric

select * from tb where (a->'b')::int4 > 3::numeric;
ERROR:  cannot cast jsonb string to type integer

You can see the error message is different (numeric vs integer). 


Patched:

We still can get the same error message as master BUT the code
looks odd.

select * from tb where (a->'b')::int4 > 3;
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Seq Scan on public.tb
   Output: a
   Filter: ((jsonb_finish_numeric(jsonb_object_field_start((tb.a)::internal, 'b'::text), '23'::oid))::integer > 3)
(3 rows)

You can see "jsonb_finish_numeric(..,  '23::oid)" the '23::oid' is just
for the *"integer"* output in error message:

"cannot cast jsonb string to type *integer*"

Now the sistuation is either we use the odd argument (23::oid) in
jsonb_finish_numeric, or we use a incompatible error message with the
previous version. I'm not sure which way is better, but this is the
place the current feature is blocked.

3. what do I want now?

Since this feature uses the planner support function which needs some
catalog changes, so it is better that we can merge this feature in PG17,
or else, we have to target it in PG18. So if some senior developers can
chime in, for the current blocking issue at least, will be pretty
helpful.

[1]
https://www.postgresql.org/message-id/5138c6b5fd239e7ce4e1a4e63826ac27%40anastigmatix.net 

-- 
Best Regards
Andy Fan


Attachment

Re: Extract numeric filed in JSONB more effectively

From
Peter Eisentraut
Date:
On 09.02.24 10:05, Andy Fan wrote:
> 2. Where is the current feature blocked for the past few months?
> 
> It's error message compatible issue! Continue with above setup:
> 
> master:
> 
> select * from tb where (a->'b')::numeric > 3::numeric;
> ERROR:  cannot cast jsonb string to type numeric
> 
> select * from tb where (a->'b')::int4 > 3::numeric;
> ERROR:  cannot cast jsonb string to type integer
> 
> You can see the error message is different (numeric vs integer).
> 
> 
> Patched:
> 
> We still can get the same error message as master BUT the code
> looks odd.
> 
> select * from tb where (a->'b')::int4 > 3;
>                                                      QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------
>   Seq Scan on public.tb
>     Output: a
>     Filter: ((jsonb_finish_numeric(jsonb_object_field_start((tb.a)::internal, 'b'::text), '23'::oid))::integer > 3)
> (3 rows)
> 
> You can see "jsonb_finish_numeric(..,  '23::oid)" the '23::oid' is just
> for the *"integer"* output in error message:
> 
> "cannot cast jsonb string to type*integer*"
> 
> Now the sistuation is either we use the odd argument (23::oid) in
> jsonb_finish_numeric, or we use a incompatible error message with the
> previous version. I'm not sure which way is better, but this is the
> place the current feature is blocked.

I'm not bothered by that.  It also happens on occasion in the backend C 
code that we pass around extra information to be able to construct 
better error messages.  The functions here are not backend C code, but 
they are internal functions, so similar considerations can apply.


But I have a different question about this patch set.  This has some 
overlap with the JSON_VALUE function that is being discussed at [0][1]. 
For example, if I apply the patch 
v39-0001-Add-SQL-JSON-query-functions.patch from that thread, I can run

select count(*) from tb where json_value(a, '$.a' returning numeric) = 2;

and I get a noticeable performance boost over

select count(*) from tb where cast (a->'a' as numeric) = 2;

So some questions to think about:

1. Compare performance of base case vs. this patch vs. json_value.

2. Can json_value be optimized further?

3. Is this patch still needed?

3a. If yes, should the internal rewriting make use of json_value or 
share code with it?


[0]: 
https://www.postgresql.org/message-id/flat/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com
[1]: https://commitfest.postgresql.org/47/4377/



Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:
Peter Eisentraut <peter@eisentraut.org> writes:

> On 09.02.24 10:05, Andy Fan wrote:
>> 2. Where is the current feature blocked for the past few months?
>> It's error message compatible issue! Continue with above setup:
>> master:
>> select * from tb where (a->'b')::numeric > 3::numeric;
>> ERROR:  cannot cast jsonb string to type numeric
>> select * from tb where (a->'b')::int4 > 3::numeric;
>> ERROR:  cannot cast jsonb string to type integer
>> You can see the error message is different (numeric vs integer).
>> Patched:
>> We still can get the same error message as master BUT the code
>> looks odd.
>> select * from tb where (a->'b')::int4 > 3;
>>                                                      QUERY PLAN
>> -------------------------------------------------------------------------------------------------------------------
>>   Seq Scan on public.tb
>>     Output: a
>>     Filter: ((jsonb_finish_numeric(jsonb_object_field_start((tb.a)::internal, 'b'::text), '23'::oid))::integer > 3)
>> (3 rows)
>> You can see "jsonb_finish_numeric(..,  '23::oid)" the '23::oid' is
>> just
>> for the *"integer"* output in error message:
>> "cannot cast jsonb string to type*integer*"
>> Now the sistuation is either we use the odd argument (23::oid) in
>> jsonb_finish_numeric, or we use a incompatible error message with the
>> previous version. I'm not sure which way is better, but this is the
>> place the current feature is blocked.
>
> I'm not bothered by that.  It also happens on occasion in the backend C
> code that we pass around extra information to be able to construct
> better error messages.  The functions here are not backend C code, but
> they are internal functions, so similar considerations can apply.

Thanks for speaking on this!

>
> But I have a different question about this patch set.  This has some
> overlap with the JSON_VALUE function that is being discussed at
> [0][1]. For example, if I apply the patch
> v39-0001-Add-SQL-JSON-query-functions.patch from that thread, I can run
>
> select count(*) from tb where json_value(a, '$.a' returning numeric) = 2;
>
> and I get a noticeable performance boost over
>
> select count(*) from tb where cast (a->'a' as numeric) = 2;

Here is my test and profile about the above 2 queries.

create table tb(a jsonb);
insert into tb
select jsonb_build_object('a', i) from generate_series(1, 10000)i;

cat a.sql
select count(*) from tb
where json_value(a, '$.a' returning numeric) = 2;

cat b.sql
select count(*) from tb where cast (a->'a' as numeric) = 2;

pgbench -n -f xxx.sql postgres -T100 | grep lat

Then here is the result:

| query | master  | patched (patch here and jsonb_value) |
|-------+---------+-------------------------------------|
| a.sql | /       | 2.59 (ms)                           |
| b.sql | 3.34 ms | 1.75 (ms)                           |

As we can see the patch here has the best performance (this result looks
be different from yours?).

After I check the code, I am sure both patches *don't* have the problem
in master where it get a jsonbvalue first and convert it to jsonb and
then cast to numeric.

Then I perf the result, and find the below stuff:

JSOB_VALUE
------------
-   32.02%     4.30%  postgres  postgres           [.] JsonPathValue
   - 27.72% JsonPathValue
      - 22.63% executeJsonPath (inlined)
         - 19.97% executeItem (inlined)
            - executeItemOptUnwrapTarget
               - 17.79% executeNextItem
                  - 15.49% executeItem (inlined)
                     - executeItemOptUnwrapTarget
                        + 8.50% getKeyJsonValueFromContainer (note here..)
                        + 2.30% executeNextItem
                          0.79% findJsonbValueFromContainer
                        + 0.68% check_stack_depth
                  + 1.51% jspGetNext
               + 0.73% check_stack_depth
           1.27% jspInitByBuffer
           0.85% JsonbExtractScalar
      + 4.91% DatumGetJsonbP (inlined)

Patch here for b.sql:
---------------------

-   19.98%     2.10%  postgres  postgres           [.] jsonb_object_field_start
   - 17.88% jsonb_object_field_start
      - 17.70% jsonb_object_field_internal (inlined)
         + 11.03% getKeyJsonValueFromContainer
         - 6.26% DatumGetJsonbP (inlined)
            + 5.78% detoast_attr
   + 1.21% _start
   + 0.54% 0x55ddb44552a

JSONB_VALUE has a much longer way to get getKeyJsonValueFromContainer,
then I think JSON_VALUE probably is designed for some more complex path 
which need to pay extra effort which bring the above performance
difference. 

I added Amit and Alvaro to this thread in case they can have more
insight on this.

> So some questions to think about:
>
> 1. Compare performance of base case vs. this patch vs. json_value.

Done, as above. 
>
> 2. Can json_value be optimized further?

hmm, I have some troubles to understand A's performance boost over B,
then who is better. But in my test above, looks the patch here is better
on the given case and the differece may comes from JSON_VALUE is
designed to handle more generic purpose.

> 3. Is this patch still needed?

I think yes. One reason is the patch here have better performance, the
other reason is the patch here prevent user from changing their existing
queries.

>
> 3a. If yes, should the internal rewriting make use of json_value or
> share code with it?

As for now, looks json_value is designed for more generic case, not sure
if we could share some code. My patch actually doesn't add much code on
the json function part. 

-- 
Best Regards
Andy Fan




Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:
>> But I have a different question about this patch set.  This has some
>> overlap with the JSON_VALUE function that is being discussed at
>> [0][1]. For example, if I apply the patch
>> v39-0001-Add-SQL-JSON-query-functions.patch from that thread, I can run
>>
>> select count(*) from tb where json_value(a, '$.a' returning numeric) = 2;
>>
>> and I get a noticeable performance boost over
>>
>> select count(*) from tb where cast (a->'a' as numeric) = 2;
>
> Here is my test and profile about the above 2 queries.
>
..
> As we can see the patch here has the best performance (this result looks
> be different from yours?).
>
> After I check the code, I am sure both patches *don't* have the problem
> in master where it get a jsonbvalue first and convert it to jsonb and
> then cast to numeric.
>
> Then I perf the result, and find the below stuff:
>
..

> JSONB_VALUE has a much longer way to get getKeyJsonValueFromContainer,
> then I think JSON_VALUE probably is designed for some more complex path 
> which need to pay extra effort which bring the above performance
> difference. 


Hello Peter,

Thanks for highlight the JSON_VALUE patch! Here is the sistuation in my
mind now. 

My patch is desigined to *not* introducing any new user-faced functions, 
but let some existing functions run faster. JSON_VALUE patch is designed
to following more on SQL standard so introuduced one new function which
has more flexibility on ERROR handling [1].  

Both patches are helpful on the subject here, but my patch here has a
better performance per my testing, I don't think I did anything better
here, just because JSON_VALUE function is designed for some more generic
purpose which has to pay some extra effort, and even if we have some
chance to improve JSON_VALUE, I don't think it shoud not block the patch
here (I'd like to learn more about this, it may takes some time!)

So I think the my patch here can be go ahead again, what do you think? 

[1]
https://www.postgresql.org/message-id/CACJufxGtetrn34Hwnb9D2if5D_HOPAh235MtEZ1meVYx-BiNtg%40mail.gmail.com 

-- 
Best Regards
Andy Fan




Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:
Here is latest version, nothing changed besides the rebase to the latest
master. The most recent 3 questions should be addressed.

- The error message compatible issue [1] and the Peter's answer at [2].
- Peter's new question at [2] and my answer at [3].

Any effrot to move this patch ahead is welcome and thanks all the people
who provided high quaility feedback so far, especially chapman!

[1] https://www.postgresql.org/message-id/87r0hmvuvr.fsf@163.com
[2]
https://www.postgresql.org/message-id/8102ff5b-b156-409e-a48f-e53e63a39b36%40eisentraut.org
[3] https://www.postgresql.org/message-id/8734t6c5rh.fsf%40163.com

-- 
Best Regards
Andy Fan


Attachment

Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:
Andy Fan <zhihuifan1213@163.com> writes:

> Here is latest version, nothing changed besides the rebase to the latest
> master. The most recent 3 questions should be addressed.
>
> - The error message compatible issue [1] and the Peter's answer at [2].
> - Peter's new question at [2] and my answer at [3].
>
> Any effrot to move this patch ahead is welcome and thanks all the people
> who provided high quaility feedback so far, especially chapman!
>
> [1] https://www.postgresql.org/message-id/87r0hmvuvr.fsf@163.com
> [2]
> https://www.postgresql.org/message-id/8102ff5b-b156-409e-a48f-e53e63a39b36%40eisentraut.org
> [3] https://www.postgresql.org/message-id/8734t6c5rh.fsf%40163.com

rebase to the latest master again.

commit bc990b983136ef658cd3be03cdb07f2eadc4cd5c (HEAD -> jsonb_numeric)
Author: yizhi.fzh <yizhi.fzh@alibaba-inc.com>
Date:   Mon Apr 1 09:36:08 2024 +0800

    Improve the performance of Jsonb numeric/bool extraction.
    
    JSONB object uses a binary compatible numeric format with the numeric
    data type in SQL. However in the past, extracting a numeric value from a
    JSONB field still needs to find the corresponding JsonbValue first,
    then convert the JsonbValue to Jsonb, and finally use the cast system to
    convert the Jsonb to a Numeric data type. This approach was very
    inefficient in terms of performance.
    
    In the current patch, It is handled that the JsonbValue is converted to
    numeric data type directly.  This is done by the planner support
    function which detects the above case and simplify the expression.
    Because the boolean type and numeric type share certain similarities in
    their attributes, we have implemented the same optimization approach for
    both.  In the ideal test case, the performance can be 2x than before.
    
    The optimized functions and operators includes:
    1. jsonb_object_field / ->
    2. jsonb_array_element / ->
    3. jsonb_extract_path / #>
    4. jsonb_path_query
    5. jsonb_path_query_first

-- 
Best Regards
Andy Fan


Attachment

Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:
Hello David,

  Thanks for checking this!
  
> There's a lot of complexity in the v18 patch that I don't understand
> the need for.
>
> I imagined you'd the patch should create a SupportRequestSimplify
> support function for jsonb_numeric() that checks if the input
> expression is an OpExpr with funcid of jsonb_object_field().  All you
> do then is ditch the cast and change the OpExpr to call a new function
> named jsonb_object_field_numeric() which returns the val.numeric
> directly.  Likely the same support function could handle jsonb casts
> to other types too, in which case you'd just call some other function,
> e.g jsonb_object_field_timestamp() or jsonb_object_field_boolean().

Basically yes. The reason complexity comes when we many operators we
want to optimize AND my patch I want to reduce the number of function
created. 

The optimized functions and operators includes:
1. jsonb_object_field / ->
2. jsonb_array_element / ->
3. jsonb_extract_path / #>
4. jsonb_path_query
5. jsonb_path_query_first

   
> ..., in which case you'd just call some other function,
> e.g jsonb_object_field_timestamp() or jsonb_object_field_boolean().

This works, but We need to create 2 functions for each operator. In the
patched case, we have 5 operators, so we need to create 10 functions.

op[1,2,3,4,5]_bool
op[1,2,3,4,5]_numeric.

Within the start / finish function, we need to create *7* functions.

op[1,2,3,4,5]_start :  return the "JsonbVaue".

jsonb_finish_numeric:  convert jsonbvalue to numeric
jsonb_finish_bool   :  convert jsonbvalue to bool.

I think the above is the major factor for the additional complexity. 

Some other factors contribute to complexity a bit.

1. we also have jsonb_int{2,4,8}/float{4,8} in pg_proc for '->'
   operator, not only numeric.
2. user may use OpExpr, like (jb->'x')::numeric, user may also use
   FuncExpr, like (jsonb_object_field(a, 'x'))::numeric. 


-- 
Best Regards
Andy Fan




Re: Extract numeric filed in JSONB more effectively

From
Dmitry Dolgov
Date:
> On Thu, Sep 12, 2024 at 03:03:18AM GMT, Andy Fan wrote:
>
> > I imagined you'd the patch should create a SupportRequestSimplify
> > support function for jsonb_numeric() that checks if the input
> > expression is an OpExpr with funcid of jsonb_object_field().  All you
> > do then is ditch the cast and change the OpExpr to call a new function
> > named jsonb_object_field_numeric() which returns the val.numeric
> > directly.  Likely the same support function could handle jsonb casts
> > to other types too, in which case you'd just call some other function,
> > e.g jsonb_object_field_timestamp() or jsonb_object_field_boolean().
>
> Basically yes. The reason complexity comes when we many operators we
> want to optimize AND my patch I want to reduce the number of function
> created.
>
> The optimized functions and operators includes:
> 1. jsonb_object_field / ->
> 2. jsonb_array_element / ->
> 3. jsonb_extract_path / #>
> 4. jsonb_path_query
> 5. jsonb_path_query_first
>
>
> > ..., in which case you'd just call some other function,
> > e.g jsonb_object_field_timestamp() or jsonb_object_field_boolean().
>
> This works, but We need to create 2 functions for each operator. In the
> patched case, we have 5 operators, so we need to create 10 functions.
>
> op[1,2,3,4,5]_bool
> op[1,2,3,4,5]_numeric.
>
> Within the start / finish function, we need to create *7* functions.

Any particular reason you want to keep number of functions minimal? Is
it just to make the patch smaller? I might be missing something without
looking at the implementation in details, but the difference between 10
and 7 functions doesn't seem to be significant.



Re: Extract numeric filed in JSONB more effectively

From
Andy Fan
Date:
Hi Dmitry,

>> On Thu, Sep 12, 2024 at 03:03:18AM GMT, Andy Fan wrote:
>>
>> > I imagined you'd the patch should create a SupportRequestSimplify
>> > support function for jsonb_numeric() that checks if the input
>> > expression is an OpExpr with funcid of jsonb_object_field().  All you
>> > do then is ditch the cast and change the OpExpr to call a new function
>> > named jsonb_object_field_numeric() which returns the val.numeric
>> > directly.  Likely the same support function could handle jsonb casts
>> > to other types too, in which case you'd just call some other function,
>> > e.g jsonb_object_field_timestamp() or jsonb_object_field_boolean().
>>
>> Basically yes. The reason complexity comes when we many operators we
>> want to optimize AND my patch I want to reduce the number of function
>> created.
>>
>> The optimized functions and operators includes:
>> 1. jsonb_object_field / ->
>> 2. jsonb_array_element / ->
>> 3. jsonb_extract_path / #>
>> 4. jsonb_path_query
>> 5. jsonb_path_query_first
>>
>>
>> > ..., in which case you'd just call some other function,
>> > e.g jsonb_object_field_timestamp() or jsonb_object_field_boolean().
>>
>> This works, but We need to create 2 functions for each operator. In the
>> patched case, we have 5 operators, so we need to create 10 functions.
>>
>> op[1,2,3,4,5]_bool
>> op[1,2,3,4,5]_numeric.
>>
>> Within the start / finish function, we need to create *7* functions.
>
> Any particular reason you want to keep number of functions minimal? Is
> it just to make the patch smaller? I might be missing something without
> looking at the implementation in details, but the difference between 10
> and 7 functions doesn't seem to be significant.

Another reason is for reducing code duplication, writting too many
similar function looks not good to me. Chapman expressed this idea
first at [1]. Search "it would make me happy to further reduce some
of the code" in the message.

Acutally this doesn't make the patch complexer too much.

[1]
https://www.postgresql.org/message-id/5138c6b5fd239e7ce4e1a4e63826ac27%40anastigmatix.net 


-- 
Best Regards
Andy Fan




Re: Extract numeric filed in JSONB more effectively

From
Dmitry Dolgov
Date:
> On Mon, Nov 18, 2024 at 08:23:52AM GMT, Andy Fan wrote:
>
> >> > I imagined you'd the patch should create a SupportRequestSimplify
> >> > support function for jsonb_numeric() that checks if the input
> >> > expression is an OpExpr with funcid of jsonb_object_field().  All you
> >> > do then is ditch the cast and change the OpExpr to call a new function
> >> > named jsonb_object_field_numeric() which returns the val.numeric
> >> > directly.  Likely the same support function could handle jsonb casts
> >> > to other types too, in which case you'd just call some other function,
> >> > e.g jsonb_object_field_timestamp() or jsonb_object_field_boolean().
> >>
> >> Basically yes. The reason complexity comes when we many operators we
> >> want to optimize AND my patch I want to reduce the number of function
> >> created.
> >>
> >> [...]
> >>
> >> Within the start / finish function, we need to create *7* functions.
> >
> > Any particular reason you want to keep number of functions minimal? Is
> > it just to make the patch smaller? I might be missing something without
> > looking at the implementation in details, but the difference between 10
> > and 7 functions doesn't seem to be significant.
>
> Another reason is for reducing code duplication, writting too many
> similar function looks not good to me. Chapman expressed this idea
> first at [1]. Search "it would make me happy to further reduce some
> of the code" in the message.
>
> Acutally this doesn't make the patch complexer too much.
>
> [1]
> https://www.postgresql.org/message-id/5138c6b5fd239e7ce4e1a4e63826ac27%40anastigmatix.net

It might not make everything too much complex, but e.g. relabeling of
the first argument for a "finish" function into an internal one sounds
strange to me. Maybe there is a way to avoid duplication of the code,
but keep all needed functions in pg_proc?

Btw, sorry to complain about small details, but I find start / finish
naming pattern not quite fitting here. Their main purpose is to extract
/ convert a value, the order in which they are happening is less
relevant.