Re: Extract numeric filed in JSONB more effectively - Mailing list pgsql-hackers

From Andy Fan
Subject Re: Extract numeric filed in JSONB more effectively
Date
Msg-id 8734t6c5rh.fsf@163.com
Whole thread Raw
In response to Re: Extract numeric filed in JSONB more effectively  (Peter Eisentraut <peter@eisentraut.org>)
Responses Re: Extract numeric filed in JSONB more effectively
List pgsql-hackers
Peter Eisentraut <peter@eisentraut.org> writes:

> On 09.02.24 10:05, Andy Fan wrote:
>> 2. Where is the current feature blocked for the past few months?
>> It's error message compatible issue! Continue with above setup:
>> master:
>> select * from tb where (a->'b')::numeric > 3::numeric;
>> ERROR:  cannot cast jsonb string to type numeric
>> select * from tb where (a->'b')::int4 > 3::numeric;
>> ERROR:  cannot cast jsonb string to type integer
>> You can see the error message is different (numeric vs integer).
>> Patched:
>> We still can get the same error message as master BUT the code
>> looks odd.
>> select * from tb where (a->'b')::int4 > 3;
>>                                                      QUERY PLAN
>> -------------------------------------------------------------------------------------------------------------------
>>   Seq Scan on public.tb
>>     Output: a
>>     Filter: ((jsonb_finish_numeric(jsonb_object_field_start((tb.a)::internal, 'b'::text), '23'::oid))::integer > 3)
>> (3 rows)
>> You can see "jsonb_finish_numeric(..,  '23::oid)" the '23::oid' is
>> just
>> for the *"integer"* output in error message:
>> "cannot cast jsonb string to type*integer*"
>> Now the sistuation is either we use the odd argument (23::oid) in
>> jsonb_finish_numeric, or we use a incompatible error message with the
>> previous version. I'm not sure which way is better, but this is the
>> place the current feature is blocked.
>
> I'm not bothered by that.  It also happens on occasion in the backend C
> code that we pass around extra information to be able to construct
> better error messages.  The functions here are not backend C code, but
> they are internal functions, so similar considerations can apply.

Thanks for speaking on this!

>
> But I have a different question about this patch set.  This has some
> overlap with the JSON_VALUE function that is being discussed at
> [0][1]. For example, if I apply the patch
> v39-0001-Add-SQL-JSON-query-functions.patch from that thread, I can run
>
> select count(*) from tb where json_value(a, '$.a' returning numeric) = 2;
>
> and I get a noticeable performance boost over
>
> select count(*) from tb where cast (a->'a' as numeric) = 2;

Here is my test and profile about the above 2 queries.

create table tb(a jsonb);
insert into tb
select jsonb_build_object('a', i) from generate_series(1, 10000)i;

cat a.sql
select count(*) from tb
where json_value(a, '$.a' returning numeric) = 2;

cat b.sql
select count(*) from tb where cast (a->'a' as numeric) = 2;

pgbench -n -f xxx.sql postgres -T100 | grep lat

Then here is the result:

| query | master  | patched (patch here and jsonb_value) |
|-------+---------+-------------------------------------|
| a.sql | /       | 2.59 (ms)                           |
| b.sql | 3.34 ms | 1.75 (ms)                           |

As we can see the patch here has the best performance (this result looks
be different from yours?).

After I check the code, I am sure both patches *don't* have the problem
in master where it get a jsonbvalue first and convert it to jsonb and
then cast to numeric.

Then I perf the result, and find the below stuff:

JSOB_VALUE
------------
-   32.02%     4.30%  postgres  postgres           [.] JsonPathValue
   - 27.72% JsonPathValue
      - 22.63% executeJsonPath (inlined)
         - 19.97% executeItem (inlined)
            - executeItemOptUnwrapTarget
               - 17.79% executeNextItem
                  - 15.49% executeItem (inlined)
                     - executeItemOptUnwrapTarget
                        + 8.50% getKeyJsonValueFromContainer (note here..)
                        + 2.30% executeNextItem
                          0.79% findJsonbValueFromContainer
                        + 0.68% check_stack_depth
                  + 1.51% jspGetNext
               + 0.73% check_stack_depth
           1.27% jspInitByBuffer
           0.85% JsonbExtractScalar
      + 4.91% DatumGetJsonbP (inlined)

Patch here for b.sql:
---------------------

-   19.98%     2.10%  postgres  postgres           [.] jsonb_object_field_start
   - 17.88% jsonb_object_field_start
      - 17.70% jsonb_object_field_internal (inlined)
         + 11.03% getKeyJsonValueFromContainer
         - 6.26% DatumGetJsonbP (inlined)
            + 5.78% detoast_attr
   + 1.21% _start
   + 0.54% 0x55ddb44552a

JSONB_VALUE has a much longer way to get getKeyJsonValueFromContainer,
then I think JSON_VALUE probably is designed for some more complex path 
which need to pay extra effort which bring the above performance
difference. 

I added Amit and Alvaro to this thread in case they can have more
insight on this.

> So some questions to think about:
>
> 1. Compare performance of base case vs. this patch vs. json_value.

Done, as above. 
>
> 2. Can json_value be optimized further?

hmm, I have some troubles to understand A's performance boost over B,
then who is better. But in my test above, looks the patch here is better
on the given case and the differece may comes from JSON_VALUE is
designed to handle more generic purpose.

> 3. Is this patch still needed?

I think yes. One reason is the patch here have better performance, the
other reason is the patch here prevent user from changing their existing
queries.

>
> 3a. If yes, should the internal rewriting make use of json_value or
> share code with it?

As for now, looks json_value is designed for more generic case, not sure
if we could share some code. My patch actually doesn't add much code on
the json function part. 

-- 
Best Regards
Andy Fan




pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Shared detoast Datum proposal
Next
From: Tomas Vondra
Date:
Subject: Re: a wrong index choose when statistics is out of date