Re: Extract numeric filed in JSONB more effectively - Mailing list pgsql-hackers

From Andy Fan
Subject Re: Extract numeric filed in JSONB more effectively
Date
Msg-id 87r0hmvuvr.fsf@163.com
Whole thread Raw
In response to Re: Extract numeric filed in JSONB more effectively  (jian he <jian.universality@gmail.com>)
Responses Re: Extract numeric filed in JSONB more effectively
List pgsql-hackers
Hi,

Here is the update of this patch.

1. What is it for?

commit f7b93acc24b4a152984048fefc6d71db606e3204 (HEAD -> jsonb_numeric)
Author: yizhi.fzh <yizhi.fzh@alibaba-inc.com>
Date:   Fri Feb 9 16:54:06 2024 +0800

    Improve the performance of Jsonb numeric/bool extraction.
    
    JSONB object uses a binary compatible numeric format with the numeric
    data type in SQL. However in the past, extracting a numeric value from a
    JSONB field still needs to find the corresponding JsonbValue first,
    then convert the JsonbValue to Jsonb, and finally use the cast system to
    convert the Jsonb to a Numeric data type. This approach was very
    inefficient in terms of performance.
    
    In the current patch, It is handled that the JsonbValue is converted to
    numeric data type directly.  This is done by the planner support
    function which detects the above case and simplify the expression.
    Because the boolean type and numeric type share certain similarities in
    their attributes, we have implemented the same optimization approach for
    both.  In the ideal test case, the performance can be 2x than before.
    
    The optimized functions and operators includes:
    1. jsonb_object_field / ->
    2. jsonb_array_element / ->
    3. jsonb_extract_path / #>
    4. jsonb_path_query
    5. jsonb_path_query_first

example:
create table tb(a jsonb);
insert into tb select '{"a": 1, "b": "a"}'::jsonb;


master:
explain (costs off, verbose) select * from tb where (a->'a')::numeric > 3::numeric;
                        QUERY PLAN                         
-----------------------------------------------------------
 Seq Scan on public.tb
   Output: a
   Filter: (((tb.a -> 'a'::text))::numeric > '3'::numeric)
(3 rows)

patched:

postgres=# explain (costs off, verbose) select * from tb where (a->'a')::numeric > 3::numeric;
                                                     QUERY PLAN                                                      
---------------------------------------------------------------------------------------------------------------------
 Seq Scan on public.tb
   Output: a
   Filter: (jsonb_finish_numeric(jsonb_object_field_start((tb.a)::internal, 'a'::text), '1700'::oid) > '3'::numeric)
(3 rows)

The final expression generated by planner support function includes:

1).
jsonb_object_field_start((tb.a)::internal, 'a'::text) first, this
function returns the internal datum which is JsonbValue in fact.
2).
jsonb_finish_numeric(internal (jsonbvalue), '1700::oid) convert the
jsonbvalue to numeric directly without the jsonb as a intermedia result.

the reason why "1700::oid" will be explained later, that's the key issue
right now.

The reason why we need the 2 steps rather than 1 step is because the
code can be better abstracted, the idea comes from Chap, the detailed
explaination is at [1]. You can search "Now, it would make me happy to
further reduce some of the code duplication" and read the following
graph. 


2. Where is the current feature blocked for the past few months?

It's error message compatible issue! Continue with above setup:

master:

select * from tb where (a->'b')::numeric > 3::numeric;
ERROR:  cannot cast jsonb string to type numeric

select * from tb where (a->'b')::int4 > 3::numeric;
ERROR:  cannot cast jsonb string to type integer

You can see the error message is different (numeric vs integer). 


Patched:

We still can get the same error message as master BUT the code
looks odd.

select * from tb where (a->'b')::int4 > 3;
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Seq Scan on public.tb
   Output: a
   Filter: ((jsonb_finish_numeric(jsonb_object_field_start((tb.a)::internal, 'b'::text), '23'::oid))::integer > 3)
(3 rows)

You can see "jsonb_finish_numeric(..,  '23::oid)" the '23::oid' is just
for the *"integer"* output in error message:

"cannot cast jsonb string to type *integer*"

Now the sistuation is either we use the odd argument (23::oid) in
jsonb_finish_numeric, or we use a incompatible error message with the
previous version. I'm not sure which way is better, but this is the
place the current feature is blocked.

3. what do I want now?

Since this feature uses the planner support function which needs some
catalog changes, so it is better that we can merge this feature in PG17,
or else, we have to target it in PG18. So if some senior developers can
chime in, for the current blocking issue at least, will be pretty
helpful.

[1]
https://www.postgresql.org/message-id/5138c6b5fd239e7ce4e1a4e63826ac27%40anastigmatix.net 

-- 
Best Regards
Andy Fan


Attachment

pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: Printing backtrace of postgres processes
Next
From: Anthonin Bonnefoy
Date:
Subject: Fix parallel vacuum buffer usage reporting