Re: Extract numeric filed in JSONB more effectively - Mailing list pgsql-hackers
From | Andy Fan |
---|---|
Subject | Re: Extract numeric filed in JSONB more effectively |
Date | |
Msg-id | 87r0hmvuvr.fsf@163.com Whole thread Raw |
In response to | Re: Extract numeric filed in JSONB more effectively (jian he <jian.universality@gmail.com>) |
Responses |
Re: Extract numeric filed in JSONB more effectively
|
List | pgsql-hackers |
Hi, Here is the update of this patch. 1. What is it for? commit f7b93acc24b4a152984048fefc6d71db606e3204 (HEAD -> jsonb_numeric) Author: yizhi.fzh <yizhi.fzh@alibaba-inc.com> Date: Fri Feb 9 16:54:06 2024 +0800 Improve the performance of Jsonb numeric/bool extraction. JSONB object uses a binary compatible numeric format with the numeric data type in SQL. However in the past, extracting a numeric value from a JSONB field still needs to find the corresponding JsonbValue first, then convert the JsonbValue to Jsonb, and finally use the cast system to convert the Jsonb to a Numeric data type. This approach was very inefficient in terms of performance. In the current patch, It is handled that the JsonbValue is converted to numeric data type directly. This is done by the planner support function which detects the above case and simplify the expression. Because the boolean type and numeric type share certain similarities in their attributes, we have implemented the same optimization approach for both. In the ideal test case, the performance can be 2x than before. The optimized functions and operators includes: 1. jsonb_object_field / -> 2. jsonb_array_element / -> 3. jsonb_extract_path / #> 4. jsonb_path_query 5. jsonb_path_query_first example: create table tb(a jsonb); insert into tb select '{"a": 1, "b": "a"}'::jsonb; master: explain (costs off, verbose) select * from tb where (a->'a')::numeric > 3::numeric; QUERY PLAN ----------------------------------------------------------- Seq Scan on public.tb Output: a Filter: (((tb.a -> 'a'::text))::numeric > '3'::numeric) (3 rows) patched: postgres=# explain (costs off, verbose) select * from tb where (a->'a')::numeric > 3::numeric; QUERY PLAN --------------------------------------------------------------------------------------------------------------------- Seq Scan on public.tb Output: a Filter: (jsonb_finish_numeric(jsonb_object_field_start((tb.a)::internal, 'a'::text), '1700'::oid) > '3'::numeric) (3 rows) The final expression generated by planner support function includes: 1). jsonb_object_field_start((tb.a)::internal, 'a'::text) first, this function returns the internal datum which is JsonbValue in fact. 2). jsonb_finish_numeric(internal (jsonbvalue), '1700::oid) convert the jsonbvalue to numeric directly without the jsonb as a intermedia result. the reason why "1700::oid" will be explained later, that's the key issue right now. The reason why we need the 2 steps rather than 1 step is because the code can be better abstracted, the idea comes from Chap, the detailed explaination is at [1]. You can search "Now, it would make me happy to further reduce some of the code duplication" and read the following graph. 2. Where is the current feature blocked for the past few months? It's error message compatible issue! Continue with above setup: master: select * from tb where (a->'b')::numeric > 3::numeric; ERROR: cannot cast jsonb string to type numeric select * from tb where (a->'b')::int4 > 3::numeric; ERROR: cannot cast jsonb string to type integer You can see the error message is different (numeric vs integer). Patched: We still can get the same error message as master BUT the code looks odd. select * from tb where (a->'b')::int4 > 3; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Seq Scan on public.tb Output: a Filter: ((jsonb_finish_numeric(jsonb_object_field_start((tb.a)::internal, 'b'::text), '23'::oid))::integer > 3) (3 rows) You can see "jsonb_finish_numeric(.., '23::oid)" the '23::oid' is just for the *"integer"* output in error message: "cannot cast jsonb string to type *integer*" Now the sistuation is either we use the odd argument (23::oid) in jsonb_finish_numeric, or we use a incompatible error message with the previous version. I'm not sure which way is better, but this is the place the current feature is blocked. 3. what do I want now? Since this feature uses the planner support function which needs some catalog changes, so it is better that we can merge this feature in PG17, or else, we have to target it in PG18. So if some senior developers can chime in, for the current blocking issue at least, will be pretty helpful. [1] https://www.postgresql.org/message-id/5138c6b5fd239e7ce4e1a4e63826ac27%40anastigmatix.net -- Best Regards Andy Fan
Attachment
pgsql-hackers by date: