Re: Extract numeric filed in JSONB more effectively - Mailing list pgsql-hackers
From | Andy Fan |
---|---|
Subject | Re: Extract numeric filed in JSONB more effectively |
Date | |
Msg-id | 8734t6c5rh.fsf@163.com Whole thread Raw |
In response to | Re: Extract numeric filed in JSONB more effectively (Peter Eisentraut <peter@eisentraut.org>) |
Responses |
Re: Extract numeric filed in JSONB more effectively
|
List | pgsql-hackers |
Peter Eisentraut <peter@eisentraut.org> writes: > On 09.02.24 10:05, Andy Fan wrote: >> 2. Where is the current feature blocked for the past few months? >> It's error message compatible issue! Continue with above setup: >> master: >> select * from tb where (a->'b')::numeric > 3::numeric; >> ERROR: cannot cast jsonb string to type numeric >> select * from tb where (a->'b')::int4 > 3::numeric; >> ERROR: cannot cast jsonb string to type integer >> You can see the error message is different (numeric vs integer). >> Patched: >> We still can get the same error message as master BUT the code >> looks odd. >> select * from tb where (a->'b')::int4 > 3; >> QUERY PLAN >> ------------------------------------------------------------------------------------------------------------------- >> Seq Scan on public.tb >> Output: a >> Filter: ((jsonb_finish_numeric(jsonb_object_field_start((tb.a)::internal, 'b'::text), '23'::oid))::integer > 3) >> (3 rows) >> You can see "jsonb_finish_numeric(.., '23::oid)" the '23::oid' is >> just >> for the *"integer"* output in error message: >> "cannot cast jsonb string to type*integer*" >> Now the sistuation is either we use the odd argument (23::oid) in >> jsonb_finish_numeric, or we use a incompatible error message with the >> previous version. I'm not sure which way is better, but this is the >> place the current feature is blocked. > > I'm not bothered by that. It also happens on occasion in the backend C > code that we pass around extra information to be able to construct > better error messages. The functions here are not backend C code, but > they are internal functions, so similar considerations can apply. Thanks for speaking on this! > > But I have a different question about this patch set. This has some > overlap with the JSON_VALUE function that is being discussed at > [0][1]. For example, if I apply the patch > v39-0001-Add-SQL-JSON-query-functions.patch from that thread, I can run > > select count(*) from tb where json_value(a, '$.a' returning numeric) = 2; > > and I get a noticeable performance boost over > > select count(*) from tb where cast (a->'a' as numeric) = 2; Here is my test and profile about the above 2 queries. create table tb(a jsonb); insert into tb select jsonb_build_object('a', i) from generate_series(1, 10000)i; cat a.sql select count(*) from tb where json_value(a, '$.a' returning numeric) = 2; cat b.sql select count(*) from tb where cast (a->'a' as numeric) = 2; pgbench -n -f xxx.sql postgres -T100 | grep lat Then here is the result: | query | master | patched (patch here and jsonb_value) | |-------+---------+-------------------------------------| | a.sql | / | 2.59 (ms) | | b.sql | 3.34 ms | 1.75 (ms) | As we can see the patch here has the best performance (this result looks be different from yours?). After I check the code, I am sure both patches *don't* have the problem in master where it get a jsonbvalue first and convert it to jsonb and then cast to numeric. Then I perf the result, and find the below stuff: JSOB_VALUE ------------ - 32.02% 4.30% postgres postgres [.] JsonPathValue - 27.72% JsonPathValue - 22.63% executeJsonPath (inlined) - 19.97% executeItem (inlined) - executeItemOptUnwrapTarget - 17.79% executeNextItem - 15.49% executeItem (inlined) - executeItemOptUnwrapTarget + 8.50% getKeyJsonValueFromContainer (note here..) + 2.30% executeNextItem 0.79% findJsonbValueFromContainer + 0.68% check_stack_depth + 1.51% jspGetNext + 0.73% check_stack_depth 1.27% jspInitByBuffer 0.85% JsonbExtractScalar + 4.91% DatumGetJsonbP (inlined) Patch here for b.sql: --------------------- - 19.98% 2.10% postgres postgres [.] jsonb_object_field_start - 17.88% jsonb_object_field_start - 17.70% jsonb_object_field_internal (inlined) + 11.03% getKeyJsonValueFromContainer - 6.26% DatumGetJsonbP (inlined) + 5.78% detoast_attr + 1.21% _start + 0.54% 0x55ddb44552a JSONB_VALUE has a much longer way to get getKeyJsonValueFromContainer, then I think JSON_VALUE probably is designed for some more complex path which need to pay extra effort which bring the above performance difference. I added Amit and Alvaro to this thread in case they can have more insight on this. > So some questions to think about: > > 1. Compare performance of base case vs. this patch vs. json_value. Done, as above. > > 2. Can json_value be optimized further? hmm, I have some troubles to understand A's performance boost over B, then who is better. But in my test above, looks the patch here is better on the given case and the differece may comes from JSON_VALUE is designed to handle more generic purpose. > 3. Is this patch still needed? I think yes. One reason is the patch here have better performance, the other reason is the patch here prevent user from changing their existing queries. > > 3a. If yes, should the internal rewriting make use of json_value or > share code with it? As for now, looks json_value is designed for more generic case, not sure if we could share some code. My patch actually doesn't add much code on the json function part. -- Best Regards Andy Fan
pgsql-hackers by date: