Re: Extract numeric filed in JSONB more effectively - Mailing list pgsql-hackers
| From | Haibo Yan |
|---|---|
| Subject | Re: Extract numeric filed in JSONB more effectively |
| Date | |
| Msg-id | CABXr29GhnWxrGXD2YijYSDxwObGLqOsf8gvfzrFu=UcYU71XYA@mail.gmail.com Whole thread |
| In response to | Re: Extract numeric filed in JSONB more effectively (Andy Fan <zhihuifan1213@163.com>) |
| List | pgsql-hackers |
Hi Haibo,
> I’d like to continue pushing this patch forward.
> Based on the earlier discussion, I reworked the patch into a smaller
> stage-1 version with a narrower scope and a simpler rewrite
> strategy. ..
> and does not yet try to cover array/path extraction or integer/float
> typed extractors.
Thanks for working on this. I did a quick comparison between this
version and my last patch v18 [1], here is the difference.
My previous v18[1]: 542 insertions(+), 59 deletions(-)
Your patch: 543 insertions(+), 5 deletions(-)
However as what you have realized, your current patch still lack of many
optimizations, Not only the integer/float stuff, but also many
functions, e.g. jsonb_array_element, jsonb_extract_path,
jsonb_path_query and jsonb_path_query_first. After considering this,
what patch will look like in your approach? I guess you can see many
duplications. Less code doesn't always mean better, but I think this
still be a key consideration to address/check.
Yes, I agree the current patch is still missing a large part of the full matrix from v18 — not only int4/int8/float8, but also other extractor families such as jsonb_array_element, jsonb_extract_path, jsonb_path_query, and jsonb_path_query_first.
My intent with this version was not to cover the whole space at once, but to first reduce it to a smaller stage-1 subset that is easier to review and easier to reason about. In particular, I wanted to validate the simpler rewrite shape first: keep the support-function-based approach, but rewrite directly to explicit typed extractor functions, without the previous start/finish/internal pipeline.
I agree that if this approach is extended to the full matrix naively, duplication will become a real issue. So I am not claiming that the current patch shape should simply be copied mechanically across every extractor/type combination. My thought was to first see whether this narrower subset is acceptable in principle, and then decide how best to extend it without reintroducing too much abstraction or too much duplication.
So for now I would prefer to keep the current scope intentionally narrow:
jsonb_object_field / -> / equivalent subscripting form
casts to numeric and bool
and treat the rest as follow-up work, rather than trying to solve the entire matrix in one patch.
> I also ran a small microbenchmark to isolate the cast-over-object-field path. On my setup, the current patch shows the following gains:
> Query Before After Speedup
> --------------------------------------------------------------------------------
> SELECT sum((j->'n')::numeric) FROM t 118.028 ms 56.082 ms 2.10x
> SELECT count(*) FROM t WHERE (j->'b')::bool 115.665 ms 51.945 ms 2.23x
> --------------------------------------------------------------------------------
Thanks for running the test.
[1] https://www.postgresql.org/message-id/87ttk0lgcx.fsf%40163.com
--
Best Regards
Andy Fan
Thanks,
pgsql-hackers by date: