Re: Extract numeric filed in JSONB more effectively - Mailing list pgsql-hackers

From Haibo Yan
Subject Re: Extract numeric filed in JSONB more effectively
Date
Msg-id 7487A577-7BDC-4B94-9FFA-741E95439BFC@gmail.com
Whole thread Raw
In response to Re: Extract numeric filed in JSONB more effectively  (Dmitry Dolgov <9erthalion6@gmail.com>)
Responses Re: Extract numeric filed in JSONB more effectively
Re: Extract numeric filed in JSONB more effectively
List pgsql-hackers
On Nov 22, 2024, at 10:14 AM, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
>
>> On Mon, Nov 18, 2024 at 08:23:52AM GMT, Andy Fan wrote:
>>
>>>>> I imagined you'd the patch should create a SupportRequestSimplify
>>>>> support function for jsonb_numeric() that checks if the input
>>>>> expression is an OpExpr with funcid of jsonb_object_field().  All you
>>>>> do then is ditch the cast and change the OpExpr to call a new function
>>>>> named jsonb_object_field_numeric() which returns the val.numeric
>>>>> directly.  Likely the same support function could handle jsonb casts
>>>>> to other types too, in which case you'd just call some other function,
>>>>> e.g jsonb_object_field_timestamp() or jsonb_object_field_boolean().
>>>>
>>>> Basically yes. The reason complexity comes when we many operators we
>>>> want to optimize AND my patch I want to reduce the number of function
>>>> created.
>>>>
>>>> [...]
>>>>
>>>> Within the start / finish function, we need to create *7* functions.
>>>
>>> Any particular reason you want to keep number of functions minimal? Is
>>> it just to make the patch smaller? I might be missing something without
>>> looking at the implementation in details, but the difference between 10
>>> and 7 functions doesn't seem to be significant.
>>
>> Another reason is for reducing code duplication, writting too many
>> similar function looks not good to me. Chapman expressed this idea
>> first at [1]. Search "it would make me happy to further reduce some
>> of the code" in the message.
>>
>> Acutally this doesn't make the patch complexer too much.
>>
>> [1]
>> https://www.postgresql.org/message-id/5138c6b5fd239e7ce4e1a4e63826ac27%40anastigmatix.net
>
> It might not make everything too much complex, but e.g. relabeling of
> the first argument for a "finish" function into an internal one sounds
> strange to me. Maybe there is a way to avoid duplication of the code,
> but keep all needed functions in pg_proc?
>
> Btw, sorry to complain about small details, but I find start / finish
> naming pattern not quite fitting here. Their main purpose is to extract
> / convert a value, the order in which they are happening is less
> relevant.
>
>
>
>

Hi all,
I’d like to continue pushing this patch forward.
Based on the earlier discussion, I reworked the patch into a smaller stage-1 version with a narrower scope and a
simplerrewrite strategy. The current patch keeps the normal SQL syntax unchanged and uses support-function
simplificationto rewrite only the following patterns: 
    (jsonb_object_field(...))::numeric
    jsonb_object_field(...))::bool
into explicit typed extractor calls.
So at this stage it intentionally covers only:
    jsonb_object_field(...) / ->
    casts to numeric
    casts to bool
and does not yet try to cover array/path extraction or integer/float typed extractors.
I also ran a small microbenchmark to isolate the cast-over-object-field path. On my setup, the current patch shows the
followinggains: 
Query                                        Before      After       Speedup
--------------------------------------------------------------------------------
SELECT sum((j->'n')::numeric) FROM t         118.028 ms  56.082 ms   2.10x
SELECT count(*) FROM t WHERE (j->'b')::bool  115.665 ms  51.945 ms   2.23x
--------------------------------------------------------------------------------
These are microbenchmark numbers rather than end-to-end workload results, but they suggest that the simplified rewrite
pathis worth pursuing. 
My goal with this version is not to solve the full matrix at once, but to first land a reviewer-friendly subset that:
1. does not introduce new user-visible operators,
2. keeps ordinary cast syntax unchanged,
3. avoids the more abstract internal/start-finish style machinery,
4. and uses explicit rewrite targets that are easier to review.
If this direction looks reasonable, I’d appreciate another round of review on the updated patch. If people think the
stage-1scope is acceptable, I can continue with follow-up patches for additional extractor families and target types. 
Thanks,
Haibo


Attachment

pgsql-hackers by date:

Previous
From: shveta malik
Date:
Subject: Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication
Next
From: Pavel Stehule
Date:
Subject: Re: Extract numeric filed in JSONB more effectively