Thread: cache type info in json_agg and friends

cache type info in json_agg and friends

From
Andrew Dunstan
Date:
Currently, json_agg, jsonb_agg, json_object_agg and jsonb_object_agg do
type classification on their arguments on each call to the transition
function. This is quite unnecessary, as the argument types won't change.
This patch remedies the defect by caching the necessary values in the
aggregate state object.

While this doesn't change the performance much, since these functions
are essentially dominated by other bits of the processing, I think it is
nevertheless worth doing.

There are other areas where we might attack this, also. In particular,
if one of the arguments is a record, then composite_to_json(b) will do
this for every attribute of every record. However, it's much less clear
to me how we can cache this information sensibly.

cheers

andrew

Attachment

Re: cache type info in json_agg and friends

From
Alvaro Herrera
Date:
Andrew Dunstan wrote:

> Currently, json_agg, jsonb_agg, json_object_agg and jsonb_object_agg do type
> classification on their arguments on each call to the transition function.
> This is quite unnecessary, as the argument types won't change. This patch
> remedies the defect by caching the necessary values in the aggregate state
> object.

Seems a reasonable idea to me.  This is 9.6 only, right?

What's the reason for this pattern?

!       json_categorize_type(val_type,&tcategory, &outfuncoid);
!       state->val_category = tcategory;
!       state->val_output_func = outfuncoid;

I think you could just as well call json_categorize_type() with the
final pointer values, and save the two separate variables, as there is
no gain in clarity or ease of reading; I mean

!       json_categorize_type(tmptyp, &state->val_category, &state->val_output_func);

Also, and this is not new in this patch, this code reuses a variable
named "val_type" for both values and keys, which reads a bit odd.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: cache type info in json_agg and friends

From
Teodor Sigaev
Date:
> Currently, json_agg, jsonb_agg, json_object_agg and jsonb_object_agg do
> type classification on their arguments on each call to the transition
> function. This is quite unnecessary, as the argument types won't change.
> This patch remedies the defect by caching the necessary values in the
> aggregate state object.
+1

>
> While this doesn't change the performance much, since these functions
> are essentially dominated by other bits of the processing, I think it is
> nevertheless worth doing.
Agree

After quick observation of your patch, why don't you use FmgrInfo 
instead of  JsonAggState.val_output_func/JsonAggState.key_category? 
FmgrInfo could be filled by fmgr_info_cxt() in aggcontext memory 
context. Suppose, direct usage of FmgrInfo with FunctionCall a bit 
faster than OidFunctionCall.
-- 
Teodor Sigaev                      E-mail: teodor@sigaev.ru                                      WWW:
http://www.sigaev.ru/



Re: cache type info in json_agg and friends

From
Andrew Dunstan
Date:

On 09/14/2015 03:42 PM, Teodor Sigaev wrote:
>> Currently, json_agg, jsonb_agg, json_object_agg and jsonb_object_agg do
>> type classification on their arguments on each call to the transition
>> function. This is quite unnecessary, as the argument types won't change.
>> This patch remedies the defect by caching the necessary values in the
>> aggregate state object.
> +1
>
>>
>> While this doesn't change the performance much, since these functions
>> are essentially dominated by other bits of the processing, I think it is
>> nevertheless worth doing.
> Agree
>
> After quick observation of your patch, why don't you use FmgrInfo 
> instead of JsonAggState.val_output_func/JsonAggState.key_category? 
> FmgrInfo could be filled by fmgr_info_cxt() in aggcontext memory 
> context. Suppose, direct usage of FmgrInfo with FunctionCall a bit 
> faster than OidFunctionCall.


Well, we need the category to help data_to_json(b) do its work. 
Nevertheless, it might be doable to pass an FmgrInfo* to datum_to_json. 
I'll see what I can do.

cheers

andrew




Re: cache type info in json_agg and friends

From
Andrew Dunstan
Date:

On 09/14/2015 04:24 PM, Andrew Dunstan wrote:
>
>
> On 09/14/2015 03:42 PM, Teodor Sigaev wrote:
>>> Currently, json_agg, jsonb_agg, json_object_agg and jsonb_object_agg do
>>> type classification on their arguments on each call to the transition
>>> function. This is quite unnecessary, as the argument types won't 
>>> change.
>>> This patch remedies the defect by caching the necessary values in the
>>> aggregate state object.
>> +1
>>
>>>
>>> While this doesn't change the performance much, since these functions
>>> are essentially dominated by other bits of the processing, I think 
>>> it is
>>> nevertheless worth doing.
>> Agree
>>
>> After quick observation of your patch, why don't you use FmgrInfo 
>> instead of JsonAggState.val_output_func/JsonAggState.key_category? 
>> FmgrInfo could be filled by fmgr_info_cxt() in aggcontext memory 
>> context. Suppose, direct usage of FmgrInfo with FunctionCall a bit 
>> faster than OidFunctionCall.
>
>
> Well, we need the category to help data_to_json(b) do its work. 
> Nevertheless, it might be doable to pass an FmgrInfo* to 
> datum_to_json. I'll see what I can do.
>
>


The real problem about this is that in the most important cases to 
improve (composite_to_json(b) and the array processing functions) we'll 
still end up calling fmgr_info for every attribute of every record and 
for every array, although not for every array element, which is what 
OidOutputFunctionCall does for us anyway.

It's not obvious to me how to fix that, and before I put lots of effort 
into it I want to do some profiling to see where the time is actually 
being spent - I don't want to add a whole lot of code for a very 
marginal improvement.

One thought I did have that might be worth testing is that in the case 
of jsonb all the micro operations might be killing us, and that it might 
well be faster to generate a JSON string in the aggregates and then 
parse that into jsonb in the final function.

cheers

andrew




Re: cache type info in json_agg and friends

From
Andrew Dunstan
Date:

On 09/14/2015 03:41 PM, Alvaro Herrera wrote:
> Andrew Dunstan wrote:
>
>> Currently, json_agg, jsonb_agg, json_object_agg and jsonb_object_agg do type
>> classification on their arguments on each call to the transition function.
>> This is quite unnecessary, as the argument types won't change. This patch
>> remedies the defect by caching the necessary values in the aggregate state
>> object.
> Seems a reasonable idea to me.  This is 9.6 only, right?


I think we can reasonably backpatch it to 9.5, which is where the jsonb 
functions were actually introduced. It's not at all user visible, and 
we're still in alpha. Seem fair?

I have addressed your stylistic concerns, but I'll leave the fmgr_info 
question Teodor raised for another day. Before I do anything more than 
this I want to do some profiling to find out where the time is actually 
going for various workloads.

cheers

andrew