Re: Deduplicate aggregates and transition functions in planner - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Deduplicate aggregates and transition functions in planner
Date
Msg-id 97969583-7f1b-c115-638d-fd8197818650@iki.fi
Whole thread Raw
In response to Re: Deduplicate aggregates and transition functions in planner  (Andres Freund <andres@anarazel.de>)
Responses Re: Deduplicate aggregates and transition functions in planner
List pgsql-hackers
On 29/10/2020 19:48, Andres Freund wrote:
> On 2020-10-29 10:17:20 +0200, Heikki Linnakangas wrote:
>> On 28/10/2020 21:59, Andres Freund wrote:
>>> It wouldn't surprise me to see a small execution time speedup here -
>>> I've seen the load of the aggno show up in profiles.
>>
>> I think you'd be hard-pressed to find a real-life query where it
>> matters. But if you don't care about real life:
> 
> I was actually thinking about a different angle - that the evaluation of
> an Aggref can be faster, because we need less indirection to find the
> aggno. As you have already implemented for the JITed code, but removing
> it for the expression code looks easy enough too. You'd need a lot of
> groups and presumably a fair number of Aggrefs to see it.
> 
> Attached is a quick version of what I am thinking wrt AggrefExprState.

Ah, I see, makes sense.

>> The Agg node itself doesn't include any information about the aggregates and
>> transition functions. Because of that, ExecInitAgg needs a "representive"
>> Aggref for each transition state and agg, to initialize the per-trans and
>> per-agg structs. The expression initialization makes those Aggrefs available
>> for ExecInitAgg.
> 
>> Instead of collecting all the Aggrefs in a list, ExecInitExprRec() could set
>> each representative Aggref directly in the right per-trans and per-agg
>> struct, based on the 'aggno' and 'aggtransno' fields.
> 
> Hold on a second: To me the question is why is it the right design that
> the Agg node doesn't have the information about "aggregates and
> transition functions"? Agg e.g. already does directly contains the group
> keys...
> 
> My concern wouldn't really be addressed if we replace the lappend() in
> ExecInitExprRec() with setting something "directly in the right
> per-trans...". I think it's structurally wrong to have to discover
> Aggrefs at execution time at all.
> 
> Perhaps the easiest incremental step would be to have something like a
> CookedAggref { int aggno; } and then just store the Aggref nodes in
> Agg->aggs, with aggno referencing that...

I started hacking on that CookedAggref approach, but it wasn't as simple 
as it seemed. I tried to replace the Aggrefs with CookedAggrefs in 
setrefs.c, but when set_plan_references() replaces expressions with Vars 
referring to the output of a subnode, it needs to be able to match an 
Aggref at an upper node to a CookedAggref on the node below. 
Furthermore, the deparsing code in ruleutils.c needs to be able to find 
the original Aggrefs, in order to print them nicely. All of that is 
solvable, I'm sure, but it's not trivial. And it's new code that mostly 
builds on top of attached patch, so I think that can be done separately 
later, it doesn't need to block this patch.

So barring objections, I'm going to push the attached updated patch that 
includes the removal of AggrefExprState, and leave CookedAggrefs or 
other further refactorings for the future.

- Heikki

Attachment

pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: Add LWLock blocker(s) information
Next
From: Craig Ringer
Date:
Subject: Re: ResourceOwner refactoring