Re: Deduplicate aggregates and transition functions in planner - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Deduplicate aggregates and transition functions in planner |
Date | |
Msg-id | 20201029174810.tk4s7ycfrwde7m3x@alap3.anarazel.de Whole thread Raw |
In response to | Re: Deduplicate aggregates and transition functions in planner (Heikki Linnakangas <hlinnaka@iki.fi>) |
Responses |
Re: Deduplicate aggregates and transition functions in planner
|
List | pgsql-hackers |
Hi, On 2020-10-29 10:17:20 +0200, Heikki Linnakangas wrote: > On 28/10/2020 21:59, Andres Freund wrote: > > It wouldn't surprise me to see a small execution time speedup here - > > I've seen the load of the aggno show up in profiles. > > I think you'd be hard-pressed to find a real-life query where it > matters. But if you don't care about real life: I was actually thinking about a different angle - that the evaluation of an Aggref can be faster, because we need less indirection to find the aggno. As you have already implemented for the JITed code, but removing it for the expression code looks easy enough too. You'd need a lot of groups and presumably a fair number of Aggrefs to see it. Attached is a quick version of what I am thinking wrt AggrefExprState. > > > @@ -783,14 +783,13 @@ ExecInitExprRec(Expr *node, ExprState *state, > > > scratch.opcode = EEOP_AGGREF; > > > scratch.d.aggref.astate = astate; > > > - astate->aggref = aggref; > > > + astate->aggno = aggref->aggno; > > > if (state->parent && IsA(state->parent, AggState)) > > > { > > > AggState *aggstate = (AggState *) state->parent; > > > - aggstate->aggs = lappend(aggstate->aggs, astate); > > > - aggstate->numaggs++; > > > + aggstate->aggs = lappend(aggstate->aggs, aggref); > > > > Hm. Why is aggstate->aggs still built during expression initialization? > > Imo that's a pretty huge wart that also introduces more > > order-of-operation brittleness to executor startup. > > The Agg node itself doesn't include any information about the aggregates and > transition functions. Because of that, ExecInitAgg needs a "representive" > Aggref for each transition state and agg, to initialize the per-trans and > per-agg structs. The expression initialization makes those Aggrefs available > for ExecInitAgg. > Instead of collecting all the Aggrefs in a list, ExecInitExprRec() could set > each representative Aggref directly in the right per-trans and per-agg > struct, based on the 'aggno' and 'aggtransno' fields. Hold on a second: To me the question is why is it the right design that the Agg node doesn't have the information about "aggregates and transition functions"? Agg e.g. already does directly contains the group keys... My concern wouldn't really be addressed if we replace the lappend() in ExecInitExprRec() with setting something "directly in the right per-trans...". I think it's structurally wrong to have to discover Aggrefs at execution time at all. Perhaps the easiest incremental step would be to have something like a CookedAggref { int aggno; } and then just store the Aggref nodes in Agg->aggs, with aggno referencing that... > I'd like to get rid of the "representative Aggrefs" altogether, and pass > information about the transition and final functions from planner to > executor in some other form. But that's exactly what got me into the > refactoring that was ballooning out of hand that I mentioned. Fair. Greetings, Andres Freund
Attachment
pgsql-hackers by date: