Re: Deduplicate aggregates and transition functions in planner - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Deduplicate aggregates and transition functions in planner
Date
Msg-id 20201029174810.tk4s7ycfrwde7m3x@alap3.anarazel.de
Whole thread Raw
In response to Re: Deduplicate aggregates and transition functions in planner  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: Deduplicate aggregates and transition functions in planner
List pgsql-hackers
Hi,

On 2020-10-29 10:17:20 +0200, Heikki Linnakangas wrote:
> On 28/10/2020 21:59, Andres Freund wrote:
> > It wouldn't surprise me to see a small execution time speedup here -
> > I've seen the load of the aggno show up in profiles.
> 
> I think you'd be hard-pressed to find a real-life query where it
> matters. But if you don't care about real life:

I was actually thinking about a different angle - that the evaluation of
an Aggref can be faster, because we need less indirection to find the
aggno. As you have already implemented for the JITed code, but removing
it for the expression code looks easy enough too. You'd need a lot of
groups and presumably a fair number of Aggrefs to see it.

Attached is a quick version of what I am thinking wrt AggrefExprState.


> > > @@ -783,14 +783,13 @@ ExecInitExprRec(Expr *node, ExprState *state,
> > >                   scratch.opcode = EEOP_AGGREF;
> > >                   scratch.d.aggref.astate = astate;
> > > -                astate->aggref = aggref;
> > > +                astate->aggno = aggref->aggno;
> > >                   if (state->parent && IsA(state->parent, AggState))
> > >                   {
> > >                       AggState   *aggstate = (AggState *) state->parent;
> > > -                    aggstate->aggs = lappend(aggstate->aggs, astate);
> > > -                    aggstate->numaggs++;
> > > +                    aggstate->aggs = lappend(aggstate->aggs, aggref);
> > 
> > Hm. Why is aggstate->aggs still built during expression initialization?
> > Imo that's a pretty huge wart that also introduces more
> > order-of-operation brittleness to executor startup.
> 
> The Agg node itself doesn't include any information about the aggregates and
> transition functions. Because of that, ExecInitAgg needs a "representive"
> Aggref for each transition state and agg, to initialize the per-trans and
> per-agg structs. The expression initialization makes those Aggrefs available
> for ExecInitAgg.

> Instead of collecting all the Aggrefs in a list, ExecInitExprRec() could set
> each representative Aggref directly in the right per-trans and per-agg
> struct, based on the 'aggno' and 'aggtransno' fields.

Hold on a second: To me the question is why is it the right design that
the Agg node doesn't have the information about "aggregates and
transition functions"? Agg e.g. already does directly contains the group
keys...

My concern wouldn't really be addressed if we replace the lappend() in
ExecInitExprRec() with setting something "directly in the right
per-trans...". I think it's structurally wrong to have to discover
Aggrefs at execution time at all.

Perhaps the easiest incremental step would be to have something like a
CookedAggref { int aggno; } and then just store the Aggref nodes in
Agg->aggs, with aggno referencing that...


> I'd like to get rid of the "representative Aggrefs" altogether, and pass
> information about the transition and final functions from planner to
> executor in some other form. But that's exactly what got me into the
> refactoring that was ballooning out of hand that I mentioned.

Fair.


Greetings,

Andres Freund

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: duplicate function oid symbols
Next
From: Andres Freund
Date:
Subject: Re: Autovacuum worker doesn't immediately exit on postmaster death