On Tue, Jun 23, 2020 at 10:06 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2020-06-23 09:23:57 -0700, Melanie Plageman wrote: > On Mon, Jun 22, 2020 at 9:02 PM Andres Freund <andres@anarazel.de> wrote: > > It's not this patch's fault, but none, really none, of this stuff should > > be in the executor. > > > > > Were you thinking it could be done in grouping_planner() and then the > bitmaps could be saved in the PlannedStmt? > Or would you have to wait until query_planner()? Or are you imagining > somewhere else entirely?
I haven't thought about it in too much detail, but I would say create_agg_plan() et al. I guess there's some argument to be made to do it in setrefs.c, because we already do convert_combining_aggrefs() there (but I don't like that much).
There's no reason to do it before we actually decided on one specific path, so doing it earlier than create_plan() seems unnecessary. And having it in agg specific code seems better than putting it into global routines.
There's probably an argument for having a bit more shared code between create_agg_plan(), create_group_plan() and create_groupingsets_plan(). But even just adding a new extract_*_cols() call to each of those would probably be ok.
So, my summary of this point in the context of the other discussion upthread is:
Planner should extract the columns that hashagg will need later during planning. Planner should not have HashAgg/MixedAgg nodes request smaller targetlists from their children with CP_SMALL_TLIST to avoid unneeded projection overhead.
Also, even this extraction should only be done when the number of groups is large enough to suspect a spill.
So, I wrote a patch that extracts the columns the same way as in ExecInitAgg but in create_agg_plan() and it doesn't work because we haven't called set_plan_references().
Then, I wrote a patch that does this in set_upper_references(), and it seems to work. I've attached that one. It is basically Jeff's patch (based somewhat on my patch) which extracts the columns in ExecInitAgg but I moved the functions over to setrefs.c and gave them a different name.
It's not very elegant. I shoved it in at the end of set_upper_references(), but I think there should be a nice way to do it while setting the references for each var instead of walking over the nodes again. Also, I think that the bitmapsets for the colnos should maybe be put somewhere less prominent (than in the main Agg plan node?), since they are only used in one small place. I tried putting both bitmaps in an array of two bitmaps in the Agg node (since there will always be two) to make it look a bit neater, but it was pretty confusing and error prone to remember which one was aggregated and which one was unaggregated.
Note that I didn't do anything with costing like only extracting the columns if there are a lot of groups.
Also, I didn't revert the CP_SMALL_TLIST change in create_agg_plan() or create_groupingsets_plan().
Not to stir the pot, but I did notice that hashjoin uses CP_SMALL_TLIST in create_hashjoin_plan() for the inner side sub-tree and the outer side one if there are multiple batches. I wondered what was different about that vs hashagg (i.e. why it is okay to do that there).