Better solution to final adjustment of combining Aggrefs - Mailing list pgsql-hackers

From Tom Lane
Subject Better solution to final adjustment of combining Aggrefs
Date
Msg-id 15693.1466875803@sss.pgh.pa.us
Whole thread Raw
Responses Re: Better solution to final adjustment of combining Aggrefs  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
I complained earlier about the brute-force way that the partial
aggregation patch deals with constructing Aggrefs for the upper stage of
aggregation.  It copied-and-pasted several hundred lines of setrefs.c
so as to inject a nonstandard rule for comparing upper and lower Aggrefs.
That's bulky and will create a constant risk of omissions in future
maintenance.  I felt there had to be a better way to do that, and after
some thought I propose the attached not-quite-complete patch.

Basically what this does is to explicitly construct the representation of
an upper combining Aggref with a lower partial Aggref as input.  After
that, the regular set_upper_references pass can match the lower partial
Aggref to what it will find in the output tlist of the child plan node,
producing the desired result of a combining Aggref with a Var as input.

The objection that could be raised to this is that it's slightly less
efficient than the original code, since it requires an additional
mutation pass over the tlist and qual of a combining Agg node.  I think
that is negligible, though, in comparison to all the other setup costs
of a parallel aggregation plan.

The patch is not quite finished: as noted in the XXX comment, it'd be
a good idea to refactor apply_partialaggref_adjustment so that it can
share code with this function, to ensure they produce identical
representations of the lower partial Aggref.  But that will just make
the patch longer, not any more interesting, so I left it out for now.

Another bit of followon work is to get rid of Aggref.aggoutputtype,
which I've also complained about and which is no longer particularly
necessary.  I'm inclined to do that in the same commit that adjusts
the partial-aggregation-control fields in Aggref as per the thread at
https://www.postgresql.org/message-id/29309.1466699160@sss.pgh.pa.us

Comments/objections?

            regards, tom lane

diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 17edc27..3088af1 100644
*** a/src/backend/optimizer/plan/setrefs.c
--- b/src/backend/optimizer/plan/setrefs.c
*************** static Node *fix_scan_expr_mutator(Node
*** 104,111 ****
  static bool fix_scan_expr_walker(Node *node, fix_scan_expr_context *context);
  static void set_join_references(PlannerInfo *root, Join *join, int rtoffset);
  static void set_upper_references(PlannerInfo *root, Plan *plan, int rtoffset);
! static void set_combineagg_references(PlannerInfo *root, Plan *plan,
!                           int rtoffset);
  static void set_dummy_tlist_references(Plan *plan, int rtoffset);
  static indexed_tlist *build_tlist_index(List *tlist);
  static Var *search_indexed_tlist_for_var(Var *var,
--- 104,110 ----
  static bool fix_scan_expr_walker(Node *node, fix_scan_expr_context *context);
  static void set_join_references(PlannerInfo *root, Join *join, int rtoffset);
  static void set_upper_references(PlannerInfo *root, Plan *plan, int rtoffset);
! static Node *convert_combining_aggrefs(Node *node, void *context);
  static void set_dummy_tlist_references(Plan *plan, int rtoffset);
  static indexed_tlist *build_tlist_index(List *tlist);
  static Var *search_indexed_tlist_for_var(Var *var,
*************** static Var *search_indexed_tlist_for_sor
*** 119,126 ****
                                        Index sortgroupref,
                                        indexed_tlist *itlist,
                                        Index newvarno);
- static Var *search_indexed_tlist_for_partial_aggref(Aggref *aggref,
-                                       indexed_tlist *itlist, Index newvarno);
  static List *fix_join_expr(PlannerInfo *root,
                List *clauses,
                indexed_tlist *outer_itlist,
--- 118,123 ----
*************** static Node *fix_upper_expr(PlannerInfo
*** 135,147 ****
                 int rtoffset);
  static Node *fix_upper_expr_mutator(Node *node,
                         fix_upper_expr_context *context);
- static Node *fix_combine_agg_expr(PlannerInfo *root,
-                      Node *node,
-                      indexed_tlist *subplan_itlist,
-                      Index newvarno,
-                      int rtoffset);
- static Node *fix_combine_agg_expr_mutator(Node *node,
-                              fix_upper_expr_context *context);
  static List *set_returning_clause_references(PlannerInfo *root,
                                  List *rlist,
                                  Plan *topplan,
--- 132,137 ----
*************** static bool extract_query_dependencies_w
*** 171,190 ****
   * 3. We adjust Vars in upper plan nodes to refer to the outputs of their
   * subplans.
   *
!  * 4. PARAM_MULTIEXPR Params are replaced by regular PARAM_EXEC Params,
   * now that we have finished planning all MULTIEXPR subplans.
   *
!  * 5. We compute regproc OIDs for operators (ie, we look up the function
   * that implements each op).
   *
!  * 6. We create lists of specific objects that the plan depends on.
   * This will be used by plancache.c to drive invalidation of cached plans.
   * Relation dependencies are represented by OIDs, and everything else by
   * PlanInvalItems (this distinction is motivated by the shared-inval APIs).
   * Currently, relations and user-defined functions are the only types of
   * objects that are explicitly tracked this way.
   *
!  * 7. We assign every plan node in the tree a unique ID.
   *
   * We also perform one final optimization step, which is to delete
   * SubqueryScan plan nodes that aren't doing anything useful (ie, have
--- 161,183 ----
   * 3. We adjust Vars in upper plan nodes to refer to the outputs of their
   * subplans.
   *
!  * 4. Aggrefs in Agg plan nodes need to be adjusted in some cases involving
!  * partial aggregation or minmax aggregate optimization.
!  *
!  * 5. PARAM_MULTIEXPR Params are replaced by regular PARAM_EXEC Params,
   * now that we have finished planning all MULTIEXPR subplans.
   *
!  * 6. We compute regproc OIDs for operators (ie, we look up the function
   * that implements each op).
   *
!  * 7. We create lists of specific objects that the plan depends on.
   * This will be used by plancache.c to drive invalidation of cached plans.
   * Relation dependencies are represented by OIDs, and everything else by
   * PlanInvalItems (this distinction is motivated by the shared-inval APIs).
   * Currently, relations and user-defined functions are the only types of
   * objects that are explicitly tracked this way.
   *
!  * 8. We assign every plan node in the tree a unique ID.
   *
   * We also perform one final optimization step, which is to delete
   * SubqueryScan plan nodes that aren't doing anything useful (ie, have
*************** set_plan_refs(PlannerInfo *root, Plan *p
*** 678,692 ****
              break;
          case T_Agg:
              {
!                 Agg           *aggplan = (Agg *) plan;

!                 if (aggplan->combineStates)
!                     set_combineagg_references(root, plan, rtoffset);
!                 else
!                     set_upper_references(root, plan, rtoffset);

!                 break;
              }
          case T_Group:
              set_upper_references(root, plan, rtoffset);
              break;
--- 671,697 ----
              break;
          case T_Agg:
              {
!                 Agg           *agg = (Agg *) plan;

!                 /*
!                  * If this node is combining partial-aggregation results, we
!                  * must convert its Aggrefs to contain references to the
!                  * partial-aggregate subexpressions that will be available
!                  * from the child plan node.
!                  */
!                 if (agg->combineStates)
!                 {
!                     plan->targetlist = (List *)
!                         convert_combining_aggrefs((Node *) plan->targetlist,
!                                                   NULL);
!                     plan->qual = (List *)
!                         convert_combining_aggrefs((Node *) plan->qual,
!                                                   NULL);
!                 }

!                 set_upper_references(root, plan, rtoffset);
              }
+             break;
          case T_Group:
              set_upper_references(root, plan, rtoffset);
              break;
*************** set_upper_references(PlannerInfo *root,
*** 1720,1789 ****
  }

  /*
!  * set_combineagg_references
!  *      This serves the same function as set_upper_references(), but treats
!  *      Aggrefs differently. Here we transform Aggref nodes args to suit the
!  *      combine aggregate phase. This means that the Aggref->args are converted
!  *      to reference the corresponding aggregate function in the subplan rather
!  *      than simple Var(s), as would be the case for a non-combine aggregate
!  *      node.
   */
! static void
! set_combineagg_references(PlannerInfo *root, Plan *plan, int rtoffset)
  {
!     Plan       *subplan = plan->lefttree;
!     indexed_tlist *subplan_itlist;
!     List       *output_targetlist;
!     ListCell   *l;
!
!     Assert(IsA(plan, Agg));
!     Assert(((Agg *) plan)->combineStates);
!
!     subplan_itlist = build_tlist_index(subplan->targetlist);

!     output_targetlist = NIL;

!     foreach(l, plan->targetlist)
!     {
!         TargetEntry *tle = (TargetEntry *) lfirst(l);
!         Node       *newexpr;

!         /* If it's a non-Var sort/group item, first try to match by sortref */
!         if (tle->ressortgroupref != 0 && !IsA(tle->expr, Var))
!         {
!             newexpr = (Node *)
!                 search_indexed_tlist_for_sortgroupref((Node *) tle->expr,
!                                                       tle->ressortgroupref,
!                                                       subplan_itlist,
!                                                       OUTER_VAR);
!             if (!newexpr)
!                 newexpr = fix_combine_agg_expr(root,
!                                                (Node *) tle->expr,
!                                                subplan_itlist,
!                                                OUTER_VAR,
!                                                rtoffset);
!         }
          else
!             newexpr = fix_combine_agg_expr(root,
!                                            (Node *) tle->expr,
!                                            subplan_itlist,
!                                            OUTER_VAR,
!                                            rtoffset);
!         tle = flatCopyTargetEntry(tle);
!         tle->expr = (Expr *) newexpr;
!         output_targetlist = lappend(output_targetlist, tle);
!     }
!
!     plan->targetlist = output_targetlist;

!     plan->qual = (List *)
!         fix_combine_agg_expr(root,
!                              (Node *) plan->qual,
!                              subplan_itlist,
!                              OUTER_VAR,
!                              rtoffset);

!     pfree(subplan_itlist);
  }

  /*
--- 1725,1799 ----
  }

  /*
!  * Recursively scan an expression tree and convert Aggrefs to the proper
!  * intermediate form for combining aggregates.  This means (1) replacing each
!  * one's argument list with a single argument that is the original Aggref
!  * modified to show partial aggregation and (2) changing the upper Aggref to
!  * show combining aggregation.
!  *
!  * After this step, set_upper_references will replace the partial Aggrefs
!  * with Vars referencing the lower Agg plan node's outputs, so that the final
!  * form seen by the executor is a combining Aggref with a Var as input.
!  *
!  * It's rather messy to postpone this step until setrefs.c; ideally it'd be
!  * done in createplan.c.  The difficulty is that once we modify the Aggref
!  * expressions, they will no longer be equal() to their original form and
!  * so cross-plan-node-level matches will fail.  So this has to happen after
!  * the plan node above the Agg has resolved its subplan references.
   */
! static Node *
! convert_combining_aggrefs(Node *node, void *context)
  {
!     if (node == NULL)
!         return NULL;
!     if (IsA(node, Aggref))
!     {
!         Aggref       *orig_agg = (Aggref *) node;
!         Aggref       *child_agg;
!         Aggref       *parent_agg;

!         /*
!          * Since aggregate calls can't be nested, we needn't recurse into the
!          * arguments.  But for safety, flat-copy the Aggref node itself rather
!          * than modifying it in-place.
!          */
!         child_agg = makeNode(Aggref);
!         memcpy(child_agg, orig_agg, sizeof(Aggref));

!         /*
!          * For the parent Aggref, we want to copy all the fields of the
!          * original aggregate *except* the args list.  Rather than explicitly
!          * knowing what they all are here, we can momentarily modify child_agg
!          * to provide a source for copyObject.
!          */
!         child_agg->args = NIL;
!         parent_agg = (Aggref *) copyObject(child_agg);
!         child_agg->args = orig_agg->args;

!         /*
!          * Now, set child_agg to represent the first phase of partial
!          * aggregation.  (XXX this needs to match
!          * apply_partialaggref_adjustment; consider refactoring so as to share
!          * code.)
!          */
!         Assert(OidIsValid(child_agg->aggtranstype));
!         if (child_agg->aggtranstype == INTERNALOID)
!             child_agg->aggoutputtype = BYTEAOID;
          else
!             child_agg->aggoutputtype = child_agg->aggtranstype;
!         child_agg->aggpartial = true;

!         /*
!          * And set up parent_agg to represent the second phase.
!          */
!         parent_agg->args = list_make1(makeTargetEntry((Expr *) child_agg,
!                                                       1, NULL, false));
!         parent_agg->aggcombine = true;

!         return (Node *) parent_agg;
!     }
!     return expression_tree_mutator(node, convert_combining_aggrefs,
!                                    (void *) context);
  }

  /*
*************** search_indexed_tlist_for_sortgroupref(No
*** 2053,2126 ****
  }

  /*
-  * search_indexed_tlist_for_partial_aggref - find an Aggref in an indexed tlist
-  *
-  * Aggrefs for partial aggregates have their aggoutputtype adjusted to set it
-  * to the aggregate state's type, or serialization type. This means that a
-  * standard equal() comparison won't match when comparing an Aggref which is
-  * in partial mode with an Aggref which is not. Here we manually compare all of
-  * the fields apart from aggoutputtype.
-  */
- static Var *
- search_indexed_tlist_for_partial_aggref(Aggref *aggref, indexed_tlist *itlist,
-                                         Index newvarno)
- {
-     ListCell   *lc;
-
-     foreach(lc, itlist->tlist)
-     {
-         TargetEntry *tle = (TargetEntry *) lfirst(lc);
-
-         if (IsA(tle->expr, Aggref))
-         {
-             Aggref       *tlistaggref = (Aggref *) tle->expr;
-             Var           *newvar;
-
-             if (aggref->aggfnoid != tlistaggref->aggfnoid)
-                 continue;
-             if (aggref->aggtype != tlistaggref->aggtype)
-                 continue;
-             /* ignore aggoutputtype */
-             if (aggref->aggcollid != tlistaggref->aggcollid)
-                 continue;
-             if (aggref->inputcollid != tlistaggref->inputcollid)
-                 continue;
-             /* ignore aggtranstype and aggargtypes, should be redundant */
-             if (!equal(aggref->aggdirectargs, tlistaggref->aggdirectargs))
-                 continue;
-             if (!equal(aggref->args, tlistaggref->args))
-                 continue;
-             if (!equal(aggref->aggorder, tlistaggref->aggorder))
-                 continue;
-             if (!equal(aggref->aggdistinct, tlistaggref->aggdistinct))
-                 continue;
-             if (!equal(aggref->aggfilter, tlistaggref->aggfilter))
-                 continue;
-             if (aggref->aggstar != tlistaggref->aggstar)
-                 continue;
-             if (aggref->aggvariadic != tlistaggref->aggvariadic)
-                 continue;
-
-             /*
-              * it would be harmless to compare aggcombine and aggpartial, but
-              * it's also unnecessary
-              */
-             if (aggref->aggkind != tlistaggref->aggkind)
-                 continue;
-             if (aggref->agglevelsup != tlistaggref->agglevelsup)
-                 continue;
-
-             newvar = makeVarFromTargetEntry(newvarno, tle);
-             newvar->varnoold = 0;        /* wasn't ever a plain Var */
-             newvar->varoattno = 0;
-
-             return newvar;
-         }
-     }
-     return NULL;
- }
-
- /*
   * fix_join_expr
   *       Create a new set of targetlist entries or join qual clauses by
   *       changing the varno/varattno values of variables in the clauses
--- 2063,2068 ----
*************** fix_upper_expr_mutator(Node *node, fix_u
*** 2391,2496 ****
  }

  /*
-  * fix_combine_agg_expr
-  *      Like fix_upper_expr() but additionally adjusts the Aggref->args of
-  *      Aggrefs so that they references the corresponding Aggref in the subplan.
-  */
- static Node *
- fix_combine_agg_expr(PlannerInfo *root,
-                      Node *node,
-                      indexed_tlist *subplan_itlist,
-                      Index newvarno,
-                      int rtoffset)
- {
-     fix_upper_expr_context context;
-
-     context.root = root;
-     context.subplan_itlist = subplan_itlist;
-     context.newvarno = newvarno;
-     context.rtoffset = rtoffset;
-     return fix_combine_agg_expr_mutator(node, &context);
- }
-
- static Node *
- fix_combine_agg_expr_mutator(Node *node, fix_upper_expr_context *context)
- {
-     Var           *newvar;
-
-     if (node == NULL)
-         return NULL;
-     if (IsA(node, Var))
-     {
-         Var           *var = (Var *) node;
-
-         newvar = search_indexed_tlist_for_var(var,
-                                               context->subplan_itlist,
-                                               context->newvarno,
-                                               context->rtoffset);
-         if (!newvar)
-             elog(ERROR, "variable not found in subplan target list");
-         return (Node *) newvar;
-     }
-     if (IsA(node, PlaceHolderVar))
-     {
-         PlaceHolderVar *phv = (PlaceHolderVar *) node;
-
-         /* See if the PlaceHolderVar has bubbled up from a lower plan node */
-         if (context->subplan_itlist->has_ph_vars)
-         {
-             newvar = search_indexed_tlist_for_non_var((Node *) phv,
-                                                       context->subplan_itlist,
-                                                       context->newvarno);
-             if (newvar)
-                 return (Node *) newvar;
-         }
-         /* If not supplied by input plan, evaluate the contained expr */
-         return fix_upper_expr_mutator((Node *) phv->phexpr, context);
-     }
-     if (IsA(node, Param))
-         return fix_param_node(context->root, (Param *) node);
-     if (IsA(node, Aggref))
-     {
-         Aggref       *aggref = (Aggref *) node;
-
-         newvar = search_indexed_tlist_for_partial_aggref(aggref,
-                                                      context->subplan_itlist,
-                                                          context->newvarno);
-         if (newvar)
-         {
-             Aggref       *newaggref;
-             TargetEntry *newtle;
-
-             /*
-              * Now build a new TargetEntry for the Aggref's arguments which is
-              * a single Var which references the corresponding AggRef in the
-              * node below.
-              */
-             newtle = makeTargetEntry((Expr *) newvar, 1, NULL, false);
-             newaggref = (Aggref *) copyObject(aggref);
-             newaggref->args = list_make1(newtle);
-             newaggref->aggcombine = true;
-
-             return (Node *) newaggref;
-         }
-         else
-             elog(ERROR, "Aggref not found in subplan target list");
-     }
-     /* Try matching more complex expressions too, if tlist has any */
-     if (context->subplan_itlist->has_non_vars)
-     {
-         newvar = search_indexed_tlist_for_non_var(node,
-                                                   context->subplan_itlist,
-                                                   context->newvarno);
-         if (newvar)
-             return (Node *) newvar;
-     }
-     fix_expr_common(context->root, node);
-     return expression_tree_mutator(node,
-                                    fix_combine_agg_expr_mutator,
-                                    (void *) context);
- }
-
- /*
   * set_returning_clause_references
   *        Perform setrefs.c's work on a RETURNING targetlist
   *
--- 2333,2338 ----

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Rethinking representation of partial-aggregate steps
Next
From: Tom Lane
Date:
Subject: Re: Better solution to final adjustment of combining Aggrefs