Re: [HACKERS] Partition-wise aggregation/grouping - Mailing list pgsql-hackers

From Jeevan Chalke
Subject Re: [HACKERS] Partition-wise aggregation/grouping
Date
Msg-id CAM2+6=W+16AGny3i8g+vTNH1jeGyczpHaykKuw==Xo0Cqo7hjw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Partition-wise aggregation/grouping  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Responses Re: [HACKERS] Partition-wise aggregation/grouping  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
List pgsql-hackers


On Tue, Mar 6, 2018 at 4:59 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
Hi Jeevan,
I am back reviewing this. Here are some comments.

@@ -1415,7 +1413,8 @@ add_paths_to_append_rel(PlannerInfo *root,
RelOptInfo *rel,
          * the unparameterized Append path we are constructing for the parent.
          * If not, there's no workable unparameterized path.
          */
-        if (childrel->cheapest_total_path->param_info == NULL)
+        if (childrel->pathlist != NIL &&
+            childrel->cheapest_total_path->param_info == NULL)
             accumulate_append_subpath(childrel->cheapest_total_path,
                                       &subpaths, NULL);
         else
@@ -1683,6 +1682,13 @@ add_paths_to_append_rel(PlannerInfo *root,
RelOptInfo *rel,
             RelOptInfo *childrel = (RelOptInfo *) lfirst(lcr);
             Path       *subpath;

+            if (childrel->pathlist == NIL)
+            {
+                /* failed to make a suitable path for this child */
+                subpaths_valid = false;
+                break;
+            }
+
When can childrel->pathlist be NIL?

diff --git a/src/backend/optimizer/plan/createplan.c
b/src/backend/optimizer/plan/createplan.c
index 9ae1bf3..f90626c 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1670,7 +1670,15 @@ create_sort_plan(PlannerInfo *root, SortPath
*best_path, int flags)
     subplan = create_plan_recurse(root, best_path->subpath,
                                   flags | CP_SMALL_TLIST);

-    plan = make_sort_from_pathkeys(subplan, best_path->path.pathkeys, NULL);
+    /*
+     * In find_ec_member_for_tle(), child EC members are ignored if they don't
+     * belong to the given relids. Thus, if this sort path is based on a child
+     * relation, we must pass the relids of it. Otherwise, we will end-up into
+     * an error requiring pathkey item.
+     */
+    plan = make_sort_from_pathkeys(subplan, best_path->path.pathkeys,
+                                   IS_OTHER_REL(best_path->subpath->parent) ?
+                                   best_path->path.parent->relids : NULL);

     copy_generic_path_info(&plan->plan, (Path *) best_path);

Please separate this small adjustment in a patch of its own, with some
explanation of why we need it i.e. now this function can see SortPaths from
child (other) relations.

+    if (child_data)
+    {
+        /* Must be other rel as all child relations are marked OTHER_RELs */
+        Assert(IS_OTHER_REL(input_rel));

I think we should check IS_OTHER_REL() and Assert(child_data). That way we know
that the code in the if block is executed for OTHER relation.

-    if ((root->hasHavingQual || parse->groupingSets) &&
+    if (!child_data && (root->hasHavingQual || parse->groupingSets) &&

Degenerate grouping will never see child relations, so instead of checking for
child_data, Assert (!IS_OTHER_REL()) inside this block. Add a comment there
explaining the assertion.

+     *
+     * If we are performing grouping for a child relation, fetch can_sort from
+     * the child_data to avoid re-calculating same.
      */
-    can_sort = (gd && gd->rollups != NIL)
-        || grouping_is_sortable(parse->groupClause);
+    can_sort = child_data ? child_data->can_sort : ((gd &&
gd->rollups != NIL) ||
+
grouping_is_sortable(parse->groupClause));

Instead of adding a conditional here, we can compute these values before
create_grouping_paths() is called from grouping_planner() and then pass them
down to try_partitionwise_grouping(). I have attached a patch which refactors
this code. Please see if this refactoring is useful. In the attached patch, I
have handled can_sort, can_hash and partial aggregation costs. More on the last
component below.

Changes look good to me and refactoring will be useful for partitionwise patches.

However, will it be good if we add agg_costs into the GroupPathExtraData too?
Also can we pass this to the add_partial_paths_to_grouping_rel() and add_paths_to_grouping_rel() to avoid passing can_sort, can_hash and costs related details individually to them?
 


     /*
      * Figure out whether a PartialAggregate/Finalize Aggregate execution
@@ -3740,10 +3789,8 @@ create_grouping_paths(PlannerInfo *root,
      * partial paths for partially_grouped_rel; that way, later code can
      * easily consider both parallel and non-parallel approaches to grouping.
      */
-    if (try_parallel_aggregation)
+    if (!child_data && !(agg_costs->hasNonPartial || agg_costs->hasNonSerial))
     {
-        PathTarget *partial_grouping_target;
-
[... clipped ...]
+            get_agg_clause_costs(root, havingQual,
                                  AGGSPLIT_FINAL_DESERIAL,
-                                 &agg_final_costs);
+                                 agg_final_costs);
         }
+    }

With this change, we are computing partial aggregation costs even in
the cases when
those will not be used e.g. when there are no children and parallel paths can
not be created. In the attached patch, I have refactored the code such that
they are computed when they are needed the first time and re-used later.

+    if (child_data)
+    {
+        partial_grouping_target = child_data->partialRelTarget;
+        partially_grouped_rel->reltarget = partial_grouping_target;
+        agg_partial_costs = child_data->agg_partial_costs;
+        agg_final_costs = child_data->agg_final_costs;
+    }

I think, with the refactoring, we can get rid of the last two lines here. I
think we can get rid of this block entirely, but I have not reviewed the entire
code to confirm that.

 static PathTarget *
-make_partial_grouping_target(PlannerInfo *root, PathTarget *grouping_target)
+make_partial_grouping_target(PlannerInfo *root,
+                             PathTarget *grouping_target,
+                             Node *havingQual)
This looks like a refactoring change. Should go to one of the refactoring
patches or in a patch of its own.

This isn't full review. I will continue reviewing this further.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

pgsql-hackers by date:

Previous
From: Aleksandr Parfenov
Date:
Subject: Re: Flexible configuration for full-text search
Next
From: David Steele
Date:
Subject: Re: Re: [HACKERS] Support to COMMENT ON DATABASE CURRENT_DATABASE