Re: UNION ALL has higher cost than inheritance - Mailing list pgsql-hackers

From Tom Lane
Subject Re: UNION ALL has higher cost than inheritance
Date
Msg-id 8400.1289188808@sss.pgh.pa.us
Whole thread Raw
In response to Re: UNION ALL has higher cost than inheritance  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: UNION ALL has higher cost than inheritance
List pgsql-hackers
I wrote:
> The oversight here is that we don't use appendrel planning for
> a top-level UNION ALL construct.  That didn't use to matter,
> because you always got the same stupid Append plan either way.
> Now it seems like we ought to have some more intelligence for the
> top-level SetOp case.  I smell some code refactoring coming up.

I did some hacking on this and came up with the attached patch, which
could use a bit more work on the comments but passes regression tests.
However, this just solves the issue of being smart about top-level
UNION ALL cases.  It might be worth looking into using MergeAppend
for the sorting required for other types of set operations.  That would
involve quite a different patch, and I'm not sure if it'd remove the
need for this one or not.

            regards, tom lane

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 07301c77fbf4f2550367ea8ff58c685462835223..a1055978869c5072db6e3a172169d934062ae2aa 100644
*** a/src/backend/optimizer/plan/planner.c
--- b/src/backend/optimizer/plan/planner.c
*************** subquery_planner(PlannerGlobal *glob, Qu
*** 341,353 ****
      inline_set_returning_functions(root);

      /*
!      * Check to see if any subqueries in the rangetable can be merged into
       * this query.
       */
      parse->jointree = (FromExpr *)
          pull_up_subqueries(root, (Node *) parse->jointree, NULL, NULL);

      /*
       * Detect whether any rangetable entries are RTE_JOIN kind; if not, we can
       * avoid the expense of doing flatten_join_alias_vars().  Also check for
       * outer joins --- if none, we can skip reduce_outer_joins(). This must be
--- 341,362 ----
      inline_set_returning_functions(root);

      /*
!      * Check to see if any subqueries in the jointree can be merged into
       * this query.
       */
      parse->jointree = (FromExpr *)
          pull_up_subqueries(root, (Node *) parse->jointree, NULL, NULL);

      /*
+      * If this is a simple UNION ALL query, flatten it into an appendrel.
+      * We do this now because it requires applying pull_up_subqueries to the
+      * leaf queries of the UNION ALL, which weren't touched above because they
+      * aren't referenced by the jointree until we do this.
+      */
+     if (parse->setOperations)
+         flatten_union_all(root);
+
+     /*
       * Detect whether any rangetable entries are RTE_JOIN kind; if not, we can
       * avoid the expense of doing flatten_join_alias_vars().  Also check for
       * outer joins --- if none, we can skip reduce_outer_joins(). This must be
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index e337751328bf0b05e3025a2ecc302f754b766855..847e67959a8b81a5cf8ea55ae48d57ec98f32285 100644
*** a/src/backend/optimizer/prep/prepjointree.c
--- b/src/backend/optimizer/prep/prepjointree.c
***************
*** 7,12 ****
--- 7,13 ----
   *        pull_up_sublinks
   *        inline_set_returning_functions
   *        pull_up_subqueries
+  *        flatten_union_all
   *        do expression preprocessing (including flattening JOIN alias vars)
   *        reduce_outer_joins
   *
*************** pull_up_simple_union_all(PlannerInfo *ro
*** 869,879 ****
      List       *rtable;

      /*
-      * Append the subquery rtable entries to upper query.
-      */
-     rtoffset = list_length(root->parse->rtable);
-
-     /*
       * Append child RTEs to parent rtable.
       *
       * Upper-level vars in subquery are now one level closer to their parent
--- 870,875 ----
*************** pull_up_simple_union_all(PlannerInfo *ro
*** 881,886 ****
--- 877,883 ----
       * because any such vars must refer to stuff above the level of the query
       * we are pulling into.
       */
+     rtoffset = list_length(root->parse->rtable);
      rtable = copyObject(subquery->rtable);
      IncrementVarSublevelsUp_rtable(rtable, -1, 1);
      root->parse->rtable = list_concat(root->parse->rtable, rtable);
*************** pullup_replace_vars_callback(Var *var,
*** 1418,1423 ****
--- 1415,1518 ----
      return newnode;
  }

+
+ /*
+  * flatten_union_all
+  *        Try to optimize a top-level UNION ALL structure
+  *
+  * If a query's setOperations tree consists entirely of simple UNION ALL
+  * operations, flatten it into an append relation, which we can process more
+  * intelligently than the general set-ops case.
+  *
+  * In most cases, this can succeed only for a top-level query, because for a
+  * subquery in FROM, the parent query's invocation of pull_up_subqueries would
+  * already have flattened the UNION via pull_up_simple_union_all.  But there
+  * are a few cases we can support here but not in that code path, for example
+  * when the subquery also contains ORDER BY.
+  */
+ void
+ flatten_union_all(PlannerInfo *root)
+ {
+     Query       *parse = root->parse;
+     SetOperationStmt *topop;
+     Node       *leftmostjtnode;
+     int            leftmostRTI;
+     RangeTblEntry *leftmostRTE;
+     int            childRTI;
+     RangeTblEntry *childRTE;
+     RangeTblRef *rtr;
+
+     /* Shouldn't be called unless query has setops */
+     topop = (SetOperationStmt *) parse->setOperations;
+     Assert(topop && IsA(topop, SetOperationStmt));
+
+     /* Can't optimize away a recursive UNION */
+     if (root->hasRecursion)
+         return;
+
+     /*
+      * Recursively check the tree of set operations.  If not all UNION ALL
+      * with identical column types, punt.
+      */
+     if (!is_simple_union_all_recurse((Node *) topop, parse,
+                                      topop->colTypes))
+         return;
+
+     /*
+      * Locate the leftmost leaf query in the setops tree.  The upper query's
+      * Vars all refer to this RTE (see transformSetOperationStmt).
+      */
+     leftmostjtnode = topop->larg;
+     while (leftmostjtnode && IsA(leftmostjtnode, SetOperationStmt))
+         leftmostjtnode = ((SetOperationStmt *) leftmostjtnode)->larg;
+     Assert(leftmostjtnode && IsA(leftmostjtnode, RangeTblRef));
+     leftmostRTI = ((RangeTblRef *) leftmostjtnode)->rtindex;
+     leftmostRTE = rt_fetch(leftmostRTI, parse->rtable);
+     Assert(leftmostRTE->rtekind == RTE_SUBQUERY);
+
+     /*
+      * Make a copy of the leftmost RTE and add it to the rtable.  This copy
+      * will represent the leftmost leaf query in its capacity as a member
+      * of the appendrel.  The original will represent the appendrel as a
+      * whole.  (We must do things this way because the upper query's Vars
+      * have to be seen as referring to the whole appendrel.)
+      */
+     childRTE = copyObject(leftmostRTE);
+     parse->rtable = lappend(parse->rtable, childRTE);
+     childRTI = list_length(parse->rtable);
+
+     /* Modify the setops tree to reference the child copy */
+     ((RangeTblRef *) leftmostjtnode)->rtindex = childRTI;
+
+     /*
+      * Modify the formerly-leftmost RTE to mark it as an appendrel parent
+      * (compare pull_up_simple_union_all).
+      */
+     leftmostRTE->inh = true;
+
+     /*
+      * Form a RangeTblRef for the appendrel, and insert it into FROM.  The top
+      * Query of a setops tree should have had an empty FromClause initially.
+      */
+     rtr = makeNode(RangeTblRef);
+     rtr->rtindex = leftmostRTI;
+     Assert(parse->jointree->fromlist == NIL);
+     parse->jointree->fromlist = list_make1(rtr);
+
+     /*
+      * Now pretend the query has no setops.  We must do this before trying
+      * to do subquery pullup, because of Assert in pull_up_simple_subquery.
+      */
+     parse->setOperations = NULL;
+
+     /*
+      * Build AppendRelInfo information and try to pull up the leaf queries.
+      * XXX need to adjust comments for pull_up_union_leaf_queries
+      */
+     pull_up_union_leaf_queries((Node *) topop, root, leftmostRTI, parse, 0);
+ }
+
+
  /*
   * reduce_outer_joins
   *        Attempt to reduce outer joins to plain inner joins.
diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h
index f8dd5428ee47a74637d37f5a293d2acff117eae4..e4654d9eeab0ef3908f679652866adb2589bd1ce 100644
*** a/src/include/optimizer/prep.h
--- b/src/include/optimizer/prep.h
*************** extern void inline_set_returning_functio
*** 26,31 ****
--- 26,32 ----
  extern Node *pull_up_subqueries(PlannerInfo *root, Node *jtnode,
                     JoinExpr *lowest_outer_join,
                     AppendRelInfo *containing_appendrel);
+ extern void flatten_union_all(PlannerInfo *root);
  extern void reduce_outer_joins(PlannerInfo *root);
  extern Relids get_relids_in_jointree(Node *jtnode, bool include_joins);
  extern Relids get_relids_for_join(PlannerInfo *root, int joinrelid);

pgsql-hackers by date:

Previous
From: Hitoshi Harada
Date:
Subject: Re: SQL2011 and writeable CTE
Next
From: Pavel Stehule
Date:
Subject: Re: SQL2011 and writeable CTE