Re: Optimization issue of branching UNION ALL - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Optimization issue of branching UNION ALL
Date
Msg-id 1788831.1671673839@sss.pgh.pa.us
Whole thread Raw
In response to Optimization issue of branching UNION ALL  (Andrey Lepikhov <a.lepikhov@postgrespro.ru>)
Responses Re: Optimization issue of branching UNION ALL  (Richard Guo <guofenglinux@gmail.com>)
Re: Optimization issue of branching UNION ALL  (Andrey Lepikhov <a.lepikhov@postgrespro.ru>)
List pgsql-hackers
Andrey Lepikhov <a.lepikhov@postgrespro.ru> writes:
> Complaint is about auto-generated query with 1E4 simple union all's (see 
> t.sh to generate a demo script). The reason: in REL_11_STABLE it is 
> planned and executed in a second, but REL_12_STABLE and beyond makes 
> matters worse: planning of such a query needs tons of gigabytes of RAM.

v11 (and prior versions) sucks just as badly.  In this example it
accidentally escapes trouble because it doesn't know how to pull up
a subquery with empty FROM.  But if you make the query look like

SELECT 1,1 FROM dual
  UNION ALL
SELECT 2,2 FROM dual
  UNION ALL
SELECT 3,3 FROM dual
...

then v11 chokes as well.  (Seems like we've overlooked the need
for check_stack_depth() and CHECK_FOR_INTERRUPTS() here ...)

> Superficial study revealed possibly unnecessary operations that could be 
> avoided:
> 1. Walking across a query by calling substitute_phv_relids() even if 
> lastPHId shows that no one phv is presented.

Yeah, we could do that, and it'd help some.

> 2. Iterative passes along the append_rel_list for replacing vars in the 
> translated_vars field. I can't grasp real necessity of passing all the 
> append_rel_list during flattening of an union all leaf subquery. No one 
> can reference this leaf, isn't it?

After thinking about that for awhile, I believe we can go further:
the containing_appendrel is actually the *only* part of the upper
query that needs to be adjusted.  So we could do something like
the attached.

This passes check-world, but I don't have quite enough confidence
in it to just commit it.

            regards, tom lane

diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index c2239d18b4..cb2c1ef5e0 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -27,6 +27,7 @@

 #include "catalog/pg_type.h"
 #include "funcapi.h"
+#include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/multibitmapset.h"
 #include "nodes/nodeFuncs.h"
@@ -127,7 +128,7 @@ static bool find_dependent_phvs_in_jointree(PlannerInfo *root,
                                             Node *node, int varno);
 static void substitute_phv_relids(Node *node,
                                   int varno, Relids subrelids);
-static void fix_append_rel_relids(List *append_rel_list, int varno,
+static void fix_append_rel_relids(PlannerInfo *root, int varno,
                                   Relids subrelids);
 static Node *find_jointree_node_for_rel(Node *jtnode, int relid);

@@ -805,6 +806,11 @@ pull_up_subqueries_recurse(PlannerInfo *root, Node *jtnode,
                            JoinExpr *lowest_nulling_outer_join,
                            AppendRelInfo *containing_appendrel)
 {
+    /* Since this function recurses, it could be driven to stack overflow. */
+    check_stack_depth();
+    /* Also, since it's a bit expensive, let's check for query cancel. */
+    CHECK_FOR_INTERRUPTS();
+
     Assert(jtnode != NULL);
     if (IsA(jtnode, RangeTblRef))
     {
@@ -1229,8 +1235,9 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
         Relids        subrelids;

         subrelids = get_relids_in_jointree((Node *) subquery->jointree, false);
-        substitute_phv_relids((Node *) parse, varno, subrelids);
-        fix_append_rel_relids(root->append_rel_list, varno, subrelids);
+        if (root->glob->lastPHId != 0)
+            substitute_phv_relids((Node *) parse, varno, subrelids);
+        fix_append_rel_relids(root, varno, subrelids);
     }

     /*
@@ -1409,7 +1416,10 @@ pull_up_union_leaf_queries(Node *setOp, PlannerInfo *root, int parentRTindex,

         /*
          * Recursively apply pull_up_subqueries to the new child RTE.  (We
-         * must build the AppendRelInfo first, because this will modify it.)
+         * must build the AppendRelInfo first, because this will modify it;
+         * indeed, that's the only part of the upper query where Vars
+         * referencing childRTindex can exist at this point.)
+         *
          * Note that we can pass NULL for containing-join info even if we're
          * actually under an outer join, because the child's expressions
          * aren't going to propagate up to the join.  Also, we ignore the
@@ -2107,6 +2117,25 @@ perform_pullup_replace_vars(PlannerInfo *root,
     Query       *parse = root->parse;
     ListCell   *lc;

+    /*
+     * If we are considering an appendrel child subquery (that is, a UNION ALL
+     * member query that we're pulling up), then the only part of the upper
+     * query that could reference the child yet is the translated_vars list of
+     * the containing appendrel.  Furthermore, we do not need to insert PHVs
+     * in the appendrel --- there isn't any outer join between.
+     */
+    if (containing_appendrel)
+    {
+        bool        save_need_phvs = rvcontext->need_phvs;
+
+        rvcontext->need_phvs = false;
+        containing_appendrel->translated_vars = (List *)
+            pullup_replace_vars((Node *) containing_appendrel->translated_vars,
+                                rvcontext);
+        rvcontext->need_phvs = save_need_phvs;
+        return;
+    }
+
     /*
      * Replace all of the top query's references to the subquery's outputs
      * with copies of the adjusted subtlist items, being careful not to
@@ -2160,22 +2189,14 @@ perform_pullup_replace_vars(PlannerInfo *root,
     parse->havingQual = pullup_replace_vars(parse->havingQual, rvcontext);

     /*
-     * Replace references in the translated_vars lists of appendrels.  When
-     * pulling up an appendrel member, we do not need PHVs in the list of the
-     * parent appendrel --- there isn't any outer join between.  Elsewhere,
-     * use PHVs for safety.  (This analysis could be made tighter but it seems
-     * unlikely to be worth much trouble.)
+     * Replace references in the translated_vars lists of appendrels.
      */
     foreach(lc, root->append_rel_list)
     {
         AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lc);
-        bool        save_need_phvs = rvcontext->need_phvs;

-        if (appinfo == containing_appendrel)
-            rvcontext->need_phvs = false;
         appinfo->translated_vars = (List *)
             pullup_replace_vars((Node *) appinfo->translated_vars, rvcontext);
-        rvcontext->need_phvs = save_need_phvs;
     }

     /*
@@ -3345,8 +3366,9 @@ remove_result_refs(PlannerInfo *root, int varno, Node *newjtloc)

         subrelids = get_relids_in_jointree(newjtloc, false);
         Assert(!bms_is_empty(subrelids));
-        substitute_phv_relids((Node *) root->parse, varno, subrelids);
-        fix_append_rel_relids(root->append_rel_list, varno, subrelids);
+        if (root->glob->lastPHId != 0)
+            substitute_phv_relids((Node *) root->parse, varno, subrelids);
+        fix_append_rel_relids(root, varno, subrelids);
     }

     /*
@@ -3565,7 +3587,7 @@ substitute_phv_relids(Node *node, int varno, Relids subrelids)
  * We assume we may modify the AppendRelInfo nodes in-place.
  */
 static void
-fix_append_rel_relids(List *append_rel_list, int varno, Relids subrelids)
+fix_append_rel_relids(PlannerInfo *root, int varno, Relids subrelids)
 {
     ListCell   *l;
     int            subvarno = -1;
@@ -3576,7 +3598,7 @@ fix_append_rel_relids(List *append_rel_list, int varno, Relids subrelids)
      * AppendRelInfo nodes refer to it.  So compute it on first use. Note that
      * bms_singleton_member will complain if set is not singleton.
      */
-    foreach(l, append_rel_list)
+    foreach(l, root->append_rel_list)
     {
         AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);

@@ -3591,8 +3613,9 @@ fix_append_rel_relids(List *append_rel_list, int varno, Relids subrelids)
         }

         /* Also fix up any PHVs in its translated vars */
-        substitute_phv_relids((Node *) appinfo->translated_vars,
-                              varno, subrelids);
+        if (root->glob->lastPHId != 0)
+            substitute_phv_relids((Node *) appinfo->translated_vars,
+                                  varno, subrelids);
     }
 }


pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: New strategies for freezing, advancing relfrozenxid early
Next
From: Thomas Munro
Date:
Subject: Re: Direct I/O