Re: Calculation of unused columns - Mailing list pgsql-performance

From Tom Lane
Subject Re: Calculation of unused columns
Date
Msg-id 14664.1255909298@sss.pgh.pa.us
Whole thread Raw
In response to Re: Calculation of unused columns  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Calculation of unused columns
List pgsql-performance
Robert Haas <robertmhaas@gmail.com> writes:
> It's probably true that in MOST of the cases where this comes up, the
> subquery can be flattened, from_collapse_limit permitting.  But I
> think there are other cases, too.

Right ... and from_collapse_limit is not relevant here; only the form of
the subquery is.  So I'd sure like to see some actual use cases before
we decide to expend planning cycles on this.

Just for fun, I hacked together a first cut at this.  It's only about
120 lines but it's a bit cheesy (the limitation to not handling
appendrel members in particular).  It passes regression tests and
seems to do what's wanted, but I'm not convinced it's worth the extra
cycles as-is, let alone with the appendrel limitation fixed.

            regards, tom lane

Index: allpaths.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/optimizer/path/allpaths.c,v
retrieving revision 1.187
diff -c -r1.187 allpaths.c
*** allpaths.c    12 Oct 2009 18:10:45 -0000    1.187
--- allpaths.c    18 Oct 2009 23:26:12 -0000
***************
*** 17,22 ****
--- 17,23 ----

  #include <math.h>

+ #include "nodes/makefuncs.h"
  #include "nodes/nodeFuncs.h"
  #ifdef OPTIMIZER_DEBUG
  #include "nodes/print.h"
***************
*** 75,80 ****
--- 76,83 ----
                     RangeTblEntry *rte, Index rti, Node *qual);
  static void recurse_push_qual(Node *setOp, Query *topquery,
                    RangeTblEntry *rte, Index rti, Node *qual);
+ static void remove_unused_subquery_outputs(PlannerInfo *root, RelOptInfo *rel,
+                                Query *subquery);


  /*
***************
*** 611,616 ****
--- 614,624 ----
      pfree(differentTypes);

      /*
+      * Remove unreferenced subquery outputs, if possible.
+      */
+     remove_unused_subquery_outputs(root, rel, subquery);
+
+     /*
       * We can safely pass the outer tuple_fraction down to the subquery if the
       * outer level has no joining, aggregation, or sorting to do. Otherwise
       * we'd better tell the subquery to plan for full retrieval. (XXX This
***************
*** 1308,1313 ****
--- 1316,1437 ----
  }

  /*****************************************************************************
+  *            REMOVING UNUSED SUBQUERY OUTPUTS
+  *****************************************************************************/
+
+ /*
+  * remove_unused_subquery_outputs
+  *        Modify the subquery's targetlist to suppress useless output columns
+  *
+  * Views very often produce output columns that are unused by a particular
+  * calling query.  Removing unused outputs can be useful if it avoids the
+  * need to calculate expensive functions or permits other optimizations
+  * (such as join removal) within the subquery.
+  *
+  * To avoid messing up the correlation between upper-query Vars and subquery
+  * output resnos, we don't actually delete unwanted tlist items, but replace
+  * them with null Consts.  It's okay to modify the subquery tlist in-place
+  * because caller already made a copy of it.
+  */
+ static void
+ remove_unused_subquery_outputs(PlannerInfo *root, RelOptInfo *rel,
+                                Query *subquery)
+ {
+     Bitmapset  *atts_used;
+     ListCell   *lc;
+     int            i;
+
+     /*
+      * Can't do anything for "other rels" (appendrel members), because we
+      * don't track attr_needed for them --- see set_append_rel_pathlist.
+      * This could be fixed by making that routine do more work, or by
+      * having this routine find the appendrel parent and consult its
+      * attr_needed.  (The attr numbers should be the same, in cases where
+      * a subquery could appear in an appendrel.)  For the moment it's not
+      * clear that handling the case is worth any extra trouble.
+      */
+     if (rel->reloptkind != RELOPT_BASEREL)
+         return;
+
+     /*
+      * Can't do anything if subquery involves a setop or DISTINCT (but not
+      * DISTINCT ON), because then the tlist values are all important to the
+      * subquery's internal semantics.  UNION ALL could possibly be handled,
+      * but we will normally not see that here because it will get flattened
+      * into an appendrel, so there's little point in expending effort on it.
+      * We would actually reject DISTINCT below anyhow, but there's no point
+      * in spending cycles just to discover that all the columns are needed.
+      */
+     if (subquery->setOperations ||
+         (subquery->distinctClause && !subquery->hasDistinctOn))
+         return;
+
+     /*
+      * Identify all the subquery outputs that are used by the outer query,
+      * and build a bitmapset of their attnos.  attr_needed tells us about
+      * everything used in joinquals or tlist, and we have to add anything
+      * used in the surviving restriction clauses.  (So it's important that
+      * this runs only after pushing down any pushable restrictclauses.)
+      * Subqueries can't have negative attnos, so we need not offset the
+      * attnos.
+      */
+     atts_used = NULL;
+     for (i = rel->min_attr; i <= rel->max_attr; i++)
+     {
+         if (!bms_is_empty(rel->attr_needed[i - rel->min_attr]))
+             atts_used = bms_add_member(atts_used, i);
+
+     }
+     foreach(lc, rel->baserestrictinfo)
+     {
+         RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+         List       *vars;
+         ListCell   *vl;
+
+         vars = pull_var_clause((Node *) rinfo->clause,
+                                PVC_RECURSE_PLACEHOLDERS);
+         foreach(vl, vars)
+         {
+             Var           *var = (Var *) lfirst(vl);
+
+             Assert(IsA(var, Var));
+             atts_used = bms_add_member(atts_used, var->varattno);
+         }
+         list_free(vars);
+     }
+
+     /*
+      * If there's a whole-row Var reference then we can't remove anything.
+      */
+     if (bms_is_member(0, atts_used))
+         return;
+
+     /*
+      * Okay, scan the tlist and replace unwanted items with null constants
+      * of the same datatype.
+      */
+     foreach(lc, subquery->targetList)
+     {
+         TargetEntry *tle = (TargetEntry *) lfirst(lc);
+
+         /*
+          * TLEs with nonzero ressortgroupref mustn't be zapped because
+          * they are relevant to the subquery's internal semantics.  We
+          * also don't risk zapping any resjunk entries, though that test
+          * is probably redundant with the ressortgroupref test.
+          */
+         if (tle->ressortgroupref || tle->resjunk)
+             continue;
+
+         if (bms_is_member(tle->resno, atts_used))
+             continue;
+
+         tle->expr = (Expr *) makeNullConst(exprType((Node *) tle->expr),
+                                            exprTypmod((Node *) tle->expr));
+     }
+ }
+
+ /*****************************************************************************
   *            DEBUG SUPPORT
   *****************************************************************************/


pgsql-performance by date:

Previous
From: Robert Haas
Date:
Subject: Re: Calculation of unused columns
Next
From: Robert Haas
Date:
Subject: Re: Improving join performance over multiple moderately wide tables