Speeding up ExecProject - Mailing list pgsql-hackers

From Tom Lane
Subject Speeding up ExecProject
Date
Msg-id 20179.1238622695@sss.pgh.pa.us
Whole thread Raw
List pgsql-hackers
I got interested in why trivial window functions looked really slow,
when using this example in the regression database:

regression=# explain analyze select *, row_number() over (order by unique2) from tenk1;
                                                              QUERY PLAN
               

---------------------------------------------------------------------------------------------------------------------------------------
 WindowAgg  (cost=0.00..781.25 rows=10000 width=244) (actual time=0.036..28.923 rows=10000 loops=1)
   ->  Index Scan using tenk1_unique2 on tenk1  (cost=0.00..631.25 rows=10000 width=244) (actual time=0.023..5.727
rows=10000loops=1) 
 Total runtime: 30.423 ms
(3 rows)

(Note: all examples in this post are the median of three tries, since
the timings are a bit noisy.  That means the data is fully cached.
This is CVS HEAD on Fedora 10 x86_64.)

Profiling soon revealed that the bulk of the runtime was going into
ExecProject() and subsidiary expression-evaluation functions.  In fact,
the problem can be illustrated without any window functions at all:

regression=# explain analyze select * from tenk1;
                                                QUERY PLAN
-----------------------------------------------------------------------------------------------------------
 Seq Scan on tenk1  (cost=0.00..458.00 rows=10000 width=244) (actual time=0.008..2.250 rows=10000 loops=1)
 Total runtime: 3.652 ms
(2 rows)

regression=# explain analyze select *,1 from tenk1;
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Seq Scan on tenk1  (cost=0.00..458.00 rows=10000 width=244) (actual time=0.011..13.232 rows=10000 loops=1)
 Total runtime: 14.602 ms
(2 rows)

The above does not show that ExecEvalConst is 5x slower than the rest
of the query put together.  Rather, what is happening is that the moment
the query targetlist contains anything but Vars, we stop using the
optimized ExecVariableList() code and fall back on the generic code in
ExecTargetList().  There are rather a lot of columns in tenk1 (16 to be
exact) and so this is a big cost.

It occurred to me that we could refactor ExecProject and associated
logic so that we use ExecVariableList-like code for all simple Vars in a
targetlist, and only invoke the full expression evaluation machinery for
the non-Var members of a tlist (if any).  The attached patch prototypes
this idea (it's code-complete, but the comments suck...).  With the
patch, adding ,1 costs a lot less:

regression=# explain analyze select * from tenk1;
                                                QUERY PLAN
-----------------------------------------------------------------------------------------------------------
 Seq Scan on tenk1  (cost=0.00..458.00 rows=10000 width=244) (actual time=0.009..2.345 rows=10000 loops=1)
 Total runtime: 3.780 ms
(2 rows)

regression=# explain analyze select *,1 from tenk1;
                                                QUERY PLAN
-----------------------------------------------------------------------------------------------------------
 Seq Scan on tenk1  (cost=0.00..458.00 rows=10000 width=244) (actual time=0.010..5.949 rows=10000 loops=1)
 Total runtime: 7.396 ms
(2 rows)

and the original example looks better too:

regression=# explain analyze select *, row_number() over (order by unique2) from tenk1;
                                                              QUERY PLAN
               

---------------------------------------------------------------------------------------------------------------------------------------
 WindowAgg  (cost=0.00..781.25 rows=10000 width=244) (actual time=0.035..18.727 rows=10000 loops=1)
   ->  Index Scan using tenk1_unique2 on tenk1  (cost=0.00..631.25 rows=10000 width=244) (actual time=0.025..5.146
rows=10000loops=1) 
 Total runtime: 20.228 ms
(3 rows)

I may be overreacting to the EXPLAIN ANALYZE timings.  In practice,
if we were actually returning so many columns to the client, I/O
conversions and data transmission would exceed the ExecProject effort
by a significant margin.  Still, there are lots of cases where
targetlists contain a lot of simple Vars, so this looks like it's
probably worth doing.  Comments/objections?

            regards, tom lane

Index: src/backend/executor/execQual.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/executor/execQual.c,v
retrieving revision 1.243
diff -c -r1.243 execQual.c
*** src/backend/executor/execQual.c    27 Mar 2009 18:30:21 -0000    1.243
--- src/backend/executor/execQual.c    1 Apr 2009 21:44:07 -0000
***************
*** 4987,4995 ****
      /*
       * evaluate all the expressions in the target list
       */
-     if (isDone)
-         *isDone = ExprSingleResult;        /* until proven otherwise */
-
      haveDoneSets = false;        /* any exhausted set exprs in tlist? */

      foreach(tl, targetlist)
--- 4987,4992 ----
***************
*** 5116,5150 ****
                   bool *isnull)
  {
      ExprContext *econtext = projInfo->pi_exprContext;
-     int           *varSlotOffsets = projInfo->pi_varSlotOffsets;
-     int           *varNumbers = projInfo->pi_varNumbers;
-     int            i;

      /*
!      * Force extraction of all input values that we need.
       */
!     if (projInfo->pi_lastInnerVar > 0)
!         slot_getsomeattrs(econtext->ecxt_innertuple,
!                           projInfo->pi_lastInnerVar);
!     if (projInfo->pi_lastOuterVar > 0)
!         slot_getsomeattrs(econtext->ecxt_outertuple,
!                           projInfo->pi_lastOuterVar);
!     if (projInfo->pi_lastScanVar > 0)
!         slot_getsomeattrs(econtext->ecxt_scantuple,
!                           projInfo->pi_lastScanVar);

!     /*
!      * Assign to result by direct extraction of fields from source slots ... a
!      * mite ugly, but fast ...
!      */
!     for (i = list_length(projInfo->pi_targetlist) - 1; i >= 0; i--)
      {
!         char       *slotptr = ((char *) econtext) + varSlotOffsets[i];
!         TupleTableSlot *varSlot = *((TupleTableSlot **) slotptr);
!         int            varNumber = varNumbers[i] - 1;

!         values[i] = varSlot->tts_values[varNumber];
!         isnull[i] = varSlot->tts_isnull[varNumber];
      }
  }

--- 5113,5159 ----
                   bool *isnull)
  {
      ExprContext *econtext = projInfo->pi_exprContext;

      /*
!      * Assign simple Vars to result by direct extraction of fields from source
!      * slots ... a mite ugly, but fast ...
       */
!     if (projInfo->pi_directMap)
!     {
!         /* especially simple case where vars go to output in order */
!         int       *varSlotOffsets = projInfo->pi_varSlotOffsets;
!         int       *varNumbers = projInfo->pi_varNumbers;
!         int        numSimpleVars = projInfo->pi_numSimpleVars;
!         int        i;

!         for (i = 0; i < numSimpleVars; i++)
!         {
!             char       *slotptr = ((char *) econtext) + varSlotOffsets[i];
!             TupleTableSlot *varSlot = *((TupleTableSlot **) slotptr);
!             int            varNumber = varNumbers[i] - 1;
!
!             values[i] = varSlot->tts_values[varNumber];
!             isnull[i] = varSlot->tts_isnull[varNumber];
!         }
!     }
!     else
      {
!         int       *varSlotOffsets = projInfo->pi_varSlotOffsets;
!         int       *varNumbers = projInfo->pi_varNumbers;
!         int       *varOutputCols = projInfo->pi_varOutputCols;
!         int        numSimpleVars = projInfo->pi_numSimpleVars;
!         int        i;
!
!         for (i = 0; i < numSimpleVars; i++)
!         {
!             char       *slotptr = ((char *) econtext) + varSlotOffsets[i];
!             TupleTableSlot *varSlot = *((TupleTableSlot **) slotptr);
!             int            varNumber = varNumbers[i] - 1;
!             int            varOutputCol = varOutputCols[i] - 1;

!             values[varOutputCol] = varSlot->tts_values[varNumber];
!             isnull[varOutputCol] = varSlot->tts_isnull[varNumber];
!         }
      }
  }

***************
*** 5165,5170 ****
--- 5174,5180 ----
  ExecProject(ProjectionInfo *projInfo, ExprDoneCond *isDone)
  {
      TupleTableSlot *slot;
+     ExprContext *econtext = projInfo->pi_exprContext;

      /*
       * sanity checks
***************
*** 5184,5212 ****
      ExecClearTuple(slot);

      /*
       * form a new result tuple (if possible); if successful, mark the result
       * slot as containing a valid virtual tuple
       */
!     if (projInfo->pi_isVarList)
      {
!         /* simple Var list: this always succeeds with one result row */
!         if (isDone)
!             *isDone = ExprSingleResult;
!         ExecVariableList(projInfo,
!                          slot->tts_values,
!                          slot->tts_isnull);
!         ExecStoreVirtualTuple(slot);
!     }
!     else
!     {
!         if (ExecTargetList(projInfo->pi_targetlist,
!                            projInfo->pi_exprContext,
!                            slot->tts_values,
!                            slot->tts_isnull,
!                            projInfo->pi_itemIsDone,
!                            isDone))
!             ExecStoreVirtualTuple(slot);
      }

      return slot;
  }
--- 5194,5235 ----
      ExecClearTuple(slot);

      /*
+      * Force extraction of all input values that we'll need.
+      */
+     if (projInfo->pi_lastInnerVar > 0)
+         slot_getsomeattrs(econtext->ecxt_innertuple,
+                           projInfo->pi_lastInnerVar);
+     if (projInfo->pi_lastOuterVar > 0)
+         slot_getsomeattrs(econtext->ecxt_outertuple,
+                           projInfo->pi_lastOuterVar);
+     if (projInfo->pi_lastScanVar > 0)
+         slot_getsomeattrs(econtext->ecxt_scantuple,
+                           projInfo->pi_lastScanVar);
+
+     ExecVariableList(projInfo,
+                      slot->tts_values,
+                      slot->tts_isnull);
+
+     /* Assume single result row until proven otherwise */
+     if (isDone)
+         *isDone = ExprSingleResult;
+
+     /*
       * form a new result tuple (if possible); if successful, mark the result
       * slot as containing a valid virtual tuple
       */
!     if (projInfo->pi_targetlist)
      {
!         if (!ExecTargetList(projInfo->pi_targetlist,
!                             projInfo->pi_exprContext,
!                             slot->tts_values,
!                             slot->tts_isnull,
!                             projInfo->pi_itemIsDone,
!                             isDone))
!             return slot;        /* no result rows, return empty slot */
      }

+     ExecStoreVirtualTuple(slot);
+
      return slot;
  }
Index: src/backend/executor/execUtils.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/executor/execUtils.c,v
retrieving revision 1.157
diff -c -r1.157 execUtils.c
*** src/backend/executor/execUtils.c    1 Jan 2009 17:23:41 -0000    1.157
--- src/backend/executor/execUtils.c    1 Apr 2009 21:44:07 -0000
***************
*** 46,51 ****
--- 46,52 ----
  #include "access/heapam.h"
  #include "catalog/index.h"
  #include "executor/execdebug.h"
+ #include "nodes/nodeFuncs.h"
  #include "parser/parsetree.h"
  #include "utils/memutils.h"
  #include "utils/relcache.h"
***************
*** 66,71 ****
--- 67,73 ----
  int            NIndexTupleProcessed;


+ static bool get_last_attnums(Node *node, ProjectionInfo *projInfo);
  static void ShutdownExprContext(ExprContext *econtext);


***************
*** 559,679 ****
                          TupleDesc inputDesc)
  {
      ProjectionInfo *projInfo = makeNode(ProjectionInfo);
!     int            len;
!     bool        isVarList;
      ListCell   *tl;

-     len = ExecTargetListLength(targetList);
-
-     projInfo->pi_targetlist = targetList;
      projInfo->pi_exprContext = econtext;
      projInfo->pi_slot = slot;
!
!     /*
!      * Determine whether the target list consists entirely of simple Var
!      * references (ie, references to non-system attributes) that match the
!      * input.  If so, we can use the simpler ExecVariableList instead of
!      * ExecTargetList.    (Note: if there is a type mismatch then ExecEvalVar
!      * will probably throw an error at runtime, but we leave that to it.)
!      */
!     isVarList = true;
      foreach(tl, targetList)
      {
          GenericExprState *gstate = (GenericExprState *) lfirst(tl);
          Var           *variable = (Var *) gstate->arg->expr;
!         Form_pg_attribute attr;

!         if (variable == NULL ||
!             !IsA(variable, Var) ||
!             variable->varattno <= 0)
!         {
!             isVarList = false;
!             break;
!         }
!         if (!inputDesc)
!             continue;            /* can't check type, assume OK */
!         if (variable->varattno > inputDesc->natts)
          {
!             isVarList = false;
!             break;
!         }
!         attr = inputDesc->attrs[variable->varattno - 1];
!         if (attr->attisdropped || variable->vartype != attr->atttypid)
!         {
!             isVarList = false;
!             break;
!         }
!     }
!     projInfo->pi_isVarList = isVarList;
!
!     if (isVarList)
!     {
!         int           *varSlotOffsets;
!         int           *varNumbers;
!         AttrNumber    lastInnerVar = 0;
!         AttrNumber    lastOuterVar = 0;
!         AttrNumber    lastScanVar = 0;

!         projInfo->pi_itemIsDone = NULL; /* not needed */
!         projInfo->pi_varSlotOffsets = varSlotOffsets = (int *)
!             palloc0(len * sizeof(int));
!         projInfo->pi_varNumbers = varNumbers = (int *)
!             palloc0(len * sizeof(int));

!         /*
!          * Set up the data needed by ExecVariableList.    The slots in which the
!          * variables can be found at runtime are denoted by the offsets of
!          * their slot pointers within the econtext.  This rather grotty
!          * representation is needed because the caller may not have given us
!          * the real econtext yet (see hacks in nodeSubplan.c).
!          */
!         foreach(tl, targetList)
          {
-             GenericExprState *gstate = (GenericExprState *) lfirst(tl);
-             Var           *variable = (Var *) gstate->arg->expr;
-             AttrNumber    attnum = variable->varattno;
              TargetEntry *tle = (TargetEntry *) gstate->xprstate.expr;
!             AttrNumber    resind = tle->resno - 1;

!             Assert(resind >= 0 && resind < len);
!             varNumbers[resind] = attnum;

              switch (variable->varno)
              {
                  case INNER:
!                     varSlotOffsets[resind] = offsetof(ExprContext,
!                                                       ecxt_innertuple);
!                     lastInnerVar = Max(lastInnerVar, attnum);
                      break;

                  case OUTER:
!                     varSlotOffsets[resind] = offsetof(ExprContext,
!                                                       ecxt_outertuple);
!                     lastOuterVar = Max(lastOuterVar, attnum);
                      break;

                  default:
!                     varSlotOffsets[resind] = offsetof(ExprContext,
!                                                       ecxt_scantuple);
!                     lastScanVar = Max(lastScanVar, attnum);
                      break;
              }
          }
-         projInfo->pi_lastInnerVar = lastInnerVar;
-         projInfo->pi_lastOuterVar = lastOuterVar;
-         projInfo->pi_lastScanVar = lastScanVar;
      }
      else
-     {
          projInfo->pi_itemIsDone = (ExprDoneCond *)
              palloc(len * sizeof(ExprDoneCond));
-         projInfo->pi_varSlotOffsets = NULL;
-         projInfo->pi_varNumbers = NULL;
-     }

      return projInfo;
  }

  /* ----------------
   *        ExecAssignProjectionInfo
   *
--- 561,716 ----
                          TupleDesc inputDesc)
  {
      ProjectionInfo *projInfo = makeNode(ProjectionInfo);
!     int            len = ExecTargetListLength(targetList);
!     int           *varSlotOffsets;
!     int           *varNumbers;
!     int           *varOutputCols;
!     List       *exprlist;
!     int            numSimpleVars;
!     bool        directMap;
      ListCell   *tl;

      projInfo->pi_exprContext = econtext;
      projInfo->pi_slot = slot;
!     projInfo->pi_varSlotOffsets = varSlotOffsets = (int *)
!         palloc(len * sizeof(int));
!     projInfo->pi_varNumbers = varNumbers = (int *)
!         palloc(len * sizeof(int));
!     projInfo->pi_varOutputCols = varOutputCols = (int *)
!         palloc(len * sizeof(int));
!     projInfo->pi_lastInnerVar = 0;
!     projInfo->pi_lastOuterVar = 0;
!     projInfo->pi_lastScanVar = 0;
!
!     /*
!      * We separate the target list into simple Var references and expressions
!      * which require the full ExecTargetList machinery.  To be a simple Var,
!      * a Var has to be a user attribute and not mismatch the inputDesc.
!      * (Note: if there is a type mismatch then ExecEvalVar will probably throw
!      * an error at runtime, but we leave that to it.)
!      */
!     exprlist = NIL;
!     numSimpleVars = 0;
!     directMap = true;
      foreach(tl, targetList)
      {
          GenericExprState *gstate = (GenericExprState *) lfirst(tl);
          Var           *variable = (Var *) gstate->arg->expr;
!         bool        isSimpleVar = false;

!         if (variable != NULL &&
!             IsA(variable, Var) &&
!             variable->varattno > 0)
          {
!             if (!inputDesc)
!                 isSimpleVar = true;        /* can't check type, assume OK */
!             else if (variable->varattno <= inputDesc->natts)
!             {
!                 Form_pg_attribute attr;

!                 attr = inputDesc->attrs[variable->varattno - 1];
!                 if (!attr->attisdropped && variable->vartype == attr->atttypid)
!                     isSimpleVar = true;
!             }
!         }

!         if (isSimpleVar)
          {
              TargetEntry *tle = (TargetEntry *) gstate->xprstate.expr;
!             AttrNumber    attnum = variable->varattno;

!             varNumbers[numSimpleVars] = attnum;
!             varOutputCols[numSimpleVars] = tle->resno;
!             if (tle->resno != numSimpleVars+1)
!                 directMap = false;

              switch (variable->varno)
              {
                  case INNER:
!                     varSlotOffsets[numSimpleVars] = offsetof(ExprContext,
!                                                              ecxt_innertuple);
!                     if (projInfo->pi_lastInnerVar < attnum)
!                         projInfo->pi_lastInnerVar = attnum;
                      break;

                  case OUTER:
!                     varSlotOffsets[numSimpleVars] = offsetof(ExprContext,
!                                                              ecxt_outertuple);
!                     if (projInfo->pi_lastOuterVar < attnum)
!                         projInfo->pi_lastOuterVar = attnum;
                      break;

                  default:
!                     varSlotOffsets[numSimpleVars] = offsetof(ExprContext,
!                                                              ecxt_scantuple);
!                     if (projInfo->pi_lastScanVar < attnum)
!                         projInfo->pi_lastScanVar = attnum;
                      break;
              }
+             numSimpleVars++;
+         }
+         else
+         {
+             /* Not a simple variable, add it to generic targetlist */
+             exprlist = lappend(exprlist, gstate);
+             get_last_attnums((Node *) variable, projInfo);
          }
      }
+     projInfo->pi_targetlist = exprlist;
+     projInfo->pi_numSimpleVars = numSimpleVars;
+     projInfo->pi_directMap = directMap;
+
+     if (exprlist == NIL)
+         projInfo->pi_itemIsDone = NULL; /* not needed */
      else
          projInfo->pi_itemIsDone = (ExprDoneCond *)
              palloc(len * sizeof(ExprDoneCond));

      return projInfo;
  }

+ static bool
+ get_last_attnums(Node *node, ProjectionInfo *projInfo)
+ {
+     if (node == NULL)
+         return false;
+     if (IsA(node, Var))
+     {
+         Var       *variable = (Var *) node;
+         AttrNumber    attnum = variable->varattno;
+
+         switch (variable->varno)
+         {
+             case INNER:
+                 if (projInfo->pi_lastInnerVar < attnum)
+                     projInfo->pi_lastInnerVar = attnum;
+                 break;
+
+             case OUTER:
+                 if (projInfo->pi_lastOuterVar < attnum)
+                     projInfo->pi_lastOuterVar = attnum;
+                 break;
+
+             default:
+                 if (projInfo->pi_lastScanVar < attnum)
+                     projInfo->pi_lastScanVar = attnum;
+                 break;
+         }
+         return false;
+     }
+     /*
+      * Don't examine the arguments of Aggrefs or WindowFuncs, because those
+      * do not represent expressions to be evaluated with respect to the
+      * overall targetlist's econtext.
+      */
+     if (IsA(node, Aggref))
+         return false;
+     if (IsA(node, WindowFunc))
+         return false;
+     return expression_tree_walker(node, get_last_attnums,
+                                   (void *) projInfo);
+ }
+
  /* ----------------
   *        ExecAssignProjectionInfo
   *
Index: src/include/nodes/execnodes.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/nodes/execnodes.h,v
retrieving revision 1.202
diff -c -r1.202 execnodes.h
*** src/include/nodes/execnodes.h    21 Mar 2009 00:04:40 -0000    1.202
--- src/include/nodes/execnodes.h    1 Apr 2009 21:44:07 -0000
***************
*** 201,216 ****
   *
   *        The planner very often produces tlists that consist entirely of
   *        simple Var references (lower levels of a plan tree almost always
!  *        look like that).  So we have an optimization to handle that case
!  *        with minimum overhead.
   *
!  *        targetlist        target list for projection
   *        exprContext        expression context in which to evaluate targetlist
   *        slot            slot to place projection result in
!  *        itemIsDone        workspace for ExecProject
!  *        isVarList        TRUE if simple-Var-list optimization applies
   *        varSlotOffsets    array indicating which slot each simple Var is from
!  *        varNumbers        array indicating attr numbers of simple Vars
   *        lastInnerVar    highest attnum from inner tuple slot (0 if none)
   *        lastOuterVar    highest attnum from outer tuple slot (0 if none)
   *        lastScanVar        highest attnum from scan tuple slot (0 if none)
--- 201,225 ----
   *
   *        The planner very often produces tlists that consist entirely of
   *        simple Var references (lower levels of a plan tree almost always
!  *        look like that).  And top-level tlists are often mostly Vars too.
!  *        We therefore optimize execution of simple-Var tlist entries.
!  *        The pi_targetlist list actually contains only the tlist entries that
!  *        aren't simple Vars, and the Vars are processed using the
!  *        varSlotOffsets/varNumbers/varOutputCols arrays.
!  *
!  *        The lastXXXVar fields are used to optimize fetching of fields from
!  *        input tuples: they let us do a slot_getsomeattrs() call to ensure
!  *        that all needed fields are extracted in one pass.
   *
!  *        targetlist        target list for projection (non-Var expressions only)
   *        exprContext        expression context in which to evaluate targetlist
   *        slot            slot to place projection result in
!  *        itemIsDone        workspace array for ExecProject
!  *        directMap        true if varOutputCols[] is an identity map
!  *        numSimpleVars    number of simple Vars found in original tlist
   *        varSlotOffsets    array indicating which slot each simple Var is from
!  *        varNumbers        array indicating input attr numbers of simple Vars
!  *        varOutputCols    array indicating output attr numbers of simple Vars
   *        lastInnerVar    highest attnum from inner tuple slot (0 if none)
   *        lastOuterVar    highest attnum from outer tuple slot (0 if none)
   *        lastScanVar        highest attnum from scan tuple slot (0 if none)
***************
*** 223,231 ****
      ExprContext *pi_exprContext;
      TupleTableSlot *pi_slot;
      ExprDoneCond *pi_itemIsDone;
!     bool        pi_isVarList;
      int           *pi_varSlotOffsets;
      int           *pi_varNumbers;
      int            pi_lastInnerVar;
      int            pi_lastOuterVar;
      int            pi_lastScanVar;
--- 232,242 ----
      ExprContext *pi_exprContext;
      TupleTableSlot *pi_slot;
      ExprDoneCond *pi_itemIsDone;
!     bool        pi_directMap;
!     int            pi_numSimpleVars;
      int           *pi_varSlotOffsets;
      int           *pi_varNumbers;
+     int           *pi_varOutputCols;
      int            pi_lastInnerVar;
      int            pi_lastOuterVar;
      int            pi_lastScanVar;

pgsql-hackers by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: [Snowball-discuss] Snowball release cycle ?
Next
From: Grant Ingersoll
Date:
Subject: Re: [Snowball-discuss] Snowball release cycle ?