Proposed fixes for planner regressions from June to release - Mailing list pgsql-patches

From Tom Lane
Subject Proposed fixes for planner regressions from June to release
Date
Msg-id 13943.1165780664@sss.pgh.pa.us
Whole thread Raw
Responses Re: Proposed fixes for planner regressions from June torelease  ("Simon Riggs" <simon@2ndquadrant.com>)
List pgsql-patches
Arjen van der Meijden reported here:
http://archives.postgresql.org/pgsql-performance/2006-12/msg00041.php
that 8.2rc1 was noticeably slower on his query mix than a CVS pull of
3-June-2006 had been.  I've been looking into it with him off-list,
and I believe that the problem is one of poor planning due to
misestimation of costs for queries with large IN-lists (that is,
indexable ScalarArrayOpExpr conditions with large array constants).
Specifically the planner tends to choose bitmapscans when it shouldn't.
This code is all new in 8.2 so it's not exactly surprising that there
might be a few glitches.  His development snapshot coincidentally
avoided the problem because that was just before we tweaked the planner
to favor indexscans (including bitmap scans) more on the inside of
nestloops.

The attached proposed patch brings Arjen's query mix back to being as
fast or faster than the 3-June snapshot.  There are four elements to the
patch; in order of increasing controversialness they are:

1. Fix a flat-out logic error in genericcostestimate()'s handling of the
CPU cost estimate for a scan involving a ScalarArrayOpExpr indexqual
(which results in multiple physical indexscans within the BitmapIndexScan
plan node).  numIndexTuples is per physical scan, so it has to be
multiplied by num_sa_scans to get the correct per-plan-node-execution
cost estimate.

2. Fix another logic error that occurred when a ScalarArrayOpExpr qual
applies to a lower-order index column that doesn't have equality quals
on all the index columns to its left.  In this situation the
ScalarArrayOpExpr doesn't contribute to reducing the fraction of the
index that has to be scanned, but genericcostestimate() was still
dividing numIndexTuples by num_sa_scans.  This is a case of "premature
optimization is the root of all evil" :-( ... I was trying to avoid
calculating num_sa_scans twice, but in fact it's necessary to compute it
separately in btcostestimate and genericcostestimate, since lower-order
ScalarArrayOpExprs should be included in the count for the latter's
purposes and not the former's.  So, do the correction of numIndexTuples
in btcostestimate, and don't let genericcostestimate adjust a passed-in
estimate.

3. In costsize.c, charge a small amount extra per tuple retrieved by a
bitmap indexscan (I set it at 0.1 * cpu_operator_cost), on the grounds
that entering the tuple into the bitmap should cost something.  The real
reason for doing this though is that for the case where a nestloop with
inner indexscan expects to retrieve a single tuple from the inner
relation on each iteration, 8.2 release is computing exactly the same
cost (within roundoff error) to do the inner scan as a plain or bitmap
indexscan --- and depending on how the roundoff error goes, it not
infrequently chooses the bitmap scan.  This is obviously silly, since a
bitmap scan has more overhead than a plain indexscan, and no way to win
when retrieving a single heap tuple.  There is more than one way we
could deal with this problem, but adding an explicit CPU-cost charge for
the extra overhead seems reasonable.

4. In genericcostestimate(), add a CPU-cost charge to reflect the CPU
effort involved in initiating an indexscan (for instance, the analysis
of the index keys that btree does).  To some extent this charge also
covers the cost of descending the btree.  Before 8.2 we had effectively
priced those costs in by counting an explicit disk access to the btree
metapage, which was clearly too high --- but it seems that zero is too
low, too.  I set the number at 100 * cpu_operator_cost, which may sound
high but it seems about right on the strength of Arjen's measurements.
It's possible that we should make it vary with the number of indexquals;
anyone have an opinion about that?

Comments in general?  I'm a bit worried that these changes will bias us
against indexscans too much, but there's no sign of that in Arjen's
results.

            regards, tom lane

Index: src/backend/optimizer/path/costsize.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/optimizer/path/costsize.c,v
retrieving revision 1.169
diff -c -r1.169 costsize.c
*** src/backend/optimizer/path/costsize.c    11 Nov 2006 01:14:19 -0000    1.169
--- src/backend/optimizer/path/costsize.c    10 Dec 2006 19:19:56 -0000
***************
*** 614,619 ****
--- 614,626 ----
      {
          *cost = ((IndexPath *) path)->indextotalcost;
          *selec = ((IndexPath *) path)->indexselectivity;
+         /*
+          * Charge a small amount per retrieved tuple to reflect the costs of
+          * manipulating the bitmap.  This is mostly to make sure that a bitmap
+          * scan doesn't look to be the same cost as an indexscan to retrieve
+          * a single tuple.
+          */
+         *cost += 0.1 * cpu_operator_cost * ((IndexPath *) path)->rows;
      }
      else if (IsA(path, BitmapAndPath))
      {
Index: src/backend/utils/adt/selfuncs.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/adt/selfuncs.c,v
retrieving revision 1.214
diff -c -r1.214 selfuncs.c
*** src/backend/utils/adt/selfuncs.c    4 Oct 2006 00:29:59 -0000    1.214
--- src/backend/utils/adt/selfuncs.c    10 Dec 2006 19:19:56 -0000
***************
*** 4673,4686 ****
       * way in order to get the right answer for partial indexes.
       */
      if (numIndexTuples <= 0.0)
          numIndexTuples = *indexSelectivity * index->rel->tuples;

!     /*
!      * The estimate obtained so far counts all the tuples returned by all
!      * scans of ScalarArrayOpExpr calls.  We want to consider the per-scan
!      * number, so adjust.  This is a handy place to round to integer, too.
!      */
!     numIndexTuples = rint(numIndexTuples / num_sa_scans);

      /*
       * We can bound the number of tuples by the index size in any case. Also,
--- 4673,4690 ----
       * way in order to get the right answer for partial indexes.
       */
      if (numIndexTuples <= 0.0)
+     {
          numIndexTuples = *indexSelectivity * index->rel->tuples;

!         /*
!          * The above calculation counts all the tuples visited across all
!          * scans induced by ScalarArrayOpExpr nodes.  We want to consider the
!          * average per-indexscan number, so adjust.  This is a handy place to
!          * round to integer, too.  (If caller supplied tuple estimate, it's
!          * responsible for handling these considerations.)
!          */
!         numIndexTuples = rint(numIndexTuples / num_sa_scans);
!     }

      /*
       * We can bound the number of tuples by the index size in any case. Also,
***************
*** 4786,4792 ****
       * evaluated once at the start of the scan to reduce them to runtime keys
       * to pass to the index AM (see nodeIndexscan.c).  We model the per-tuple
       * CPU costs as cpu_index_tuple_cost plus one cpu_operator_cost per
!      * indexqual operator.
       *
       * Note: this neglects the possible costs of rechecking lossy operators
       * and OR-clause expressions.  Detecting that that might be needed seems
--- 4790,4798 ----
       * evaluated once at the start of the scan to reduce them to runtime keys
       * to pass to the index AM (see nodeIndexscan.c).  We model the per-tuple
       * CPU costs as cpu_index_tuple_cost plus one cpu_operator_cost per
!      * indexqual operator.  Because we have numIndexTuples as a per-scan
!      * number, we have to multiply by num_sa_scans to get the correct result
!      * for ScalarArrayOpExpr cases.
       *
       * Note: this neglects the possible costs of rechecking lossy operators
       * and OR-clause expressions.  Detecting that that might be needed seems
***************
*** 4801,4807 ****
          qual_arg_cost = 0;
      *indexStartupCost = qual_arg_cost;
      *indexTotalCost += qual_arg_cost;
!     *indexTotalCost += numIndexTuples * (cpu_index_tuple_cost + qual_op_cost);

      /*
       * Generic assumption about index correlation: there isn't any.
--- 4807,4826 ----
          qual_arg_cost = 0;
      *indexStartupCost = qual_arg_cost;
      *indexTotalCost += qual_arg_cost;
!     *indexTotalCost += numIndexTuples * num_sa_scans * (cpu_index_tuple_cost + qual_op_cost);
!
!     /*
!      * We also add a CPU-cost component to represent the general costs of
!      * starting an indexscan (such as analysis of btree index keys).  This
!      * is estimated at 100x cpu_operator_cost, which is a bit arbitrary but
!      * seems the right order of magnitude.
!      *
!      * Although this is startup cost with respect to any one scan, we add
!      * it to the "total" cost component because it's only very interesting
!      * in the many-ScalarArrayOpExpr-scan case, and there it will be paid
!      * over the life of the scan node.
!      */
!     *indexTotalCost += num_sa_scans * 100.0 * cpu_operator_cost;

      /*
       * Generic assumption about index correlation: there isn't any.
***************
*** 4829,4834 ****
--- 4848,4854 ----
      int            indexcol;
      bool        eqQualHere;
      bool        found_saop;
+     double        num_sa_scans;
      ListCell   *l;

      /*
***************
*** 4852,4857 ****
--- 4872,4878 ----
      indexcol = 0;
      eqQualHere = false;
      found_saop = false;
+     num_sa_scans = 1;
      foreach(l, indexQuals)
      {
          RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
***************
*** 4928,4933 ****
--- 4949,4963 ----
          Assert(op_strategy != 0);        /* not a member of opclass?? */
          if (op_strategy == BTEqualStrategyNumber)
              eqQualHere = true;
+         /* count up number of SA scans induced by indexBoundQuals only */
+         if (IsA(clause, ScalarArrayOpExpr))
+         {
+             ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+             int            alength = estimate_array_length(lsecond(saop->args));
+
+             if (alength > 1)
+                 num_sa_scans *= alength;
+         }
          indexBoundQuals = lappend(indexBoundQuals, rinfo);
      }

***************
*** 4949,4954 ****
--- 4979,4990 ----
                                                    index->rel->relid,
                                                    JOIN_INNER);
          numIndexTuples = btreeSelectivity * index->rel->tuples;
+         /*
+          * As in genericcostestimate(), we have to adjust for any
+          * ScalarArrayOpExpr quals included in indexBoundQuals, and then
+          * round to integer.
+          */
+         numIndexTuples = rint(numIndexTuples / num_sa_scans);
      }

      genericcostestimate(root, index, indexQuals, outer_rel, numIndexTuples,

pgsql-patches by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] Configuring BLCKSZ and XLOGSEGSZ (in 8.3)
Next
From: Jeff Davis
Date:
Subject: Synchronized Scan patch