Limiting the number of parameterized indexpaths created - Mailing list pgsql-hackers

From Tom Lane
Subject Limiting the number of parameterized indexpaths created
Date
Msg-id 21634.1351634224@sss.pgh.pa.us
Whole thread Raw
Responses Re: Limiting the number of parameterized indexpaths created
Re: Limiting the number of parameterized indexpaths created
List pgsql-hackers
I looked into the complaint of unreasonable planner runtime in bug #7626,
http://archives.postgresql.org/pgsql-bugs/2012-10/msg00232.php

In the given example, the indexed relation "foo" has join clauses with
30 other relations.  The code that I added in commit
3b8968f25232ad09001bf35ab4cc59f5a501193e will try all 2^30 combinations
of those rels as possible outer relations for a parameterized indexscan
:-(.  So clearly, the idea that we can just try everything and not have
some kind of heuristic restriction isn't going to work.

This particular example, and probably a lot of similar examples, does
have a principled non-heuristic fix, which comes from the fact that we
recognize all the join clauses as belonging to the same equivalence
class.  Therefore, using more than one of them in an indexscan is
useless.  (The code already does know that and discards the extra
clauses, but that happens too far downstream to prevent the exponential
search time here.)  The extra function eclass_already_used() added in
the attached patch attacks the problem this way, and is sufficient to
resolve the bug for the case presented.

However, we still have an issue for cases where the join clauses aren't
equivalence clauses.  (For instance, if you just change all the "="
signs to "<" in Brian's example, it still takes forever.)  So we also
need some heuristic rule to limit the number of cases considered.

I spent a good deal of time thinking about how the indexscan code might
make use of the joinlist data structure (which prevents exponential
planning time growth overall by limiting which joins we'll consider)
to fix this; the idea being to not consider outer-relation sets that
couldn't actually be presented for use because of joinlist restrictions.
But this looked messy, and probably expensive in itself.  Moreover it
seemed like practical implementations of the idea would carry the
restriction that we could never generate parameterized paths for
joinlist subproblems, which would be a real shame.  And we'd be doing a
lot of work for what is probably a very unusual corner case, given that
I think eclass_already_used() will fix the problem in most real cases.

So what I'm proposing instead, which is implemented in the other half of
the attached patch, is that we simply put an arbitrary limit on how many
outer-relation sets we'll consider while generating parameterized
indexscans.  As written, the patch limits the number of relid sets to 10
times the number of join clauses it's working with, which is small
enough to keep the runtime of this test case to something reasonable.
I think that in practice this will still allow useful
combination-outer-rel cases to be found.  I wonder though if anyone has
a better idea?

            regards, tom lane

diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 444ec2a40e4ac2b0ed8435017149c7664869e34c..78aaca4f279d6ba580643c75d82d82bfc88fba38 100644
*** a/src/backend/optimizer/path/indxpath.c
--- b/src/backend/optimizer/path/indxpath.c
*************** static void consider_index_join_outer_re
*** 92,97 ****
--- 92,98 ----
                                 IndexClauseSet *eclauseset,
                                 List **bitindexpaths,
                                 List *indexjoinclauses,
+                                int considered_clauses,
                                 List **considered_relids);
  static void get_join_index_paths(PlannerInfo *root, RelOptInfo *rel,
                       IndexOptInfo *index,
*************** static void get_join_index_paths(Planner
*** 101,106 ****
--- 102,109 ----
                       List **bitindexpaths,
                       Relids relids,
                       List **considered_relids);
+ static bool eclass_already_used(EquivalenceClass *parent_ec, Relids oldrelids,
+                     List *indexjoinclauses);
  static bool bms_equal_any(Relids relids, List *relids_list);
  static void get_index_paths(PlannerInfo *root, RelOptInfo *rel,
                  IndexOptInfo *index, IndexClauseSet *clauses,
*************** consider_index_join_clauses(PlannerInfo
*** 447,452 ****
--- 450,456 ----
                              IndexClauseSet *eclauseset,
                              List **bitindexpaths)
  {
+     int            considered_clauses = 0;
      List       *considered_relids = NIL;
      int            indexcol;

*************** consider_index_join_clauses(PlannerInfo
*** 460,467 ****
       * filter (qpqual); which is where an available clause would end up being
       * applied if we omit it from the indexquals.
       *
!      * This looks expensive, but in practical cases there won't be very many
!      * distinct sets of outer rels to consider.
       *
       * For simplicity in selecting relevant clauses, we represent each set of
       * outer rels as a maximum set of clause_relids --- that is, the indexed
--- 464,474 ----
       * filter (qpqual); which is where an available clause would end up being
       * applied if we omit it from the indexquals.
       *
!      * This looks expensive, but in most practical cases there won't be very
!      * many distinct sets of outer rels to consider.  As a safety valve when
!      * that's not true, we use a heuristic: limit the number of outer rel sets
!      * considered to a multiple of the number of clauses considered.  (We'll
!      * always consider using each individual join clause, though.)
       *
       * For simplicity in selecting relevant clauses, we represent each set of
       * outer rels as a maximum set of clause_relids --- that is, the indexed
*************** consider_index_join_clauses(PlannerInfo
*** 471,486 ****
--- 478,497 ----
      for (indexcol = 0; indexcol < index->ncolumns; indexcol++)
      {
          /* Consider each applicable simple join clause */
+         considered_clauses += list_length(jclauseset->indexclauses[indexcol]);
          consider_index_join_outer_rels(root, rel, index,
                                         rclauseset, jclauseset, eclauseset,
                                         bitindexpaths,
                                         jclauseset->indexclauses[indexcol],
+                                        considered_clauses,
                                         &considered_relids);
          /* Consider each applicable eclass join clause */
+         considered_clauses += list_length(eclauseset->indexclauses[indexcol]);
          consider_index_join_outer_rels(root, rel, index,
                                         rclauseset, jclauseset, eclauseset,
                                         bitindexpaths,
                                         eclauseset->indexclauses[indexcol],
+                                        considered_clauses,
                                         &considered_relids);
      }
  }
*************** consider_index_join_clauses(PlannerInfo
*** 494,499 ****
--- 505,511 ----
   * 'rel', 'index', 'rclauseset', 'jclauseset', 'eclauseset', and
   *        'bitindexpaths' as above
   * 'indexjoinclauses' is a list of RestrictInfos for join clauses
+  * considered_clauses is the total number of clauses considered so far
   * '*considered_relids' is a list of all relids sets already considered
   */
  static void
*************** consider_index_join_outer_rels(PlannerIn
*** 504,509 ****
--- 516,522 ----
                                 IndexClauseSet *eclauseset,
                                 List **bitindexpaths,
                                 List *indexjoinclauses,
+                                int considered_clauses,
                                 List **considered_relids)
  {
      ListCell   *lc;
*************** consider_index_join_outer_rels(PlannerIn
*** 522,528 ****
          /*
           * Generate the union of this clause's relids set with each
           * previously-tried set.  This ensures we try this clause along with
!          * every interesting subset of previous clauses.
           *
           * Note: get_join_index_paths adds entries to *considered_relids, but
           * it prepends them to the list, so that we won't visit new entries
--- 535,543 ----
          /*
           * Generate the union of this clause's relids set with each
           * previously-tried set.  This ensures we try this clause along with
!          * every interesting subset of previous clauses.  However, to avoid
!          * exponential growth of planning time when there are many clauses,
!          * limit the number of relid sets accepted to 10 * considered_clauses.
           *
           * Note: get_join_index_paths adds entries to *considered_relids, but
           * it prepends them to the list, so that we won't visit new entries
*************** consider_index_join_outer_rels(PlannerIn
*** 543,548 ****
--- 558,584 ----
              if (bms_subset_compare(clause_relids, oldrelids) != BMS_DIFFERENT)
                  continue;

+             /*
+              * If this clause was derived from an equivalence class, the
+              * clause list may contain other clauses derived from the same
+              * eclass.  We should not consider that combining this clause with
+              * one of those clauses generates a usefully different
+              * parameterization; so skip if any clause derived from the same
+              * eclass would already have been included by oldrelids.
+              */
+             if (rinfo->parent_ec &&
+                 eclass_already_used(rinfo->parent_ec, oldrelids,
+                                     indexjoinclauses))
+                 continue;
+
+             /*
+              * If the number of relid sets considered exceeds our heuristic
+              * limit, stop considering combinations of clauses.  We'll still
+              * consider the current clause alone, though (below this loop).
+              */
+             if (list_length(*considered_relids) >= 10 * considered_clauses)
+                 break;
+
              /* OK, try the union set */
              get_join_index_paths(root, rel, index,
                                   rclauseset, jclauseset, eclauseset,
*************** get_join_index_paths(PlannerInfo *root,
*** 648,653 ****
--- 684,711 ----
  }

  /*
+  * eclass_already_used
+  *        True if any join clause usable with oldrelids was generated from
+  *        the specified equivalence class.
+  */
+ static bool
+ eclass_already_used(EquivalenceClass *parent_ec, Relids oldrelids,
+                     List *indexjoinclauses)
+ {
+     ListCell   *lc;
+
+     foreach(lc, indexjoinclauses)
+     {
+         RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+         if (rinfo->parent_ec == parent_ec &&
+             bms_is_subset(rinfo->clause_relids, oldrelids))
+             return true;
+     }
+     return false;
+ }
+
+ /*
   * bms_equal_any
   *        True if relids is bms_equal to any member of relids_list
   *

pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Proposal for Allow postgresql.conf values to be changed via SQL
Next
From: Hannu Krosing
Date:
Subject: Re: Proposal for Allow postgresql.conf values to be changed via SQL