Limiting the number of parameterized indexpaths created - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Limiting the number of parameterized indexpaths created |
Date | |
Msg-id | 21634.1351634224@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Limiting the number of parameterized indexpaths created
Re: Limiting the number of parameterized indexpaths created |
List | pgsql-hackers |
I looked into the complaint of unreasonable planner runtime in bug #7626, http://archives.postgresql.org/pgsql-bugs/2012-10/msg00232.php In the given example, the indexed relation "foo" has join clauses with 30 other relations. The code that I added in commit 3b8968f25232ad09001bf35ab4cc59f5a501193e will try all 2^30 combinations of those rels as possible outer relations for a parameterized indexscan :-(. So clearly, the idea that we can just try everything and not have some kind of heuristic restriction isn't going to work. This particular example, and probably a lot of similar examples, does have a principled non-heuristic fix, which comes from the fact that we recognize all the join clauses as belonging to the same equivalence class. Therefore, using more than one of them in an indexscan is useless. (The code already does know that and discards the extra clauses, but that happens too far downstream to prevent the exponential search time here.) The extra function eclass_already_used() added in the attached patch attacks the problem this way, and is sufficient to resolve the bug for the case presented. However, we still have an issue for cases where the join clauses aren't equivalence clauses. (For instance, if you just change all the "=" signs to "<" in Brian's example, it still takes forever.) So we also need some heuristic rule to limit the number of cases considered. I spent a good deal of time thinking about how the indexscan code might make use of the joinlist data structure (which prevents exponential planning time growth overall by limiting which joins we'll consider) to fix this; the idea being to not consider outer-relation sets that couldn't actually be presented for use because of joinlist restrictions. But this looked messy, and probably expensive in itself. Moreover it seemed like practical implementations of the idea would carry the restriction that we could never generate parameterized paths for joinlist subproblems, which would be a real shame. And we'd be doing a lot of work for what is probably a very unusual corner case, given that I think eclass_already_used() will fix the problem in most real cases. So what I'm proposing instead, which is implemented in the other half of the attached patch, is that we simply put an arbitrary limit on how many outer-relation sets we'll consider while generating parameterized indexscans. As written, the patch limits the number of relid sets to 10 times the number of join clauses it's working with, which is small enough to keep the runtime of this test case to something reasonable. I think that in practice this will still allow useful combination-outer-rel cases to be found. I wonder though if anyone has a better idea? regards, tom lane diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c index 444ec2a40e4ac2b0ed8435017149c7664869e34c..78aaca4f279d6ba580643c75d82d82bfc88fba38 100644 *** a/src/backend/optimizer/path/indxpath.c --- b/src/backend/optimizer/path/indxpath.c *************** static void consider_index_join_outer_re *** 92,97 **** --- 92,98 ---- IndexClauseSet *eclauseset, List **bitindexpaths, List *indexjoinclauses, + int considered_clauses, List **considered_relids); static void get_join_index_paths(PlannerInfo *root, RelOptInfo *rel, IndexOptInfo *index, *************** static void get_join_index_paths(Planner *** 101,106 **** --- 102,109 ---- List **bitindexpaths, Relids relids, List **considered_relids); + static bool eclass_already_used(EquivalenceClass *parent_ec, Relids oldrelids, + List *indexjoinclauses); static bool bms_equal_any(Relids relids, List *relids_list); static void get_index_paths(PlannerInfo *root, RelOptInfo *rel, IndexOptInfo *index, IndexClauseSet *clauses, *************** consider_index_join_clauses(PlannerInfo *** 447,452 **** --- 450,456 ---- IndexClauseSet *eclauseset, List **bitindexpaths) { + int considered_clauses = 0; List *considered_relids = NIL; int indexcol; *************** consider_index_join_clauses(PlannerInfo *** 460,467 **** * filter (qpqual); which is where an available clause would end up being * applied if we omit it from the indexquals. * ! * This looks expensive, but in practical cases there won't be very many ! * distinct sets of outer rels to consider. * * For simplicity in selecting relevant clauses, we represent each set of * outer rels as a maximum set of clause_relids --- that is, the indexed --- 464,474 ---- * filter (qpqual); which is where an available clause would end up being * applied if we omit it from the indexquals. * ! * This looks expensive, but in most practical cases there won't be very ! * many distinct sets of outer rels to consider. As a safety valve when ! * that's not true, we use a heuristic: limit the number of outer rel sets ! * considered to a multiple of the number of clauses considered. (We'll ! * always consider using each individual join clause, though.) * * For simplicity in selecting relevant clauses, we represent each set of * outer rels as a maximum set of clause_relids --- that is, the indexed *************** consider_index_join_clauses(PlannerInfo *** 471,486 **** --- 478,497 ---- for (indexcol = 0; indexcol < index->ncolumns; indexcol++) { /* Consider each applicable simple join clause */ + considered_clauses += list_length(jclauseset->indexclauses[indexcol]); consider_index_join_outer_rels(root, rel, index, rclauseset, jclauseset, eclauseset, bitindexpaths, jclauseset->indexclauses[indexcol], + considered_clauses, &considered_relids); /* Consider each applicable eclass join clause */ + considered_clauses += list_length(eclauseset->indexclauses[indexcol]); consider_index_join_outer_rels(root, rel, index, rclauseset, jclauseset, eclauseset, bitindexpaths, eclauseset->indexclauses[indexcol], + considered_clauses, &considered_relids); } } *************** consider_index_join_clauses(PlannerInfo *** 494,499 **** --- 505,511 ---- * 'rel', 'index', 'rclauseset', 'jclauseset', 'eclauseset', and * 'bitindexpaths' as above * 'indexjoinclauses' is a list of RestrictInfos for join clauses + * considered_clauses is the total number of clauses considered so far * '*considered_relids' is a list of all relids sets already considered */ static void *************** consider_index_join_outer_rels(PlannerIn *** 504,509 **** --- 516,522 ---- IndexClauseSet *eclauseset, List **bitindexpaths, List *indexjoinclauses, + int considered_clauses, List **considered_relids) { ListCell *lc; *************** consider_index_join_outer_rels(PlannerIn *** 522,528 **** /* * Generate the union of this clause's relids set with each * previously-tried set. This ensures we try this clause along with ! * every interesting subset of previous clauses. * * Note: get_join_index_paths adds entries to *considered_relids, but * it prepends them to the list, so that we won't visit new entries --- 535,543 ---- /* * Generate the union of this clause's relids set with each * previously-tried set. This ensures we try this clause along with ! * every interesting subset of previous clauses. However, to avoid ! * exponential growth of planning time when there are many clauses, ! * limit the number of relid sets accepted to 10 * considered_clauses. * * Note: get_join_index_paths adds entries to *considered_relids, but * it prepends them to the list, so that we won't visit new entries *************** consider_index_join_outer_rels(PlannerIn *** 543,548 **** --- 558,584 ---- if (bms_subset_compare(clause_relids, oldrelids) != BMS_DIFFERENT) continue; + /* + * If this clause was derived from an equivalence class, the + * clause list may contain other clauses derived from the same + * eclass. We should not consider that combining this clause with + * one of those clauses generates a usefully different + * parameterization; so skip if any clause derived from the same + * eclass would already have been included by oldrelids. + */ + if (rinfo->parent_ec && + eclass_already_used(rinfo->parent_ec, oldrelids, + indexjoinclauses)) + continue; + + /* + * If the number of relid sets considered exceeds our heuristic + * limit, stop considering combinations of clauses. We'll still + * consider the current clause alone, though (below this loop). + */ + if (list_length(*considered_relids) >= 10 * considered_clauses) + break; + /* OK, try the union set */ get_join_index_paths(root, rel, index, rclauseset, jclauseset, eclauseset, *************** get_join_index_paths(PlannerInfo *root, *** 648,653 **** --- 684,711 ---- } /* + * eclass_already_used + * True if any join clause usable with oldrelids was generated from + * the specified equivalence class. + */ + static bool + eclass_already_used(EquivalenceClass *parent_ec, Relids oldrelids, + List *indexjoinclauses) + { + ListCell *lc; + + foreach(lc, indexjoinclauses) + { + RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc); + + if (rinfo->parent_ec == parent_ec && + bms_is_subset(rinfo->clause_relids, oldrelids)) + return true; + } + return false; + } + + /* * bms_equal_any * True if relids is bms_equal to any member of relids_list *
pgsql-hackers by date: