Re: [BUG] Remove self joins causes 'variable not found in subplan target lists' error - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: [BUG] Remove self joins causes 'variable not found in subplan target lists' error |
Date | |
Msg-id | 1458112.1756330944@sss.pgh.pa.us Whole thread Raw |
In response to | Re: [BUG] Remove self joins causes 'variable not found in subplan target lists' error (Richard Guo <guofenglinux@gmail.com>) |
Responses |
Re: [BUG] Remove self joins causes 'variable not found in subplan target lists' error
|
List | pgsql-hackers |
Richard Guo <guofenglinux@gmail.com> writes: > On Wed, Aug 27, 2025 at 2:26 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I wonder if the right answer in v18 is to back out a3179ab69. >> Not fixing that performance regression for another year would >> be mildly annoying; but AFAIR we've still had only the one >> complaint, so it seems like it's not hurting many people. >> And we are getting mighty late in the release calendar to be >> inventing new stuff in v18. > Yeah, inventing new stuff in v18 at this point seems risky. I think > backing out a3179ab69 should be fine for v18. I had a go at doing that (attached) and the news is not good: it dumps core in some regression tests added by the SJE patch, seemingly because there are still references to the removed relid. Now it might be that I failed to correctly adjust the places where the revert didn't apply because of SJE-related changes. But I think what is more likely is that the SJE code is not bothering to update relids everywhere because it's expecting a3179ab69's "rebuild" logic to take care of things later. This leaves us with some pretty unappetizing choices about what to do in v18: 1. Try to emulate the proposed HEAD fix. 2. Try to fix up the SJE patch so that it calculates relid changes honestly, or at least no less honestly than what happened before a3179ab69. 3. Revert SJE as well as a3179ab69 in v18. I will have a look at #1. If there's somebody who wants to look at #2, feel free, but it won't be me because I don't understand the SJE patch well enough. Either way, it's not great to be doing stuff like this just days before rc1 ... regards, tom lane diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c index 441f12f6c50..54e0b5cbdd1 100644 --- a/src/backend/optimizer/path/equivclass.c +++ b/src/backend/optimizer/path/equivclass.c @@ -1453,8 +1453,7 @@ generate_base_implied_equalities_no_const(PlannerInfo *root, * For the moment we force all the Vars to be available at all join nodes * for this eclass. Perhaps this could be improved by doing some * pre-analysis of which members we prefer to join, but it's no worse than - * what happened in the pre-8.3 code. (Note: rebuild_eclass_attr_needed - * needs to match this code.) + * what happened in the pre-8.3 code. */ foreach(lc, ec->ec_members) { @@ -2561,51 +2560,6 @@ reconsider_full_join_clause(PlannerInfo *root, OuterJoinClauseInfo *ojcinfo) return false; /* failed to make any deduction */ } -/* - * rebuild_eclass_attr_needed - * Put back attr_needed bits for Vars/PHVs needed for join eclasses. - * - * This is used to rebuild attr_needed/ph_needed sets after removal of a - * useless outer join. It should match what - * generate_base_implied_equalities_no_const did, except that we call - * add_vars_to_attr_needed not add_vars_to_targetlist. - */ -void -rebuild_eclass_attr_needed(PlannerInfo *root) -{ - ListCell *lc; - - foreach(lc, root->eq_classes) - { - EquivalenceClass *ec = (EquivalenceClass *) lfirst(lc); - - /* - * We don't expect any EC child members to exist at this point. Ensure - * that's the case, otherwise, we might be getting asked to do - * something this function hasn't been coded for. - */ - Assert(ec->ec_childmembers == NULL); - - /* Need do anything only for a multi-member, no-const EC. */ - if (list_length(ec->ec_members) > 1 && !ec->ec_has_const) - { - ListCell *lc2; - - foreach(lc2, ec->ec_members) - { - EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc2); - List *vars = pull_var_clause((Node *) cur_em->em_expr, - PVC_RECURSE_AGGREGATES | - PVC_RECURSE_WINDOWFUNCS | - PVC_INCLUDE_PLACEHOLDERS); - - add_vars_to_attr_needed(root, vars, ec->ec_relids); - list_free(vars); - } - } - } -} - /* * find_join_domain * Find the highest JoinDomain enclosed within the given relid set. diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c index 4455a046d23..11ce7e33573 100644 --- a/src/backend/optimizer/plan/analyzejoins.c +++ b/src/backend/optimizer/plan/analyzejoins.c @@ -28,7 +28,6 @@ #include "optimizer/optimizer.h" #include "optimizer/pathnode.h" #include "optimizer/paths.h" -#include "optimizer/placeholder.h" #include "optimizer/planmain.h" #include "optimizer/restrictinfo.h" #include "rewrite/rewriteManip.h" @@ -344,6 +343,37 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel, sjinfo->ojrelid); } + /* + * Remove references to the rel from other baserels' attr_needed arrays. + */ + for (rti = 1; rti < root->simple_rel_array_size; rti++) + { + RelOptInfo *otherrel = root->simple_rel_array[rti]; + int attroff; + + /* there may be empty slots corresponding to non-baserel RTEs */ + if (otherrel == NULL) + continue; + + Assert(otherrel->relid == rti); /* sanity check on array */ + + /* no point in processing target rel itself */ + if (otherrel == rel) + continue; + + for (attroff = otherrel->max_attr - otherrel->min_attr; + attroff >= 0; + attroff--) + { + otherrel->attr_needed[attroff] = + bms_del_member(otherrel->attr_needed[attroff], relid); + if (sjinfo != NULL) + otherrel->attr_needed[attroff] = + bms_del_member(otherrel->attr_needed[attroff], + sjinfo->ojrelid); + } + } + /* * Likewise remove references from SpecialJoinInfo data structures. * @@ -451,11 +481,13 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel, phinfo->ph_eval_at = adjust_relid_set(phinfo->ph_eval_at, sjinfo->ojrelid, subst); Assert(!bms_is_empty(phinfo->ph_eval_at)); /* checked previously */ - /* Reduce ph_needed to contain only "relation 0"; see below */ - if (bms_is_member(0, phinfo->ph_needed)) - phinfo->ph_needed = bms_make_singleton(0); - else - phinfo->ph_needed = NULL; + + phinfo->ph_needed = adjust_relid_set(phinfo->ph_needed, + relid, subst); + if (sjinfo != NULL) + phinfo->ph_needed = adjust_relid_set(phinfo->ph_needed, + sjinfo->ojrelid, subst); + /* ph_needed might or might not become empty */ phinfo->ph_lateral = adjust_relid_set(phinfo->ph_lateral, relid, subst); @@ -490,46 +522,6 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel, (sjinfo == NULL || bms_is_member(sjinfo->ojrelid, ec->ec_relids))) remove_rel_from_eclass(ec, sjinfo, relid, subst); } - - /* - * Finally, we must recompute per-Var attr_needed and per-PlaceHolderVar - * ph_needed relid sets. These have to be known accurately, else we may - * fail to remove other now-removable outer joins. And our removal of the - * join clause(s) for this outer join may mean that Vars that were - * formerly needed no longer are. So we have to do this honestly by - * repeating the construction of those relid sets. We can cheat to one - * small extent: we can avoid re-examining the targetlist and HAVING qual - * by preserving "relation 0" bits from the existing relid sets. This is - * safe because we'd never remove such references. - * - * So, start by removing all other bits from attr_needed sets and - * lateral_vars lists. (We already did this above for ph_needed.) - */ - for (rti = 1; rti < root->simple_rel_array_size; rti++) - { - RelOptInfo *otherrel = root->simple_rel_array[rti]; - int attroff; - - /* there may be empty slots corresponding to non-baserel RTEs */ - if (otherrel == NULL) - continue; - - Assert(otherrel->relid == rti); /* sanity check on array */ - - for (attroff = otherrel->max_attr - otherrel->min_attr; - attroff >= 0; - attroff--) - { - if (bms_is_member(0, otherrel->attr_needed[attroff])) - otherrel->attr_needed[attroff] = bms_make_singleton(0); - else - otherrel->attr_needed[attroff] = NULL; - } - - if (subst > 0) - ChangeVarNodesExtended((Node *) otherrel->lateral_vars, relid, - subst, 0, replace_relid_callback); - } } /* @@ -626,7 +618,7 @@ remove_leftjoinrel_from_query(PlannerInfo *root, int relid, */ /* - * Now remove the rel from the baserel array to prevent it from being + * Finally, remove the rel from the baserel array to prevent it from being * referenced again. (We can't do this earlier because * remove_join_clause_from_rels will touch it.) */ @@ -635,15 +627,6 @@ remove_leftjoinrel_from_query(PlannerInfo *root, int relid, /* And nuke the RelOptInfo, just in case there's another access path */ pfree(rel); - - /* - * Now repeat construction of attr_needed bits coming from all other - * sources. - */ - rebuild_placeholder_attr_needed(root); - rebuild_joinclause_attr_needed(root); - rebuild_eclass_attr_needed(root); - rebuild_lateral_attr_needed(root); } /* @@ -1984,16 +1967,6 @@ remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark, /* And nuke the RelOptInfo, just in case there's another access path. */ pfree(toRemove); - - - /* - * Now repeat construction of attr_needed bits coming from all other - * sources. - */ - rebuild_placeholder_attr_needed(root); - rebuild_joinclause_attr_needed(root); - rebuild_eclass_attr_needed(root); - rebuild_lateral_attr_needed(root); } /* diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c index 01804b085b3..0ed6feb8cdd 100644 --- a/src/backend/optimizer/plan/initsplan.c +++ b/src/backend/optimizer/plan/initsplan.c @@ -275,8 +275,6 @@ build_base_rel_tlists(PlannerInfo *root, List *final_tlist) * have a single owning relation; we keep their attr_needed info in * root->placeholder_list instead. Find or create the associated * PlaceHolderInfo entry, and update its ph_needed. - * - * See also add_vars_to_attr_needed. */ void add_vars_to_targetlist(PlannerInfo *root, List *vars, @@ -330,62 +328,6 @@ add_vars_to_targetlist(PlannerInfo *root, List *vars, } } -/* - * add_vars_to_attr_needed - * This does a subset of what add_vars_to_targetlist does: it just - * updates attr_needed for Vars and ph_needed for PlaceHolderVars. - * We assume the Vars are already in their relations' targetlists. - * - * This is used to rebuild attr_needed/ph_needed sets after removal - * of a useless outer join. The removed join clause might have been - * the only upper-level use of some other relation's Var, in which - * case we can reduce that Var's attr_needed and thereby possibly - * open the door to further join removals. But we can't tell that - * without tedious reconstruction of the attr_needed data. - * - * Note that if a Var's attr_needed is successfully reduced to empty, - * it will still be in the relation's targetlist even though we do - * not really need the scan plan node to emit it. The extra plan - * inefficiency seems tiny enough to not be worth spending planner - * cycles to get rid of it. - */ -void -add_vars_to_attr_needed(PlannerInfo *root, List *vars, - Relids where_needed) -{ - ListCell *temp; - - Assert(!bms_is_empty(where_needed)); - - foreach(temp, vars) - { - Node *node = (Node *) lfirst(temp); - - if (IsA(node, Var)) - { - Var *var = (Var *) node; - RelOptInfo *rel = find_base_rel(root, var->varno); - int attno = var->varattno; - - if (bms_is_subset(where_needed, rel->relids)) - continue; - Assert(attno >= rel->min_attr && attno <= rel->max_attr); - attno -= rel->min_attr; - rel->attr_needed[attno] = bms_add_members(rel->attr_needed[attno], - where_needed); - } - else if (IsA(node, PlaceHolderVar)) - { - PlaceHolderVar *phv = (PlaceHolderVar *) node; - PlaceHolderInfo *phinfo = find_placeholder_info(root, phv); - - phinfo->ph_needed = bms_add_members(phinfo->ph_needed, - where_needed); - } - else - elog(ERROR, "unrecognized node type: %d", (int) nodeTag(node)); - } -} /***************************************************************************** * @@ -788,54 +730,10 @@ extract_lateral_references(PlannerInfo *root, RelOptInfo *brel, Index rtindex) */ add_vars_to_targetlist(root, newvars, where_needed); - /* - * Remember the lateral references for rebuild_lateral_attr_needed and - * create_lateral_join_info. - */ + /* Remember the lateral references for create_lateral_join_info */ brel->lateral_vars = newvars; } -/* - * rebuild_lateral_attr_needed - * Put back attr_needed bits for Vars/PHVs needed for lateral references. - * - * This is used to rebuild attr_needed/ph_needed sets after removal of a - * useless outer join. It should match what find_lateral_references did, - * except that we call add_vars_to_attr_needed not add_vars_to_targetlist. - */ -void -rebuild_lateral_attr_needed(PlannerInfo *root) -{ - Index rti; - - /* We need do nothing if the query contains no LATERAL RTEs */ - if (!root->hasLateralRTEs) - return; - - /* Examine the same baserels that find_lateral_references did */ - for (rti = 1; rti < root->simple_rel_array_size; rti++) - { - RelOptInfo *brel = root->simple_rel_array[rti]; - Relids where_needed; - - if (brel == NULL) - continue; - if (brel->reloptkind != RELOPT_BASEREL) - continue; - - /* - * We don't need to repeat all of extract_lateral_references, since it - * kindly saved the extracted Vars/PHVs in lateral_vars. - */ - if (brel->lateral_vars == NIL) - continue; - - where_needed = bms_make_singleton(rti); - - add_vars_to_attr_needed(root, brel->lateral_vars, where_needed); - } -} - /* * create_lateral_join_info * Fill in the per-base-relation direct_lateral_relids, lateral_relids @@ -2789,9 +2687,6 @@ distribute_qual_to_rels(PlannerInfo *root, Node *clause, * var propagation is ensured by making ojscope include input rels from * both sides of the join. * - * See also rebuild_joinclause_attr_needed, which has to partially repeat - * this work after removal of an outer join. - * * Note: if the clause gets absorbed into an EquivalenceClass then this * may be unnecessary, but for now we have to do it to cover the case * where the EC becomes ec_broken and we end up reinserting the original @@ -3401,11 +3296,6 @@ process_implied_equality(PlannerInfo *root, * some of the Vars could have missed having that done because they only * appeared in single-relation clauses originally. So do it here for * safety. - * - * See also rebuild_joinclause_attr_needed, which has to partially repeat - * this work after removal of an outer join. (Since we will put this - * clause into the joininfo lists, that function needn't do any extra work - * to find it.) */ if (bms_membership(relids) == BMS_MULTIPLE) { @@ -3547,72 +3437,6 @@ get_join_domain_min_rels(PlannerInfo *root, Relids domain_relids) } -/* - * rebuild_joinclause_attr_needed - * Put back attr_needed bits for Vars/PHVs needed for join clauses. - * - * This is used to rebuild attr_needed/ph_needed sets after removal of a - * useless outer join. It should match what distribute_qual_to_rels did, - * except that we call add_vars_to_attr_needed not add_vars_to_targetlist. - */ -void -rebuild_joinclause_attr_needed(PlannerInfo *root) -{ - /* - * We must examine all join clauses, but there's no value in processing - * any join clause more than once. So it's slightly annoying that we have - * to find them via the per-base-relation joininfo lists. Avoid duplicate - * processing by tracking the rinfo_serial numbers of join clauses we've - * already seen. (This doesn't work for is_clone clauses, so we must - * waste effort on them.) - */ - Bitmapset *seen_serials = NULL; - Index rti; - - /* Scan all baserels for join clauses */ - for (rti = 1; rti < root->simple_rel_array_size; rti++) - { - RelOptInfo *brel = root->simple_rel_array[rti]; - ListCell *lc; - - if (brel == NULL) - continue; - if (brel->reloptkind != RELOPT_BASEREL) - continue; - - foreach(lc, brel->joininfo) - { - RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc); - Relids relids = rinfo->required_relids; - - if (!rinfo->is_clone) /* else serial number is not unique */ - { - if (bms_is_member(rinfo->rinfo_serial, seen_serials)) - continue; /* saw it already */ - seen_serials = bms_add_member(seen_serials, - rinfo->rinfo_serial); - } - - if (bms_membership(relids) == BMS_MULTIPLE) - { - List *vars = pull_var_clause((Node *) rinfo->clause, - PVC_RECURSE_AGGREGATES | - PVC_RECURSE_WINDOWFUNCS | - PVC_INCLUDE_PLACEHOLDERS); - Relids where_needed; - - if (rinfo->is_clone) - where_needed = bms_intersect(relids, root->all_baserels); - else - where_needed = relids; - add_vars_to_attr_needed(root, vars, where_needed); - list_free(vars); - } - } - } -} - - /* * match_foreign_keys_to_quals * Match foreign-key constraints to equivalence classes and join quals diff --git a/src/backend/optimizer/util/placeholder.c b/src/backend/optimizer/util/placeholder.c index e1cd00a72fb..d1ba8923b26 100644 --- a/src/backend/optimizer/util/placeholder.c +++ b/src/backend/optimizer/util/placeholder.c @@ -314,33 +314,6 @@ fix_placeholder_input_needed_levels(PlannerInfo *root) } } -/* - * rebuild_placeholder_attr_needed - * Put back attr_needed bits for Vars/PHVs needed in PlaceHolderVars. - * - * This is used to rebuild attr_needed/ph_needed sets after removal of a - * useless outer join. It should match what - * fix_placeholder_input_needed_levels did, except that we call - * add_vars_to_attr_needed not add_vars_to_targetlist. - */ -void -rebuild_placeholder_attr_needed(PlannerInfo *root) -{ - ListCell *lc; - - foreach(lc, root->placeholder_list) - { - PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(lc); - List *vars = pull_var_clause((Node *) phinfo->ph_var->phexpr, - PVC_RECURSE_AGGREGATES | - PVC_RECURSE_WINDOWFUNCS | - PVC_INCLUDE_PLACEHOLDERS); - - add_vars_to_attr_needed(root, vars, phinfo->ph_eval_at); - list_free(vars); - } -} - /* * add_placeholders_to_base_rels * Add any required PlaceHolderVars to base rels' targetlists. diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h index 8410531f2d6..e0924e622d4 100644 --- a/src/include/optimizer/paths.h +++ b/src/include/optimizer/paths.h @@ -129,7 +129,6 @@ extern bool process_equivalence(PlannerInfo *root, extern Expr *canonicalize_ec_expression(Expr *expr, Oid req_type, Oid req_collation); extern void reconsider_outer_join_clauses(PlannerInfo *root); -extern void rebuild_eclass_attr_needed(PlannerInfo *root); extern EquivalenceClass *get_eclass_for_sort_expr(PlannerInfo *root, Expr *expr, List *opfamilies, diff --git a/src/include/optimizer/placeholder.h b/src/include/optimizer/placeholder.h index db92d8861ba..a729e73d561 100644 --- a/src/include/optimizer/placeholder.h +++ b/src/include/optimizer/placeholder.h @@ -23,7 +23,6 @@ extern PlaceHolderInfo *find_placeholder_info(PlannerInfo *root, PlaceHolderVar *phv); extern void find_placeholders_in_jointree(PlannerInfo *root); extern void fix_placeholder_input_needed_levels(PlannerInfo *root); -extern void rebuild_placeholder_attr_needed(PlannerInfo *root); extern void add_placeholders_to_base_rels(PlannerInfo *root); extern void add_placeholders_to_joinrel(PlannerInfo *root, RelOptInfo *joinrel, RelOptInfo *outer_rel, RelOptInfo *inner_rel, diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h index 9d3debcab28..e3bf9db6b6e 100644 --- a/src/include/optimizer/planmain.h +++ b/src/include/optimizer/planmain.h @@ -73,11 +73,8 @@ extern void add_other_rels_to_query(PlannerInfo *root); extern void build_base_rel_tlists(PlannerInfo *root, List *final_tlist); extern void add_vars_to_targetlist(PlannerInfo *root, List *vars, Relids where_needed); -extern void add_vars_to_attr_needed(PlannerInfo *root, List *vars, - Relids where_needed); extern void remove_useless_groupby_columns(PlannerInfo *root); extern void find_lateral_references(PlannerInfo *root); -extern void rebuild_lateral_attr_needed(PlannerInfo *root); extern void create_lateral_join_info(PlannerInfo *root); extern List *deconstruct_jointree(PlannerInfo *root); extern bool restriction_is_always_true(PlannerInfo *root, @@ -101,7 +98,6 @@ extern RestrictInfo *build_implied_join_equality(PlannerInfo *root, Expr *item2, Relids qualscope, Index security_level); -extern void rebuild_joinclause_attr_needed(PlannerInfo *root); extern void match_foreign_keys_to_quals(PlannerInfo *root); /* diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out index 390aabfb34b..336def509bc 100644 --- a/src/test/regress/expected/join.out +++ b/src/test/regress/expected/join.out @@ -5853,13 +5853,11 @@ CREATE TEMP TABLE a (id int PRIMARY KEY, b_id int); CREATE TEMP TABLE b (id int PRIMARY KEY, c_id int); CREATE TEMP TABLE c (id int PRIMARY KEY); CREATE TEMP TABLE d (a int, b int); -CREATE TEMP TABLE e (id1 int, id2 int, PRIMARY KEY(id1, id2)); INSERT INTO a VALUES (0, 0), (1, NULL); INSERT INTO b VALUES (0, 0), (1, NULL); INSERT INTO c VALUES (0), (1); INSERT INTO d VALUES (1,3), (2,2), (3,1); -INSERT INTO e VALUES (0,0), (2,2), (3,1); --- all these cases should be optimizable into a simple seqscan +-- all three cases should be optimizable into a simple seqscan explain (costs off) SELECT a.* FROM a LEFT JOIN b ON a.b_id = b.id; QUERY PLAN --------------- @@ -5880,14 +5878,6 @@ explain (costs off) Seq Scan on a (1 row) -explain (costs off) - SELECT a.* FROM a LEFT JOIN b ON a.id = b.id - LEFT JOIN e ON e.id1 = a.b_id AND b.c_id = e.id2; - QUERY PLAN ---------------- - Seq Scan on a -(1 row) - -- check optimization of outer join within another special join explain (costs off) select id from a where id in ( diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql index f6e7070db65..2850c20eb37 100644 --- a/src/test/regress/sql/join.sql +++ b/src/test/regress/sql/join.sql @@ -2086,22 +2086,17 @@ CREATE TEMP TABLE a (id int PRIMARY KEY, b_id int); CREATE TEMP TABLE b (id int PRIMARY KEY, c_id int); CREATE TEMP TABLE c (id int PRIMARY KEY); CREATE TEMP TABLE d (a int, b int); -CREATE TEMP TABLE e (id1 int, id2 int, PRIMARY KEY(id1, id2)); INSERT INTO a VALUES (0, 0), (1, NULL); INSERT INTO b VALUES (0, 0), (1, NULL); INSERT INTO c VALUES (0), (1); INSERT INTO d VALUES (1,3), (2,2), (3,1); -INSERT INTO e VALUES (0,0), (2,2), (3,1); --- all these cases should be optimizable into a simple seqscan +-- all three cases should be optimizable into a simple seqscan explain (costs off) SELECT a.* FROM a LEFT JOIN b ON a.b_id = b.id; explain (costs off) SELECT b.* FROM b LEFT JOIN c ON b.c_id = c.id; explain (costs off) SELECT a.* FROM a LEFT JOIN (b left join c on b.c_id = c.id) ON (a.b_id = b.id); -explain (costs off) - SELECT a.* FROM a LEFT JOIN b ON a.id = b.id - LEFT JOIN e ON e.id1 = a.b_id AND b.c_id = e.id2; -- check optimization of outer join within another special join explain (costs off)
pgsql-hackers by date: