Re: Making Vars outer-join aware - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Making Vars outer-join aware |
Date | |
Msg-id | 553080.1657481916@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Making Vars outer-join aware (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Making Vars outer-join aware
Re: Making Vars outer-join aware |
List | pgsql-hackers |
Here's v2 of this patch series. It's functionally identical to v1, but I've rebased it over the recent auto-node-support-generation changes, and also extracted a few separable bits in hopes of making the main planner patch smaller. (It's still pretty durn large, unfortunately.) Unlike the original submission, each step will compile on its own, though the intermediate states mostly don't pass all regression tests. regards, tom lane commit 59846b51f350064b05e4a7bf643c4ce73b7f629a Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Sun Jul 10 11:16:38 2022 -0400 Add overview documentation. diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README index 41c120e0cd..2b30d22aed 100644 --- a/src/backend/optimizer/README +++ b/src/backend/optimizer/README @@ -295,6 +295,191 @@ Therefore, we don't merge FROM-lists if the result would have too many FROM-items in one list. +Vars and PlaceHolderVars +------------------------ + +A Var node is simply the parse-tree representation of a table column +reference. However, in the presence of outer joins, that concept is +more subtle than it might seem. We need to distinguish the values of +a Var "above" and "below" any outer join that could force the Var to +null. As an example, consider + + SELECT * FROM t1 LEFT JOIN t2 ON (t1.x = t2.y) WHERE foo(t2.z) + +(Assume foo() is not strict, so that we can't reduce the left join to +a plain join.) A naive implementation might try to push the foo(t2.z) +call down to the scan of t2, but that is not correct because +(a) what foo() should actually see for a null-extended join row is NULL, +and (b) if foo() returns false, we should suppress the t1 row from the +join altogether, not emit it with a null-extended t2 row. On the other +hand, it *would* be correct (and desirable) to push the call down to +the scan level if the query were + + SELECT * FROM t1 LEFT JOIN t2 ON (t1.x = t2.y AND foo(t2.z)) + +This motivates considering "t2.z" within the left join's ON clause +to be a different value from "t2.z" outside the JOIN clause. The +former can be identified with t2.z as seen at the relation scan level, +but the latter can't. + +Another example occurs in connection with EquivalenceClasses (discussed +below). Given + + SELECT * FROM t1 LEFT JOIN t2 ON (t1.x = t2.y) WHERE t1.x = 42 + +we would like to put t1.x and t2.y and 42 into the same EquivalenceClass +and then derive "t2.y = 42" to use as a restriction clause for the scan +of t2. However, it'd be wrong to conclude that t2.y will always have +the value 42, or that it's equal to t1.x in every joined row. We can +solve this problem by deeming that "t2.y" in the ON clause refers to +the relation-scan-level value of t2.y, but not to the value that y will +have in joined rows, where it might be NULL rather than equal to t1.x. + +Therefore, Var nodes are decorated with "varnullingrels", which are sets +of the rangetable indexes of outer joins that potentially null this Var +at the point where it appears in the query. (Using a set, not an +ordered list, is fine since it doesn't matter which join forced the +value to null; and that avoids having to change the representation when +we consider different outer-join orders.) In the examples above, all +occurrences of t1.x would have empty varnullingrels, since the left join +doesn't null t1. The t2 references within the JOIN ON clauses would +also have empty varnullingrels, but other references to t2 columns would +be labeled with the index of the JOIN's rangetable entry (RTE), so that +they'd be understood as potentially different from the t2 values seen at +scan level. Labeling t2.z in the WHERE clause with the JOIN's RT index +lets us recognize that that occurrence of foo(t2.z) cannot be pushed +down to the t2 scan level: we cannot evaluate that value at the scan +level, but only after the join has been done. + +For LEFT and RIGHT outer joins, only Vars coming from the nullable side +of the join are marked with that join's RT index. For FULL joins, all +Vars are marked. (Such marking doesn't let us tell which side of the +full join a Var came from; but that information can be found elsewhere +at need.) + +Notionally, a Var having nonempty varnullingrels can be thought of as + CASE WHEN any-of-these-outer-joins-produced-a-null-extended-row + THEN NULL + ELSE the-scan-level-value-of-the-column + END +It's only notional, because no such calculation is ever done explicitly. +In a finished plan, Vars occurring in scan-level plan nodes represent +the actual table column values, but upper-level Vars are always +references to outputs of lower-level plan nodes. When a join node emits +a null-extended row, it just returns nulls for the relevant output +columns rather than copying up values from its input. Because we don't +ever have to do this calculation explicitly, it's not necessary to +distinguish which side of an outer join got null-extended, which'd +otherwise be essential information for FULL JOIN cases. + +Outer join identity 3 (discussed above) complicates this picture +a bit. In the form + A leftjoin (B leftjoin C on (Pbc)) on (Pab) +all of the Vars in clauses Pbc and Pab will have empty varnullingrels, +but if we start with + (A leftjoin B on (Pab)) leftjoin C on (Pbc) +then the parser will have marked Pbc's B Vars with the A/B join's +RT index, making this form artificially different from the first. +We resolve this by, after noting that Pbc is strict, running +through that clause and removing any varnullingrels references to +left joins in the lefthand side. That makes the clause equivalent +to what it would have looked like if the first form were presented, +so that we can freely consider both join orders. However, because +we have done this, if we do construct a plan based on the second +join order then we cannot cross-check that B Vars appearing above +the A/B join are all marked with that join's RT index. That would +be a useful cross-check to have to catch planner bugs, but it +doesn't seem useful enough to justify the extra complication of +devising a representation that would support it. + +Outer joins also complicate handling of subquery pull-up. Consider + + SELECT ..., ss.x FROM tab1 + LEFT JOIN (SELECT *, 42 AS x FROM tab2) ss ON ... + +We want to be able to pull up the subquery as discussed previously, +but we can't just replace the "ss.x" Var in the top-level SELECT list +with the constant 42. That'd result in always emitting 42, rather +than emitting NULL in null-extended join rows. + +To solve this, we introduce the concept of PlaceHolderVars. +A PlaceHolderVar is somewhat like a Var, in that its value originates +at a relation scan level and can then be forced to null by higher-level +outer joins; hence PlaceHolderVars carry a set of nulling rel IDs just +like Vars. Unlike a Var, whose original value comes from a table, +a PlaceHolderVar's original value is defined by a query-determined +expression ("42" in this example); so we represent the PlaceHolderVar +as a node with that expression as child. We insert a PlaceHolderVar +whenever subquery pullup needs to replace a subquery-referencing Var +that has nonempty varnullingrels with an expression that is not simply a +Var. (When the replacement expression is a pulled-up Var, we can just +add the replaced Var's varnullingrels to its set. Also, if the replaced +Var has empty varnullingrels, we don't need a PlaceHolderVar: there is +nothing that'd force the value to null, so the pulled-up expression is +fine to use as-is.) In a finished plan, a PlaceHolderVar becomes just +the contained expression at whatever plan level it's supposed to be +evaluated at, and then upper-level occurrences are replaced by +references to that output column of the lower plan level. That causes +the value to go to null when appropriate at an outer join, in the same +way as for Vars. Thus, PlaceHolderVars are never seen outside the +planner. + +PlaceHolderVars (PHVs) are more complicated than Vars in another way: +their original value might need to be calculated at a join, not a +base-level relation scan. This can happen if a pulled-up subquery +contains a join. Because of this, a PHV can create a join order +constraint that wouldn't otherwise exist, to ensure that it can +be calculated before it is used. A PHV's expression can also contain +LATERAL references, adding complications that are discussed below. + + +Relation Identification and Qual Clause Placement +------------------------------------------------- + +A qual clause obtained from WHERE or JOIN/ON can be enforced at the lowest +scan or join level that includes all relations used in the clause. For +this purpose we consider that outer joins listed in varnullingrels or +phnullingrels are used in the clause, since we can't compute the qual's +result correctly until we know whether such Vars have gone to null. + +The one exception to this general rule is that a non-degenerate outer +JOIN/ON qual (one that references the non-nullable side of the join) +cannot be enforced below that join, even if it doesn't reference the +nullable side. Pushing it down into the non-nullable side would result +in rows disappearing from the join's result, rather than appearing as +null-extended rows. To handle that, when we identify such a qual we +artificially add the join's minimum input relid set to the set of +relations it is considered to use, forcing it to be evaluated exactly at +that join level. The same happens for outer-join quals that mention no +relations at all. + +When attaching a qual clause to a join plan node that is performing +an outer join, the qual clause is considered a "join clause" (that +is, it is applied before the join) if it does not use that specific +outer join, or a "filter clause" (applied after the join) if it does +use that outer join. + +These things lead us to identify join relations within the planner +by the sets of base relation RT indexes plus outer join RT indexes +that they include. In that way, the sets of relations used by qual +clauses can be directly compared to join relations' relid sets to +see where to place the clauses. These identifying sets are unique +because, for any given collection of base relations, there is only +one valid set of outer joins to have performed along the way to +joining that set of base relations (although the order of applying +them could vary, as discussed above). + +SEMI joins do not have RT indexes, because they are artifacts made by +the planner rather than the parser. (We could create rangetable +entries for them, but there seems no need at present.) This does not +cause a problem for qual placement, because the nullable side of a +semijoin is not referenceable from above the join, so there is never a +need to cite it in varnullingrels or phnullingrels. It does not cause +a problem for join relation identification either, since again whether +a semijoin has been completed is implicit in the set of base relations +included in the join. + + Optimizer Functions ------------------- @@ -437,11 +622,10 @@ inputs. EquivalenceClasses ------------------ -During the deconstruct_jointree() scan of the query's qual clauses, we look -for mergejoinable equality clauses A = B whose applicability is not delayed -by an outer join; these are called "equivalence clauses". When we find -one, we create an EquivalenceClass containing the expressions A and B to -record this knowledge. If we later find another equivalence clause B = C, +During the deconstruct_jointree() scan of the query's qual clauses, we +look for mergejoinable equality clauses A = B. When we find one, we +create an EquivalenceClass containing the expressions A and B to record +that they are equal. If we later find another equivalence clause B = C, we add C to the existing EquivalenceClass for {A B}; this may require merging two existing EquivalenceClasses. At the end of the scan, we have sets of values that are known all transitively equal to each other. We can @@ -473,15 +657,26 @@ asserts that at any plan node where more than one of its member values can be computed, output rows in which the values are not all equal may be discarded without affecting the query result. (We require all levels of the plan to enforce EquivalenceClasses, hence a join need not recheck -equality of values that were computable by one of its children.) For an -ordinary EquivalenceClass that is "valid everywhere", we can further infer -that the values are all non-null, because all mergejoinable operators are -strict. However, we also allow equivalence clauses that appear below the -nullable side of an outer join to form EquivalenceClasses; for these -classes, the interpretation is that either all the values are equal, or -all (except pseudo-constants) have gone to null. (This requires a -limitation that non-constant members be strict, else they might not go -to null when the other members do.) Consider for example +equality of values that were computable by one of its children.) + +We can further infer that the values are all non-null, because all +mergejoinable operators are strict. This is a little tricky in the +presence of outer joins. Consider + + SELECT * + FROM a LEFT JOIN + (SELECT * FROM b LEFT JOIN c ON b.y = c.z WHERE b.y = 10) ss + ON a.x = ss.y + WHERE a.x = 42; + +We can form the EquivalenceClass {b.y c.z 10} and thereby apply c.z = 10 +while scanning c. However it would be incorrect to conclude that a.x +is also a member of that EquivalenceClass. Instead, we form a second +EquivalenceClass {a.x ss.y 42}, where (as discussed earlier) ss.y +references the same table column as b.y but has a different +varnullingrels label and is therefore considered a distinct Var. + +If the lower join were INNER: SELECT * FROM a LEFT JOIN @@ -489,40 +684,23 @@ to null when the other members do.) Consider for example ON a.x = ss.y WHERE a.x = 42; -We can form the below-outer-join EquivalenceClass {b.y c.z 10} and thereby -apply c.z = 10 while scanning c. (The reason we disallow outerjoin-delayed -clauses from forming EquivalenceClasses is exactly that we want to be able -to push any derived clauses as far down as possible.) But once above the -outer join it's no longer necessarily the case that b.y = 10, and thus we -cannot use such EquivalenceClasses to conclude that sorting is unnecessary -(see discussion of PathKeys below). - -In this example, notice also that a.x = ss.y (really a.x = b.y) is not an -equivalence clause because its applicability to b is delayed by the outer -join; thus we do not try to insert b.y into the equivalence class {a.x 42}. -But since we see that a.x has been equated to 42 above the outer join, we -are able to form a below-outer-join class {b.y 42}; this restriction can be -added because no b/c row not having b.y = 42 can contribute to the result -of the outer join, and so we need not compute such rows. Now this class -will get merged with {b.y c.z 10}, leading to the contradiction 10 = 42, -which lets the planner deduce that the b/c join need not be computed at all -because none of its rows can contribute to the outer join. (This gets -implemented as a gating Result filter, since more usually the potential -contradiction involves Param values rather than just Consts, and thus has -to be checked at runtime.) +then ss.y is not any different from b.y and we'd end up with the +EquivalenceClass {a.x b.y c.z 10 42}. This leads to the contradiction +10 = 42, which lets the planner deduce that the b/c join need not be +computed at all because none of its rows can contribute to the outer +join. (This gets implemented as a gating Result filter, since more +usually the potential contradiction involves Param values rather than +just Consts, and thus has to be checked at runtime.) To aid in determining the sort ordering(s) that can work with a mergejoin, we mark each mergejoinable clause with the EquivalenceClasses of its left -and right inputs. For an equivalence clause, these are of course the same -EquivalenceClass. For a non-equivalence mergejoinable clause (such as an -outer-join qualification), we generate two separate EquivalenceClasses for -the left and right inputs. This may result in creating single-item -equivalence "classes", though of course these are still subject to merging -if other equivalence clauses are later found to bear on the same -expressions. - -Another way that we may form a single-item EquivalenceClass is in creation -of a PathKey to represent a desired sort order (see below). This is a bit +and right inputs. (These are in fact always the same EquivalenceClass.) + +In some cases we will form single-item EquivalenceClasses. This happens +if an ORDER BY or GROUP BY key is not mentioned in any equivalence +clause. We need to reason about sort orders in such queries, and our +representation of sort ordering is a PathKey (see below) which uses an +EquivalenceClass, so we have to make an EquivalenceClass. This is a bit different from the above cases because such an EquivalenceClass might contain an aggregate function or volatile expression. (A clause containing a volatile function will never be considered mergejoinable, even if its top @@ -579,7 +757,7 @@ Index scans have Path.pathkeys that represent the chosen index's ordering, if any. A single-key index would create a single-PathKey list, while a multi-column index generates a list with one element per key index column. Non-key columns specified in the INCLUDE clause of covering indexes don't -have corresponding PathKeys in the list, because the have no influence on +have corresponding PathKeys in the list, because they have no influence on index ordering. (Actually, since an index can be scanned either forward or backward, there are two possible sort orders and two possible PathKey lists it can generate.) @@ -655,14 +833,9 @@ redundancy, we save time and improve planning, since the planner will more easily recognize equivalent orderings as being equivalent. Another interesting property is that if the underlying EquivalenceClass -contains a constant and is not below an outer join, then the pathkey is -completely redundant and need not be sorted by at all! Every row must -contain the same constant value, so there's no need to sort. (If the EC is -below an outer join, we still have to sort, since some of the rows might -have gone to null and others not. In this case we must be careful to pick -a non-const member to sort by. The assumption that all the non-const -members go to null at the same plan level is critical here, else they might -not produce the same sort order.) This might seem pointless because users +contains a constant, then the pathkey is completely redundant and need +not be sorted by at all! Every row must contain the same value, so +there's no need to sort. This might seem pointless because users are unlikely to write "... WHERE x = 42 ORDER BY x", but it allows us to recognize when particular index columns are irrelevant to the sort order: if we have "... WHERE x = 42 ORDER BY y", scanning an index on (x,y) @@ -670,15 +843,6 @@ produces correctly ordered data without a sort step. We used to have very ugly ad-hoc code to recognize that in limited contexts, but discarding constant ECs from pathkeys makes it happen cleanly and automatically. -You might object that a below-outer-join EquivalenceClass doesn't always -represent the same values at every level of the join tree, and so using -it to uniquely identify a sort order is dubious. This is true, but we -can avoid dealing with the fact explicitly because we always consider that -an outer join destroys any ordering of its nullable inputs. Thus, even -if a path was sorted by {a.x} below an outer join, we'll re-sort if that -sort ordering was important; and so using the same PathKey for both sort -orderings doesn't create any real problem. - Order of processing for EquivalenceClasses and PathKeys ------------------------------------------------------- commit 7ae3b66331c268862ef9058fb7ca5419949efbd9 Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Sun Jul 10 12:04:07 2022 -0400 Improve performance of adjust_appendrel_attrs_multilevel. The present implementations of adjust_appendrel_attrs_multilevel and its sibling adjust_child_relids_multilevel are very messy, because they work by reconstructing the relids of the child's immediate parent and then seeing if that's bms_equal to the relids of the target parent. Aside from being quite inefficient, this will not work for joinrels whose relids contain outer-join relids in addition to baserels. The whole thing can be solved at a stroke by adding explicit parent and top_parent links to child RelOptInfos, and making these functions work with RelOptInfo pointers instead of relids. Doing that is simpler for most callers, too. In my original version of this patch, I got rid of RelOptInfo.top_parent_relids on the grounds that it was now redundant. However, that adds a lot of code churn in places that otherwise would not need changing, and arguably the extra indirection needed to fetch top_parent->relids in those places costs something. So this version leaves that field in place. diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c index 60c0e3f108..f8a97622b1 100644 --- a/src/backend/optimizer/path/equivclass.c +++ b/src/backend/optimizer/path/equivclass.c @@ -1760,8 +1760,8 @@ generate_join_implied_equalities_broken(PlannerInfo *root, if (IS_OTHER_REL(inner_rel) && result != NIL) result = (List *) adjust_appendrel_attrs_multilevel(root, (Node *) result, - inner_rel->relids, - inner_rel->top_parent_relids); + inner_rel, + inner_rel->top_parent); return result; } @@ -2626,8 +2626,8 @@ add_child_rel_equivalences(PlannerInfo *root, child_expr = (Expr *) adjust_appendrel_attrs_multilevel(root, (Node *) cur_em->em_expr, - child_relids, - top_parent_relids); + child_rel, + child_rel->top_parent); } /* @@ -2768,8 +2768,8 @@ add_child_join_rel_equivalences(PlannerInfo *root, child_expr = (Expr *) adjust_appendrel_attrs_multilevel(root, (Node *) cur_em->em_expr, - child_relids, - top_parent_relids); + child_joinrel, + child_joinrel->top_parent); } /* @@ -2791,8 +2791,8 @@ add_child_join_rel_equivalences(PlannerInfo *root, new_nullable_relids = adjust_child_relids_multilevel(root, new_nullable_relids, - child_relids, - top_parent_relids); + child_joinrel, + child_joinrel->top_parent); (void) add_eq_member(cur_ec, child_expr, new_relids, new_nullable_relids, diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index 06ad856eac..f6baa2a765 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -1791,8 +1791,8 @@ grouping_planner(PlannerInfo *root, double tuple_fraction) withCheckOptions = (List *) adjust_appendrel_attrs_multilevel(root, (Node *) withCheckOptions, - this_result_rel->relids, - top_result_rel->relids); + this_result_rel, + top_result_rel); withCheckOptionLists = lappend(withCheckOptionLists, withCheckOptions); } @@ -1804,8 +1804,8 @@ grouping_planner(PlannerInfo *root, double tuple_fraction) returningList = (List *) adjust_appendrel_attrs_multilevel(root, (Node *) returningList, - this_result_rel->relids, - top_result_rel->relids); + this_result_rel, + top_result_rel); returningLists = lappend(returningLists, returningList); } @@ -1826,13 +1826,13 @@ grouping_planner(PlannerInfo *root, double tuple_fraction) leaf_action->qual = adjust_appendrel_attrs_multilevel(root, (Node *) action->qual, - this_result_rel->relids, - top_result_rel->relids); + this_result_rel, + top_result_rel); leaf_action->targetList = (List *) adjust_appendrel_attrs_multilevel(root, (Node *) action->targetList, - this_result_rel->relids, - top_result_rel->relids); + this_result_rel, + top_result_rel); if (leaf_action->commandType == CMD_UPDATE) leaf_action->updateColnos = adjust_inherited_attnums_multilevel(root, diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c index 9d4bb47027..62cccf9d87 100644 --- a/src/backend/optimizer/util/appendinfo.c +++ b/src/backend/optimizer/util/appendinfo.c @@ -479,39 +479,34 @@ adjust_appendrel_attrs_mutator(Node *node, /* * adjust_appendrel_attrs_multilevel - * Apply Var translations from a toplevel appendrel parent down to a child. + * Apply Var translations from an appendrel parent down to a child. * - * In some cases we need to translate expressions referencing a parent relation - * to reference an appendrel child that's multiple levels removed from it. + * Replace Vars in the "node" expression that reference "parentrel" with + * the appropriate Vars for "childrel". childrel can be more than one + * inheritance level removed from parentrel. */ Node * adjust_appendrel_attrs_multilevel(PlannerInfo *root, Node *node, - Relids child_relids, - Relids top_parent_relids) + RelOptInfo *childrel, + RelOptInfo *parentrel) { AppendRelInfo **appinfos; - Bitmapset *parent_relids = NULL; int nappinfos; - int cnt; - - Assert(bms_num_members(child_relids) == bms_num_members(top_parent_relids)); - - appinfos = find_appinfos_by_relids(root, child_relids, &nappinfos); - /* Construct relids set for the immediate parent of given child. */ - for (cnt = 0; cnt < nappinfos; cnt++) + /* Recurse if immediate parent is not the top parent. */ + if (childrel->parent != parentrel) { - AppendRelInfo *appinfo = appinfos[cnt]; - - parent_relids = bms_add_member(parent_relids, appinfo->parent_relid); + if (childrel->parent) + node = adjust_appendrel_attrs_multilevel(root, node, + childrel->parent, + parentrel); + else + elog(ERROR, "childrel is not a child of parentrel"); } - /* Recurse if immediate parent is not the top parent. */ - if (!bms_equal(parent_relids, top_parent_relids)) - node = adjust_appendrel_attrs_multilevel(root, node, parent_relids, - top_parent_relids); + /* Now translate for this child. */ + appinfos = find_appinfos_by_relids(root, childrel->relids, &nappinfos); - /* Now translate for this child */ node = adjust_appendrel_attrs(root, node, nappinfos, appinfos); pfree(appinfos); @@ -554,56 +549,43 @@ adjust_child_relids(Relids relids, int nappinfos, AppendRelInfo **appinfos) } /* - * Replace any relid present in top_parent_relids with its child in - * child_relids. Members of child_relids can be multiple levels below top - * parent in the partition hierarchy. + * Substitute child's relids for parent's relids in a Relid set. + * The childrel can be multiple inheritance levels below the parent. */ Relids adjust_child_relids_multilevel(PlannerInfo *root, Relids relids, - Relids child_relids, Relids top_parent_relids) + RelOptInfo *childrel, + RelOptInfo *parentrel) { AppendRelInfo **appinfos; int nappinfos; - Relids parent_relids = NULL; - Relids result; - Relids tmp_result = NULL; - int cnt; /* - * If the given relids set doesn't contain any of the top parent relids, - * it will remain unchanged. + * If the given relids set doesn't contain any of the parent relids, it + * will remain unchanged. */ - if (!bms_overlap(relids, top_parent_relids)) + if (!bms_overlap(relids, parentrel->relids)) return relids; - appinfos = find_appinfos_by_relids(root, child_relids, &nappinfos); - - /* Construct relids set for the immediate parent of the given child. */ - for (cnt = 0; cnt < nappinfos; cnt++) - { - AppendRelInfo *appinfo = appinfos[cnt]; - - parent_relids = bms_add_member(parent_relids, appinfo->parent_relid); - } - /* Recurse if immediate parent is not the top parent. */ - if (!bms_equal(parent_relids, top_parent_relids)) + if (childrel->parent != parentrel) { - tmp_result = adjust_child_relids_multilevel(root, relids, - parent_relids, - top_parent_relids); - relids = tmp_result; + if (childrel->parent) + relids = adjust_child_relids_multilevel(root, relids, + childrel->parent, + parentrel); + else + elog(ERROR, "childrel is not a child of parentrel"); } - result = adjust_child_relids(relids, nappinfos, appinfos); + /* Now translate for this child. */ + appinfos = find_appinfos_by_relids(root, childrel->relids, &nappinfos); + + relids = adjust_child_relids(relids, nappinfos, appinfos); - /* Free memory consumed by any intermediate result. */ - if (tmp_result) - bms_free(tmp_result); - bms_free(parent_relids); pfree(appinfos); - return result; + return relids; } /* @@ -694,8 +676,8 @@ get_translated_update_targetlist(PlannerInfo *root, Index relid, *processed_tlist = (List *) adjust_appendrel_attrs_multilevel(root, (Node *) root->processed_tlist, - bms_make_singleton(relid), - bms_make_singleton(root->parse->resultRelation)); + find_base_rel(root, relid), + find_base_rel(root, root->parse->resultRelation)); if (update_colnos) *update_colnos = adjust_inherited_attnums_multilevel(root, root->update_colnos, diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c index 483c4f4137..833b440f39 100644 --- a/src/backend/optimizer/util/pathnode.c +++ b/src/backend/optimizer/util/pathnode.c @@ -3994,8 +3994,8 @@ reparameterize_path_by_child(PlannerInfo *root, Path *path, #define ADJUST_CHILD_ATTRS(node) \ ((node) = \ (List *) adjust_appendrel_attrs_multilevel(root, (Node *) (node), \ - child_rel->relids, \ - child_rel->top_parent_relids)) + child_rel, \ + child_rel->top_parent)) #define REPARAMETERIZE_CHILD_PATH(path) \ do { \ @@ -4212,8 +4212,8 @@ do { \ old_ppi = new_path->param_info; required_outer = adjust_child_relids_multilevel(root, old_ppi->ppi_req_outer, - child_rel->relids, - child_rel->top_parent_relids); + child_rel, + child_rel->top_parent); /* If we already have a PPI for this parameterization, just return it */ new_ppi = find_param_path_info(new_path->parent, required_outer); diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c index 520409f4ba..a163853bed 100644 --- a/src/backend/optimizer/util/relnode.c +++ b/src/backend/optimizer/util/relnode.c @@ -265,14 +265,10 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent) */ if (parent) { - /* - * Each direct or indirect child wants to know the relids of its - * topmost parent. - */ - if (parent->top_parent_relids) - rel->top_parent_relids = parent->top_parent_relids; - else - rel->top_parent_relids = bms_copy(parent->relids); + /* We keep back-links to immediate parent and topmost parent. */ + rel->parent = parent; + rel->top_parent = parent->top_parent ? parent->top_parent : parent; + rel->top_parent_relids = rel->top_parent->relids; /* * Also propagate lateral-reference information from appendrel parent @@ -294,6 +290,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent) } else { + rel->parent = NULL; + rel->top_parent = NULL; rel->top_parent_relids = NULL; rel->direct_lateral_relids = NULL; rel->lateral_relids = NULL; @@ -663,6 +661,8 @@ build_join_rel(PlannerInfo *root, joinrel->joininfo = NIL; joinrel->has_eclass_joins = false; joinrel->consider_partitionwise_join = false; /* might get changed later */ + joinrel->parent = NULL; + joinrel->top_parent = NULL; joinrel->top_parent_relids = NULL; joinrel->part_scheme = NULL; joinrel->nparts = -1; @@ -842,7 +842,9 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel, joinrel->joininfo = NIL; joinrel->has_eclass_joins = false; joinrel->consider_partitionwise_join = false; /* might get changed later */ - joinrel->top_parent_relids = NULL; + joinrel->parent = parent_joinrel; + joinrel->top_parent = parent_joinrel->top_parent ? parent_joinrel->top_parent : parent_joinrel; + joinrel->top_parent_relids = joinrel->top_parent->relids; joinrel->part_scheme = NULL; joinrel->nparts = -1; joinrel->boundinfo = NULL; @@ -854,9 +856,6 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel, joinrel->partexprs = NULL; joinrel->nullable_partexprs = NULL; - joinrel->top_parent_relids = bms_union(outer_rel->top_parent_relids, - inner_rel->top_parent_relids); - /* Compute information relevant to foreign relations. */ set_foreign_rel_properties(joinrel, outer_rel, inner_rel); diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c index 9d3c05aed3..aeabc2aca9 100644 --- a/src/backend/partitioning/partprune.c +++ b/src/backend/partitioning/partprune.c @@ -529,8 +529,8 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel, partprunequal = (List *) adjust_appendrel_attrs_multilevel(root, (Node *) prunequal, - subpart->relids, - targetpart->relids); + subpart, + targetpart); } /* diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h index 44ffc73f15..ee8bed6452 100644 --- a/src/include/nodes/pathnodes.h +++ b/src/include/nodes/pathnodes.h @@ -915,7 +915,15 @@ typedef struct RelOptInfo */ /* consider partitionwise join paths? (if partitioned rel) */ bool consider_partitionwise_join; - /* Relids of topmost parents (if "other" rel) */ + + /* + * inheritance links, if this is an otherrel (otherwise NULL): + */ + /* Immediate parent relation (dumping it would be too verbose) */ + struct RelOptInfo *parent pg_node_attr(read_write_ignore); + /* Topmost parent relation (dumping it would be too verbose) */ + struct RelOptInfo *top_parent pg_node_attr(read_write_ignore); + /* Relids of topmost parent (redundant, but handy) */ Relids top_parent_relids; /* diff --git a/src/include/optimizer/appendinfo.h b/src/include/optimizer/appendinfo.h index fc808dcd27..5e80a741a4 100644 --- a/src/include/optimizer/appendinfo.h +++ b/src/include/optimizer/appendinfo.h @@ -23,13 +23,13 @@ extern AppendRelInfo *make_append_rel_info(Relation parentrel, extern Node *adjust_appendrel_attrs(PlannerInfo *root, Node *node, int nappinfos, AppendRelInfo **appinfos); extern Node *adjust_appendrel_attrs_multilevel(PlannerInfo *root, Node *node, - Relids child_relids, - Relids top_parent_relids); + RelOptInfo *childrel, + RelOptInfo *parentrel); extern Relids adjust_child_relids(Relids relids, int nappinfos, AppendRelInfo **appinfos); extern Relids adjust_child_relids_multilevel(PlannerInfo *root, Relids relids, - Relids child_relids, - Relids top_parent_relids); + RelOptInfo *childrel, + RelOptInfo *parentrel); extern List *adjust_inherited_attnums(List *attnums, AppendRelInfo *context); extern List *adjust_inherited_attnums_multilevel(PlannerInfo *root, List *attnums, commit bd58b68da9d15c0e2cce86a6a837302a8a764598 Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Sun Jul 10 13:26:16 2022 -0400 Add Var.varnullingrels and PlaceHolderVar.phnullingrels fields. These fields are always empty as of this commit, so they don't affect any behavior, even though equal() will compare them. Update backend/nodes/ and backend/rewrite/ infrastructure as needed. Also add some rewrite functions we'll need later. diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c index 28288dcfc1..19606c495f 100644 --- a/src/backend/nodes/makefuncs.c +++ b/src/backend/nodes/makefuncs.c @@ -81,11 +81,13 @@ makeVar(int varno, var->varlevelsup = varlevelsup; /* - * Only a few callers need to make Var nodes with varnosyn/varattnosyn - * different from varno/varattno. We don't provide separate arguments for - * them, but just initialize them to the given varno/varattno. This - * reduces code clutter and chance of error for most callers. + * Only a few callers need to make Var nodes with non-null varnullingrels, + * or with varnosyn/varattnosyn different from varno/varattno. We don't + * provide separate arguments for them, but just initialize them to NULL + * and the given varno/varattno. This reduces code clutter and chance of + * error for most callers. */ + var->varnullingrels = NULL; var->varnosyn = (Index) varno; var->varattnosyn = varattno; diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c index 4cb1744da6..ccf63515fa 100644 --- a/src/backend/nodes/nodeFuncs.c +++ b/src/backend/nodes/nodeFuncs.c @@ -2847,6 +2847,7 @@ expression_tree_mutator(Node *node, Var *newnode; FLATCOPY(newnode, var, Var); + /* Assume we need not copy the varnullingrels bitmapset */ return (Node *) newnode; } break; @@ -3442,7 +3443,7 @@ expression_tree_mutator(Node *node, FLATCOPY(newnode, phv, PlaceHolderVar); MUTATE(newnode->phexpr, phv->phexpr, Expr *); - /* Assume we need not copy the relids bitmapset */ + /* Assume we need not copy the relids bitmapsets */ return (Node *) newnode; } break; diff --git a/src/backend/rewrite/rewriteManip.c b/src/backend/rewrite/rewriteManip.c index 101c39553a..a0a0026469 100644 --- a/src/backend/rewrite/rewriteManip.c +++ b/src/backend/rewrite/rewriteManip.c @@ -40,6 +40,13 @@ typedef struct int win_location; } locate_windowfunc_context; +typedef struct +{ + Bitmapset *removable_relids; + Bitmapset *except_relids; + int sublevels_up; +} remove_nulling_relids_context; + static bool contain_aggs_of_level_walker(Node *node, contain_aggs_of_level_context *context); static bool locate_agg_of_level_walker(Node *node, @@ -50,6 +57,9 @@ static bool locate_windowfunc_walker(Node *node, static bool checkExprHasSubLink_walker(Node *node, void *context); static Relids offset_relid_set(Relids relids, int offset); static Relids adjust_relid_set(Relids relids, int oldrelid, int newrelid); +static bool get_nulling_relids_walker(Node *node, Bitmapset **context); +static Node *remove_nulling_relids_mutator(Node *node, + remove_nulling_relids_context *context); /* @@ -348,6 +358,8 @@ OffsetVarNodes_walker(Node *node, OffsetVarNodes_context *context) if (var->varlevelsup == context->sublevels_up) { var->varno += context->offset; + var->varnullingrels = offset_relid_set(var->varnullingrels, + context->offset); if (var->varnosyn > 0) var->varnosyn += context->offset; } @@ -386,6 +398,8 @@ OffsetVarNodes_walker(Node *node, OffsetVarNodes_context *context) { phv->phrels = offset_relid_set(phv->phrels, context->offset); + phv->phnullingrels = offset_relid_set(phv->phnullingrels, + context->offset); } /* fall through to examine children */ } @@ -510,11 +524,13 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context) { Var *var = (Var *) node; - if (var->varlevelsup == context->sublevels_up && - var->varno == context->rt_index) + if (var->varlevelsup == context->sublevels_up) { - var->varno = context->new_index; - /* If the syntactic referent is same RTE, fix it too */ + if (var->varno == context->rt_index) + var->varno = context->new_index; + var->varnullingrels = adjust_relid_set(var->varnullingrels, + context->rt_index, + context->new_index); if (var->varnosyn == context->rt_index) var->varnosyn = context->new_index; } @@ -557,6 +573,9 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context) phv->phrels = adjust_relid_set(phv->phrels, context->rt_index, context->new_index); + phv->phnullingrels = adjust_relid_set(phv->phnullingrels, + context->rt_index, + context->new_index); } /* fall through to examine children */ } @@ -833,7 +852,8 @@ rangeTableEntry_used_walker(Node *node, Var *var = (Var *) node; if (var->varlevelsup == context->sublevels_up && - var->varno == context->rt_index) + (var->varno == context->rt_index || + bms_is_member(context->rt_index, var->varnullingrels))) return true; return false; } @@ -1061,6 +1081,154 @@ AddInvertedQual(Query *parsetree, Node *qual) } +/* + * get_nulling_relids collects all the level-zero RT indexes mentioned in + * Var.varnullingrels and PlaceHolderVar.phnullingrels fields within the + * given expression. + */ +Bitmapset * +get_nulling_relids(Node *node) +{ + Bitmapset *result = NULL; + + (void) get_nulling_relids_walker(node, &result); + return result; +} + +static bool +get_nulling_relids_walker(Node *node, Bitmapset **context) +{ + if (node == NULL) + return false; + if (IsA(node, Var)) + { + Var *var = (Var *) node; + + if (var->varlevelsup == 0) + *context = bms_add_members(*context, var->varnullingrels); + } + else if (IsA(node, PlaceHolderVar)) + { + PlaceHolderVar *phv = (PlaceHolderVar *) node; + + if (phv->phlevelsup == 0) + *context = bms_add_members(*context, phv->phnullingrels); + } + + /* + * Currently, this is only used after the planner has converted SubLinks + * to SubPlans, so we don't need to support recursion into sub-Queries; so + * no sublevels_up counting is needed. + */ + Assert(!IsA(node, SubLink)); + Assert(!IsA(node, Query)); + return expression_tree_walker(node, get_nulling_relids_walker, context); +} + +/* + * remove_nulling_relids removes mentions of the specified RT index(es) + * in Var.varnullingrels and PlaceHolderVar.phnullingrels fields within + * the given expression, except in nodes belonging to rels listed in + * except_relids. + * + * XXX consider making this a destructive walker. + */ +Node * +remove_nulling_relids(Node *node, Bitmapset *removable_relids, + Bitmapset *except_relids) +{ + remove_nulling_relids_context context; + + context.removable_relids = removable_relids; + context.except_relids = except_relids; + context.sublevels_up = 0; + return query_or_expression_tree_mutator(node, + remove_nulling_relids_mutator, + &context, + 0); +} + +static Node * +remove_nulling_relids_mutator(Node *node, + remove_nulling_relids_context *context) +{ + if (node == NULL) + return NULL; + if (IsA(node, Var)) + { + Var *var = (Var *) node; + + if (var->varlevelsup == context->sublevels_up && + !bms_is_member(var->varno, context->except_relids) && + bms_overlap(var->varnullingrels, context->removable_relids)) + { + Relids newnullingrels = bms_difference(var->varnullingrels, + context->removable_relids); + + /* Micro-optimization: ensure nullingrels is NULL if empty */ + if (bms_is_empty(newnullingrels)) + newnullingrels = NULL; + /* Copy the Var ... */ + var = copyObject(var); + /* ... and replace the copy's varnullingrels field */ + var->varnullingrels = newnullingrels; + return (Node *) var; + } + /* Otherwise fall through to copy the Var normally */ + } + else if (IsA(node, PlaceHolderVar)) + { + PlaceHolderVar *phv = (PlaceHolderVar *) node; + + if (phv->phlevelsup == context->sublevels_up && + !bms_overlap(phv->phrels, context->except_relids)) + { + Relids newnullingrels = bms_difference(phv->phnullingrels, + context->removable_relids); + + /* + * Micro-optimization: ensure nullingrels is NULL if empty. + * + * Note: it might seem desirable to remove the PHV altogether if + * phnullingrels goes to empty. Currently we dare not do that + * because we use PHVs in some cases to enforce separate identity + * of subexpressions; see wrap_non_vars usages in prepjointree.c. + */ + if (bms_is_empty(newnullingrels)) + newnullingrels = NULL; + /* Copy the PlaceHolderVar and mutate what's below ... */ + phv = (PlaceHolderVar *) + expression_tree_mutator(node, + remove_nulling_relids_mutator, + (void *) context); + /* ... and replace the copy's phnullingrels field */ + phv->phnullingrels = newnullingrels; + /* We must also update phrels, if it contains a removable RTI */ + phv->phrels = bms_difference(phv->phrels, + context->removable_relids); + Assert(!bms_is_empty(phv->phrels)); + return (Node *) phv; + } + /* Otherwise fall through to copy the PlaceHolderVar normally */ + } + else if (IsA(node, Query)) + { + /* Recurse into RTE or sublink subquery */ + Query *newnode; + + context->sublevels_up++; + newnode = query_tree_mutator((Query *) node, + remove_nulling_relids_mutator, + (void *) context, + 0); + context->sublevels_up--; + return (Node *) newnode; + } + return expression_tree_mutator(node, remove_nulling_relids_mutator, + (void *) context); +} + + /* * replace_rte_variables() finds all Vars in an expression tree * that reference a particular RTE, and replaces them with substitute diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c index eeaa0b31fe..e517e0363c 100644 --- a/src/backend/utils/misc/queryjumble.c +++ b/src/backend/utils/misc/queryjumble.c @@ -381,6 +381,11 @@ JumbleExpr(JumbleState *jstate, Node *node) APP_JUMB(var->varno); APP_JUMB(var->varattno); APP_JUMB(var->varlevelsup); + + /* + * We can omit varnullingrels, because it's fully determined + * by varno/varlevelsup plus the Var's query location. + */ } break; case T_Const: diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h index ee8bed6452..369ddfbddc 100644 --- a/src/include/nodes/pathnodes.h +++ b/src/include/nodes/pathnodes.h @@ -2574,10 +2574,15 @@ typedef struct MergeScanSelCache * of a plan tree. This is used during planning to represent the contained * expression. At the end of the planning process it is replaced by either * the contained expression or a Var referring to a lower-level evaluation of - * the contained expression. Typically the evaluation occurs below an outer + * the contained expression. Generally the evaluation occurs below an outer * join, and Var references above the outer join might thereby yield NULL * instead of the expression value. * + * phrels and phlevelsup correspond to the varno/varlevelsup fields of a + * plain Var, except that phrels has to be a relid set since the evaluation + * level of a PlaceHolderVar might be a join rather than a base relation. + * Likewise, phnullingrels corresponds to varnullingrels. + * * Although the planner treats this as an expression node type, it is not * recognized by the parser or executor, so we declare it here rather than * in primnodes.h. @@ -2590,8 +2595,10 @@ typedef struct MergeScanSelCache * PHV. Another way in which it can happen is that initplan sublinks * could get replaced by differently-numbered Params when sublink folding * is done. (The end result of such a situation would be some - * unreferenced initplans, which is annoying but not really a problem.) On - * the same reasoning, there is no need to examine phrels. + * unreferenced initplans, which is annoying but not really a problem.) On + * the same reasoning, there is no need to examine phrels. But we do need + * to compare phnullingrels, as that represents effects that are external + * to the original value of the PHV. */ typedef struct PlaceHolderVar @@ -2601,9 +2608,12 @@ typedef struct PlaceHolderVar /* the represented expression */ Expr *phexpr pg_node_attr(equal_ignore); - /* base relids syntactically within expr src */ + /* base+OJ relids syntactically within expr src */ Relids phrels pg_node_attr(equal_ignore); + /* RT indexes of outer joins that can null PHV's value */ + Relids phnullingrels; + /* ID for PHV (unique within planner run) */ Index phid; diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h index 1fc2fbffa3..f2a0739e6e 100644 --- a/src/include/nodes/primnodes.h +++ b/src/include/nodes/primnodes.h @@ -189,6 +189,14 @@ typedef struct Expr * row identity information during UPDATE/DELETE. This value should never * be seen outside the planner. * + * varnullingrels is the set of RT indexes of outer joins that can force + * the Var's value to null (at the point where it appears in the query). + * See optimizer/README for discussion of that. + * + * varlevelsup is greater than zero in Vars that represent outer references. + * Note that it affects the meaning of all of varno, varnullingrels, and + * varnosyn, all of which refer to the range table of that query level. + * * In the parser, varnosyn and varattnosyn are either identical to * varno/varattno, or they specify the column's position in an aliased JOIN * RTE that hides the semantic referent RTE's refname. This is a syntactic @@ -231,6 +239,8 @@ typedef struct Var int32 vartypmod; /* OID of collation, or InvalidOid if none */ Oid varcollid; + /* RT indexes of outer joins that can replace the Var's value with null */ + Bitmapset *varnullingrels; /* * for subquery variables referencing outer relations; 0 in a normal var, diff --git a/src/include/rewrite/rewriteManip.h b/src/include/rewrite/rewriteManip.h index 98b9b3a288..a3f902c1bb 100644 --- a/src/include/rewrite/rewriteManip.h +++ b/src/include/rewrite/rewriteManip.h @@ -63,6 +63,10 @@ extern bool contain_windowfuncs(Node *node); extern int locate_windowfunc(Node *node); extern bool checkExprHasSubLink(Node *node); +extern Bitmapset *get_nulling_relids(Node *node); +extern Node *remove_nulling_relids(Node *node, Bitmapset *removable_relids, + Bitmapset *except_relids); + extern Node *replace_rte_variables(Node *node, int target_varno, int sublevels_up, replace_rte_variables_callback callback, commit 2ffacd89273b53d8cc94ae788ad2b6d4d1c3b7fc Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Sun Jul 10 13:44:30 2022 -0400 Teach the parser to fill Var.varnullingrels correctly. Vars emitted by the parser are now marked with RT indexes of outer joins that can null them. (This is done purely according to the syntax of the query; we don't consider whether an outer join could be strength-reduced, for example.) Although the result of this step compiles, it will fail some regression tests due to the planner not yet knowing what to do. diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c index 8ed2c4b8c7..a7fe6f4e66 100644 --- a/src/backend/parser/analyze.c +++ b/src/backend/parser/analyze.c @@ -670,6 +670,7 @@ transformInsertStmt(ParseState *pstate, InsertStmt *stmt) */ sub_pstate->p_rtable = sub_rtable; sub_pstate->p_joinexprs = NIL; /* sub_rtable has no joins */ + sub_pstate->p_nullingrels = NIL; sub_pstate->p_namespace = sub_namespace; sub_pstate->p_resolve_unknowns = false; @@ -851,7 +852,7 @@ transformInsertStmt(ParseState *pstate, InsertStmt *stmt) /* * Generate list of Vars referencing the RTE */ - exprList = expandNSItemVars(nsitem, 0, -1, NULL); + exprList = expandNSItemVars(pstate, nsitem, 0, -1, NULL); /* * Re-apply any indirection on the target column specs to the Vars diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c index c655d188c7..2abd164380 100644 --- a/src/backend/parser/parse_clause.c +++ b/src/backend/parser/parse_clause.c @@ -52,7 +52,8 @@ #include "utils/syscache.h" -static int extractRemainingColumns(ParseNamespaceColumn *src_nscolumns, +static int extractRemainingColumns(ParseState *pstate, + ParseNamespaceColumn *src_nscolumns, List *src_colnames, List **src_colnos, List **res_colnames, List **res_colvars, @@ -75,9 +76,11 @@ static ParseNamespaceItem *getNSItemForSpecialRelationTypes(ParseState *pstate, static Node *transformFromClauseItem(ParseState *pstate, Node *n, ParseNamespaceItem **top_nsitem, List **namespace); -static Var *buildVarFromNSColumn(ParseNamespaceColumn *nscol); +static Var *buildVarFromNSColumn(ParseState *pstate, + ParseNamespaceColumn *nscol); static Node *buildMergedJoinVar(ParseState *pstate, JoinType jointype, Var *l_colvar, Var *r_colvar); +static void markRelsAsNulledBy(ParseState *pstate, Node *n, int jindex); static void setNamespaceColumnVisibility(List *namespace, bool cols_visible); static void setNamespaceLateralState(List *namespace, bool lateral_only, bool lateral_ok); @@ -249,7 +252,8 @@ setTargetTable(ParseState *pstate, RangeVar *relation, * Returns the number of columns added. */ static int -extractRemainingColumns(ParseNamespaceColumn *src_nscolumns, +extractRemainingColumns(ParseState *pstate, + ParseNamespaceColumn *src_nscolumns, List *src_colnames, List **src_colnos, List **res_colnames, List **res_colvars, @@ -285,7 +289,8 @@ extractRemainingColumns(ParseNamespaceColumn *src_nscolumns, *src_colnos = lappend_int(*src_colnos, attnum); *res_colnames = lappend(*res_colnames, lfirst(lc)); *res_colvars = lappend(*res_colvars, - buildVarFromNSColumn(src_nscolumns + attnum - 1)); + buildVarFromNSColumn(pstate, + src_nscolumns + attnum - 1)); /* Copy the input relation's nscolumn data for this column */ res_nscolumns[colcount] = src_nscolumns[attnum - 1]; colcount++; @@ -1295,8 +1300,7 @@ transformFromClauseItem(ParseState *pstate, Node *n, { /* * JOIN/USING (or NATURAL JOIN, as transformed above). Transform - * the list into an explicit ON-condition, and generate a list of - * merged result columns. + * the list into an explicit ON-condition. */ List *ucols = j->usingClause; List *l_usingvars = NIL; @@ -1314,8 +1318,6 @@ transformFromClauseItem(ParseState *pstate, Node *n, int r_index = -1; Var *l_colvar, *r_colvar; - Node *u_colvar; - ParseNamespaceColumn *res_nscolumn; Assert(u_colname[0] != '\0'); @@ -1379,17 +1381,109 @@ transformFromClauseItem(ParseState *pstate, Node *n, u_colname))); r_colnos = lappend_int(r_colnos, r_index + 1); - l_colvar = buildVarFromNSColumn(l_nscolumns + l_index); + /* Build Vars to use in the generated JOIN ON clause */ + l_colvar = buildVarFromNSColumn(pstate, l_nscolumns + l_index); l_usingvars = lappend(l_usingvars, l_colvar); - r_colvar = buildVarFromNSColumn(r_nscolumns + r_index); + r_colvar = buildVarFromNSColumn(pstate, r_nscolumns + r_index); r_usingvars = lappend(r_usingvars, r_colvar); + /* + * While we're here, add column names to the res_colnames + * list. It's a bit ugly to do this here while the + * corresponding res_colvars entries are not made till later, + * but doing this later would require an additional traversal + * of the usingClause list. + */ res_colnames = lappend(res_colnames, lfirst(ucol)); + } + + /* Construct the generated JOIN ON clause */ + j->quals = transformJoinUsingClause(pstate, + l_usingvars, + r_usingvars); + } + else if (j->quals) + { + /* User-written ON-condition; transform it */ + j->quals = transformJoinOnClause(pstate, j, my_namespace); + } + else + { + /* CROSS JOIN: no quals */ + } + + /* + * If this is an outer join, now mark the appropriate child RTEs as + * being nulled by this join. We have finished processing the child + * join expressions as well as the current join's quals, which deal in + * non-nulled input columns. All future references to those RTEs will + * see possibly-nulled values, and we should mark generated Vars to + * account for that. In particular, the join alias Vars that we're + * about to build should reflect the nulling effects of this join. + * + * A difficulty with doing this is that we need the join's RT index, + * which we don't officially have yet. However, no other RTE can get + * made between here and the addRangeTableEntryForJoin call, so we can + * predict what the assignment will be. (Alternatively, we could call + * addRangeTableEntryForJoin before we have all the data computed, but + * this seems less ugly.) + */ + j->rtindex = list_length(pstate->p_rtable) + 1; + + switch (j->jointype) + { + case JOIN_INNER: + break; + case JOIN_LEFT: + markRelsAsNulledBy(pstate, j->rarg, j->rtindex); + break; + case JOIN_FULL: + markRelsAsNulledBy(pstate, j->larg, j->rtindex); + markRelsAsNulledBy(pstate, j->rarg, j->rtindex); + break; + case JOIN_RIGHT: + markRelsAsNulledBy(pstate, j->larg, j->rtindex); + break; + default: + /* shouldn't see any other types here */ + elog(ERROR, "unrecognized join type: %d", + (int) j->jointype); + break; + } + + /* + * Now we can construct join alias expressions for the USING columns. + */ + if (j->usingClause) + { + ListCell *lc1, + *lc2; + + /* Scan the colnos lists to recover info from the previous loop */ + forboth(lc1, l_colnos, lc2, r_colnos) + { + int l_index = lfirst_int(lc1) - 1; + int r_index = lfirst_int(lc2) - 1; + Var *l_colvar, + *r_colvar; + Node *u_colvar; + ParseNamespaceColumn *res_nscolumn; + + /* + * Note we re-build these Vars: they might have different + * varnullingrels than the ones made in the previous loop. + */ + l_colvar = buildVarFromNSColumn(pstate, l_nscolumns + l_index); + r_colvar = buildVarFromNSColumn(pstate, r_nscolumns + r_index); + + /* Construct the join alias Var for this column */ u_colvar = buildMergedJoinVar(pstate, j->jointype, l_colvar, r_colvar); res_colvars = lappend(res_colvars, u_colvar); + + /* Construct column's res_nscolumns[] entry */ res_nscolumn = res_nscolumns + res_colindex; res_colindex++; if (u_colvar == (Node *) l_colvar) @@ -1407,47 +1501,45 @@ transformFromClauseItem(ParseState *pstate, Node *n, /* * Merged column is not semantically equivalent to either * input, so it needs to be referenced as the join output - * column. We don't know the join's varno yet, so we'll - * replace these zeroes below. + * column. */ - res_nscolumn->p_varno = 0; + res_nscolumn->p_varno = j->rtindex; res_nscolumn->p_varattno = res_colindex; res_nscolumn->p_vartype = exprType(u_colvar); res_nscolumn->p_vartypmod = exprTypmod(u_colvar); res_nscolumn->p_varcollid = exprCollation(u_colvar); - res_nscolumn->p_varnosyn = 0; + res_nscolumn->p_varnosyn = j->rtindex; res_nscolumn->p_varattnosyn = res_colindex; } } - - j->quals = transformJoinUsingClause(pstate, - l_usingvars, - r_usingvars); - } - else if (j->quals) - { - /* User-written ON-condition; transform it */ - j->quals = transformJoinOnClause(pstate, j, my_namespace); - } - else - { - /* CROSS JOIN: no quals */ } /* Add remaining columns from each side to the output columns */ res_colindex += - extractRemainingColumns(l_nscolumns, l_colnames, &l_colnos, + extractRemainingColumns(pstate, + l_nscolumns, l_colnames, &l_colnos, &res_colnames, &res_colvars, res_nscolumns + res_colindex); res_colindex += - extractRemainingColumns(r_nscolumns, r_colnames, &r_colnos, + extractRemainingColumns(pstate, + r_nscolumns, r_colnames, &r_colnos, &res_colnames, &res_colvars, res_nscolumns + res_colindex); + /* If join has an alias, it syntactically hides all inputs */ + if (j->alias) + { + for (k = 0; k < res_colindex; k++) + { + ParseNamespaceColumn *nscol = res_nscolumns + k; + + nscol->p_varnosyn = j->rtindex; + nscol->p_varattnosyn = k + 1; + } + } + /* * Now build an RTE and nsitem for the result of the join. - * res_nscolumns isn't totally done yet, but that's OK because - * addRangeTableEntryForJoin doesn't examine it, only store a pointer. */ nsitem = addRangeTableEntryForJoin(pstate, res_colnames, @@ -1461,31 +1553,16 @@ transformFromClauseItem(ParseState *pstate, Node *n, j->alias, true); - j->rtindex = nsitem->p_rtindex; + /* Verify that we correctly predicted the join's RT index */ + Assert(j->rtindex == nsitem->p_rtindex); + /* Cross-check number of columns, too */ + Assert(res_colindex == list_length(nsitem->p_names->colnames)); /* - * Now that we know the join RTE's rangetable index, we can fix up the - * res_nscolumns data in places where it should contain that. + * Save a link to the JoinExpr in the proper element of p_joinexprs. + * Since we maintain that list lazily, it may be necessary to fill in + * empty entries before we can add the JoinExpr in the right place. */ - Assert(res_colindex == list_length(nsitem->p_names->colnames)); - for (k = 0; k < res_colindex; k++) - { - ParseNamespaceColumn *nscol = res_nscolumns + k; - - /* fill in join RTI for merged columns */ - if (nscol->p_varno == 0) - nscol->p_varno = j->rtindex; - if (nscol->p_varnosyn == 0) - nscol->p_varnosyn = j->rtindex; - /* if join has an alias, it syntactically hides all inputs */ - if (j->alias) - { - nscol->p_varnosyn = j->rtindex; - nscol->p_varattnosyn = k + 1; - } - } - - /* make a matching link to the JoinExpr for later use */ for (k = list_length(pstate->p_joinexprs) + 1; k < j->rtindex; k++) pstate->p_joinexprs = lappend(pstate->p_joinexprs, NULL); pstate->p_joinexprs = lappend(pstate->p_joinexprs, j); @@ -1554,10 +1631,13 @@ transformFromClauseItem(ParseState *pstate, Node *n, * buildVarFromNSColumn - * build a Var node using ParseNamespaceColumn data * - * We assume varlevelsup should be 0, and no location is specified + * This is used to construct joinaliasvars entries. + * We can assume varlevelsup should be 0, and no location is specified. + * Note also that no column SELECT privilege is requested here; that would + * happen only if the column is actually referenced in the query. */ static Var * -buildVarFromNSColumn(ParseNamespaceColumn *nscol) +buildVarFromNSColumn(ParseState *pstate, ParseNamespaceColumn *nscol) { Var *var; @@ -1571,6 +1651,10 @@ buildVarFromNSColumn(ParseNamespaceColumn *nscol) /* makeVar doesn't offer parameters for these, so set by hand: */ var->varnosyn = nscol->p_varnosyn; var->varattnosyn = nscol->p_varattnosyn; + + /* ... and update varnullingrels */ + markNullableIfNeeded(pstate, var); + return var; } @@ -1682,6 +1766,47 @@ buildMergedJoinVar(ParseState *pstate, JoinType jointype, return res_node; } +/* + * markRelsAsNulledBy - + * Mark the given jointree node and its children as nulled by join jindex + */ +static void +markRelsAsNulledBy(ParseState *pstate, Node *n, int jindex) +{ + int varno; + ListCell *lc; + + /* Note: we can't see FromExpr here */ + if (IsA(n, RangeTblRef)) + { + varno = ((RangeTblRef *) n)->rtindex; + } + else if (IsA(n, JoinExpr)) + { + JoinExpr *j = (JoinExpr *) n; + + /* recurse to children */ + markRelsAsNulledBy(pstate, j->larg, jindex); + markRelsAsNulledBy(pstate, j->rarg, jindex); + varno = j->rtindex; + } + else + { + elog(ERROR, "unrecognized node type: %d", (int) nodeTag(n)); + varno = 0; /* keep compiler quiet */ + } + + /* + * Now add jindex to the p_nullingrels set for relation varno. Since we + * maintain the p_nullingrels list lazily, we might need to extend it to + * make the varno'th entry exist. + */ + while (list_length(pstate->p_nullingrels) < varno) + pstate->p_nullingrels = lappend(pstate->p_nullingrels, NULL); + lc = list_nth_cell(pstate->p_nullingrels, varno - 1); + lfirst(lc) = bms_add_member((Bitmapset *) lfirst(lc), jindex); +} + /* * setNamespaceColumnVisibility - * Convenience subroutine to update cols_visible flags in a namespace list. diff --git a/src/backend/parser/parse_coerce.c b/src/backend/parser/parse_coerce.c index c4e958e4aa..4ded12e873 100644 --- a/src/backend/parser/parse_coerce.c +++ b/src/backend/parser/parse_coerce.c @@ -1042,7 +1042,7 @@ coerce_record_to_complex(ParseState *pstate, Node *node, ParseNamespaceItem *nsitem; nsitem = GetNSItemByRangeTablePosn(pstate, rtindex, sublevels_up); - args = expandNSItemVars(nsitem, sublevels_up, vlocation, NULL); + args = expandNSItemVars(pstate, nsitem, sublevels_up, vlocation, NULL); } else ereport(ERROR, diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c index efcf1cd5ab..2f6ace0b15 100644 --- a/src/backend/parser/parse_expr.c +++ b/src/backend/parser/parse_expr.c @@ -2602,6 +2602,9 @@ transformWholeRowRef(ParseState *pstate, ParseNamespaceItem *nsitem, /* location is not filled in by makeWholeRowVar */ result->location = location; + /* mark Var if it's nulled by any outer joins */ + markNullableIfNeeded(pstate, result); + /* mark relation as requiring whole-row SELECT access */ markVarForSelectPriv(pstate, result); @@ -2629,6 +2632,8 @@ transformWholeRowRef(ParseState *pstate, ParseNamespaceItem *nsitem, rowexpr->colnames = copyObject(nsitem->p_names->colnames); rowexpr->location = location; + /* XXX we ought to mark the row as possibly nullable */ + return (Node *) rowexpr; } } diff --git a/src/backend/parser/parse_relation.c b/src/backend/parser/parse_relation.c index 926dcbf30e..deec58e4b1 100644 --- a/src/backend/parser/parse_relation.c +++ b/src/backend/parser/parse_relation.c @@ -751,6 +751,9 @@ scanNSItemForColumn(ParseState *pstate, ParseNamespaceItem *nsitem, } var->location = location; + /* Mark Var if it's nulled by any outer joins */ + markNullableIfNeeded(pstate, var); + /* Require read access to the column */ markVarForSelectPriv(pstate, var); @@ -1007,6 +1010,35 @@ searchRangeTableForCol(ParseState *pstate, const char *alias, const char *colnam return fuzzystate; } +/* + * markNullableIfNeeded + * If the RTE referenced by the Var is nullable by outer join(s) + * at this point in the query, set var->varnullingrels to show that. + */ +void +markNullableIfNeeded(ParseState *pstate, Var *var) +{ + int rtindex = var->varno; + Bitmapset *relids; + + /* Find the appropriate pstate */ + for (int lv = 0; lv < var->varlevelsup; lv++) + pstate = pstate->parentParseState; + + /* Find currently-relevant join relids for the Var's rel */ + if (rtindex > 0 && rtindex <= list_length(pstate->p_nullingrels)) + relids = (Bitmapset *) list_nth(pstate->p_nullingrels, rtindex - 1); + else + relids = NULL; + + /* + * Merge with any already-declared nulling rels. (Typically there won't + * be any, but let's get it right if there are.) + */ + if (relids != NULL) + var->varnullingrels = bms_union(var->varnullingrels, relids); +} + /* * markRTEForSelectPriv * Mark the specified column of the RTE with index rtindex @@ -3066,7 +3098,7 @@ expandTupleDesc(TupleDesc tupdesc, Alias *eref, int count, int offset, * the list elements mustn't be modified. */ List * -expandNSItemVars(ParseNamespaceItem *nsitem, +expandNSItemVars(ParseState *pstate, ParseNamespaceItem *nsitem, int sublevels_up, int location, List **colnames) { @@ -3102,6 +3134,10 @@ expandNSItemVars(ParseNamespaceItem *nsitem, var->varnosyn = nscol->p_varnosyn; var->varattnosyn = nscol->p_varattnosyn; var->location = location; + + /* ... and update varnullingrels */ + markNullableIfNeeded(pstate, var); + result = lappend(result, var); if (colnames) *colnames = lappend(*colnames, colnameval); @@ -3136,7 +3172,7 @@ expandNSItemAttrs(ParseState *pstate, ParseNamespaceItem *nsitem, *var; List *te_list = NIL; - vars = expandNSItemVars(nsitem, sublevels_up, location, &names); + vars = expandNSItemVars(pstate, nsitem, sublevels_up, location, &names); /* * Require read access to the table. This is normally redundant with the diff --git a/src/backend/parser/parse_target.c b/src/backend/parser/parse_target.c index 2a1d44b813..22834a5bf3 100644 --- a/src/backend/parser/parse_target.c +++ b/src/backend/parser/parse_target.c @@ -1379,7 +1379,7 @@ ExpandSingleTable(ParseState *pstate, ParseNamespaceItem *nsitem, List *vars; ListCell *l; - vars = expandNSItemVars(nsitem, sublevels_up, location, NULL); + vars = expandNSItemVars(pstate, nsitem, sublevels_up, location, NULL); /* * Require read access to the table. This is normally redundant with diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index 0b6a7bb365..13b6f1a8e0 100644 --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -1081,6 +1081,14 @@ typedef struct RangeTblEntry * alias Vars are generated only for merged columns). We keep these * entries only because they're needed in expandRTE() and similar code. * + * Vars appearing within joinaliasvars are marked with varnullingrels sets + * that describe the nulling effects of this join and lower ones. This is + * essential for FULL JOIN cases, because the COALESCE expression only + * describes the semantics correctly if its inputs have been nulled by the + * join. For other cases, it allows expandRTE() to generate a valid + * representation of the join's output without consulting additional + * parser state. + * * Within a Query loaded from a stored rule, it is possible for non-merged * joinaliasvars items to be null pointers, which are placeholders for * (necessarily unreferenced) columns dropped since the rule was made. diff --git a/src/include/parser/parse_node.h b/src/include/parser/parse_node.h index cf9c759025..8bef98487d 100644 --- a/src/include/parser/parse_node.h +++ b/src/include/parser/parse_node.h @@ -115,6 +115,13 @@ typedef Node *(*CoerceParamHook) (ParseState *pstate, Param *param, * This is one-for-one with p_rtable, but contains NULLs for non-join * RTEs, and may be shorter than p_rtable if the last RTE(s) aren't joins. * + * p_nullingrels: list of Bitmapsets associated with p_rtable entries, each + * containing the set of outer-join RTE indexes that can null that relation + * at the current point in the parse tree. This is one-for-one with p_rtable, + * but may be shorter than p_rtable, in which case the missing entries are + * implicitly empty (NULL). That rule allows us to save work when the query + * contains no outer joins. + * * p_joinlist: list of join items (RangeTblRef and JoinExpr nodes) that * will become the fromlist of the query's top-level FromExpr node. * @@ -182,6 +189,7 @@ struct ParseState const char *p_sourcetext; /* source text, or NULL if not available */ List *p_rtable; /* range table so far */ List *p_joinexprs; /* JoinExprs for RTE_JOIN p_rtable entries */ + List *p_nullingrels; /* Bitmapsets showing nulling outer joins */ List *p_joinlist; /* join items so far (will become FromExpr * node's fromlist) */ List *p_namespace; /* currently-referenceable RTEs (List of diff --git a/src/include/parser/parse_relation.h b/src/include/parser/parse_relation.h index de21c3c649..85d96563f3 100644 --- a/src/include/parser/parse_relation.h +++ b/src/include/parser/parse_relation.h @@ -41,6 +41,7 @@ extern Node *scanNSItemForColumn(ParseState *pstate, ParseNamespaceItem *nsitem, int location); extern Node *colNameToVar(ParseState *pstate, const char *colname, bool localonly, int location); +extern void markNullableIfNeeded(ParseState *pstate, Var *var); extern void markVarForSelectPriv(ParseState *pstate, Var *var); extern Relation parserOpenTable(ParseState *pstate, const RangeVar *relation, int lockmode); @@ -109,7 +110,7 @@ extern void errorMissingColumn(ParseState *pstate, extern void expandRTE(RangeTblEntry *rte, int rtindex, int sublevels_up, int location, bool include_dropped, List **colnames, List **colvars); -extern List *expandNSItemVars(ParseNamespaceItem *nsitem, +extern List *expandNSItemVars(ParseState *pstate, ParseNamespaceItem *nsitem, int sublevels_up, int location, List **colnames); extern List *expandNSItemAttrs(ParseState *pstate, ParseNamespaceItem *nsitem, commit d7a2e3fdc5f502af367ff27bc14bcdebe48ba933 Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Sun Jul 10 14:31:47 2022 -0400 Teach the planner to cope with Vars bearing nullingrels. The core idea of this step is to include varnullingrels in the relid sets that qual clauses are considered to depend on. So that we can still easily compare quals' relids to RelOptInfos' relids, that means also adding outer join relids to the identifying relids of join relations. Much of the bulk of this step is concerned with fallout from the latter change. Also, in setrefs.c and some other places, we have to intentionally ignore varnullingrels when comparing the outputs of a lower plan node to the Vars required by upper expressions. I'd like to tighten that up, by accounting for whether a given plan node implements an outer join and expecting the upper Vars to have that OJ relid added to their varnullingrels if so. But because of the hackery involved in implementing outer join identity 3, there are some cases where the upper Var legitimately won't have that bit set, and it's unclear how to make a check that doesn't reject such plans. So that issue is left for later. It would only be a bug-detection aid anyway, since all the interesting decisions have been made already. This step removes some low-hanging fruit from the old implementation, such as the need to track lowest_nulling_outer_join during subquery pullup. There's much more to do in that line, though. The result of this step passes most core regression tests, but there are still some failure cases involving full joins. diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index e9342097e5..97c8cd0711 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -179,6 +179,9 @@ make_one_rel(PlannerInfo *root, List *joinlist) root->all_baserels = bms_add_member(root->all_baserels, brel->relid); } + /* Now we can form the value of all_query_rels, too */ + root->all_query_rels = bms_union(root->all_baserels, root->outer_join_rels); + /* Mark base rels as to whether we care about fast-start plans */ set_base_rel_consider_startup(root); @@ -230,9 +233,9 @@ make_one_rel(PlannerInfo *root, List *joinlist) rel = make_rel_from_joinlist(root, joinlist); /* - * The result should join all and only the query's base rels. + * The result should join all and only the query's base + outer-join rels. */ - Assert(bms_equal(rel->relids, root->all_baserels)); + Assert(bms_equal(rel->relids, root->all_query_rels)); return rel; } @@ -558,7 +561,7 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, * (see grouping_planner). */ if (rel->reloptkind == RELOPT_BASEREL && - bms_membership(root->all_baserels) != BMS_SINGLETON) + bms_membership(root->all_query_rels) != BMS_SINGLETON) generate_useful_gather_paths(root, rel, false); /* Now find the cheapest of the paths for this rel */ @@ -879,7 +882,7 @@ set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry * * to support an uncommon usage of second-rate sampling methods. Instead, * if there is a risk that the query might perform an unsafe join, just * wrap the SampleScan in a Materialize node. We can check for joins by - * counting the membership of all_baserels (note that this correctly + * counting the membership of all_query_rels (note that this correctly * counts inheritance trees as single rels). If we're inside a subquery, * we can't easily check whether a join might occur in the outer query, so * just assume one is possible. @@ -888,7 +891,7 @@ set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry * * so check repeatable_across_scans last, even though that's a bit odd. */ if ((root->query_level > 1 || - bms_membership(root->all_baserels) != BMS_SINGLETON) && + bms_membership(root->all_query_rels) != BMS_SINGLETON) && !(GetTsmRoutine(rte->tablesample->tsmhandler)->repeatable_across_scans)) { path = (Path *) create_material_path(rel, path); diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c index 06f836308d..6a5182c0d7 100644 --- a/src/backend/optimizer/path/clausesel.c +++ b/src/backend/optimizer/path/clausesel.c @@ -218,7 +218,7 @@ clauselist_selectivity_ext(PlannerInfo *root, if (rinfo) { - ok = (bms_membership(rinfo->clause_relids) == BMS_SINGLETON) && + ok = (rinfo->num_base_rels == 1) && (is_pseudo_constant_clause_relids(lsecond(expr->args), rinfo->right_relids) || (varonleft = false, @@ -579,30 +579,6 @@ find_single_rel_for_clauses(PlannerInfo *root, List *clauses) return NULL; /* no clauses */ } -/* - * bms_is_subset_singleton - * - * Same result as bms_is_subset(s, bms_make_singleton(x)), - * but a little faster and doesn't leak memory. - * - * Is this of use anywhere else? If so move to bitmapset.c ... - */ -static bool -bms_is_subset_singleton(const Bitmapset *s, int x) -{ - switch (bms_membership(s)) - { - case BMS_EMPTY_SET: - return true; - case BMS_SINGLETON: - return bms_is_member(x, s); - case BMS_MULTIPLE: - return false; - } - /* can't get here... */ - return false; -} - /* * treat_as_join_clause - * Decide whether an operator clause is to be handled by the @@ -631,17 +607,20 @@ treat_as_join_clause(PlannerInfo *root, Node *clause, RestrictInfo *rinfo, else { /* - * Otherwise, it's a join if there's more than one relation used. We - * can optimize this calculation if an rinfo was passed. + * Otherwise, it's a join if there's more than one base relation used. + * We can optimize this calculation if an rinfo was passed. * * XXX Since we know the clause is being evaluated at a join, the * only way it could be single-relation is if it was delayed by outer - * joins. Although we can make use of the restriction qual estimators - * anyway, it seems likely that we ought to account for the - * probability of injected nulls somehow. + * joins. We intentionally count only baserels here, not OJs that + * might be present in rinfo->clause_relids, so that we direct such + * cases to the restriction qual estimators not join estimators. + * Eventually some notice should be taken of the possibility of + * injected nulls, but we'll likely want to do that in the restriction + * estimators rather than starting to treat such cases as join quals. */ if (rinfo) - return (bms_membership(rinfo->clause_relids) == BMS_MULTIPLE); + return (rinfo->num_base_rels > 1); else return (NumRelids(root, clause) > 1); } @@ -753,8 +732,7 @@ clause_selectivity_ext(PlannerInfo *root, * considering a unique-ified case, so we only need one cache variable * for all non-JOIN_INNER cases. */ - if (varRelid == 0 || - bms_is_subset_singleton(rinfo->clause_relids, varRelid)) + if (varRelid == 0 || rinfo->num_base_rels <= 1) { /* Cacheable --- do we already have the result? */ if (jointype == JOIN_INNER) diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index fcc26b01a4..3ca598830e 100644 --- a/src/backend/optimizer/path/costsize.c +++ b/src/backend/optimizer/path/costsize.c @@ -5083,7 +5083,9 @@ compute_semi_anti_join_factors(PlannerInfo *root, norm_sjinfo.syn_lefthand = outerrel->relids; norm_sjinfo.syn_righthand = innerrel->relids; norm_sjinfo.jointype = JOIN_INNER; + norm_sjinfo.ojrelid = 0; /* we don't bother trying to make the remaining fields valid */ + norm_sjinfo.strict_relids = NULL; norm_sjinfo.lhs_strict = false; norm_sjinfo.delay_upper_joins = false; norm_sjinfo.semi_can_btree = false; @@ -5248,7 +5250,9 @@ approx_tuple_count(PlannerInfo *root, JoinPath *path, List *quals) sjinfo.syn_lefthand = path->outerjoinpath->parent->relids; sjinfo.syn_righthand = path->innerjoinpath->parent->relids; sjinfo.jointype = JOIN_INNER; + sjinfo.ojrelid = 0; /* we don't bother trying to make the remaining fields valid */ + sjinfo.strict_relids = NULL; sjinfo.lhs_strict = false; sjinfo.delay_upper_joins = false; sjinfo.semi_can_btree = false; diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c index f8a97622b1..d31ca5e527 100644 --- a/src/backend/optimizer/path/equivclass.c +++ b/src/backend/optimizer/path/equivclass.c @@ -29,6 +29,7 @@ #include "optimizer/paths.h" #include "optimizer/planmain.h" #include "optimizer/restrictinfo.h" +#include "rewrite/rewriteManip.h" #include "utils/lsyscache.h" @@ -64,7 +65,7 @@ static bool reconsider_outer_join_clause(PlannerInfo *root, RestrictInfo *rinfo, bool outer_on_left); static bool reconsider_full_join_clause(PlannerInfo *root, - RestrictInfo *rinfo); + FullJoinClauseInfo *fjinfo); static Bitmapset *get_eclass_indexes_for_relids(PlannerInfo *root, Relids relids); static Bitmapset *get_common_eclass_indexes(PlannerInfo *root, Relids relids1, @@ -768,6 +769,9 @@ get_eclass_for_sort_expr(PlannerInfo *root, { RelOptInfo *rel = root->simple_rel_array[i]; + if (rel == NULL) + continue; /* must be an outer join */ + Assert(rel->reloptkind == RELOPT_BASEREL || rel->reloptkind == RELOPT_DEADREL); @@ -936,7 +940,36 @@ is_exprlist_member(Expr *node, List *exprs) if (expr && IsA(expr, TargetEntry)) expr = ((TargetEntry *) expr)->expr; - if (equal(node, expr)) + /* + * For Vars and PlaceHolderVars, match using the same rules as + * setrefs.c will, in particular ignoring nullingrels. XXX when that + * gets tightened up, this should too. + */ + if (IsA(node, Var)) + { + if (expr && IsA(expr, Var)) + { + Var *v1 = (Var *) node; + Var *v2 = (Var *) expr; + + if (v1->varno == v2->varno && + v1->varattno == v2->varattno && + v1->varlevelsup == v2->varlevelsup) + return true; + } + } + else if (IsA(node, PlaceHolderVar)) + { + if (expr && IsA(expr, PlaceHolderVar)) + { + PlaceHolderVar *v1 = (PlaceHolderVar *) node; + PlaceHolderVar *v2 = (PlaceHolderVar *) expr; + + if (v1->phid == v2->phid) + return true; + } + } + else if (equal(node, expr)) return true; } return false; @@ -1124,6 +1157,9 @@ generate_base_implied_equalities(PlannerInfo *root) { RelOptInfo *rel = root->simple_rel_array[i]; + if (rel == NULL) + continue; /* must be an outer join */ + Assert(rel->reloptkind == RELOPT_BASEREL); rel->eclass_indexes = bms_add_member(rel->eclass_indexes, @@ -2014,10 +2050,12 @@ reconsider_outer_join_clauses(PlannerInfo *root) /* Process the FULL JOIN clauses */ foreach(cell, root->full_join_clauses) { - RestrictInfo *rinfo = (RestrictInfo *) lfirst(cell); + FullJoinClauseInfo *fjinfo = (FullJoinClauseInfo *) lfirst(cell); - if (reconsider_full_join_clause(root, rinfo)) + if (reconsider_full_join_clause(root, fjinfo)) { + RestrictInfo *rinfo = fjinfo->rinfo; + found = true; /* remove it from the list */ root->full_join_clauses = @@ -2046,9 +2084,9 @@ reconsider_outer_join_clauses(PlannerInfo *root) } foreach(cell, root->full_join_clauses) { - RestrictInfo *rinfo = (RestrictInfo *) lfirst(cell); + FullJoinClauseInfo *fjinfo = (FullJoinClauseInfo *) lfirst(cell); - distribute_restrictinfo_to_rels(root, rinfo); + distribute_restrictinfo_to_rels(root, fjinfo->rinfo); } } @@ -2184,8 +2222,11 @@ reconsider_outer_join_clause(PlannerInfo *root, RestrictInfo *rinfo, * Returns true if we were able to propagate a constant through the clause. */ static bool -reconsider_full_join_clause(PlannerInfo *root, RestrictInfo *rinfo) +reconsider_full_join_clause(PlannerInfo *root, FullJoinClauseInfo *fjinfo) { + RestrictInfo *rinfo = fjinfo->rinfo; + SpecialJoinInfo *sjinfo = fjinfo->sjinfo; + Relids fjrelids = bms_make_singleton(sjinfo->ojrelid); Expr *leftvar; Expr *rightvar; Oid opno, @@ -2267,6 +2308,18 @@ reconsider_full_join_clause(PlannerInfo *root, RestrictInfo *rinfo) cfirst = (Node *) linitial(cexpr->args); csecond = (Node *) lsecond(cexpr->args); + /* + * The COALESCE arguments will be marked as possibly nulled by + * the full join, while we wish to generate clauses that apply + * to the join's inputs. So we must strip the join from the + * nullingrels fields of cfirst/csecond before comparing them + * to leftvar/rightvar. (Perhaps with a less hokey + * representation for FULL JOIN USING output columns, this + * wouldn't be needed?) + */ + cfirst = remove_nulling_relids(cfirst, fjrelids, NULL); + csecond = remove_nulling_relids(csecond, fjrelids, NULL); + if (equal(leftvar, cfirst) && equal(rightvar, csecond)) { coal_idx = foreach_current_index(lc2); @@ -3203,6 +3256,8 @@ get_eclass_indexes_for_relids(PlannerInfo *root, Relids relids) { RelOptInfo *rel = root->simple_rel_array[i]; + if (rel == NULL) + continue; /* must be an outer join */ ec_indexes = bms_add_members(ec_indexes, rel->eclass_indexes); } return ec_indexes; diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c index 0ef70ad7f1..ba451f8952 100644 --- a/src/backend/optimizer/path/indxpath.c +++ b/src/backend/optimizer/path/indxpath.c @@ -3357,13 +3357,13 @@ check_index_predicates(PlannerInfo *root, RelOptInfo *rel) * Add on any equivalence-derivable join clauses. Computing the correct * relid sets for generate_join_implied_equalities is slightly tricky * because the rel could be a child rel rather than a true baserel, and in - * that case we must remove its parents' relid(s) from all_baserels. + * that case we must subtract its parents' relid(s) from all_query_rels. */ if (rel->reloptkind == RELOPT_OTHER_MEMBER_REL) - otherrels = bms_difference(root->all_baserels, + otherrels = bms_difference(root->all_query_rels, find_childrel_parents(root, rel)); else - otherrels = bms_difference(root->all_baserels, rel->relids); + otherrels = bms_difference(root->all_query_rels, rel->relids); if (!bms_is_empty(otherrels)) clauselist = diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c index 2a3f0ab7bf..a1fc72c394 100644 --- a/src/backend/optimizer/path/joinpath.c +++ b/src/backend/optimizer/path/joinpath.c @@ -250,7 +250,7 @@ add_paths_to_joinrel(PlannerInfo *root, if (bms_overlap(joinrelids, sjinfo2->min_righthand) && !bms_overlap(joinrelids, sjinfo2->min_lefthand)) extra.param_source_rels = bms_join(extra.param_source_rels, - bms_difference(root->all_baserels, + bms_difference(root->all_query_rels, sjinfo2->min_righthand)); /* full joins constrain both sides symmetrically */ @@ -258,7 +258,7 @@ add_paths_to_joinrel(PlannerInfo *root, bms_overlap(joinrelids, sjinfo2->min_lefthand) && !bms_overlap(joinrelids, sjinfo2->min_righthand)) extra.param_source_rels = bms_join(extra.param_source_rels, - bms_difference(root->all_baserels, + bms_difference(root->all_query_rels, sjinfo2->min_lefthand)); } diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c index 9da3ff2f9a..b64c37f089 100644 --- a/src/backend/optimizer/path/joinrels.c +++ b/src/backend/optimizer/path/joinrels.c @@ -353,7 +353,10 @@ make_rels_by_clauseless_joins(PlannerInfo *root, * * Caller must supply not only the two rels, but the union of their relids. * (We could simplify the API by computing joinrelids locally, but this - * would be redundant work in the normal path through make_join_rel.) + * would be redundant work in the normal path through make_join_rel. + * Note that this value does NOT include the RT index of any outer join that + * might need to be performed here, so it's not the canonical identifier + * of the join relation.) * * On success, *sjinfo_p is set to NULL if this is to be a plain inner join, * else it's set to point to the associated SpecialJoinInfo node. Also, @@ -695,7 +698,7 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2) /* We should never try to join two overlapping sets of rels. */ Assert(!bms_overlap(rel1->relids, rel2->relids)); - /* Construct Relids set that identifies the joinrel. */ + /* Construct Relids set that identifies the joinrel (without OJ as yet). */ joinrelids = bms_union(rel1->relids, rel2->relids); /* Check validity and determine join type. */ @@ -707,6 +710,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2) return NULL; } + /* If we have an outer join, add its RTI to form the canonical relids. */ + if (sjinfo && sjinfo->ojrelid != 0) + joinrelids = bms_add_member(joinrelids, sjinfo->ojrelid); + /* Swap rels if needed to match the join info. */ if (reversed) { @@ -730,7 +737,9 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2) sjinfo->syn_lefthand = rel1->relids; sjinfo->syn_righthand = rel2->relids; sjinfo->jointype = JOIN_INNER; + sjinfo->ojrelid = 0; /* we don't bother trying to make the remaining fields valid */ + sjinfo->strict_relids = NULL; sjinfo->lhs_strict = false; sjinfo->delay_upper_joins = false; sjinfo->semi_can_btree = false; @@ -1510,8 +1519,6 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2, /* We should never try to join two overlapping sets of rels. */ Assert(!bms_overlap(child_rel1->relids, child_rel2->relids)); - child_joinrelids = bms_union(child_rel1->relids, child_rel2->relids); - appinfos = find_appinfos_by_relids(root, child_joinrelids, &nappinfos); /* * Construct SpecialJoinInfo from parent join relations's @@ -1521,6 +1528,15 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2, child_rel1->relids, child_rel2->relids); + /* Build correct join relids for child join */ + child_joinrelids = bms_union(child_rel1->relids, child_rel2->relids); + if (child_sjinfo->ojrelid != 0) + child_joinrelids = bms_add_member(child_joinrelids, + child_sjinfo->ojrelid); + + /* Find the AppendRelInfo structures */ + appinfos = find_appinfos_by_relids(root, child_joinrelids, &nappinfos); + /* * Construct restrictions applicable to the child join from those * applicable to the parent join. @@ -1536,8 +1552,7 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2, { child_joinrel = build_child_join_rel(root, child_rel1, child_rel2, joinrel, child_restrictlist, - child_sjinfo, - child_sjinfo->jointype); + child_sjinfo); joinrel->part_rels[cnt_parts] = child_joinrel; joinrel->live_parts = bms_add_member(joinrel->live_parts, cnt_parts); joinrel->all_partrels = bms_add_members(joinrel->all_partrels, diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c index 337f470d58..bbe31c03fe 100644 --- a/src/backend/optimizer/plan/analyzejoins.c +++ b/src/backend/optimizer/plan/analyzejoins.c @@ -34,7 +34,7 @@ /* local functions */ static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo); -static void remove_rel_from_query(PlannerInfo *root, int relid, +static void remove_rel_from_query(PlannerInfo *root, int relid, int ojrelid, Relids joinrelids); static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved); static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel); @@ -70,6 +70,7 @@ restart: foreach(lc, root->join_info_list) { SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(lc); + Relids joinrelids; int innerrelid; int nremoved; @@ -84,9 +85,12 @@ restart: */ innerrelid = bms_singleton_member(sjinfo->min_righthand); - remove_rel_from_query(root, innerrelid, - bms_union(sjinfo->min_lefthand, - sjinfo->min_righthand)); + /* Compute the relid set for the join we are considering */ + joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand); + if (sjinfo->ojrelid != 0) + joinrelids = bms_add_member(joinrelids, sjinfo->ojrelid); + + remove_rel_from_query(root, innerrelid, sjinfo->ojrelid, joinrelids); /* We verify that exactly one reference gets removed from joinlist */ nremoved = 0; @@ -188,6 +192,8 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo) /* Compute the relid set for the join we are considering */ joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand); + if (sjinfo->ojrelid != 0) + joinrelids = bms_add_member(joinrelids, sjinfo->ojrelid); /* * We can't remove the join if any inner-rel attributes are used above the @@ -306,10 +312,12 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo) * no longer treated as a baserel, and that attributes of other baserels * are no longer marked as being needed at joins involving this rel. * Also, join quals involving the rel have to be removed from the joininfo - * lists, but only if they belong to the outer join identified by joinrelids. + * lists, but only if they belong to the outer join identified by ojrelid + * and joinrelids. */ static void -remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids) +remove_rel_from_query(PlannerInfo *root, int relid, int ojrelid, + Relids joinrelids) { RelOptInfo *rel = find_base_rel(root, relid); List *joininfos; @@ -349,6 +357,13 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids) } } + /* + * The removed outer join has to be dropped from root->outer_join_rels. + * (We'd need to update all_baserels and all_query_rels too, but those + * haven't been computed yet.) + */ + root->outer_join_rels = bms_del_member(root->outer_join_rels, ojrelid); + /* * Likewise remove references from SpecialJoinInfo data structures. * @@ -365,6 +380,10 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids) sjinfo->min_righthand = bms_del_member(sjinfo->min_righthand, relid); sjinfo->syn_lefthand = bms_del_member(sjinfo->syn_lefthand, relid); sjinfo->syn_righthand = bms_del_member(sjinfo->syn_righthand, relid); + sjinfo->min_lefthand = bms_del_member(sjinfo->min_lefthand, ojrelid); + sjinfo->min_righthand = bms_del_member(sjinfo->min_righthand, ojrelid); + sjinfo->syn_lefthand = bms_del_member(sjinfo->syn_lefthand, ojrelid); + sjinfo->syn_righthand = bms_del_member(sjinfo->syn_righthand, ojrelid); } /* @@ -393,8 +412,10 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids) else { phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid); + phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, ojrelid); Assert(!bms_is_empty(phinfo->ph_eval_at)); phinfo->ph_needed = bms_del_member(phinfo->ph_needed, relid); + phinfo->ph_needed = bms_del_member(phinfo->ph_needed, ojrelid); } } @@ -431,6 +452,8 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids) rinfo->required_relids = bms_copy(rinfo->required_relids); rinfo->required_relids = bms_del_member(rinfo->required_relids, relid); + rinfo->required_relids = bms_del_member(rinfo->required_relids, + ojrelid); distribute_restrictinfo_to_rels(root, rinfo); } } @@ -545,6 +568,7 @@ reduce_unique_semijoins(PlannerInfo *root) /* Compute the relid set for the join we are considering */ joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand); + Assert(sjinfo->ojrelid == 0); /* SEMI joins don't have RT indexes */ /* * Since we're only considering a single-rel RHS, any join clauses it diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c index 023efbaf09..b438085af6 100644 --- a/src/backend/optimizer/plan/initsplan.c +++ b/src/backend/optimizer/plan/initsplan.c @@ -60,12 +60,15 @@ static void process_security_barrier_quals(PlannerInfo *root, static SpecialJoinInfo *make_outerjoininfo(PlannerInfo *root, Relids left_rels, Relids right_rels, Relids inner_join_rels, - JoinType jointype, List *clause); + JoinType jointype, Index ojrelid, + List *clause); static void compute_semijoin_info(PlannerInfo *root, SpecialJoinInfo *sjinfo, List *clause); +static List *remove_unneeded_nulling_relids(PlannerInfo *root, List *quals, + SpecialJoinInfo *sjinfo); static void distribute_qual_to_rels(PlannerInfo *root, Node *clause, bool below_outer_join, - JoinType jointype, + SpecialJoinInfo *sjinfo, Index security_level, Relids qualscope, Relids ojscope, @@ -250,10 +253,16 @@ add_vars_to_targetlist(PlannerInfo *root, List *vars, attno -= rel->min_attr; if (rel->attr_needed[attno] == NULL) { - /* Variable not yet requested, so add to rel's targetlist */ - /* XXX is copyObject necessary here? */ - rel->reltarget->exprs = lappend(rel->reltarget->exprs, - copyObject(var)); + /* + * Variable not yet requested, so add to rel's targetlist. + * + * The value available at the rel's scan level has not been + * nulled by any outer join, so drop its varnullingrels. + * (We'll put those back as we climb up the join tree.) + */ + var = copyObject(var); + var->varnullingrels = NULL; + rel->reltarget->exprs = lappend(rel->reltarget->exprs, var); /* reltarget cost and width will be computed later */ } rel->attr_needed[attno] = bms_add_members(rel->attr_needed[attno], @@ -551,8 +560,10 @@ create_lateral_join_info(PlannerInfo *root) varno = -1; while ((varno = bms_next_member(eval_at, varno)) >= 0) { - RelOptInfo *brel = find_base_rel(root, varno); + RelOptInfo *brel = find_base_rel_ignore_join(root, varno); + if (brel == NULL) + continue; /* ignore outer joins in eval_at */ brel->lateral_relids = bms_add_members(brel->lateral_relids, phinfo->ph_lateral); } @@ -643,7 +654,10 @@ create_lateral_join_info(PlannerInfo *root) { RelOptInfo *brel2 = root->simple_rel_array[rti2]; - Assert(brel2 != NULL && brel2->reloptkind == RELOPT_BASEREL); + if (brel2 == NULL) + continue; /* must be an OJ */ + + Assert(brel2->reloptkind == RELOPT_BASEREL); brel2->lateral_referencers = bms_add_member(brel2->lateral_referencers, rti); } @@ -695,7 +709,8 @@ deconstruct_jointree(PlannerInfo *root) Assert(root->parse->jointree != NULL && IsA(root->parse->jointree, FromExpr)); - /* this is filled as we scan the jointree */ + /* These are filled as we scan the jointree */ + root->outer_join_rels = NULL; root->nullable_baserels = NULL; result = deconstruct_recurse(root, (Node *) root->parse->jointree, false, @@ -717,7 +732,7 @@ deconstruct_jointree(PlannerInfo *root) * below_outer_join is true if this node is within the nullable side of a * higher-level outer join * Outputs: - * *qualscope gets the set of base Relids syntactically included in this + * *qualscope gets the set of base+OJ Relids syntactically included in this * jointree node (do not modify or free this, as it may also be pointed * to by RestrictInfo and SpecialJoinInfo nodes) * *inner_join_rels gets the set of base Relids syntactically included in @@ -802,6 +817,8 @@ deconstruct_recurse(PlannerInfo *root, Node *jtnode, bool below_outer_join, * there was exactly one element, we should (and already did) report * whatever its inner_join_rels were. If there were no elements (is * that still possible?) the initialization before the loop fixed it. + * + * XXX now wrong, do we care? */ if (list_length(f->fromlist) > 1) *inner_join_rels = *qualscope; @@ -816,7 +833,7 @@ deconstruct_recurse(PlannerInfo *root, Node *jtnode, bool below_outer_join, if (bms_is_subset(pq->relids, *qualscope)) distribute_qual_to_rels(root, pq->qual, - below_outer_join, JOIN_INNER, + below_outer_join, NULL, root->qual_security_level, *qualscope, NULL, NULL, NULL); @@ -832,7 +849,7 @@ deconstruct_recurse(PlannerInfo *root, Node *jtnode, bool below_outer_join, Node *qual = (Node *) lfirst(l); distribute_qual_to_rels(root, qual, - below_outer_join, JOIN_INNER, + below_outer_join, NULL, root->qual_security_level, *qualscope, NULL, NULL, postponed_qual_list); @@ -896,6 +913,13 @@ deconstruct_recurse(PlannerInfo *root, Node *jtnode, bool below_outer_join, &rightids, &right_inners, &child_postponed_quals); *qualscope = bms_union(leftids, rightids); + /* caution: ANTI join derived from SEMI will lack rtindex */ + if (j->rtindex != 0) + { + *qualscope = bms_add_member(*qualscope, j->rtindex); + root->outer_join_rels = bms_add_member(root->outer_join_rels, + j->rtindex); + } *inner_join_rels = bms_union(left_inners, right_inners); nonnullable_rels = leftids; nullable_rels = rightids; @@ -910,6 +934,8 @@ deconstruct_recurse(PlannerInfo *root, Node *jtnode, bool below_outer_join, &rightids, &right_inners, &child_postponed_quals); *qualscope = bms_union(leftids, rightids); + /* SEMI join never has rtindex, so don't add to qualscope */ + Assert(j->rtindex == 0); *inner_join_rels = bms_union(left_inners, right_inners); /* Semi join adds no restrictions for quals */ nonnullable_rels = NULL; @@ -931,6 +957,10 @@ deconstruct_recurse(PlannerInfo *root, Node *jtnode, bool below_outer_join, &rightids, &right_inners, &child_postponed_quals); *qualscope = bms_union(leftids, rightids); + Assert(j->rtindex != 0); + *qualscope = bms_add_member(*qualscope, j->rtindex); + root->outer_join_rels = bms_add_member(root->outer_join_rels, + j->rtindex); *inner_join_rels = bms_union(left_inners, right_inners); /* each side is both outer and inner */ nonnullable_rels = *qualscope; @@ -976,32 +1006,44 @@ deconstruct_recurse(PlannerInfo *root, Node *jtnode, bool below_outer_join, my_quals = list_concat(my_quals, (List *) j->quals); /* - * For an OJ, form the SpecialJoinInfo now, because we need the OJ's - * semantic scope (ojscope) to pass to distribute_qual_to_rels. But - * we mustn't add it to join_info_list just yet, because we don't want - * distribute_qual_to_rels to think it is an outer join below us. - * - * Semijoins are a bit of a hybrid: we build a SpecialJoinInfo, but we - * want ojscope = NULL for distribute_qual_to_rels. + * For an OJ, form the SpecialJoinInfo now, because we need it for + * distribute_qual_to_rels. But we mustn't add it to join_info_list + * just yet, because we don't want distribute_qual_to_rels to think it + * is an outer join below us. */ if (j->jointype != JOIN_INNER) - { sjinfo = make_outerjoininfo(root, leftids, rightids, *inner_join_rels, j->jointype, + j->rtindex, my_quals); - if (j->jointype == JOIN_SEMI) - ojscope = NULL; - else - ojscope = bms_union(sjinfo->min_lefthand, - sjinfo->min_righthand); - } else - { sjinfo = NULL; + + /* + * If we have a LEFT JOIN whose ON qual is strict for any LHS + * relations, we may be able to commute the join with lower outer + * joins that null those relations. To do that, we must remove such + * lower outer joins from Var.varnullingrels fields within the qual, + * else subsequent processing will think that the qual has to be + * evaluated above such lower outer joins. + */ + if (j->jointype == JOIN_LEFT && sjinfo->lhs_strict) + my_quals = remove_unneeded_nulling_relids(root, my_quals, sjinfo); + + /* + * Now we can compute ojscope (we can't do it earlier, because + * remove_unneeded_nulling_relids might change the scope). + * + * Semijoins are a bit of a hybrid: we build a SpecialJoinInfo, but we + * want ojscope = NULL for distribute_qual_to_rels. + */ + if (j->jointype == JOIN_INNER || j->jointype == JOIN_SEMI) ojscope = NULL; - } + else + ojscope = bms_union(sjinfo->min_lefthand, + sjinfo->min_righthand); /* Process the JOIN's qual clauses */ foreach(l, my_quals) @@ -1009,7 +1051,7 @@ deconstruct_recurse(PlannerInfo *root, Node *jtnode, bool below_outer_join, Node *qual = (Node *) lfirst(l); distribute_qual_to_rels(root, qual, - below_outer_join, j->jointype, + below_outer_join, sjinfo, root->qual_security_level, *qualscope, ojscope, nonnullable_rels, @@ -1112,7 +1154,7 @@ process_security_barrier_quals(PlannerInfo *root, */ distribute_qual_to_rels(root, qual, below_outer_join, - JOIN_INNER, + NULL, security_level, qualscope, qualscope, @@ -1135,6 +1177,7 @@ process_security_barrier_quals(PlannerInfo *root, * right_rels: the base Relids syntactically on inner side of join * inner_join_rels: base Relids participating in inner joins below this one * jointype: what it says (must always be LEFT, FULL, SEMI, or ANTI) + * ojrelid: RT index of the join RTE (0 for SEMI, which isn't in the RT list) * clause: the outer join's join condition (in implicit-AND format) * * The node should eventually be appended to root->join_info_list, but we @@ -1148,7 +1191,8 @@ static SpecialJoinInfo * make_outerjoininfo(PlannerInfo *root, Relids left_rels, Relids right_rels, Relids inner_join_rels, - JoinType jointype, List *clause) + JoinType jointype, Index ojrelid, + List *clause) { SpecialJoinInfo *sjinfo = makeNode(SpecialJoinInfo); Relids clause_relids; @@ -1196,6 +1240,7 @@ make_outerjoininfo(PlannerInfo *root, sjinfo->syn_lefthand = left_rels; sjinfo->syn_righthand = right_rels; sjinfo->jointype = jointype; + sjinfo->ojrelid = ojrelid; /* this always starts out false */ sjinfo->delay_upper_joins = false; @@ -1206,6 +1251,7 @@ make_outerjoininfo(PlannerInfo *root, { sjinfo->min_lefthand = bms_copy(left_rels); sjinfo->min_righthand = bms_copy(right_rels); + sjinfo->strict_relids = NULL; /* don't care about this */ sjinfo->lhs_strict = false; /* don't care about this */ return sjinfo; } @@ -1220,6 +1266,7 @@ make_outerjoininfo(PlannerInfo *root, * rel's columns are all NULL? */ strict_relids = find_nonnullable_rels((Node *) clause); + sjinfo->strict_relids = strict_relids; /* Remember whether the clause is strict for any LHS relations */ sjinfo->lhs_strict = bms_overlap(strict_relids, left_rels); @@ -1258,6 +1305,9 @@ make_outerjoininfo(PlannerInfo *root, otherinfo->syn_lefthand); min_lefthand = bms_add_members(min_lefthand, otherinfo->syn_righthand); + if (otherinfo->ojrelid != 0) + min_lefthand = bms_add_member(min_lefthand, + otherinfo->ojrelid); } if (bms_overlap(right_rels, otherinfo->syn_lefthand) || bms_overlap(right_rels, otherinfo->syn_righthand)) @@ -1266,6 +1316,9 @@ make_outerjoininfo(PlannerInfo *root, otherinfo->syn_lefthand); min_righthand = bms_add_members(min_righthand, otherinfo->syn_righthand); + if (otherinfo->ojrelid != 0) + min_righthand = bms_add_member(min_righthand, + otherinfo->ojrelid); } /* Needn't do anything else with the full join */ continue; @@ -1295,6 +1348,9 @@ make_outerjoininfo(PlannerInfo *root, otherinfo->syn_lefthand); min_lefthand = bms_add_members(min_lefthand, otherinfo->syn_righthand); + if (otherinfo->ojrelid != 0) + min_lefthand = bms_add_member(min_lefthand, + otherinfo->ojrelid); } } @@ -1337,6 +1393,9 @@ make_outerjoininfo(PlannerInfo *root, otherinfo->syn_lefthand); min_righthand = bms_add_members(min_righthand, otherinfo->syn_righthand); + if (otherinfo->ojrelid != 0) + min_righthand = bms_add_member(min_righthand, + otherinfo->ojrelid); } } } @@ -1561,6 +1620,62 @@ compute_semijoin_info(PlannerInfo *root, SpecialJoinInfo *sjinfo, List *clause) sjinfo->semi_rhs_exprs = semi_rhs_exprs; } +/* + * remove_unneeded_nulling_relids + * Remove lower outer joins from Vars (& PHVs) in the quals, if possible + * + * This paves the way to apply outer join identity 3 to commute the current + * LEFT JOIN with lower outer joins. We already know that the quals are + * strict for at least one LHS relation. + */ +static List * +remove_unneeded_nulling_relids(PlannerInfo *root, List *quals, + SpecialJoinInfo *sjinfo) +{ + Relids old_nulling_relids; + Relids removable_relids; + ListCell *lc; + + /* + * Find outer joins mentioned in nullingrel fields in the quals. If there + * aren't any (the common case), there's no need to work hard. + */ + old_nulling_relids = get_nulling_relids((Node *) quals); + if (bms_is_empty(old_nulling_relids)) + return quals; + + /* + * Thumb through the existing SpecialJoinInfos (which describe all outer + * joins below this one, but not yet this one) to find the ones mentioned + * in the quals. If the current join's quals are strict for any rel of + * one's RHS, we can commute this join with that one, so remove it from + * the current join's min_lefthand and from the quals' nullingrel fields. + */ + removable_relids = NULL; + foreach(lc, root->join_info_list) + { + SpecialJoinInfo *sjinfo2 = (SpecialJoinInfo *) lfirst(lc); + + if (sjinfo2->jointype != JOIN_LEFT || + !bms_is_member(sjinfo2->ojrelid, old_nulling_relids)) + continue; /* it's not relevant */ + if (bms_is_subset(sjinfo2->syn_righthand, sjinfo->syn_lefthand) && + bms_overlap(sjinfo->strict_relids, sjinfo2->min_righthand)) + { + sjinfo->min_lefthand = bms_del_member(sjinfo->min_lefthand, + sjinfo2->ojrelid); + removable_relids = bms_add_member(removable_relids, + sjinfo2->ojrelid); + } + } + + if (removable_relids == NULL) + return quals; /* no hits, nothing to do */ + + return (List *) remove_nulling_relids((Node *) quals, + removable_relids, NULL); +} + /***************************************************************************** * @@ -1582,7 +1697,7 @@ compute_semijoin_info(PlannerInfo *root, SpecialJoinInfo *sjinfo, List *clause) * 'clause': the qual clause to be distributed * 'below_outer_join': true if the qual is from a JOIN/ON that is below the * nullable side of a higher-level outer join - * 'jointype': type of join the qual is from (JOIN_INNER for a WHERE clause) + * 'sjinfo': join's SpecialJoinInfo (NULL for an inner join or WHERE clause) * 'security_level': security_level to assign to the qual * 'qualscope': set of baserels the qual's syntactic scope covers * 'ojscope': NULL if not an outer-join qual, else the minimum set of baserels @@ -1600,12 +1715,13 @@ compute_semijoin_info(PlannerInfo *root, SpecialJoinInfo *sjinfo, List *clause) * level, which will be ojscope not necessarily qualscope. * * At the time this is called, root->join_info_list must contain entries for - * all and only those special joins that are syntactically below this qual. + * all and only those special joins that are syntactically below this qual; + * in particular, the passed-in SpecialJoinInfo isn't yet in that list. */ static void distribute_qual_to_rels(PlannerInfo *root, Node *clause, bool below_outer_join, - JoinType jointype, + SpecialJoinInfo *sjinfo, Index security_level, Relids qualscope, Relids ojscope, @@ -1642,7 +1758,7 @@ distribute_qual_to_rels(PlannerInfo *root, Node *clause, PostponedQual *pq = (PostponedQual *) palloc(sizeof(PostponedQual)); Assert(root->hasLateralRTEs); /* shouldn't happen otherwise */ - Assert(jointype == JOIN_INNER); /* mustn't postpone past outer join */ + Assert(sjinfo == NULL); /* mustn't postpone past outer join */ pq->qual = clause; pq->relids = relids; *postponed_qual_list = lappend(*postponed_qual_list, pq); @@ -1704,7 +1820,7 @@ distribute_qual_to_rels(PlannerInfo *root, Node *clause, { relids = get_relids_in_jointree((Node *) root->parse->jointree, - false); + true, false); qualscope = bms_copy(relids); } } @@ -1946,11 +2062,15 @@ distribute_qual_to_rels(PlannerInfo *root, Node *clause, restrictinfo); return; } - if (jointype == JOIN_FULL) + if (sjinfo && sjinfo->jointype == JOIN_FULL) { /* FULL JOIN (above tests cannot match in this case) */ + FullJoinClauseInfo *fjinfo = makeNode(FullJoinClauseInfo); + + fjinfo->rinfo = restrictinfo; + fjinfo->sjinfo = sjinfo; root->full_join_clauses = lappend(root->full_join_clauses, - restrictinfo); + fjinfo); return; } /* nope, so fall through to distribute_restrictinfo_to_rels */ @@ -2344,7 +2464,7 @@ process_implied_equality(PlannerInfo *root, { relids = get_relids_in_jointree((Node *) root->parse->jointree, - false); + true, false); } } } diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index f6baa2a765..ea077cff4e 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -2221,7 +2221,7 @@ preprocess_rowmarks(PlannerInfo *root) * make a bitmapset of all base rels and then remove the items we don't * need or have FOR [KEY] UPDATE/SHARE marks for. */ - rels = get_relids_in_jointree((Node *) parse->jointree, false); + rels = get_relids_in_jointree((Node *) parse->jointree, false, false); if (parse->resultRelation) rels = bms_del_member(rels, parse->resultRelation); diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c index 9cef92cab2..14f5c0f897 100644 --- a/src/backend/optimizer/plan/setrefs.c +++ b/src/backend/optimizer/plan/setrefs.c @@ -151,6 +151,9 @@ static Var *search_indexed_tlist_for_var(Var *var, indexed_tlist *itlist, int newvarno, int rtoffset); +static Var *search_indexed_tlist_for_phv(PlaceHolderVar *phv, + indexed_tlist *itlist, + int newvarno); static Var *search_indexed_tlist_for_non_var(Expr *node, indexed_tlist *itlist, int newvarno); @@ -2108,6 +2111,7 @@ fix_scan_expr_mutator(Node *node, fix_scan_expr_context *context) /* At scan level, we should always just evaluate the contained expr */ PlaceHolderVar *phv = (PlaceHolderVar *) node; + Assert(phv->phnullingrels == NULL); return fix_scan_expr_mutator((Node *) phv->phexpr, context); } if (IsA(node, AlternativeSubPlan)) @@ -2228,33 +2232,12 @@ set_join_references(PlannerInfo *root, Join *join, int rtoffset) /* * Now we need to fix up the targetlist and qpqual, which are logically * above the join. This means they should not re-use any input expression - * that was computed in the nullable side of an outer join. Vars and - * PlaceHolderVars are fine, so we can implement this restriction just by - * clearing has_non_vars in the indexed_tlist structs. + * that was computed in the nullable side of an outer join. * - * XXX This is a grotty workaround for the fact that we don't clearly - * distinguish between a Var appearing below an outer join and the "same" - * Var appearing above it. If we did, we'd not need to hack the matching - * rules this way. + * XXX we will probably need to pass some flag down to indicate that this + * context applies, so that search_indexed_tlist_for_var() and siblings + * can correctly check for varnullingrels matches. */ - switch (join->jointype) - { - case JOIN_LEFT: - case JOIN_SEMI: - case JOIN_ANTI: - inner_itlist->has_non_vars = false; - break; - case JOIN_RIGHT: - outer_itlist->has_non_vars = false; - break; - case JOIN_FULL: - outer_itlist->has_non_vars = false; - inner_itlist->has_non_vars = false; - break; - default: - break; - } - join->plan.targetlist = fix_join_expr(root, join->plan.targetlist, outer_itlist, @@ -2543,7 +2526,7 @@ set_dummy_tlist_references(Plan *plan, int rtoffset) * tlist_member() searches. * * The result of this function is an indexed_tlist struct to pass to - * search_indexed_tlist_for_var() or search_indexed_tlist_for_non_var(). + * search_indexed_tlist_for_var() and siblings. * When done, the indexed_tlist may be freed with a single pfree(). */ static indexed_tlist * @@ -2665,6 +2648,8 @@ search_indexed_tlist_for_var(Var *var, indexed_tlist *itlist, /* Found a match */ Var *newvar = copyVar(var); + /* XXX we oughta check varnullingrels match here ... */ + newvar->varno = newvarno; newvar->varattno = vinfo->resno; if (newvar->varnosyn > 0) @@ -2677,15 +2662,55 @@ search_indexed_tlist_for_var(Var *var, indexed_tlist *itlist, } /* - * search_indexed_tlist_for_non_var --- find a non-Var in an indexed tlist + * search_indexed_tlist_for_phv --- find a PlaceHolderVar in an indexed tlist * * If a match is found, return a Var constructed to reference the tlist item. * If no match, return NULL. * - * NOTE: it is a waste of time to call this unless itlist->has_ph_vars or - * itlist->has_non_vars. Furthermore, set_join_references() relies on being - * able to prevent matching of non-Vars by clearing itlist->has_non_vars, - * so there's a correctness reason not to call it unless that's set. + * NOTE: it is a waste of time to call this unless itlist->has_ph_vars. + */ +static Var * +search_indexed_tlist_for_phv(PlaceHolderVar *phv, + indexed_tlist *itlist, int newvarno) +{ + ListCell *lc; + + foreach(lc, itlist->tlist) + { + TargetEntry *tle = (TargetEntry *) lfirst(lc); + + if (tle->expr && IsA(tle->expr, PlaceHolderVar)) + { + PlaceHolderVar *subphv = (PlaceHolderVar *) tle->expr; + Var *newvar; + + /* + * Analogously to search_indexed_tlist_for_var, we match on phid + * only. We don't use equal(), partially for speed but mostly + * because phnullingrels might not be exactly equal. + * + * XXX we really oughta verify phnullingrels. + */ + if (phv->phid != subphv->phid) + continue; + + /* Found a matching subplan output expression */ + newvar = makeVarFromTargetEntry(newvarno, tle); + newvar->varnosyn = 0; /* wasn't ever a plain Var */ + newvar->varattnosyn = 0; + return newvar; + } + } + return NULL; /* no match */ +} + +/* + * search_indexed_tlist_for_non_var --- find a non-Var/PHV in an indexed tlist + * + * If a match is found, return a Var constructed to reference the tlist item. + * If no match, return NULL. + * + * NOTE: it is a waste of time to call this unless itlist->has_non_vars. */ static Var * search_indexed_tlist_for_non_var(Expr *node, @@ -2870,22 +2895,23 @@ fix_join_expr_mutator(Node *node, fix_join_expr_context *context) /* See if the PlaceHolderVar has bubbled up from a lower plan node */ if (context->outer_itlist && context->outer_itlist->has_ph_vars) { - newvar = search_indexed_tlist_for_non_var((Expr *) phv, - context->outer_itlist, - OUTER_VAR); + newvar = search_indexed_tlist_for_phv(phv, + context->outer_itlist, + OUTER_VAR); if (newvar) return (Node *) newvar; } if (context->inner_itlist && context->inner_itlist->has_ph_vars) { - newvar = search_indexed_tlist_for_non_var((Expr *) phv, - context->inner_itlist, - INNER_VAR); + newvar = search_indexed_tlist_for_phv(phv, + context->inner_itlist, + INNER_VAR); if (newvar) return (Node *) newvar; } /* If not supplied by input plans, evaluate the contained expr */ + /* XXX assert something about phnullingrels */ return fix_join_expr_mutator((Node *) phv->phexpr, context); } /* Try matching more complex expressions too, if tlists have any */ @@ -2994,13 +3020,14 @@ fix_upper_expr_mutator(Node *node, fix_upper_expr_context *context) /* See if the PlaceHolderVar has bubbled up from a lower plan node */ if (context->subplan_itlist->has_ph_vars) { - newvar = search_indexed_tlist_for_non_var((Expr *) phv, - context->subplan_itlist, - context->newvarno); + newvar = search_indexed_tlist_for_phv(phv, + context->subplan_itlist, + context->newvarno); if (newvar) return (Node *) newvar; } /* If not supplied by input plan, evaluate the contained expr */ + /* XXX assert something about phnullingrels */ return fix_upper_expr_mutator((Node *) phv->phexpr, context); } /* Try matching more complex expressions too, if tlist has any */ diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c index 0bd99acf83..389f7d9ce7 100644 --- a/src/backend/optimizer/prep/prepjointree.c +++ b/src/backend/optimizer/prep/prepjointree.c @@ -49,17 +49,28 @@ typedef struct pullup_replace_vars_context * pullup (set only if target_rte->lateral) */ bool *outer_hasSubLinks; /* -> outer query's hasSubLinks */ int varno; /* varno of subquery */ - bool need_phvs; /* do we need PlaceHolderVars? */ - bool wrap_non_vars; /* do we need 'em on *all* non-Vars? */ + bool wrap_non_vars; /* do we need all non-Var outputs to be PHVs? */ Node **rv_cache; /* cache for results with PHVs */ } pullup_replace_vars_context; -typedef struct reduce_outer_joins_state +typedef struct reduce_outer_joins_pass1_state { Relids relids; /* base relids within this subtree */ bool contains_outer; /* does subtree contain outer join(s)? */ List *sub_states; /* List of states for subtree components */ -} reduce_outer_joins_state; +} reduce_outer_joins_pass1_state; + +typedef struct reduce_outer_joins_pass2_state +{ + Relids inner_reduced; /* OJ relids reduced to plain inner joins */ + List *partial_reduced; /* List of partially reduced FULL joins */ +} reduce_outer_joins_pass2_state; + +typedef struct reduce_outer_joins_partial_state +{ + int full_join_rti; /* RT index of a formerly-FULL join */ + Relids unreduced_side; /* relids in its still-nullable side */ +} reduce_outer_joins_partial_state; static Node *pull_up_sublinks_jointree_recurse(PlannerInfo *root, Node *jtnode, Relids *relids); @@ -68,12 +79,10 @@ static Node *pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node, Node **jtlink2, Relids available_rels2); static Node *pull_up_subqueries_recurse(PlannerInfo *root, Node *jtnode, JoinExpr *lowest_outer_join, - JoinExpr *lowest_nulling_outer_join, AppendRelInfo *containing_appendrel); static Node *pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte, JoinExpr *lowest_outer_join, - JoinExpr *lowest_nulling_outer_join, AppendRelInfo *containing_appendrel); static Node *pull_up_simple_union_all(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte); @@ -90,7 +99,6 @@ static Node *pull_up_simple_values(PlannerInfo *root, Node *jtnode, static bool is_simple_values(PlannerInfo *root, RangeTblEntry *rte); static Node *pull_up_constant_function(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte, - JoinExpr *lowest_nulling_outer_join, AppendRelInfo *containing_appendrel); static bool is_simple_union_all(Query *subquery); static bool is_simple_union_all_recurse(Node *setOp, Query *setOpQuery, @@ -101,25 +109,27 @@ static bool jointree_contains_lateral_outer_refs(PlannerInfo *root, Relids safe_upper_varnos); static void perform_pullup_replace_vars(PlannerInfo *root, pullup_replace_vars_context *rvcontext, - JoinExpr *lowest_nulling_outer_join, AppendRelInfo *containing_appendrel); static void replace_vars_in_jointree(Node *jtnode, - pullup_replace_vars_context *context, - JoinExpr *lowest_nulling_outer_join); + pullup_replace_vars_context *context); static Node *pullup_replace_vars(Node *expr, pullup_replace_vars_context *context); static Node *pullup_replace_vars_callback(Var *var, replace_rte_variables_context *context); static Query *pullup_replace_vars_subquery(Query *query, pullup_replace_vars_context *context); -static reduce_outer_joins_state *reduce_outer_joins_pass1(Node *jtnode); +static reduce_outer_joins_pass1_state *reduce_outer_joins_pass1(Node *jtnode); static void reduce_outer_joins_pass2(Node *jtnode, - reduce_outer_joins_state *state, + reduce_outer_joins_pass1_state *state1, + reduce_outer_joins_pass2_state *state2, PlannerInfo *root, Relids nonnullable_rels, List *nonnullable_vars, List *forced_null_vars); -static Node *remove_useless_results_recurse(PlannerInfo *root, Node *jtnode); +static void report_reduced_full_join(reduce_outer_joins_pass2_state *state2, + int rtindex, Relids relids); +static Node *remove_useless_results_recurse(PlannerInfo *root, Node *jtnode, + Relids *dropped_outer_joins); static int get_result_relid(PlannerInfo *root, Node *jtnode); static void remove_result_refs(PlannerInfo *root, int varno, Node *newjtloc); static bool find_dependent_phvs(PlannerInfo *root, int varno); @@ -764,7 +774,7 @@ pull_up_subqueries(PlannerInfo *root) /* Recursion starts with no containing join nor appendrel */ root->parse->jointree = (FromExpr *) pull_up_subqueries_recurse(root, (Node *) root->parse->jointree, - NULL, NULL, NULL); + NULL, NULL); /* We should still have a FromExpr */ Assert(IsA(root->parse->jointree, FromExpr)); } @@ -779,12 +789,6 @@ pull_up_subqueries(PlannerInfo *root) * lowest_outer_join references the lowest such JoinExpr node; otherwise * it is NULL. We use this to constrain the effects of LATERAL subqueries. * - * If this jointree node is within the nullable side of an outer join, then - * lowest_nulling_outer_join references the lowest such JoinExpr node; - * otherwise it is NULL. This forces use of the PlaceHolderVar mechanism for - * references to non-nullable targetlist items, but only for references above - * that join. - * * If we are looking at a member subquery of an append relation, * containing_appendrel describes that relation; else it is NULL. * This forces use of the PlaceHolderVar mechanism for all non-Var targetlist @@ -801,15 +805,14 @@ pull_up_subqueries(PlannerInfo *root) * Notice also that we can't turn pullup_replace_vars loose on the whole * jointree, because it'd return a mutated copy of the tree; we have to * invoke it just on the quals, instead. This behavior is what makes it - * reasonable to pass lowest_outer_join and lowest_nulling_outer_join as - * pointers rather than some more-indirect way of identifying the lowest - * OJs. Likewise, we don't replace append_rel_list members but only their - * substructure, so the containing_appendrel reference is safe to use. + * reasonable to pass lowest_outer_join as a pointer rather than some + * more-indirect way of identifying the lowest OJ. Likewise, we don't + * replace append_rel_list members but only their substructure, so the + * containing_appendrel reference is safe to use. */ static Node * pull_up_subqueries_recurse(PlannerInfo *root, Node *jtnode, JoinExpr *lowest_outer_join, - JoinExpr *lowest_nulling_outer_join, AppendRelInfo *containing_appendrel) { Assert(jtnode != NULL); @@ -831,7 +834,6 @@ pull_up_subqueries_recurse(PlannerInfo *root, Node *jtnode, is_safe_append_member(rte->subquery))) return pull_up_simple_subquery(root, jtnode, rte, lowest_outer_join, - lowest_nulling_outer_join, containing_appendrel); /* @@ -864,7 +866,6 @@ pull_up_subqueries_recurse(PlannerInfo *root, Node *jtnode, */ if (rte->rtekind == RTE_FUNCTION) return pull_up_constant_function(root, jtnode, rte, - lowest_nulling_outer_join, containing_appendrel); /* Otherwise, do nothing at this node. */ @@ -880,7 +881,6 @@ pull_up_subqueries_recurse(PlannerInfo *root, Node *jtnode, { lfirst(l) = pull_up_subqueries_recurse(root, lfirst(l), lowest_outer_join, - lowest_nulling_outer_join, NULL); } } @@ -895,11 +895,9 @@ pull_up_subqueries_recurse(PlannerInfo *root, Node *jtnode, case JOIN_INNER: j->larg = pull_up_subqueries_recurse(root, j->larg, lowest_outer_join, - lowest_nulling_outer_join, NULL); j->rarg = pull_up_subqueries_recurse(root, j->rarg, lowest_outer_join, - lowest_nulling_outer_join, NULL); break; case JOIN_LEFT: @@ -907,31 +905,25 @@ pull_up_subqueries_recurse(PlannerInfo *root, Node *jtnode, case JOIN_ANTI: j->larg = pull_up_subqueries_recurse(root, j->larg, j, - lowest_nulling_outer_join, NULL); j->rarg = pull_up_subqueries_recurse(root, j->rarg, - j, j, NULL); break; case JOIN_FULL: j->larg = pull_up_subqueries_recurse(root, j->larg, - j, j, NULL); j->rarg = pull_up_subqueries_recurse(root, j->rarg, - j, j, NULL); break; case JOIN_RIGHT: j->larg = pull_up_subqueries_recurse(root, j->larg, - j, j, NULL); j->rarg = pull_up_subqueries_recurse(root, j->rarg, j, - lowest_nulling_outer_join, NULL); break; default: @@ -961,7 +953,6 @@ pull_up_subqueries_recurse(PlannerInfo *root, Node *jtnode, static Node * pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte, JoinExpr *lowest_outer_join, - JoinExpr *lowest_nulling_outer_join, AppendRelInfo *containing_appendrel) { Query *parse = root->parse; @@ -1107,31 +1098,25 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte, * The subquery's targetlist items are now in the appropriate form to * insert into the top query, except that we may need to wrap them in * PlaceHolderVars. Set up required context data for pullup_replace_vars. + * (Note that we should include the subquery's inner joins in relids, + * since it may include join alias vars referencing them.) */ rvcontext.root = root; rvcontext.targetlist = subquery->targetList; rvcontext.target_rte = rte; if (rte->lateral) rvcontext.relids = get_relids_in_jointree((Node *) subquery->jointree, - true); + true, true); else /* won't need relids */ rvcontext.relids = NULL; rvcontext.outer_hasSubLinks = &parse->hasSubLinks; rvcontext.varno = varno; - /* these flags will be set below, if needed */ - rvcontext.need_phvs = false; + /* this flag will be set below, if needed */ rvcontext.wrap_non_vars = false; /* initialize cache array with indexes 0 .. length(tlist) */ rvcontext.rv_cache = palloc0((list_length(subquery->targetList) + 1) * sizeof(Node *)); - /* - * If we are under an outer join then non-nullable items and lateral - * references may have to be turned into PlaceHolderVars. - */ - if (lowest_nulling_outer_join != NULL) - rvcontext.need_phvs = true; - /* * If we are dealing with an appendrel member then anything that's not a * simple Var has to be turned into a PlaceHolderVar. We force this to @@ -1140,10 +1125,7 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte, * expression actually available from the appendrel. */ if (containing_appendrel != NULL) - { - rvcontext.need_phvs = true; rvcontext.wrap_non_vars = true; - } /* * If the parent query uses grouping sets, we need a PlaceHolderVar for @@ -1155,10 +1137,7 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte, * that pullup_replace_vars hasn't currently got.) */ if (parse->groupingSets) - { - rvcontext.need_phvs = true; rvcontext.wrap_non_vars = true; - } /* * Replace all of the top query's references to the subquery's outputs @@ -1166,7 +1145,6 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte, * replace any of the jointree structure. */ perform_pullup_replace_vars(root, &rvcontext, - lowest_nulling_outer_join, containing_appendrel); /* @@ -1233,7 +1211,8 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte, { Relids subrelids; - subrelids = get_relids_in_jointree((Node *) subquery->jointree, false); + subrelids = get_relids_in_jointree((Node *) subquery->jointree, + true, false); substitute_phv_relids((Node *) parse, varno, subrelids); fix_append_rel_relids(root->append_rel_list, varno, subrelids); } @@ -1424,7 +1403,7 @@ pull_up_union_leaf_queries(Node *setOp, PlannerInfo *root, int parentRTindex, rtr = makeNode(RangeTblRef); rtr->rtindex = childRTindex; (void) pull_up_subqueries_recurse(root, (Node *) rtr, - NULL, NULL, appinfo); + NULL, appinfo); } else if (IsA(setOp, SetOperationStmt)) { @@ -1561,7 +1540,7 @@ is_simple_subquery(PlannerInfo *root, Query *subquery, RangeTblEntry *rte, { restricted = true; safe_upper_varnos = get_relids_in_jointree((Node *) lowest_outer_join, - true); + true, true); } else { @@ -1673,7 +1652,6 @@ pull_up_simple_values(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte) rvcontext.relids = NULL; rvcontext.outer_hasSubLinks = &parse->hasSubLinks; rvcontext.varno = varno; - rvcontext.need_phvs = false; rvcontext.wrap_non_vars = false; /* initialize cache array with indexes 0 .. length(tlist) */ rvcontext.rv_cache = palloc0((list_length(tlist) + 1) * @@ -1685,7 +1663,7 @@ pull_up_simple_values(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte) * any of the jointree structure. We can assume there's no outer joins or * appendrels in the dummy Query that surrounds a VALUES RTE. */ - perform_pullup_replace_vars(root, &rvcontext, NULL, NULL); + perform_pullup_replace_vars(root, &rvcontext, NULL); /* * There should be no appendrels to fix, nor any outer joins and hence no @@ -1784,7 +1762,6 @@ is_simple_values(PlannerInfo *root, RangeTblEntry *rte) static Node * pull_up_constant_function(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte, - JoinExpr *lowest_nulling_outer_join, AppendRelInfo *containing_appendrel) { Query *parse = root->parse; @@ -1836,40 +1813,26 @@ pull_up_constant_function(PlannerInfo *root, Node *jtnode, rvcontext.outer_hasSubLinks = &parse->hasSubLinks; rvcontext.varno = ((RangeTblRef *) jtnode)->rtindex; - /* these flags will be set below, if needed */ - rvcontext.need_phvs = false; + /* this flag will be set below, if needed */ rvcontext.wrap_non_vars = false; /* initialize cache array with indexes 0 .. length(tlist) */ rvcontext.rv_cache = palloc0((list_length(rvcontext.targetlist) + 1) * sizeof(Node *)); - /* - * If we are under an outer join then non-nullable items and lateral - * references may have to be turned into PlaceHolderVars. - */ - if (lowest_nulling_outer_join != NULL) - rvcontext.need_phvs = true; - /* * If we are dealing with an appendrel member then anything that's not a * simple Var has to be turned into a PlaceHolderVar. (See comments in * pull_up_simple_subquery().) */ if (containing_appendrel != NULL) - { - rvcontext.need_phvs = true; rvcontext.wrap_non_vars = true; - } /* * If the parent query uses grouping sets, we need a PlaceHolderVar for * anything that's not a simple Var. */ if (parse->groupingSets) - { - rvcontext.need_phvs = true; rvcontext.wrap_non_vars = true; - } /* * Replace all of the top query's references to the RTE's output with @@ -1877,7 +1840,6 @@ pull_up_constant_function(PlannerInfo *root, Node *jtnode, * jointree structure. */ perform_pullup_replace_vars(root, &rvcontext, - lowest_nulling_outer_join, containing_appendrel); /* @@ -2099,13 +2061,11 @@ jointree_contains_lateral_outer_refs(PlannerInfo *root, Node *jtnode, * * Caller has already filled *rvcontext with data describing what to * substitute for Vars referencing the target subquery. In addition - * we need the identity of the lowest outer join that can null the - * target subquery, and its containing appendrel if any. + * we need the identity of the containing appendrel if any. */ static void perform_pullup_replace_vars(PlannerInfo *root, pullup_replace_vars_context *rvcontext, - JoinExpr *lowest_nulling_outer_join, AppendRelInfo *containing_appendrel) { Query *parse = root->parse; @@ -2149,38 +2109,31 @@ perform_pullup_replace_vars(PlannerInfo *root, pullup_replace_vars((Node *) action->targetList, rvcontext); } } - replace_vars_in_jointree((Node *) parse->jointree, rvcontext, - lowest_nulling_outer_join); + replace_vars_in_jointree((Node *) parse->jointree, rvcontext); Assert(parse->setOperations == NULL); parse->havingQual = pullup_replace_vars(parse->havingQual, rvcontext); /* * Replace references in the translated_vars lists of appendrels. When - * pulling up an appendrel member, we do not need PHVs in the list of the - * parent appendrel --- there isn't any outer join between. Elsewhere, - * use PHVs for safety. (This analysis could be made tighter but it seems - * unlikely to be worth much trouble.) + * pulling up an appendrel member, we do not want to force PHVs in the + * list of the parent appendrel --- there isn't any outer join between. + * Elsewhere, use PHVs for safety. (This analysis could be made tighter + * but it seems unlikely to be worth much trouble.) */ foreach(lc, root->append_rel_list) { AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lc); - bool save_need_phvs = rvcontext->need_phvs; + bool save_wrap_non_vars = rvcontext->wrap_non_vars; if (appinfo == containing_appendrel) - rvcontext->need_phvs = false; + rvcontext->wrap_non_vars = false; appinfo->translated_vars = (List *) pullup_replace_vars((Node *) appinfo->translated_vars, rvcontext); - rvcontext->need_phvs = save_need_phvs; + rvcontext->wrap_non_vars = save_wrap_non_vars; } /* * Replace references in the joinaliasvars lists of join RTEs. - * - * You might think that we could avoid using PHVs for alias vars of joins - * below lowest_nulling_outer_join, but that doesn't work because the - * alias vars could be referenced above that join; we need the PHVs to be - * present in such references after the alias vars get flattened. (It - * might be worth trying to be smarter here, someday.) */ foreach(lc, parse->rtable) { @@ -2197,14 +2150,10 @@ perform_pullup_replace_vars(PlannerInfo *root, * Helper routine for perform_pullup_replace_vars: do pullup_replace_vars on * every expression in the jointree, without changing the jointree structure * itself. Ugly, but there's no other way... - * - * If we are at or below lowest_nulling_outer_join, we can suppress use of - * PlaceHolderVars wrapped around the replacement expressions. */ static void replace_vars_in_jointree(Node *jtnode, - pullup_replace_vars_context *context, - JoinExpr *lowest_nulling_outer_join) + pullup_replace_vars_context *context) { if (jtnode == NULL) return; @@ -2217,7 +2166,7 @@ replace_vars_in_jointree(Node *jtnode, * jointree scan, rather than a scan of the rtable, for a couple of * reasons: we can avoid processing no-longer-referenced RTEs, and we * can use the appropriate setting of need_phvs depending on whether - * the RTE is above possibly-nulling outer joins or not. + * the RTE is above possibly-nulling outer joins or not. XXX fix */ int varno = ((RangeTblRef *) jtnode)->rtindex; @@ -2274,42 +2223,30 @@ replace_vars_in_jointree(Node *jtnode, ListCell *l; foreach(l, f->fromlist) - replace_vars_in_jointree(lfirst(l), context, - lowest_nulling_outer_join); + replace_vars_in_jointree(lfirst(l), context); f->quals = pullup_replace_vars(f->quals, context); } else if (IsA(jtnode, JoinExpr)) { JoinExpr *j = (JoinExpr *) jtnode; - bool save_need_phvs = context->need_phvs; + bool save_wrap_non_vars = context->wrap_non_vars; - if (j == lowest_nulling_outer_join) - { - /* no more PHVs in or below this join */ - context->need_phvs = false; - lowest_nulling_outer_join = NULL; - } - replace_vars_in_jointree(j->larg, context, lowest_nulling_outer_join); - replace_vars_in_jointree(j->rarg, context, lowest_nulling_outer_join); + replace_vars_in_jointree(j->larg, context); + replace_vars_in_jointree(j->rarg, context); /* - * Use PHVs within the join quals of a full join, even when it's the - * lowest nulling outer join. Otherwise, we cannot identify which - * side of the join a pulled-up var-free expression came from, which - * can lead to failure to make a plan at all because none of the quals - * appear to be mergeable or hashable conditions. For this purpose we - * don't care about the state of wrap_non_vars, so leave it alone. + * Use PHVs within the join quals of a full join. Otherwise, we + * cannot identify which side of the join a pulled-up var-free + * expression came from, which can lead to failure to make a plan at + * all because none of the quals appear to be mergeable or hashable + * conditions. */ if (j->jointype == JOIN_FULL) - context->need_phvs = true; + context->wrap_non_vars = true; j->quals = pullup_replace_vars(j->quals, context); - /* - * We don't bother to update the colvars list, since it won't be used - * again ... - */ - context->need_phvs = save_need_phvs; + context->wrap_non_vars = save_wrap_non_vars; } else elog(ERROR, "unrecognized node type: %d", @@ -2338,8 +2275,18 @@ pullup_replace_vars_callback(Var *var, { pullup_replace_vars_context *rcon = (pullup_replace_vars_context *) context->callback_arg; int varattno = var->varattno; + bool need_phv; Node *newnode; + /* + * We need a PlaceHolderVar if the Var-to-be-replaced has nonempty + * varnullingrels (unless we find below that the replacement expression is + * a Var or PlaceHolderVar that we can just add the nullingrels to). We + * also need one if the caller has instructed us that all non-Var/PHV + * replacements need to be wrapped for identification purposes. + */ + need_phv = (var->varnullingrels != NULL) || rcon->wrap_non_vars; + /* * If PlaceHolderVars are needed, we cache the modified expressions in * rcon->rv_cache[]. This is not in hopes of any material speed gain @@ -2348,13 +2295,16 @@ pullup_replace_vars_callback(Var *var, * and possibly prevent optimizations that rely on recognizing different * references to the same subquery output as being equal(). So it's worth * a bit of extra effort to avoid it. + * + * The cached items have phlevelsup = 0 and phnullingrels = NULL; we'll + * copy them and adjust those values for this reference site below. */ - if (rcon->need_phvs && + if (need_phv && varattno >= InvalidAttrNumber && varattno <= list_length(rcon->targetlist) && rcon->rv_cache[varattno] != NULL) { - /* Just copy the entry and fall through to adjust its varlevelsup */ + /* Just copy the entry and fall through to adjust phlevelsup etc */ newnode = copyObject(rcon->rv_cache[varattno]); } else if (varattno == InvalidAttrNumber) @@ -2363,7 +2313,7 @@ pullup_replace_vars_callback(Var *var, RowExpr *rowexpr; List *colnames; List *fields; - bool save_need_phvs = rcon->need_phvs; + bool save_wrap_non_vars = rcon->wrap_non_vars; int save_sublevelsup = context->sublevels_up; /* @@ -2374,18 +2324,18 @@ pullup_replace_vars_callback(Var *var, * the RowExpr for use of the executor and ruleutils.c. * * In order to be able to cache the results, we always generate the - * expansion with varlevelsup = 0, and then adjust if needed. + * expansion with varlevelsup = 0, and then adjust below if needed. */ expandRTE(rcon->target_rte, var->varno, 0 /* not varlevelsup */ , var->location, (var->vartype != RECORDOID), &colnames, &fields); - /* Adjust the generated per-field Vars, but don't insert PHVs */ - rcon->need_phvs = false; + /* Expand the generated per-field Vars, but don't insert PHVs there */ + rcon->wrap_non_vars = false; context->sublevels_up = 0; /* to match the expandRTE output */ fields = (List *) replace_rte_variables_mutator((Node *) fields, context); - rcon->need_phvs = save_need_phvs; + rcon->wrap_non_vars = save_wrap_non_vars; context->sublevels_up = save_sublevelsup; rowexpr = makeNode(RowExpr); @@ -2403,14 +2353,13 @@ pullup_replace_vars_callback(Var *var, * expression to yield NULL, not ROW(NULL,NULL,...) when it is forced * to null by an outer join. */ - if (rcon->need_phvs) + if (need_phv) { - /* RowExpr is certainly not strict, so always need PHV */ newnode = (Node *) make_placeholder_expr(rcon->root, (Expr *) newnode, bms_make_singleton(rcon->varno)); - /* cache it with the PHV, and with varlevelsup still zero */ + /* cache it with the PHV, and with phlevelsup etc not set yet */ rcon->rv_cache[InvalidAttrNumber] = copyObject(newnode); } } @@ -2427,7 +2376,7 @@ pullup_replace_vars_callback(Var *var, newnode = (Node *) copyObject(tle->expr); /* Insert PlaceHolderVar if needed */ - if (rcon->need_phvs) + if (need_phv) { bool wrap; @@ -2453,69 +2402,61 @@ pullup_replace_vars_callback(Var *var, /* No need to wrap a PlaceHolderVar with another one, either */ wrap = false; } - else if (rcon->wrap_non_vars) - { - /* Wrap all non-Vars in a PlaceHolderVar */ - wrap = true; - } else { /* - * If it contains a Var of the subquery being pulled up, and - * does not contain any non-strict constructs, then it's - * certainly nullable so we don't need to insert a - * PlaceHolderVar. - * - * This analysis could be tighter: in particular, a non-strict - * construct hidden within a lower-level PlaceHolderVar is not - * reason to add another PHV. But for now it doesn't seem - * worth the code to be more exact. - * - * Note: in future maybe we should insert a PlaceHolderVar - * anyway, if the tlist item is expensive to evaluate? - * - * For a LATERAL subquery, we have to check the actual var - * membership of the node, but if it's non-lateral then any - * level-zero var must belong to the subquery. + * Must wrap, either because we need a place to insert + * varnullingrels or because caller told us to wrap + * everything. */ - if ((rcon->target_rte->lateral ? - bms_overlap(pull_varnos(rcon->root, (Node *) newnode), - rcon->relids) : - contain_vars_of_level((Node *) newnode, 0)) && - !contain_nonstrict_functions((Node *) newnode)) - { - /* No wrap needed */ - wrap = false; - } - else - { - /* Else wrap it in a PlaceHolderVar */ - wrap = true; - } + wrap = true; } if (wrap) + { newnode = (Node *) make_placeholder_expr(rcon->root, (Expr *) newnode, bms_make_singleton(rcon->varno)); - /* - * Cache it if possible (ie, if the attno is in range, which it - * probably always should be). We can cache the value even if we - * decided we didn't need a PHV, since this result will be - * suitable for any request that has need_phvs. - */ - if (varattno > InvalidAttrNumber && - varattno <= list_length(rcon->targetlist)) - rcon->rv_cache[varattno] = copyObject(newnode); + /* + * Cache it if possible (ie, if the attno is in range, which + * it probably always should be). + */ + if (varattno > InvalidAttrNumber && + varattno <= list_length(rcon->targetlist)) + rcon->rv_cache[varattno] = copyObject(newnode); + } } } - /* Must adjust varlevelsup if tlist item is from higher query */ + /* Must adjust varlevelsup if replaced Var is within a subquery */ if (var->varlevelsup > 0) IncrementVarSublevelsUp(newnode, var->varlevelsup, 0); + /* Propagate any varnullingrels into the replacement Var or PHV */ + if (var->varnullingrels != NULL) + { + if (IsA(newnode, Var)) + { + Var *newvar = (Var *) newnode; + + Assert(newvar->varlevelsup == var->varlevelsup); + newvar->varnullingrels = bms_add_members(newvar->varnullingrels, + var->varnullingrels); + } + else if (IsA(newnode, PlaceHolderVar)) + { + PlaceHolderVar *newphv = (PlaceHolderVar *) newnode; + + Assert(newphv->phlevelsup == var->varlevelsup); + newphv->phnullingrels = bms_add_members(newphv->phnullingrels, + var->varnullingrels); + } + else + elog(ERROR, "failed to wrap a non-Var"); + } + return newnode; } @@ -2674,7 +2615,9 @@ flatten_simple_union_all(PlannerInfo *root) void reduce_outer_joins(PlannerInfo *root) { - reduce_outer_joins_state *state; + reduce_outer_joins_pass1_state *state1; + reduce_outer_joins_pass2_state state2; + ListCell *lc; /* * To avoid doing strictness checks on more quals than necessary, we want @@ -2685,14 +2628,44 @@ reduce_outer_joins(PlannerInfo *root) * join(s) below each side of each join clause. The second pass examines * qual clauses and changes join types as it descends the tree. */ - state = reduce_outer_joins_pass1((Node *) root->parse->jointree); + state1 = reduce_outer_joins_pass1((Node *) root->parse->jointree); /* planner.c shouldn't have called me if no outer joins */ - if (state == NULL || !state->contains_outer) + if (state1 == NULL || !state1->contains_outer) elog(ERROR, "so where are the outer joins?"); + state2.inner_reduced = NULL; + state2.partial_reduced = NIL; + reduce_outer_joins_pass2((Node *) root->parse->jointree, - state, root, NULL, NIL, NIL); + state1, &state2, + root, NULL, NIL, NIL); + + /* + * If we successfully reduced the strength of any outer joins, we must + * remove references to those joins as nulling rels. This is handled as + * an additional pass, for simplicity and because we can handle all + * fully-reduced joins in a single pass over the parse tree. + */ + if (!bms_is_empty(state2.inner_reduced)) + root->parse = (Query *) + remove_nulling_relids((Node *) root->parse, + state2.inner_reduced, + NULL); + + /* + * Partially-reduced full joins have to be done one at a time, since + * they'll each need a different setting of except_relids. + */ + foreach(lc, state2.partial_reduced) + { + reduce_outer_joins_partial_state *statep = lfirst(lc); + + root->parse = (Query *) + remove_nulling_relids((Node *) root->parse, + bms_make_singleton(statep->full_join_rti), + statep->unreduced_side); + } } /* @@ -2700,13 +2673,13 @@ reduce_outer_joins(PlannerInfo *root) * * Returns a state node describing the given jointree node. */ -static reduce_outer_joins_state * +static reduce_outer_joins_pass1_state * reduce_outer_joins_pass1(Node *jtnode) { - reduce_outer_joins_state *result; + reduce_outer_joins_pass1_state *result; - result = (reduce_outer_joins_state *) - palloc(sizeof(reduce_outer_joins_state)); + result = (reduce_outer_joins_pass1_state *) + palloc(sizeof(reduce_outer_joins_pass1_state)); result->relids = NULL; result->contains_outer = false; result->sub_states = NIL; @@ -2726,7 +2699,7 @@ reduce_outer_joins_pass1(Node *jtnode) foreach(l, f->fromlist) { - reduce_outer_joins_state *sub_state; + reduce_outer_joins_pass1_state *sub_state; sub_state = reduce_outer_joins_pass1(lfirst(l)); result->relids = bms_add_members(result->relids, @@ -2738,7 +2711,7 @@ reduce_outer_joins_pass1(Node *jtnode) else if (IsA(jtnode, JoinExpr)) { JoinExpr *j = (JoinExpr *) jtnode; - reduce_outer_joins_state *sub_state; + reduce_outer_joins_pass1_state *sub_state; /* join's own RT index is not wanted in result->relids */ if (IS_OUTER_JOIN(j->jointype)) @@ -2766,15 +2739,23 @@ reduce_outer_joins_pass1(Node *jtnode) * reduce_outer_joins_pass2 - phase 2 processing * * jtnode: current jointree node - * state: state data collected by phase 1 for this node + * state1: state data collected by phase 1 for this node + * state2: where to accumulate info about successfully-reduced joins * root: toplevel planner state * nonnullable_rels: set of base relids forced non-null by upper quals * nonnullable_vars: list of Vars forced non-null by upper quals * forced_null_vars: list of Vars forced null by upper quals + * + * Returns info in state2 about outer joins that were successfully simplified. + * Joins that were fully reduced to inner joins are all added to + * state2->inner_reduced. If a full join is reduced to a left join, + * it needs its own entry in state2->partial_reduced, since that will + * require custom processing to remove only the correct nullingrel markers. */ static void reduce_outer_joins_pass2(Node *jtnode, - reduce_outer_joins_state *state, + reduce_outer_joins_pass1_state *state1, + reduce_outer_joins_pass2_state *state2, PlannerInfo *root, Relids nonnullable_rels, List *nonnullable_vars, @@ -2808,13 +2789,14 @@ reduce_outer_joins_pass2(Node *jtnode, pass_forced_null_vars = list_concat(pass_forced_null_vars, forced_null_vars); /* And recurse --- but only into interesting subtrees */ - Assert(list_length(f->fromlist) == list_length(state->sub_states)); - forboth(l, f->fromlist, s, state->sub_states) + Assert(list_length(f->fromlist) == list_length(state1->sub_states)); + forboth(l, f->fromlist, s, state1->sub_states) { - reduce_outer_joins_state *sub_state = lfirst(s); + reduce_outer_joins_pass1_state *sub_state = lfirst(s); if (sub_state->contains_outer) - reduce_outer_joins_pass2(lfirst(l), sub_state, root, + reduce_outer_joins_pass2(lfirst(l), sub_state, + state2, root, pass_nonnullable_rels, pass_nonnullable_vars, pass_forced_null_vars); @@ -2827,8 +2809,8 @@ reduce_outer_joins_pass2(Node *jtnode, JoinExpr *j = (JoinExpr *) jtnode; int rtindex = j->rtindex; JoinType jointype = j->jointype; - reduce_outer_joins_state *left_state = linitial(state->sub_states); - reduce_outer_joins_state *right_state = lsecond(state->sub_states); + reduce_outer_joins_pass1_state *left_state = linitial(state1->sub_states); + reduce_outer_joins_pass1_state *right_state = lsecond(state1->sub_states); List *local_nonnullable_vars = NIL; bool computed_local_nonnullable_vars = false; @@ -2851,12 +2833,22 @@ reduce_outer_joins_pass2(Node *jtnode, if (bms_overlap(nonnullable_rels, right_state->relids)) jointype = JOIN_INNER; else + { jointype = JOIN_LEFT; + /* Also report partial reduction in state2 */ + report_reduced_full_join(state2, rtindex, + right_state->relids); + } } else { if (bms_overlap(nonnullable_rels, right_state->relids)) + { jointype = JOIN_RIGHT; + /* Also report partial reduction in state2 */ + report_reduced_full_join(state2, rtindex, + left_state->relids); + } } break; case JOIN_SEMI: @@ -2889,8 +2881,8 @@ reduce_outer_joins_pass2(Node *jtnode, j->larg = j->rarg; j->rarg = tmparg; jointype = JOIN_LEFT; - right_state = linitial(state->sub_states); - left_state = lsecond(state->sub_states); + right_state = linitial(state1->sub_states); + left_state = lsecond(state1->sub_states); } /* @@ -2923,7 +2915,10 @@ reduce_outer_joins_pass2(Node *jtnode, jointype = JOIN_ANTI; } - /* Apply the jointype change, if any, to both jointree node and RTE */ + /* + * Apply the jointype change, if any, to both jointree node and RTE. + * Also, if we changed an RTE to INNER, add its RTI to inner_reduced. + */ if (rtindex && jointype != j->jointype) { RangeTblEntry *rte = rt_fetch(rtindex, root->parse->rtable); @@ -2931,6 +2926,9 @@ reduce_outer_joins_pass2(Node *jtnode, Assert(rte->rtekind == RTE_JOIN); Assert(rte->jointype == j->jointype); rte->jointype = jointype; + if (jointype == JOIN_INNER) + state2->inner_reduced = bms_add_member(state2->inner_reduced, + rtindex); } j->jointype = jointype; @@ -3011,7 +3009,8 @@ reduce_outer_joins_pass2(Node *jtnode, pass_nonnullable_vars = NIL; pass_forced_null_vars = NIL; } - reduce_outer_joins_pass2(j->larg, left_state, root, + reduce_outer_joins_pass2(j->larg, left_state, + state2, root, pass_nonnullable_rels, pass_nonnullable_vars, pass_forced_null_vars); @@ -3033,7 +3032,8 @@ reduce_outer_joins_pass2(Node *jtnode, pass_nonnullable_vars = NIL; pass_forced_null_vars = NIL; } - reduce_outer_joins_pass2(j->rarg, right_state, root, + reduce_outer_joins_pass2(j->rarg, right_state, + state2, root, pass_nonnullable_rels, pass_nonnullable_vars, pass_forced_null_vars); @@ -3046,6 +3046,19 @@ reduce_outer_joins_pass2(Node *jtnode, (int) nodeTag(jtnode)); } +/* Helper for reduce_outer_joins_pass2 */ +static void +report_reduced_full_join(reduce_outer_joins_pass2_state *state2, + int rtindex, Relids relids) +{ + reduce_outer_joins_partial_state *statep; + + statep = palloc(sizeof(reduce_outer_joins_partial_state)); + statep->full_join_rti = rtindex; + statep->unreduced_side = relids; + state2->partial_reduced = lappend(state2->partial_reduced, statep); +} + /* * remove_useless_result_rtes @@ -3087,16 +3100,34 @@ reduce_outer_joins_pass2(Node *jtnode, void remove_useless_result_rtes(PlannerInfo *root) { + Relids dropped_outer_joins = NULL; ListCell *cell; /* Top level of jointree must always be a FromExpr */ Assert(IsA(root->parse->jointree, FromExpr)); /* Recurse ... */ root->parse->jointree = (FromExpr *) - remove_useless_results_recurse(root, (Node *) root->parse->jointree); + remove_useless_results_recurse(root, + (Node *) root->parse->jointree, + &dropped_outer_joins); /* We should still have a FromExpr */ Assert(IsA(root->parse->jointree, FromExpr)); + /* + * If we removed any outer-join nodes from the jointree, run around and + * remove references to those joins as nulling rels. (There could be such + * references in PHVs that we pulled up out of the original subquery that + * the RESULT rel replaced. This is kosher on the grounds that we now + * know that such an outer join wouldn't really have nulled anything.) We + * don't do this during the main recursion, for simplicity and because we + * can handle all such joins in a single pass over the parse tree. + */ + if (!bms_is_empty(dropped_outer_joins)) + root->parse = (Query *) + remove_nulling_relids((Node *) root->parse, + dropped_outer_joins, + NULL); + /* * Remove any PlanRowMark referencing an RTE_RESULT RTE. We obviously * must do that for any RTE_RESULT that we just removed. But one for a @@ -3122,9 +3153,12 @@ remove_useless_result_rtes(PlannerInfo *root) * Recursive guts of remove_useless_result_rtes. * * This recursively processes the jointree and returns a modified jointree. + * In addition, the RT indexes of any removed outer-join nodes are added to + * *dropped_outer_joins. */ static Node * -remove_useless_results_recurse(PlannerInfo *root, Node *jtnode) +remove_useless_results_recurse(PlannerInfo *root, Node *jtnode, + Relids *dropped_outer_joins) { Assert(jtnode != NULL); if (IsA(jtnode, RangeTblRef)) @@ -3152,7 +3186,8 @@ remove_useless_results_recurse(PlannerInfo *root, Node *jtnode) int varno; /* Recursively transform child ... */ - child = remove_useless_results_recurse(root, child); + child = remove_useless_results_recurse(root, child, + dropped_outer_joins); /* ... and stick it back into the tree */ lfirst(cell) = child; @@ -3201,8 +3236,10 @@ remove_useless_results_recurse(PlannerInfo *root, Node *jtnode) int varno; /* First, recurse */ - j->larg = remove_useless_results_recurse(root, j->larg); - j->rarg = remove_useless_results_recurse(root, j->rarg); + j->larg = remove_useless_results_recurse(root, j->larg, + dropped_outer_joins); + j->rarg = remove_useless_results_recurse(root, j->rarg, + dropped_outer_joins); /* Apply join-type-specific optimization rules */ switch (j->jointype) @@ -3270,6 +3307,8 @@ remove_useless_results_recurse(PlannerInfo *root, Node *jtnode) !find_dependent_phvs(root, varno))) { remove_result_refs(root, varno, j->larg); + *dropped_outer_joins = bms_add_member(*dropped_outer_joins, + j->rtindex); jtnode = j->larg; } break; @@ -3280,6 +3319,8 @@ remove_useless_results_recurse(PlannerInfo *root, Node *jtnode) !find_dependent_phvs(root, varno))) { remove_result_refs(root, varno, j->rarg); + *dropped_outer_joins = bms_add_member(*dropped_outer_joins, + j->rtindex); jtnode = j->rarg; } break; @@ -3294,11 +3335,14 @@ remove_useless_results_recurse(PlannerInfo *root, Node *jtnode) * Unlike the LEFT/RIGHT cases, we just Assert that there are * no PHVs that need to be evaluated at the semijoin's RHS, * since the rest of the query couldn't reference any outputs - * of the semijoin's RHS. + * of the semijoin's RHS. Also, we don't need to worry about + * removing traces of the join's rtindex, since it hasn't got + * one. */ if ((varno = get_result_relid(root, j->rarg)) != 0) { Assert(!find_dependent_phvs(root, varno)); + Assert(j->rtindex == 0); remove_result_refs(root, varno, j->larg); if (j->quals) jtnode = (Node *) @@ -3367,7 +3411,7 @@ remove_result_refs(PlannerInfo *root, int varno, Node *newjtloc) { Relids subrelids; - subrelids = get_relids_in_jointree(newjtloc, false); + subrelids = get_relids_in_jointree(newjtloc, true, false); Assert(!bms_is_empty(subrelids)); substitute_phv_relids((Node *) root->parse, varno, subrelids); fix_append_rel_relids(root->append_rel_list, varno, subrelids); @@ -3479,7 +3523,7 @@ find_dependent_phvs_in_jointree(PlannerInfo *root, Node *node, int varno) * are not marked LATERAL, though, since they couldn't possibly contain * any cross-references to other RTEs. */ - subrelids = get_relids_in_jointree(node, false); + subrelids = get_relids_in_jointree(node, false, false); relid = -1; while ((relid = bms_next_member(subrelids, relid)) >= 0) { @@ -3623,11 +3667,17 @@ fix_append_rel_relids(List *append_rel_list, int varno, Relids subrelids) /* * get_relids_in_jointree: get set of RT indexes present in a jointree * - * If include_joins is true, join RT indexes are included; if false, - * only base rels are included. + * Base-relation relids are always included in the result. + * If include_outer_joins is true, outer-join RT indexes are included. + * If include_inner_joins is true, inner-join RT indexes are included. + * + * Note that for most purposes in the planner, outer joins are included + * in standard relid sets. Setting include_inner_joins true is only + * appropriate for special purposes during subquery flattening. */ Relids -get_relids_in_jointree(Node *jtnode, bool include_joins) +get_relids_in_jointree(Node *jtnode, bool include_outer_joins, + bool include_inner_joins) { Relids result = NULL; @@ -3648,18 +3698,34 @@ get_relids_in_jointree(Node *jtnode, bool include_joins) { result = bms_join(result, get_relids_in_jointree(lfirst(l), - include_joins)); + include_outer_joins, + include_inner_joins)); } } else if (IsA(jtnode, JoinExpr)) { JoinExpr *j = (JoinExpr *) jtnode; - result = get_relids_in_jointree(j->larg, include_joins); + result = get_relids_in_jointree(j->larg, + include_outer_joins, + include_inner_joins); result = bms_join(result, - get_relids_in_jointree(j->rarg, include_joins)); - if (include_joins && j->rtindex) - result = bms_add_member(result, j->rtindex); + get_relids_in_jointree(j->rarg, + include_outer_joins, + include_inner_joins)); + if (j->rtindex) + { + if (j->jointype == JOIN_INNER) + { + if (include_inner_joins) + result = bms_add_member(result, j->rtindex); + } + else + { + if (include_outer_joins) + result = bms_add_member(result, j->rtindex); + } + } } else elog(ERROR, "unrecognized node type: %d", @@ -3668,7 +3734,7 @@ get_relids_in_jointree(Node *jtnode, bool include_joins) } /* - * get_relids_for_join: get set of base RT indexes making up a join + * get_relids_for_join: get set of base+OJ RT indexes making up a join */ Relids get_relids_for_join(Query *query, int joinrelid) @@ -3679,7 +3745,7 @@ get_relids_for_join(Query *query, int joinrelid) joinrelid); if (!jtnode) elog(ERROR, "could not find join node %d", joinrelid); - return get_relids_in_jointree(jtnode, false); + return get_relids_in_jointree(jtnode, true, false); } /* diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c index 62cccf9d87..e793a4c85b 100644 --- a/src/backend/optimizer/util/appendinfo.c +++ b/src/backend/optimizer/util/appendinfo.c @@ -228,6 +228,12 @@ adjust_appendrel_attrs_mutator(Node *node, if (var->varlevelsup != 0) return (Node *) var; /* no changes needed */ + /* + * You might think we need to adjust var->varnullingrels, but that + * shouldn't need any changes. It will contain outer-join relids, + * while the transformation we are making affects only baserels. + */ + for (cnt = 0; cnt < nappinfos; cnt++) { if (var->varno == appinfos[cnt]->parent_relid) @@ -348,6 +354,8 @@ adjust_appendrel_attrs_mutator(Node *node, var = copyObject(ridinfo->rowidvar); /* ... but use the correct relid */ var->varno = leaf_relid; + /* identity vars shouldn't have nulling rels */ + Assert(var->varnullingrels == NULL); /* varnosyn in the RowIdentityVarInfo is probably wrong */ var->varnosyn = 0; var->varattnosyn = 0; @@ -392,8 +400,11 @@ adjust_appendrel_attrs_mutator(Node *node, (void *) context); /* now fix PlaceHolderVar's relid sets */ if (phv->phlevelsup == 0) - phv->phrels = adjust_child_relids(phv->phrels, context->nappinfos, - context->appinfos); + { + phv->phrels = adjust_child_relids(phv->phrels, + nappinfos, appinfos); + /* as above, we needn't touch phnullingrels */ + } return (Node *) phv; } /* Shouldn't need to handle planner auxiliary nodes here */ @@ -688,7 +699,11 @@ get_translated_update_targetlist(PlannerInfo *root, Index relid, /* * find_appinfos_by_relids - * Find AppendRelInfo structures for all relations specified by relids. + * Find AppendRelInfo structures for base relations listed in relids. + * + * The relids argument is typically a join relation's relids, which can + * include outer-join RT indexes in addition to baserels. We silently + * ignore the outer joins. * * The AppendRelInfos are returned in an array, which can be pfree'd by the * caller. *nappinfos is set to the number of entries in the array. @@ -700,8 +715,9 @@ find_appinfos_by_relids(PlannerInfo *root, Relids relids, int *nappinfos) int cnt = 0; int i; - *nappinfos = bms_num_members(relids); - appinfos = (AppendRelInfo **) palloc(sizeof(AppendRelInfo *) * *nappinfos); + /* Allocate an array that's certainly big enough */ + appinfos = (AppendRelInfo **) + palloc(sizeof(AppendRelInfo *) * bms_num_members(relids)); i = -1; while ((i = bms_next_member(relids, i)) >= 0) @@ -709,10 +725,17 @@ find_appinfos_by_relids(PlannerInfo *root, Relids relids, int *nappinfos) AppendRelInfo *appinfo = root->append_rel_array[i]; if (!appinfo) + { + /* Probably i is an OJ index, but let's check */ + if (find_base_rel_ignore_join(root, i) == NULL) + continue; + /* It's a base rel, but we lack an append_rel_array entry */ elog(ERROR, "child rel %d not found in append_rel_array", i); + } appinfos[cnt++] = appinfo; } + *nappinfos = cnt; return appinfos; } @@ -754,6 +777,7 @@ add_row_identity_var(PlannerInfo *root, Var *orig_var, Assert(IsA(orig_var, Var)); Assert(orig_var->varno == rtindex); Assert(orig_var->varlevelsup == 0); + Assert(orig_var->varnullingrels == NULL); /* * If we're doing non-inherited UPDATE/DELETE/MERGE, there's little need diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c index 533df86ff7..c82fd451b2 100644 --- a/src/backend/optimizer/util/clauses.c +++ b/src/backend/optimizer/util/clauses.c @@ -2022,14 +2022,16 @@ is_pseudo_constant_clause_relids(Node *clause, Relids relids) * NumRelids * (formerly clause_relids) * - * Returns the number of different relations referenced in 'clause'. + * Returns the number of different base relations referenced in 'clause'. */ int NumRelids(PlannerInfo *root, Node *clause) { + int result; Relids varnos = pull_varnos(root, clause); - int result = bms_num_members(varnos); + varnos = bms_del_members(varnos, root->outer_join_rels); + result = bms_num_members(varnos); bms_free(varnos); return result; } diff --git a/src/backend/optimizer/util/joininfo.c b/src/backend/optimizer/util/joininfo.c index d4cffdb198..afd243f5d8 100644 --- a/src/backend/optimizer/util/joininfo.c +++ b/src/backend/optimizer/util/joininfo.c @@ -88,8 +88,8 @@ have_relevant_joinclause(PlannerInfo *root, * not depend on context). * * 'restrictinfo' describes the join clause - * 'join_relids' is the list of relations participating in the join clause - * (there must be more than one) + * 'join_relids' is the set of relations participating in the join clause + * (some of these could be outer joins) */ void add_join_clause_to_rels(PlannerInfo *root, @@ -101,8 +101,11 @@ add_join_clause_to_rels(PlannerInfo *root, cur_relid = -1; while ((cur_relid = bms_next_member(join_relids, cur_relid)) >= 0) { - RelOptInfo *rel = find_base_rel(root, cur_relid); + RelOptInfo *rel = find_base_rel_ignore_join(root, cur_relid); + /* We only need to add the clause to baserels */ + if (rel == NULL) + continue; rel->joininfo = lappend(rel->joininfo, restrictinfo); } } @@ -115,8 +118,8 @@ add_join_clause_to_rels(PlannerInfo *root, * discover that a relation need not be joined at all. * * 'restrictinfo' describes the join clause - * 'join_relids' is the list of relations participating in the join clause - * (there must be more than one) + * 'join_relids' is the set of relations participating in the join clause + * (some of these could be outer joins) */ void remove_join_clause_from_rels(PlannerInfo *root, @@ -128,7 +131,11 @@ remove_join_clause_from_rels(PlannerInfo *root, cur_relid = -1; while ((cur_relid = bms_next_member(join_relids, cur_relid)) >= 0) { - RelOptInfo *rel = find_base_rel(root, cur_relid); + RelOptInfo *rel = find_base_rel_ignore_join(root, cur_relid); + + /* We would only have added the clause to baserels */ + if (rel == NULL) + continue; /* * Remove the restrictinfo from the list. Pointer comparison is diff --git a/src/backend/optimizer/util/orclauses.c b/src/backend/optimizer/util/orclauses.c index b1363df065..a62d4587ea 100644 --- a/src/backend/optimizer/util/orclauses.c +++ b/src/backend/optimizer/util/orclauses.c @@ -338,7 +338,9 @@ consider_new_or_clause(PlannerInfo *root, RelOptInfo *rel, sjinfo.syn_lefthand = sjinfo.min_lefthand; sjinfo.syn_righthand = sjinfo.min_righthand; sjinfo.jointype = JOIN_INNER; + sjinfo.ojrelid = 0; /* we don't bother trying to make the remaining fields valid */ + sjinfo.strict_relids = NULL; sjinfo.lhs_strict = false; sjinfo.delay_upper_joins = false; sjinfo.semi_can_btree = false; diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c index 833b440f39..23e3e430f4 100644 --- a/src/backend/optimizer/util/pathnode.c +++ b/src/backend/optimizer/util/pathnode.c @@ -1307,7 +1307,7 @@ create_append_path(PlannerInfo *root, * Apply query-wide LIMIT if known and path is for sole base relation. * (Handling this at this low level is a bit klugy.) */ - if (root != NULL && bms_equal(rel->relids, root->all_baserels)) + if (root != NULL && bms_equal(rel->relids, root->all_query_rels)) pathnode->limit_tuples = root->limit_tuples; else pathnode->limit_tuples = -1.0; @@ -1427,7 +1427,7 @@ create_merge_append_path(PlannerInfo *root, * Apply query-wide LIMIT if known and path is for sole base relation. * (Handling this at this low level is a bit klugy.) */ - if (bms_equal(rel->relids, root->all_baserels)) + if (bms_equal(rel->relids, root->all_query_rels)) pathnode->limit_tuples = root->limit_tuples; else pathnode->limit_tuples = -1.0; diff --git a/src/backend/optimizer/util/placeholder.c b/src/backend/optimizer/util/placeholder.c index 3b0f0584f0..4166e55d9a 100644 --- a/src/backend/optimizer/util/placeholder.c +++ b/src/backend/optimizer/util/placeholder.c @@ -32,8 +32,14 @@ static void find_placeholders_in_expr(PlannerInfo *root, Node *expr); * make_placeholder_expr * Make a PlaceHolderVar for the given expression. * - * phrels is the syntactic location (as a set of baserels) to attribute + * phrels is the syntactic location (as a set of relids) to attribute * to the expression. + * + * The caller is responsible for adjusting phlevelsup and phnullingrels + * as needed. Because we do not know here which query level the PHV + * will be associated with, it's important that this function touches + * only root->glob; messing with other parts of PlannerInfo would be + * likely to do the wrong thing. */ PlaceHolderVar * make_placeholder_expr(PlannerInfo *root, Expr *expr, Relids phrels) @@ -42,8 +48,9 @@ make_placeholder_expr(PlannerInfo *root, Expr *expr, Relids phrels) phv->phexpr = expr; phv->phrels = phrels; + phv->phnullingrels = NULL; /* caller may change this later */ phv->phid = ++(root->glob->lastPHId); - phv->phlevelsup = 0; + phv->phlevelsup = 0; /* caller may change this later */ return phv; } @@ -317,6 +324,8 @@ update_placeholder_eval_levels(PlannerInfo *root, SpecialJoinInfo *new_sjinfo) sjinfo->min_lefthand); eval_at = bms_add_members(eval_at, sjinfo->min_righthand); + if (sjinfo->ojrelid) + eval_at = bms_add_member(eval_at, sjinfo->ojrelid); /* we'll need another iteration */ found_some = true; } @@ -390,9 +399,16 @@ add_placeholders_to_base_rels(PlannerInfo *root) bms_nonempty_difference(phinfo->ph_needed, eval_at)) { RelOptInfo *rel = find_base_rel(root, varno); + PlaceHolderVar *phv; - rel->reltarget->exprs = lappend(rel->reltarget->exprs, - copyObject(phinfo->ph_var)); + /* + * As in add_vars_to_targetlist(), a value computed at scan level + * has not yet been nulled by any outer join, so set its + * phnullingrels to empty. + */ + phv = copyObject(phinfo->ph_var); + phv->phnullingrels = NULL; + rel->reltarget->exprs = lappend(rel->reltarget->exprs, phv); /* reltarget's cost and width fields will be updated later */ } } @@ -411,7 +427,8 @@ add_placeholders_to_base_rels(PlannerInfo *root) */ void add_placeholders_to_joinrel(PlannerInfo *root, RelOptInfo *joinrel, - RelOptInfo *outer_rel, RelOptInfo *inner_rel) + RelOptInfo *outer_rel, RelOptInfo *inner_rel, + SpecialJoinInfo *sjinfo) { Relids relids = joinrel->relids; ListCell *lc; @@ -426,10 +443,11 @@ add_placeholders_to_joinrel(PlannerInfo *root, RelOptInfo *joinrel, /* Is it still needed above this joinrel? */ if (bms_nonempty_difference(phinfo->ph_needed, relids)) { - /* Yup, add it to the output */ - joinrel->reltarget->exprs = lappend(joinrel->reltarget->exprs, - phinfo->ph_var); - joinrel->reltarget->width += phinfo->ph_width; + /* + * Yup, we must add it to the output. Make a copy so we can + * adjust phnullingrels if needed. + */ + PlaceHolderVar *phv = copyObject(phinfo->ph_var); /* * Charge the cost of evaluating the contained expression if @@ -442,16 +460,42 @@ add_placeholders_to_joinrel(PlannerInfo *root, RelOptInfo *joinrel, * with that; but we might want to improve it later by * refiguring the reltarget costs for each pair of inputs. */ - if (!bms_is_subset(phinfo->ph_eval_at, outer_rel->relids) && - !bms_is_subset(phinfo->ph_eval_at, inner_rel->relids)) + if (bms_is_subset(phinfo->ph_eval_at, outer_rel->relids)) + { + if (sjinfo->jointype == JOIN_FULL && sjinfo->ojrelid != 0) + { + /* PHV's value can be nulled at this join */ + phv->phnullingrels = bms_add_member(phv->phnullingrels, + sjinfo->ojrelid); + } + } + else if (bms_is_subset(phinfo->ph_eval_at, inner_rel->relids)) { + if (sjinfo->jointype != JOIN_INNER && sjinfo->ojrelid != 0) + { + /* PHV's value can be nulled at this join */ + phv->phnullingrels = bms_add_member(phv->phnullingrels, + sjinfo->ojrelid); + } + } + else + { + /* It must be computed here. */ QualCost cost; + /* It'll start out not nulled by anything */ + phv->phnullingrels = NULL; + /* Add the appropriate cost */ cost_qual_eval_node(&cost, (Node *) phinfo->ph_var->phexpr, root); joinrel->reltarget->cost.startup += cost.startup; joinrel->reltarget->cost.per_tuple += cost.per_tuple; } + + joinrel->reltarget->exprs = lappend(joinrel->reltarget->exprs, + phv); + /* Update width estimate, too */ + joinrel->reltarget->width += phinfo->ph_width; } /* diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c index a163853bed..9544ced296 100644 --- a/src/backend/optimizer/util/relnode.c +++ b/src/backend/optimizer/util/relnode.c @@ -39,7 +39,7 @@ typedef struct JoinHashEntry } JoinHashEntry; static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel, - RelOptInfo *input_rel); + RelOptInfo *input_rel, int ojrelid); static List *build_joinrel_restrictlist(PlannerInfo *root, RelOptInfo *joinrel, RelOptInfo *outer_rel, @@ -58,7 +58,8 @@ static void set_foreign_rel_properties(RelOptInfo *joinrel, static void add_join_rel(PlannerInfo *root, RelOptInfo *joinrel); static void build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel, RelOptInfo *inner_rel, - List *restrictlist, JoinType jointype); + SpecialJoinInfo *sjinfo, + List *restrictlist); static bool have_partkey_equi_join(RelOptInfo *joinrel, RelOptInfo *rel1, RelOptInfo *rel2, JoinType jointype, List *restrictlist); @@ -66,7 +67,8 @@ static int match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel, bool strict_op); static void set_joinrel_partition_key_exprs(RelOptInfo *joinrel, RelOptInfo *outer_rel, RelOptInfo *inner_rel, - JoinType jointype); + SpecialJoinInfo *sjinfo); +static Node *add_nullingrel_to(Node *node, int relid); static void build_child_join_reltarget(PlannerInfo *root, RelOptInfo *parentrel, RelOptInfo *childrel, @@ -367,7 +369,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent) /* * find_base_rel - * Find a base or other relation entry, which must already exist. + * Find a base or otherrel relation entry, which must already exist. */ RelOptInfo * find_base_rel(PlannerInfo *root, int relid) @@ -388,6 +390,44 @@ find_base_rel(PlannerInfo *root, int relid) return NULL; /* keep compiler quiet */ } +/* + * find_base_rel_ignore_join + * Find a base or otherrel relation entry, which must already exist. + * + * Unlike find_base_rel, if relid references an outer join then this + * will return NULL rather than raising an error. This is convenient + * for callers that must deal with relid sets including both base and + * outer joins. + */ +RelOptInfo * +find_base_rel_ignore_join(PlannerInfo *root, int relid) +{ + Assert(relid > 0); + + if (relid < root->simple_rel_array_size) + { + RelOptInfo *rel; + RangeTblEntry *rte; + + rel = root->simple_rel_array[relid]; + if (rel) + return rel; + + /* + * We could just return NULL here, but for debugging purposes it seems + * best to actually verify that the relid is an outer join and not + * something weird. + */ + rte = root->simple_rte_array[relid]; + if (rte && rte->rtekind == RTE_JOIN && rte->jointype != JOIN_INNER) + return NULL; + } + + elog(ERROR, "no relation entry for relid %d", relid); + + return NULL; /* keep compiler quiet */ +} + /* * build_join_rel_hash * Construct the auxiliary hash table for join relations. @@ -686,9 +726,11 @@ build_join_rel(PlannerInfo *root, * and inner rels we first try to build it from. But the contents should * be the same regardless. */ - build_joinrel_tlist(root, joinrel, outer_rel); - build_joinrel_tlist(root, joinrel, inner_rel); - add_placeholders_to_joinrel(root, joinrel, outer_rel, inner_rel); + build_joinrel_tlist(root, joinrel, outer_rel, + (sjinfo->jointype == JOIN_FULL) ? sjinfo->ojrelid : 0); + build_joinrel_tlist(root, joinrel, inner_rel, + (sjinfo->jointype != JOIN_INNER) ? sjinfo->ojrelid : 0); + add_placeholders_to_joinrel(root, joinrel, outer_rel, inner_rel, sjinfo); /* * add_placeholders_to_joinrel also took care of adding the ph_lateral @@ -720,8 +762,8 @@ build_join_rel(PlannerInfo *root, joinrel->has_eclass_joins = has_relevant_eclass_joinclause(root, joinrel); /* Store the partition information. */ - build_joinrel_partition_info(joinrel, outer_rel, inner_rel, restrictlist, - sjinfo->jointype); + build_joinrel_partition_info(joinrel, outer_rel, inner_rel, sjinfo, + restrictlist); /* * Set estimates of the joinrel's size. @@ -777,16 +819,14 @@ build_join_rel(PlannerInfo *root, * 'parent_joinrel' is the RelOptInfo representing the join between parent * relations. Some of the members of new RelOptInfo are produced by * translating corresponding members of this RelOptInfo - * 'sjinfo': child-join context info * 'restrictlist': list of RestrictInfo nodes that apply to this particular * pair of joinable relations - * 'jointype' is the join type (inner, left, full, etc) + * 'sjinfo': child join's join-type details */ RelOptInfo * build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel, RelOptInfo *inner_rel, RelOptInfo *parent_joinrel, - List *restrictlist, SpecialJoinInfo *sjinfo, - JoinType jointype) + List *restrictlist, SpecialJoinInfo *sjinfo) { RelOptInfo *joinrel = makeNode(RelOptInfo); AppendRelInfo **appinfos; @@ -800,6 +840,8 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel, joinrel->reloptkind = RELOPT_OTHER_JOINREL; joinrel->relids = bms_union(outer_rel->relids, inner_rel->relids); + if (sjinfo->ojrelid != 0) + joinrel->relids = bms_add_member(joinrel->relids, sjinfo->ojrelid); joinrel->rows = 0; /* cheap startup cost is interesting iff not all tuples to be retrieved */ joinrel->consider_startup = (root->tuple_fraction > 0); @@ -886,8 +928,8 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel, joinrel->has_eclass_joins = parent_joinrel->has_eclass_joins; /* Is the join between partitions itself partitioned? */ - build_joinrel_partition_info(joinrel, outer_rel, inner_rel, restrictlist, - jointype); + build_joinrel_partition_info(joinrel, outer_rel, inner_rel, sjinfo, + restrictlist); /* Child joinrel is parallel safe if parent is parallel safe. */ joinrel->consider_parallel = parent_joinrel->consider_parallel; @@ -966,12 +1008,15 @@ min_join_parameterization(PlannerInfo *root, * will still be needed above the join. This subroutine adds all such * Vars from the specified input rel's tlist to the join rel's tlist. * + * If the join can null Vars from this input relation, pass its RT index + * (if any) as ojrelid; if not, pass zero. + * * We also compute the expected width of the join's output, making use * of data that was cached at the baserel level by set_rel_width(). */ static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel, - RelOptInfo *input_rel) + RelOptInfo *input_rel, int ojrelid) { Relids relids = joinrel->relids; ListCell *vars; @@ -1002,9 +1047,7 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel, RowIdentityVarInfo *ridinfo = (RowIdentityVarInfo *) list_nth(root->row_identity_vars, var->varattno - 1); - joinrel->reltarget->exprs = lappend(joinrel->reltarget->exprs, - var); - /* Vars have cost zero, so no need to adjust reltarget->cost */ + /* Update reltarget width estimate from RowIdentityVarInfo */ joinrel->reltarget->width += ridinfo->rowidwidth; } else @@ -1017,15 +1060,28 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel, /* Is it still needed above this joinrel? */ ndx = var->varattno - baserel->min_attr; - if (bms_nonempty_difference(baserel->attr_needed[ndx], relids)) - { - /* Yup, add it to the output */ - joinrel->reltarget->exprs = lappend(joinrel->reltarget->exprs, - var); - /* Vars have cost zero, so no need to adjust reltarget->cost */ - joinrel->reltarget->width += baserel->attr_widths[ndx]; - } + if (!bms_nonempty_difference(baserel->attr_needed[ndx], relids)) + continue; /* nope, skip it */ + + /* Update reltarget width estimate from baserel's attr_widths */ + joinrel->reltarget->width += baserel->attr_widths[ndx]; + } + + /* + * Add the Var to the output. If this join potentially nulls this + * input, we have to update the Var's varnullingrels, which means + * making a copy. + */ + if (ojrelid != 0) + { + var = copyObject(var); + var->varnullingrels = bms_add_member(var->varnullingrels, ojrelid); } + + joinrel->reltarget->exprs = lappend(joinrel->reltarget->exprs, + var); + + /* Vars have cost zero, so no need to adjust reltarget->cost */ } } @@ -1044,7 +1100,7 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel, * is not handled in the sub-relations, so it depends on which * sub-relations are considered. * - * If a join clause from an input relation refers to base rels still not + * If a join clause from an input relation refers to base+OJ rels still not * present in the joinrel, then it is still a join clause for the joinrel; * we put it into the joininfo list for the joinrel. Otherwise, * the clause is now a restrict clause for the joined relation, and we @@ -1645,8 +1701,8 @@ find_param_path_info(RelOptInfo *rel, Relids required_outer) */ static void build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel, - RelOptInfo *inner_rel, List *restrictlist, - JoinType jointype) + RelOptInfo *inner_rel, SpecialJoinInfo *sjinfo, + List *restrictlist) { PartitionScheme part_scheme; @@ -1673,7 +1729,7 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel, !inner_rel->consider_partitionwise_join || outer_rel->part_scheme != inner_rel->part_scheme || !have_partkey_equi_join(joinrel, outer_rel, inner_rel, - jointype, restrictlist)) + sjinfo->jointype, restrictlist)) { Assert(!IS_PARTITIONED_REL(joinrel)); return; @@ -1697,7 +1753,7 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel, * child-join relations of the join relation in try_partitionwise_join(). */ joinrel->part_scheme = part_scheme; - set_joinrel_partition_key_exprs(joinrel, outer_rel, inner_rel, jointype); + set_joinrel_partition_key_exprs(joinrel, outer_rel, inner_rel, sjinfo); /* * Set the consider_partitionwise_join flag. @@ -1877,6 +1933,23 @@ match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel, bool strict_op) { if (equal(lfirst(lc), expr)) return cnt; + + /* + * XXX For the moment, also allow a match if we have Vars that + * match except for varnullingrels. This may be indicative of a + * bug, although given the restriction to strict join operators, + * it could be okay. + */ + if (IsA(expr, Var) && IsA(lfirst(lc), Var)) + { + Var *v1 = (Var *) expr; + Var *v2 = (Var *) lfirst(lc); + + if (v1->varno == v2->varno && + v1->varattno == v2->varattno && + v1->varlevelsup == v2->varlevelsup) + return cnt; + } } } @@ -1890,7 +1963,7 @@ match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel, bool strict_op) static void set_joinrel_partition_key_exprs(RelOptInfo *joinrel, RelOptInfo *outer_rel, RelOptInfo *inner_rel, - JoinType jointype) + SpecialJoinInfo *sjinfo) { PartitionScheme part_scheme = joinrel->part_scheme; int partnatts = part_scheme->partnatts; @@ -1916,7 +1989,7 @@ set_joinrel_partition_key_exprs(RelOptInfo *joinrel, List *nullable_partexpr = NIL; ListCell *lc; - switch (jointype) + switch (sjinfo->jointype) { /* * A join relation resulting from an INNER join may be @@ -1992,18 +2065,37 @@ set_joinrel_partition_key_exprs(RelOptInfo *joinrel, * partitionwise nesting of any outer join.) We assume no * type coercions are needed to make the coalesce expressions, * since columns of different types won't have gotten - * classified as the same PartitionScheme. + * classified as the same PartitionScheme. However, we do + * have to worry about marking the COALESCE inputs as nullable + * by the full join, else these won't match the real thing. */ foreach(lc, list_concat_copy(outer_expr, outer_null_expr)) { Node *larg = (Node *) lfirst(lc); ListCell *lc2; + /* Insert nullingrel, or skip it if we can't */ + larg = add_nullingrel_to(larg, sjinfo->ojrelid); + if (larg == NULL) + continue; + foreach(lc2, list_concat_copy(inner_expr, inner_null_expr)) { Node *rarg = (Node *) lfirst(lc2); - CoalesceExpr *c = makeNode(CoalesceExpr); + CoalesceExpr *c; + + /* Forget it if coercions would be needed */ + if (exprType(larg) != exprType(rarg) || + exprCollation(larg) != exprCollation(rarg)) + continue; + /* Insert nullingrel, or skip it if we can't */ + rarg = add_nullingrel_to(rarg, sjinfo->ojrelid); + if (rarg == NULL) + continue; + + /* Now we can build a valid merged join variable */ + c = makeNode(CoalesceExpr); c->coalescetype = exprType(larg); c->coalescecollid = exprCollation(larg); c->args = list_make2(larg, rarg); @@ -2014,7 +2106,8 @@ set_joinrel_partition_key_exprs(RelOptInfo *joinrel, break; default: - elog(ERROR, "unrecognized join type: %d", (int) jointype); + elog(ERROR, "unrecognized join type: %d", + (int) sjinfo->jointype); } joinrel->partexprs[cnt] = partexpr; @@ -2022,6 +2115,54 @@ set_joinrel_partition_key_exprs(RelOptInfo *joinrel, } } +/* + * Attempt to add relid to nullingrels of a FULL JOIN USING variable. + * Returns the modified expression if successful, or NULL if we failed. + * + * We currently don't support any cases where type coercion is involved, + * so only plain Vars and COALESCE nodes need be handled. However, we + * do need to support nested COALESCEs, so recursion is required. + */ +static Node * +add_nullingrel_to(Node *node, int relid) +{ + if (IsA(node, Var)) + { + /* Copy so we can modify it... */ + Var *var = (Var *) copyObject(node); + + /* ... and insert the correct nullingrel marker */ + var->varnullingrels = bms_add_member(var->varnullingrels, + relid); + return (Node *) var; + } + if (IsA(node, CoalesceExpr)) + { + CoalesceExpr *cexpr = (CoalesceExpr *) node; + CoalesceExpr *newcexpr; + List *newargs = NIL; + ListCell *lc; + + /* Try to modify each argument ... */ + foreach(lc, cexpr->args) + { + Node *newarg = add_nullingrel_to((Node *) lfirst(lc), relid); + + if (newarg == NULL) + return NULL; + newargs = lappend(newargs, newarg); + } + /* Success, so make the result node */ + newcexpr = makeNode(CoalesceExpr); + newcexpr->coalescetype = cexpr->coalescetype; + newcexpr->coalescecollid = cexpr->coalescecollid; + newcexpr->args = newargs; + newcexpr->location = cexpr->location; + return (Node *) newcexpr; + } + return NULL; +} + /* * build_child_join_reltarget * Set up a child-join relation's reltarget from a parent-join relation. diff --git a/src/backend/optimizer/util/restrictinfo.c b/src/backend/optimizer/util/restrictinfo.c index ef8df3d098..6902c2a9d7 100644 --- a/src/backend/optimizer/util/restrictinfo.c +++ b/src/backend/optimizer/util/restrictinfo.c @@ -116,6 +116,7 @@ make_restrictinfo_internal(PlannerInfo *root, Relids nullable_relids) { RestrictInfo *restrictinfo = makeNode(RestrictInfo); + Relids baserels; restrictinfo->clause = clause; restrictinfo->orclause = orclause; @@ -187,6 +188,20 @@ make_restrictinfo_internal(PlannerInfo *root, else restrictinfo->required_relids = restrictinfo->clause_relids; + /* + * Count the number of base rels appearing in clause_relids. To do this, + * we just delete rels mentioned in root->outer_join_rels and count the + * survivors. Because we are called during deconstruct_jointree which is + * the same tree walk that populates outer_join_rels, this is a little bit + * unsafe-looking; but it should be fine because the recursion in + * deconstruct_jointree should already have visited any outer join that + * could be mentioned in this clause. + */ + baserels = bms_difference(restrictinfo->clause_relids, + root->outer_join_rels); + restrictinfo->num_base_rels = bms_num_members(baserels); + bms_free(baserels); + /* * Fill in all the cacheable fields with "not yet set" markers. None of * these will be computed until/unless needed. Note in particular that we diff --git a/src/backend/optimizer/util/var.c b/src/backend/optimizer/util/var.c index ebc6ce84b0..99126958db 100644 --- a/src/backend/optimizer/util/var.c +++ b/src/backend/optimizer/util/var.c @@ -88,6 +88,9 @@ static Relids alias_relid_set(Query *query, Relids relids); * Create a set of all the distinct varnos present in a parsetree. * Only varnos that reference level-zero rtable entries are considered. * + * The result includes outer-join relids mentioned in Var.varnullingrels and + * PlaceHolderVar.phnullingrels fields in the parsetree. + * * "root" can be passed as NULL if it is not necessary to process * PlaceHolderVars. * @@ -153,7 +156,11 @@ pull_varnos_walker(Node *node, pull_varnos_context *context) Var *var = (Var *) node; if (var->varlevelsup == context->sublevels_up) + { context->varnos = bms_add_member(context->varnos, var->varno); + context->varnos = bms_add_members(context->varnos, + var->varnullingrels); + } return false; } if (IsA(node, CurrentOfExpr)) @@ -251,6 +258,14 @@ pull_varnos_walker(Node *node, pull_varnos_context *context) context->varnos = bms_join(context->varnos, newevalat); } + + /* + * In all three cases, include phnullingrels in the result. We + * don't worry about possibly needing to translate it, because + * appendrels only translate varnos of baserels, not outer joins. + */ + context->varnos = bms_add_members(context->varnos, + phv->phnullingrels); return false; /* don't recurse into expression */ } } diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c index fa1f589fad..e4033b1572 100644 --- a/src/backend/utils/adt/selfuncs.c +++ b/src/backend/utils/adt/selfuncs.c @@ -2204,7 +2204,7 @@ rowcomparesel(PlannerInfo *root, else { /* - * Otherwise, it's a join if there's more than one relation used. + * Otherwise, it's a join if there's more than one base relation used. */ is_join_clause = (NumRelids(root, (Node *) opargs) > 1); } diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h index 369ddfbddc..522c432841 100644 --- a/src/include/nodes/pathnodes.h +++ b/src/include/nodes/pathnodes.h @@ -240,13 +240,26 @@ struct PlannerInfo struct AppendRelInfo **append_rel_array pg_node_attr(read_write_ignore); /* - * all_baserels is a Relids set of all base relids (but not "other" - * relids) in the query; that is, the Relids identifier of the final join - * we need to form. This is computed in make_one_rel, just before we - * start making Paths. + * all_baserels is a Relids set of all base relids (but not joins or + * "other" relids) in the query. This is computed in make_one_rel, just + * before we start making Paths. */ Relids all_baserels; + /* + * outer_join_rels is a Relids set of all outer-join relids in the query. + * This is computed in deconstruct_jointree. + */ + Relids outer_join_rels; + + /* + * all_query_rels is a Relids set of all base relids and outer join relids + * (but not "other" relids) in the query. This is the Relids identifier + * of the final join we need to form. This is computed in make_one_rel, + * just before we start making Paths. + */ + Relids all_query_rels; + /* * nullable_baserels is a Relids set of base relids that are nullable by * some outer join in the jointree; these are rels that are potentially @@ -313,7 +326,7 @@ struct PlannerInfo List *right_join_clauses; /* - * list of RestrictInfos for mergejoinable full join clauses + * list of FullJoinClauseInfos for mergejoinable full join clauses */ List *full_join_clauses; @@ -532,9 +545,10 @@ typedef struct PartitionSchemeData *PartitionScheme; * or the output of a sub-SELECT or function that appears in the range table. * In either case it is uniquely identified by an RT index. A "joinrel" * is the joining of two or more base rels. A joinrel is identified by - * the set of RT indexes for its component baserels. We create RelOptInfo - * nodes for each baserel and joinrel, and store them in the PlannerInfo's - * simple_rel_array and join_rel_list respectively. + * the set of RT indexes for its component baserels, along with RT indexes + * for any outer joins it has computed. We create RelOptInfo nodes for each + * baserel and joinrel, and store them in the PlannerInfo's simple_rel_array + * and join_rel_list respectively. * * Note that there is only one joinrel for any given set of component * baserels, no matter what order we assemble them in; so an unordered @@ -573,8 +587,10 @@ typedef struct PartitionSchemeData *PartitionScheme; * Parts of this data structure are specific to various scan and join * mechanisms. It didn't seem worth creating new node types for them. * - * relids - Set of base-relation identifiers; it is a base relation - * if there is just one, a join relation if more than one + * relids - Set of relation identifiers (RT indexes). This is a base + * relation if there is just one, a join relation if more; + * in the join case, RT indexes of any outer joins formed + * at or below this join are included along with baserels * rows - estimated number of tuples in the relation after restriction * clauses have been applied (ie, output rows of a plan for it) * consider_startup - true if there is any value in keeping plain paths for @@ -786,7 +802,7 @@ typedef struct RelOptInfo RelOptKind reloptkind; /* - * all relations included in this RelOptInfo; set of base relids + * all relations included in this RelOptInfo; set of base + OJ relids * (rangetable indexes) */ Relids relids; @@ -2269,17 +2285,17 @@ typedef struct LimitPath * If a restriction clause references a single base relation, it will appear * in the baserestrictinfo list of the RelOptInfo for that base rel. * - * If a restriction clause references more than one base rel, it will + * If a restriction clause references more than one base+OJ relation, it will * appear in the joininfo list of every RelOptInfo that describes a strict - * subset of the base rels mentioned in the clause. The joininfo lists are + * subset of the relations mentioned in the clause. The joininfo lists are * used to drive join tree building by selecting plausible join candidates. * The clause cannot actually be applied until we have built a join rel - * containing all the base rels it references, however. + * containing all the relations it references, however. * - * When we construct a join rel that includes all the base rels referenced + * When we construct a join rel that includes all the relations referenced * in a multi-relation restriction clause, we place that clause into the * joinrestrictinfo lists of paths for the join rel, if neither left nor - * right sub-path includes all base rels referenced in the clause. The clause + * right sub-path includes all relations referenced in the clause. The clause * will be applied at that join level, and will not propagate any further up * the join tree. (Note: the "predicate migration" code was once intended to * push restriction clauses up and down the plan tree based on evaluation @@ -2300,12 +2316,15 @@ typedef struct LimitPath * or join to enforce that all members of each EquivalenceClass are in fact * equal in all rows emitted by the scan or join. * - * When dealing with outer joins we have to be very careful about pushing qual - * clauses up and down the tree. An outer join's own JOIN/ON conditions must - * be evaluated exactly at that join node, unless they are "degenerate" - * conditions that reference only Vars from the nullable side of the join. - * Quals appearing in WHERE or in a JOIN above the outer join cannot be pushed - * down below the outer join, if they reference any nullable Vars. + * The clause_relids field lists the base plus outer-join RT indexes that + * actually appear in the clause. required_relids lists the minimum set of + * relids needed to evaluate the clause; while this is often equal to + * clause_relids, it can be more. We will add relids to required_relids when + * we need to force an outer join ON clause to be evaluated exactly at the + * level of the outer join, which is true except when it is a "degenerate" + * condition that references only Vars from the nullable side of the join. + * + * XXX rewrite or remove me: * RestrictInfo nodes contain a flag to indicate whether a qual has been * pushed down to a lower level than its original syntactic placement in the * join tree would suggest. If an outer join prevents us from pushing a qual @@ -2427,13 +2446,16 @@ typedef struct RestrictInfo /* true if known to contain no leaked Vars */ bool leakproof pg_node_attr(equal_ignore); - /* to indicate if clause contains any volatile functions. */ + /* indicates if clause contains any volatile functions */ VolatileFunctionStatus has_volatile pg_node_attr(equal_ignore); /* see comment above */ Index security_level; - /* The set of relids (varnos) actually referenced in the clause: */ + /* number of base rels in clause_relids */ + int num_base_rels pg_node_attr(equal_ignore); + + /* The relids (varnos+varnullingrels) actually referenced in the clause: */ Relids clause_relids pg_node_attr(equal_ignore); /* The set of relids required to evaluate the clause: */ @@ -2535,6 +2557,7 @@ typedef struct RestrictInfo } RestrictInfo; /* + * XXX this will need work: * This macro embodies the correct way to test whether a RestrictInfo is * "pushed down" to a given outer join, that is, should be treated as a filter * clause rather than a join clause at that outer join. This is certainly so @@ -2637,17 +2660,20 @@ typedef struct PlaceHolderVar * We make SpecialJoinInfos for FULL JOINs even though there is no flexibility * of planning for them, because this simplifies make_join_rel()'s API. * - * min_lefthand and min_righthand are the sets of base relids that must be - * available on each side when performing the special join. lhs_strict is - * true if the special join's condition cannot succeed when the LHS variables - * are all NULL (this means that an outer join can commute with upper-level + * min_lefthand and min_righthand are the sets of base+OJ relids that must be + * available on each side when performing the special join. + * + * strict_relids is the set of base+OJ relids for which the special join's + * condition is strict, ie it cannot succeed if any of those rels produce + * an all-NULL row. lhs_strict reports whether any LHS rels appear in + * strict_relids (this means that an outer join can commute with upper-level * outer joins even if it appears in their RHS). We don't bother to set - * lhs_strict for FULL JOINs, however. + * strict_relids or lhs_strict for FULL JOINs, however. * * It is not valid for either min_lefthand or min_righthand to be empty sets; * if they were, this would break the logic that enforces join order. * - * syn_lefthand and syn_righthand are the sets of base relids that are + * syn_lefthand and syn_righthand are the sets of base+OJ relids that are * syntactically below this special join. (These are needed to help compute * min_lefthand and min_righthand for higher joins.) * @@ -2669,14 +2695,18 @@ typedef struct PlaceHolderVar * the inputs to make it a LEFT JOIN. So the allowed values of jointype * in a join_info_list member are only LEFT, FULL, SEMI, or ANTI. * + * ojrelid is the RT index of the join RTE representing this outer join, + * if there is one. It is zero when jointype is INNER or SEMI. + * * For purposes of join selectivity estimation, we create transient * SpecialJoinInfo structures for regular inner joins; so it is possible * to have jointype == JOIN_INNER in such a structure, even though this is * not allowed within join_info_list. We also create transient * SpecialJoinInfos with jointype == JOIN_INNER for outer joins, since for * cost estimation purposes it is sometimes useful to know the join size under - * plain innerjoin semantics. Note that lhs_strict, delay_upper_joins, and - * of course the semi_xxx fields are not set meaningfully within such structs. + * plain innerjoin semantics. Note that strict_relids, lhs_strict, + * delay_upper_joins, and of course the semi_xxx fields are not set + * meaningfully within such structs. */ #ifndef HAVE_SPECIALJOININFO_TYPEDEF typedef struct SpecialJoinInfo SpecialJoinInfo; @@ -2688,11 +2718,13 @@ struct SpecialJoinInfo pg_node_attr(no_read) NodeTag type; - Relids min_lefthand; /* base relids in minimum LHS for join */ - Relids min_righthand; /* base relids in minimum RHS for join */ - Relids syn_lefthand; /* base relids syntactically within LHS */ - Relids syn_righthand; /* base relids syntactically within RHS */ + Relids min_lefthand; /* base+OJ relids in minimum LHS for join */ + Relids min_righthand; /* base+OJ relids in minimum RHS for join */ + Relids syn_lefthand; /* base+OJ relids syntactically within LHS */ + Relids syn_righthand; /* base+OJ relids syntactically within RHS */ JoinType jointype; /* always INNER, LEFT, FULL, SEMI, or ANTI */ + Index ojrelid; /* outer join's RT index; 0 if none */ + Relids strict_relids; /* joinclause is strict for these relids */ bool lhs_strict; /* joinclause is strict for some LHS rel */ bool delay_upper_joins; /* can't commute with upper RHS */ /* Remaining fields are set only for JOIN_SEMI jointype: */ @@ -2702,6 +2734,21 @@ struct SpecialJoinInfo List *semi_rhs_exprs; /* righthand-side expressions of these ops */ }; +/* + * FULL JOIN clause info. + * + * We set aside every FULL JOIN ON clause that looks mergejoinable, and + * process it specially at the end of qual distribution. + */ +typedef struct FullJoinClauseInfo +{ + pg_node_attr(no_copy_equal, no_read) + + NodeTag type; + RestrictInfo *rinfo; /* a mergejoinable FULL JOIN clause */ + SpecialJoinInfo *sjinfo; /* the FULL JOIN's SpecialJoinInfo */ +} FullJoinClauseInfo; + /* * Append-relation info. * diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h index 635cc0a0a6..1255641176 100644 --- a/src/include/optimizer/pathnode.h +++ b/src/include/optimizer/pathnode.h @@ -301,6 +301,7 @@ extern void expand_planner_arrays(PlannerInfo *root, int add_size); extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent); extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid); +extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid); extern RelOptInfo *find_join_rel(PlannerInfo *root, Relids relids); extern RelOptInfo *build_join_rel(PlannerInfo *root, Relids joinrelids, @@ -332,6 +333,6 @@ extern ParamPathInfo *find_param_path_info(RelOptInfo *rel, extern RelOptInfo *build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel, RelOptInfo *inner_rel, RelOptInfo *parent_joinrel, List *restrictlist, - SpecialJoinInfo *sjinfo, JoinType jointype); + SpecialJoinInfo *sjinfo); #endif /* PATHNODE_H */ diff --git a/src/include/optimizer/placeholder.h b/src/include/optimizer/placeholder.h index 39803ea41f..34b118a5c9 100644 --- a/src/include/optimizer/placeholder.h +++ b/src/include/optimizer/placeholder.h @@ -27,6 +27,7 @@ extern void update_placeholder_eval_levels(PlannerInfo *root, extern void fix_placeholder_input_needed_levels(PlannerInfo *root); extern void add_placeholders_to_base_rels(PlannerInfo *root); extern void add_placeholders_to_joinrel(PlannerInfo *root, RelOptInfo *joinrel, - RelOptInfo *outer_rel, RelOptInfo *inner_rel); + RelOptInfo *outer_rel, RelOptInfo *inner_rel, + SpecialJoinInfo *sjinfo); #endif /* PLACEHOLDER_H */ diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h index 2b11ff1d1f..ca03f32174 100644 --- a/src/include/optimizer/prep.h +++ b/src/include/optimizer/prep.h @@ -29,7 +29,8 @@ extern void pull_up_subqueries(PlannerInfo *root); extern void flatten_simple_union_all(PlannerInfo *root); extern void reduce_outer_joins(PlannerInfo *root); extern void remove_useless_result_rtes(PlannerInfo *root); -extern Relids get_relids_in_jointree(Node *jtnode, bool include_joins); +extern Relids get_relids_in_jointree(Node *jtnode, bool include_outer_joins, + bool include_inner_joins); extern Relids get_relids_for_join(Query *query, int joinrelid); /* commit b6c973fd890a5d7df7ffcc043e6a5defc269a303 Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Sun Jul 10 15:02:41 2022 -0400 Fix flatten_join_alias_vars() to handle varnullingrels correctly. The remaining core regression test failures occur because flatten_join_alias_vars() isn't doing the right thing. The alias Var it needs to replace may have acquired varnullingrels bits signifying the effect of upper outer joins, and if so we must preserve that information in the replacement expression. The simplest way to do that is to wrap the replacement expression in a PlaceHolderVar, and that's what we have to do in the general case where subquery pullup has mutated the replacement joinaliasvars entry into an arbitrary expression. But in simpler cases, such as where the joinaliasvars entry is just a Var, we'd prefer to do it by merging the alias Var's varnullingrels into the replacement Var. In that way the flattened alias will compare equal() to semantically equivalent references that didn't use the alias name. Moreover, the parser also uses this code while checking certain semantic constraints, and in that context we *must not* generate PlaceHolderVars. PHVs shouldn't appear in parse-time expressions, and adding one would certainly cause the parser to decide the query is invalid (because the result wouldn't compare equal() to what it needs to). Fortunately, during parsing the set of possible contents of a joinaliasvars entry is quite constrained, so we can guarantee to apply the nullingrels info to the Vars therein. The result of this step passes all core regression tests, but there are still loose ends for FDWs (so that contrib/postgres_fdw will fail). diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index ea077cff4e..127f06fb04 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -898,7 +898,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse, */ if (rte->lateral && root->hasJoinRTEs) rte->subquery = (Query *) - flatten_join_alias_vars(root->parse, + flatten_join_alias_vars(root, root->parse, (Node *) rte->subquery); } else if (rte->rtekind == RTE_FUNCTION) @@ -1099,7 +1099,7 @@ preprocess_expression(PlannerInfo *root, Node *expr, int kind) kind == EXPRKIND_VALUES || kind == EXPRKIND_TABLESAMPLE || kind == EXPRKIND_TABLEFUNC)) - expr = flatten_join_alias_vars(root->parse, expr); + expr = flatten_join_alias_vars(root, root->parse, expr); /* * Simplify constant expressions. For function RTEs, this was already diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c index 389f7d9ce7..7b22254173 100644 --- a/src/backend/optimizer/prep/prepjointree.c +++ b/src/backend/optimizer/prep/prepjointree.c @@ -1076,7 +1076,8 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte, * maybe even in the rewriter; but for now let's just fix this case here.) */ subquery->targetList = (List *) - flatten_join_alias_vars(subroot->parse, (Node *) subquery->targetList); + flatten_join_alias_vars(subroot, subroot->parse, + (Node *) subquery->targetList); /* * Adjust level-0 varnos in subquery so that we can append its rangetable diff --git a/src/backend/optimizer/util/var.c b/src/backend/optimizer/util/var.c index 99126958db..81fc3002fe 100644 --- a/src/backend/optimizer/util/var.c +++ b/src/backend/optimizer/util/var.c @@ -62,6 +62,7 @@ typedef struct typedef struct { + PlannerInfo *root; /* could be NULL! */ Query *query; /* outer Query */ int sublevels_up; bool possible_sublink; /* could aliases include a SubLink? */ @@ -80,6 +81,10 @@ static bool pull_var_clause_walker(Node *node, pull_var_clause_context *context); static Node *flatten_join_alias_vars_mutator(Node *node, flatten_join_alias_vars_context *context); +static Node *add_nullingrels_if_needed(PlannerInfo *root, Node *newnode, + Var *oldvar); +static bool is_standard_join_alias_expression(Node *newnode, Var *oldvar); +static void adjust_standard_join_alias_expression(Node *newnode, Var *oldvar); static Relids alias_relid_set(Query *query, Relids relids); @@ -729,26 +734,42 @@ pull_var_clause_walker(Node *node, pull_var_clause_context *context) * is the only way that the executor can directly handle whole-row Vars. * * This also adjusts relid sets found in some expression node types to - * substitute the contained base rels for any join relid. + * substitute the contained base+OJ rels for any join relid. * * If a JOIN contains sub-selects that have been flattened, its join alias * entries might now be arbitrary expressions, not just Vars. This affects - * this function in one important way: we might find ourselves inserting - * SubLink expressions into subqueries, and we must make sure that their - * Query.hasSubLinks fields get set to true if so. If there are any + * this function in two important ways. First, we might find ourselves + * inserting SubLink expressions into subqueries, and we must make sure that + * their Query.hasSubLinks fields get set to true if so. If there are any * SubLinks in the join alias lists, the outer Query should already have * hasSubLinks = true, so this is only relevant to un-flattened subqueries. + * Second, we have to preserve any varnullingrels info attached to the + * alias Vars we're replacing. If the replacement expression is a Var or + * PlaceHolderVar or constructed from those, we can just add the + * varnullingrels bits to the existing nullingrels field(s); otherwise + * we have to add a PlaceHolderVar wrapper. * - * NOTE: this is used on not-yet-planned expressions. We do not expect it - * to be applied directly to the whole Query, so if we see a Query to start - * with, we do want to increment sublevels_up (this occurs for LATERAL - * subqueries). + * NOTE: this is also used by the parser, to expand join alias Vars before + * checking GROUP BY validity. For that use-case, root will be NULL, which + * is why we have to pass the Query separately. We need the root itself only + * for making PlaceHolderVars. We can avoid making PlaceHolderVars in the + * parser's usage because it won't be dealing with arbitrary expressions: + * so long as adjust_standard_join_alias_expression can handle everything + * the parser would make as a join alias expression, we're OK. */ Node * -flatten_join_alias_vars(Query *query, Node *node) +flatten_join_alias_vars(PlannerInfo *root, Query *query, Node *node) { flatten_join_alias_vars_context context; + /* + * We do not expect this to be applied to the whole Query, only to + * expressions or LATERAL subqueries. Hence, if the top node is a Query, + * it's okay to immediately increment sublevels_up. + */ + Assert(node != (Node *) query); + + context.root = root; context.query = query; context.sublevels_up = 0; /* flag whether join aliases could possibly contain SubLinks */ @@ -819,7 +840,9 @@ flatten_join_alias_vars_mutator(Node *node, rowexpr->colnames = colnames; rowexpr->location = var->location; - return (Node *) rowexpr; + /* Lastly, add any varnullingrels to the replacement expression */ + return add_nullingrels_if_needed(context->root, (Node *) rowexpr, + var); } /* Expand join alias reference */ @@ -846,7 +869,8 @@ flatten_join_alias_vars_mutator(Node *node, if (context->possible_sublink && !context->inserted_sublink) context->inserted_sublink = checkExprHasSubLink(newvar); - return newvar; + /* Lastly, add any varnullingrels to the replacement expression */ + return add_nullingrels_if_needed(context->root, newvar, var); } if (IsA(node, PlaceHolderVar)) { @@ -861,6 +885,7 @@ flatten_join_alias_vars_mutator(Node *node, { phv->phrels = alias_relid_set(context->query, phv->phrels); + /* we *don't* change phnullingrels */ } return (Node *) phv; } @@ -894,9 +919,145 @@ flatten_join_alias_vars_mutator(Node *node, (void *) context); } +/* + * Add oldvar's varnullingrels, if any, to a flattened join alias expression. + * The newnode has been copied, so we can modify it freely. + */ +static Node * +add_nullingrels_if_needed(PlannerInfo *root, Node *newnode, Var *oldvar) +{ + if (oldvar->varnullingrels == NULL) + return newnode; /* nothing to do */ + /* If possible, do it by adding to existing nullingrel fields */ + if (is_standard_join_alias_expression(newnode, oldvar)) + adjust_standard_join_alias_expression(newnode, oldvar); + else if (root) + { + /* We can insert a PlaceHolderVar to carry the nullingrels */ + PlaceHolderVar *newphv; + Relids phrels = pull_varnos(root, newnode); + + /* XXX what if phrels is empty? */ + Assert(!bms_is_empty(phrels)); /* probably wrong */ + newphv = make_placeholder_expr(root, (Expr *) newnode, phrels); + /* newphv has zero phlevelsup and NULL phnullingrels; fix it */ + newphv->phlevelsup = oldvar->varlevelsup; + newphv->phnullingrels = bms_copy(oldvar->varnullingrels); + newnode = (Node *) newphv; + } + else + { + /* ooops, we're missing support for something the parser can make */ + elog(ERROR, "unsupported join alias expression"); + } + return newnode; +} + +/* + * Check to see if we can insert nullingrels into this join alias expression + * without use of a separate PlaceHolderVar. + * + * This will handle Vars, PlaceHolderVars, and implicit-coercion and COALESCE + * expressions built from those. This coverage needs to handle anything + * that the parser would put into joinaliasvars. + * XXX it's probably incomplete at the moment. + */ +static bool +is_standard_join_alias_expression(Node *newnode, Var *oldvar) +{ + if (newnode == NULL) + return false; + if (IsA(newnode, Var) && + ((Var *) newnode)->varlevelsup == oldvar->varlevelsup) + return true; + else if (IsA(newnode, PlaceHolderVar) && + ((PlaceHolderVar *) newnode)->phlevelsup == oldvar->varlevelsup) + return true; + else if (IsA(newnode, FuncExpr)) + { + FuncExpr *fexpr = (FuncExpr *) newnode; + + /* + * We need to assume that the function wouldn't produce non-NULL from + * NULL, which is reasonable for implicit coercions but otherwise not + * so much. (Looking at its strictness is likely overkill, and anyway + * it would cause us to fail if someone forgot to mark an implicit + * coercion as strict.) + */ + if (fexpr->funcformat != COERCE_IMPLICIT_CAST || + fexpr->args == NIL) + return false; + + /* + * Examine only the first argument --- coercions might have additional + * arguments that are constants. + */ + return is_standard_join_alias_expression(linitial(fexpr->args), oldvar); + } + else if (IsA(newnode, CoalesceExpr)) + { + CoalesceExpr *cexpr = (CoalesceExpr *) newnode; + ListCell *lc; + + Assert(cexpr->args != NIL); + foreach(lc, cexpr->args) + { + if (!is_standard_join_alias_expression(lfirst(lc), oldvar)) + return false; + } + return true; + } + else + return false; +} + +/* + * Insert nullingrels into an expression accepted by + * is_standard_join_alias_expression. + */ +static void +adjust_standard_join_alias_expression(Node *newnode, Var *oldvar) +{ + if (IsA(newnode, Var) && + ((Var *) newnode)->varlevelsup == oldvar->varlevelsup) + { + Var *newvar = (Var *) newnode; + + newvar->varnullingrels = bms_add_members(newvar->varnullingrels, + oldvar->varnullingrels); + } + else if (IsA(newnode, PlaceHolderVar) && + ((PlaceHolderVar *) newnode)->phlevelsup == oldvar->varlevelsup) + { + PlaceHolderVar *newphv = (PlaceHolderVar *) newnode; + + newphv->phnullingrels = bms_add_members(newphv->phnullingrels, + oldvar->varnullingrels); + } + else if (IsA(newnode, FuncExpr)) + { + FuncExpr *fexpr = (FuncExpr *) newnode; + + adjust_standard_join_alias_expression(linitial(fexpr->args), oldvar); + } + else if (IsA(newnode, CoalesceExpr)) + { + CoalesceExpr *cexpr = (CoalesceExpr *) newnode; + ListCell *lc; + + Assert(cexpr->args != NIL); + foreach(lc, cexpr->args) + { + adjust_standard_join_alias_expression(lfirst(lc), oldvar); + } + } + else + Assert(false); +} + /* * alias_relid_set: in a set of RT indexes, replace joins by their - * underlying base relids + * underlying base+OJ relids */ static Relids alias_relid_set(Query *query, Relids relids) diff --git a/src/backend/parser/parse_agg.c b/src/backend/parser/parse_agg.c index 3ef9e8ee5e..c15fab0f68 100644 --- a/src/backend/parser/parse_agg.c +++ b/src/backend/parser/parse_agg.c @@ -1162,7 +1162,7 @@ parseCheckAggregates(ParseState *pstate, Query *qry) * entries are RTE_JOIN kind. */ if (hasJoinRTEs) - groupClauses = (List *) flatten_join_alias_vars(qry, + groupClauses = (List *) flatten_join_alias_vars(NULL, qry, (Node *) groupClauses); /* @@ -1206,7 +1206,7 @@ parseCheckAggregates(ParseState *pstate, Query *qry) groupClauses, hasJoinRTEs, have_non_var_grouping); if (hasJoinRTEs) - clause = flatten_join_alias_vars(qry, clause); + clause = flatten_join_alias_vars(NULL, qry, clause); check_ungrouped_columns(clause, pstate, qry, groupClauses, groupClauseCommonVars, have_non_var_grouping, @@ -1217,7 +1217,7 @@ parseCheckAggregates(ParseState *pstate, Query *qry) groupClauses, hasJoinRTEs, have_non_var_grouping); if (hasJoinRTEs) - clause = flatten_join_alias_vars(qry, clause); + clause = flatten_join_alias_vars(NULL, qry, clause); check_ungrouped_columns(clause, pstate, qry, groupClauses, groupClauseCommonVars, have_non_var_grouping, @@ -1546,7 +1546,7 @@ finalize_grouping_exprs_walker(Node *node, Index ref = 0; if (context->hasJoinRTEs) - expr = flatten_join_alias_vars(context->qry, expr); + expr = flatten_join_alias_vars(NULL, context->qry, expr); /* * Each expression must match a grouping entry at the current diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h index 7be1e5906b..1f5e0b24ca 100644 --- a/src/include/optimizer/optimizer.h +++ b/src/include/optimizer/optimizer.h @@ -202,6 +202,6 @@ extern bool contain_var_clause(Node *node); extern bool contain_vars_of_level(Node *node, int levelsup); extern int locate_var_of_level(Node *node, int levelsup); extern List *pull_var_clause(Node *node, int flags); -extern Node *flatten_join_alias_vars(Query *query, Node *node); +extern Node *flatten_join_alias_vars(PlannerInfo *root, Query *query, Node *node); #endif /* OPTIMIZER_H */ commit cac202978c90883ee1a8139d919c3e1ece9264fa Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Sun Jul 10 15:27:35 2022 -0400 Teach FDWs about base-plus-outer-join relids. Conversion of the planner to include OJ relids in join relids affects FDWs that want to plan foreign joins. They *must* follow suit when labeling foreign joins in order to match with the core planner, but for many purposes (if postgres_fdw is any guide) they'd prefer to consider only base relations within the join. To support both requirements, redefine ForeignScan.fs_relids as base+OJ relids, and add a new field fs_base_relids that's set up by the core planner. Another way we could do this is to keep fs_relids as just base relids and make the new field be the one with OJ relids added. While that would be more backwards-compatible in some sense, it would be inconsistent with the naming used in the core planner, and I think that it might allow some types of bugs to escape immediate detection. postgres_fdw also has one place where it needs to ignore varnullingrels while matching Vars, similarly to the unfinished work in setrefs.c. (That requirement will only affect join-planning FDWs, too, since Vars seen at a base relation scan should never have any varnullingrels.) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index 8f4d8a5022..cfac1ea57b 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -3880,7 +3880,17 @@ get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, i = 1; foreach(lc, foreignrel->reltarget->exprs) { - if (equal(lfirst(lc), (Node *) node)) + Var *tlvar = (Var *) lfirst(lc); + + /* + * As in setrefs.c, we match only on varno/varattno. Ideally there + * would be some cross-check on varnullingrels, but it's unclear what + * to do exactly; we don't have enough context to know what that value + * should be. + */ + if (IsA(tlvar, Var) && + tlvar->varno == node->varno && + tlvar->varattno == node->varattno) { *colno = i; return; diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index 955a428e3d..d91c3622df 100644 --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -1513,13 +1513,13 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags) /* * Identify which user to do the remote access as. This should match what * ExecCheckRTEPerms() does. In case of a join or aggregate, use the - * lowest-numbered member RTE as a representative; we would get the same - * result from any. + * lowest-numbered member base RTE as a representative; we would get the + * same result from any. */ if (fsplan->scan.scanrelid > 0) rtindex = fsplan->scan.scanrelid; else - rtindex = bms_next_member(fsplan->fs_relids, -1); + rtindex = bms_next_member(fsplan->fs_base_relids, -1); rte = exec_rt_fetch(rtindex, estate); userid = rte->checkAsUser ? rte->checkAsUser : GetUserId(); @@ -2405,7 +2405,7 @@ find_modifytable_subplan(PlannerInfo *root, { ForeignScan *fscan = (ForeignScan *) subplan; - if (bms_is_member(rtindex, fscan->fs_relids)) + if (bms_is_member(rtindex, fscan->fs_base_relids)) return fscan; } @@ -2831,8 +2831,8 @@ postgresExplainForeignScan(ForeignScanState *node, ExplainState *es) * that setrefs.c won't update the string when flattening the * rangetable. To find out what rtoffset was applied, identify the * minimum RT index appearing in the string and compare it to the - * minimum member of plan->fs_relids. (We expect all the relids in - * the join will have been offset by the same amount; the Asserts + * minimum member of plan->fs_base_relids. (We expect all the relids + * in the join will have been offset by the same amount; the Asserts * below should catch it if that ever changes.) */ minrti = INT_MAX; @@ -2849,7 +2849,7 @@ postgresExplainForeignScan(ForeignScanState *node, ExplainState *es) else ptr++; } - rtoffset = bms_next_member(plan->fs_relids, -1) - minrti; + rtoffset = bms_next_member(plan->fs_base_relids, -1) - minrti; /* Now we can translate the string */ relations = makeStringInfo(); @@ -2864,7 +2864,7 @@ postgresExplainForeignScan(ForeignScanState *node, ExplainState *es) char *refname; rti += rtoffset; - Assert(bms_is_member(rti, plan->fs_relids)); + Assert(bms_is_member(rti, plan->fs_base_relids)); rte = rt_fetch(rti, es->rtable); Assert(rte->rtekind == RTE_RELATION); /* This logic should agree with explain.c's ExplainTargetRel */ diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml index d0b5951019..329affa30b 100644 --- a/doc/src/sgml/fdwhandler.sgml +++ b/doc/src/sgml/fdwhandler.sgml @@ -351,6 +351,17 @@ GetForeignJoinPaths(PlannerInfo *root, it will supply at run time in the tuples it returns. </para> + <note> + <para> + Beginning with <productname>PostgreSQL</productname> 16, + <structfield>fs_relids</structfield> includes the rangetable indexes + of outer joins, if any were involved in this join. The new field + <structfield>fs_base_relids</structfield> includes only base + relation indexes, and thus + mimics <structfield>fs_relids</structfield>'s old semantics. + </para> + </note> + <para> See <xref linkend="fdw-planning"/> for additional information. </para> diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index e29c2ae206..ded37cb2e9 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -1114,7 +1114,7 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used) break; case T_ForeignScan: *rels_used = bms_add_members(*rels_used, - ((ForeignScan *) plan)->fs_relids); + ((ForeignScan *) plan)->fs_base_relids); break; case T_CustomScan: *rels_used = bms_add_members(*rels_used, diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c index 043bb83f55..2b37266b6a 100644 --- a/src/backend/executor/execScan.c +++ b/src/backend/executor/execScan.c @@ -325,7 +325,7 @@ ExecScanReScan(ScanState *node) * all of them. */ if (IsA(node->ps.plan, ForeignScan)) - relids = ((ForeignScan *) node->ps.plan)->fs_relids; + relids = ((ForeignScan *) node->ps.plan)->fs_base_relids; else if (IsA(node->ps.plan, CustomScan)) relids = ((CustomScan *) node->ps.plan)->custom_relids; else diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c index 76606faa3e..16247dd9ce 100644 --- a/src/backend/optimizer/plan/createplan.c +++ b/src/backend/optimizer/plan/createplan.c @@ -29,6 +29,7 @@ #include "optimizer/cost.h" #include "optimizer/optimizer.h" #include "optimizer/paramassign.h" +#include "optimizer/pathnode.h" #include "optimizer/paths.h" #include "optimizer/placeholder.h" #include "optimizer/plancat.h" @@ -4106,6 +4107,8 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path, Index scan_relid = rel->relid; Oid rel_oid = InvalidOid; Plan *outer_plan = NULL; + Relids fs_base_relids; + int rtindex; Assert(rel->fdwroutine != NULL); @@ -4154,14 +4157,28 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path, /* * Likewise, copy the relids that are represented by this foreign scan. An - * upper rel doesn't have relids set, but it covers all the base relations - * participating in the underlying scan, so use root's all_baserels. + * upper rel doesn't have relids set, but it covers all the relations + * participating in the underlying scan/join, so use root->all_query_rels. */ if (rel->reloptkind == RELOPT_UPPER_REL) - scan_plan->fs_relids = root->all_baserels; + scan_plan->fs_relids = root->all_query_rels; else scan_plan->fs_relids = best_path->path.parent->relids; + /* + * Join relid sets include relevant outer joins, but FDWs may need to know + * which are the included base rels. That's a bit tedious to get without + * access to the plan-time data structures, so compute it here. + */ + fs_base_relids = NULL; + rtindex = -1; + while ((rtindex = bms_next_member(scan_plan->fs_relids, rtindex)) >= 0) + { + if (find_base_rel_ignore_join(root, rtindex) != NULL) + fs_base_relids = bms_add_member(fs_base_relids, rtindex); + } + scan_plan->fs_base_relids = fs_base_relids; + /* * If this is a foreign join, and to make it valid to push down we had to * assume that the current user is the same as some user explicitly named @@ -5806,8 +5823,9 @@ make_foreignscan(List *qptlist, node->fdw_private = fdw_private; node->fdw_scan_tlist = fdw_scan_tlist; node->fdw_recheck_quals = fdw_recheck_quals; - /* fs_relids will be filled in by create_foreignscan_plan */ + /* fs_relids, fs_base_relids will be filled by create_foreignscan_plan */ node->fs_relids = NULL; + node->fs_base_relids = NULL; /* fsSystemCol will be filled in by create_foreignscan_plan */ node->fsSystemCol = false; diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c index 14f5c0f897..3b720ec5d6 100644 --- a/src/backend/optimizer/plan/setrefs.c +++ b/src/backend/optimizer/plan/setrefs.c @@ -1533,6 +1533,7 @@ set_foreignscan_references(PlannerInfo *root, } fscan->fs_relids = offset_relid_set(fscan->fs_relids, rtoffset); + fscan->fs_base_relids = offset_relid_set(fscan->fs_base_relids, rtoffset); /* Adjust resultRelation if it's valid */ if (fscan->resultRelation > 0) diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h index dca2a21e7a..7e98d0b7a3 100644 --- a/src/include/nodes/plannodes.h +++ b/src/include/nodes/plannodes.h @@ -688,6 +688,7 @@ typedef struct WorkTableScan * When the plan node represents a foreign join, scan.scanrelid is zero and * fs_relids must be consulted to identify the join relation. (fs_relids * is valid for simple scans as well, but will always match scan.scanrelid.) + * fs_relids includes outer joins; fs_base_relids does not. * * If the FDW's PlanDirectModify() callback decides to repurpose a ForeignScan * node to perform the UPDATE or DELETE operation directly in the remote @@ -707,7 +708,8 @@ typedef struct ForeignScan List *fdw_private; /* private data for FDW */ List *fdw_scan_tlist; /* optional tlist describing scan tuple */ List *fdw_recheck_quals; /* original quals not in scan.plan.qual */ - Bitmapset *fs_relids; /* RTIs generated by this scan */ + Bitmapset *fs_relids; /* base+OJ RTIs generated by this scan */ + Bitmapset *fs_base_relids; /* base RTIs generated by this scan */ bool fsSystemCol; /* true if any "system column" is needed */ } ForeignScan;
pgsql-hackers by date: