Re: Allowing NOT IN to use ANTI joins - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Allowing NOT IN to use ANTI joins |
Date | |
Msg-id | 13766.1405037879@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Allowing NOT IN to use ANTI joins (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: Allowing NOT IN to use ANTI joins
|
List | pgsql-hackers |
David Rowley <dgrowleyml@gmail.com> writes: > Attached is the updated version of the patch. I spent some time fooling with this patch, cleaning up the join-alias issue as well as more-cosmetic things. However, while testing it I realized that the whole thing is based on a false premise: to equate a NOT IN with an antijoin, we'd have to prove not only that the inner side is non-nullable, but that the outer comparison values are too. Here's an example: regression=# create table zt1 (f1 int); CREATE TABLE regression=# insert into zt1 values(1); INSERT 0 1 regression=# insert into zt1 values(2); INSERT 0 1 regression=# insert into zt1 values(null); INSERT 0 1 regression=# create table zt2 (f1 int not null); CREATE TABLE regression=# insert into zt2 values(1); INSERT 0 1 With the patch in place, we get regression=# explain select * from zt1 where f1 not in (select f1 from zt2); QUERY PLAN ------------------------------------------------------------------- Hash Anti Join (cost=64.00..144.80 rows=1200 width=4) Hash Cond: (zt1.f1 = zt2.f1) -> Seq Scan on zt1 (cost=0.00..34.00 rows=2400 width=4) -> Hash (cost=34.00..34.00 rows=2400 width=4) -> Seq Scan on zt2 (cost=0.00..34.00 rows=2400 width=4) Planning time: 0.646 ms (6 rows) regression=# select * from zt1 where f1 not in (select f1 from zt2); f1 ---- 2 (2 rows) which is the wrong answer, as demonstrated by comparison to the result without optimization: regression=# alter table zt2 alter column f1 drop not null; ALTER TABLE regression=# explain select * from zt1 where f1 not in (select f1 from zt2); QUERY PLAN --------------------------------------------------------------- Seq Scan on zt1 (cost=40.00..80.00 rows=1200 width=4) Filter: (NOT (hashed SubPlan 1)) SubPlan 1 -> Seq Scan on zt2 (cost=0.00..34.00 rows=2400 width=4) Planning time: 0.163 ms (5 rows) regression=# select * from zt1 where f1 not in (select f1 from zt2); f1 ---- 2 (1 row) We could no doubt fix this by also insisting that the left-side vars be provably not null, but that's going to make the patch even slower and even less often applicable. I'm feeling discouraged about whether this is worth doing in this form. For the archives' sake, I attach an updated version with the fixes I'd applied so far. regards, tom lane diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c index 3e7dc85..4b44662 100644 *** a/src/backend/optimizer/plan/subselect.c --- b/src/backend/optimizer/plan/subselect.c *************** SS_process_ctes(PlannerInfo *root) *** 1195,1205 **** * If so, form a JoinExpr and return it. Return NULL if the SubLink cannot * be converted to a join. * ! * The only non-obvious input parameter is available_rels: this is the set ! * of query rels that can safely be referenced in the sublink expression. ! * (We must restrict this to avoid changing the semantics when a sublink ! * is present in an outer join's ON qual.) The conversion must fail if ! * the converted qual would reference any but these parent-query relids. * * On success, the returned JoinExpr has larg = NULL and rarg = the jointree * item representing the pulled-up subquery. The caller must set larg to --- 1195,1208 ---- * If so, form a JoinExpr and return it. Return NULL if the SubLink cannot * be converted to a join. * ! * If under_not is true, the caller actually found NOT (ANY SubLink), ! * so that what we must try to build is an ANTI not SEMI join. ! * ! * available_rels is the set of query rels that can safely be referenced ! * in the sublink expression. (We must restrict this to avoid changing the ! * semantics when a sublink is present in an outer join's ON qual.) ! * The conversion must fail if the converted qual would reference any but ! * these parent-query relids. * * On success, the returned JoinExpr has larg = NULL and rarg = the jointree * item representing the pulled-up subquery. The caller must set larg to *************** SS_process_ctes(PlannerInfo *root) *** 1222,1228 **** */ JoinExpr * convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, ! Relids available_rels) { JoinExpr *result; Query *parse = root->parse; --- 1225,1231 ---- */ JoinExpr * convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, ! bool under_not, Relids available_rels) { JoinExpr *result; Query *parse = root->parse; *************** convert_ANY_sublink_to_join(PlannerInfo *** 1237,1242 **** --- 1240,1254 ---- Assert(sublink->subLinkType == ANY_SUBLINK); /* + * Per SQL spec, NOT IN is not ordinarily equivalent to an anti-join, so + * that by default we have to fail when under_not. However, if we can + * prove that the sub-select's output columns are all certainly not NULL, + * then it's safe to convert NOT IN to an anti-join. + */ + if (under_not && !query_outputs_are_not_nullable(subselect)) + return NULL; + + /* * The sub-select must not refer to any Vars of the parent query. (Vars of * higher levels should be okay, though.) */ *************** convert_ANY_sublink_to_join(PlannerInfo *** 1302,1308 **** * And finally, build the JoinExpr node. */ result = makeNode(JoinExpr); ! result->jointype = JOIN_SEMI; result->isNatural = false; result->larg = NULL; /* caller must fill this in */ result->rarg = (Node *) rtr; --- 1314,1320 ---- * And finally, build the JoinExpr node. */ result = makeNode(JoinExpr); ! result->jointype = under_not ? JOIN_ANTI : JOIN_SEMI; result->isNatural = false; result->larg = NULL; /* caller must fill this in */ result->rarg = (Node *) rtr; *************** convert_ANY_sublink_to_join(PlannerInfo *** 1317,1325 **** /* * convert_EXISTS_sublink_to_join: try to convert an EXISTS SubLink to a join * ! * The API of this function is identical to convert_ANY_sublink_to_join's, ! * except that we also support the case where the caller has found NOT EXISTS, ! * so we need an additional input parameter "under_not". */ JoinExpr * convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink, --- 1329,1335 ---- /* * convert_EXISTS_sublink_to_join: try to convert an EXISTS SubLink to a join * ! * The API of this function is identical to convert_ANY_sublink_to_join's. */ JoinExpr * convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink, diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c index 9cb1378..3a116c7 100644 *** a/src/backend/optimizer/prep/prepjointree.c --- b/src/backend/optimizer/prep/prepjointree.c *************** pull_up_sublinks_qual_recurse(PlannerInf *** 334,340 **** /* Is it a convertible ANY or EXISTS clause? */ if (sublink->subLinkType == ANY_SUBLINK) { ! if ((j = convert_ANY_sublink_to_join(root, sublink, available_rels1)) != NULL) { /* Yes; insert the new join node into the join tree */ --- 334,340 ---- /* Is it a convertible ANY or EXISTS clause? */ if (sublink->subLinkType == ANY_SUBLINK) { ! if ((j = convert_ANY_sublink_to_join(root, sublink, false, available_rels1)) != NULL) { /* Yes; insert the new join node into the join tree */ *************** pull_up_sublinks_qual_recurse(PlannerInf *** 360,366 **** return NULL; } if (available_rels2 != NULL && ! (j = convert_ANY_sublink_to_join(root, sublink, available_rels2)) != NULL) { /* Yes; insert the new join node into the join tree */ --- 360,366 ---- return NULL; } if (available_rels2 != NULL && ! (j = convert_ANY_sublink_to_join(root, sublink, false, available_rels2)) != NULL) { /* Yes; insert the new join node into the join tree */ *************** pull_up_sublinks_qual_recurse(PlannerInf *** 445,458 **** } if (not_clause(node)) { ! /* If the immediate argument of NOT is EXISTS, try to convert */ SubLink *sublink = (SubLink *) get_notclausearg((Expr *) node); JoinExpr *j; Relids child_rels; if (sublink && IsA(sublink, SubLink)) { ! if (sublink->subLinkType == EXISTS_SUBLINK) { if ((j = convert_EXISTS_sublink_to_join(root, sublink, true, available_rels1)) != NULL) --- 445,512 ---- } if (not_clause(node)) { ! /* If the immediate argument of NOT is ANY or EXISTS, try to convert */ SubLink *sublink = (SubLink *) get_notclausearg((Expr *) node); JoinExpr *j; Relids child_rels; if (sublink && IsA(sublink, SubLink)) { ! if (sublink->subLinkType == ANY_SUBLINK) ! { ! if ((j = convert_ANY_sublink_to_join(root, sublink, true, ! available_rels1)) != NULL) ! { ! /* Yes; insert the new join node into the join tree */ ! j->larg = *jtlink1; ! *jtlink1 = (Node *) j; ! /* Recursively process pulled-up jointree nodes */ ! j->rarg = pull_up_sublinks_jointree_recurse(root, ! j->rarg, ! &child_rels); ! ! /* ! * Now recursively process the pulled-up quals. Because ! * we are underneath a NOT, we can't pull up sublinks that ! * reference the left-hand stuff, but it's still okay to ! * pull up sublinks referencing j->rarg. ! */ ! j->quals = pull_up_sublinks_qual_recurse(root, ! j->quals, ! &j->rarg, ! child_rels, ! NULL, NULL); ! /* Return NULL representing constant TRUE */ ! return NULL; ! } ! if (available_rels2 != NULL && ! (j = convert_ANY_sublink_to_join(root, sublink, true, ! available_rels2)) != NULL) ! { ! /* Yes; insert the new join node into the join tree */ ! j->larg = *jtlink2; ! *jtlink2 = (Node *) j; ! /* Recursively process pulled-up jointree nodes */ ! j->rarg = pull_up_sublinks_jointree_recurse(root, ! j->rarg, ! &child_rels); ! ! /* ! * Now recursively process the pulled-up quals. Because ! * we are underneath a NOT, we can't pull up sublinks that ! * reference the left-hand stuff, but it's still okay to ! * pull up sublinks referencing j->rarg. ! */ ! j->quals = pull_up_sublinks_qual_recurse(root, ! j->quals, ! &j->rarg, ! child_rels, ! NULL, NULL); ! /* Return NULL representing constant TRUE */ ! return NULL; ! } ! } ! else if (sublink->subLinkType == EXISTS_SUBLINK) { if ((j = convert_EXISTS_sublink_to_join(root, sublink, true, available_rels1)) != NULL) diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c index 19b5cf7..f8e3eaa 100644 *** a/src/backend/optimizer/util/clauses.c --- b/src/backend/optimizer/util/clauses.c *************** *** 40,45 **** --- 40,46 ---- #include "parser/parse_agg.h" #include "parser/parse_coerce.h" #include "parser/parse_func.h" + #include "parser/parsetree.h" #include "rewrite/rewriteManip.h" #include "tcop/tcopprot.h" #include "utils/acl.h" *************** static bool contain_nonstrict_functions_ *** 100,105 **** --- 101,108 ---- static bool contain_leaky_functions_walker(Node *node, void *context); static Relids find_nonnullable_rels_walker(Node *node, bool top_level); static List *find_nonnullable_vars_walker(Node *node, bool top_level); + static void find_innerjoined_rels(Node *jtnode, + Relids *innerjoined_rels, List **usable_quals); static bool is_strict_saop(ScalarArrayOpExpr *expr, bool falseOK); static Node *eval_const_expressions_mutator(Node *node, eval_const_expressions_context *context); *************** contain_leaky_functions_walker(Node *nod *** 1459,1464 **** --- 1462,1471 ---- context); } + /***************************************************************************** + * Nullability analysis + *****************************************************************************/ + /* * find_nonnullable_rels * Determine which base rels are forced nonnullable by given clause. *************** find_nonnullable_rels_walker(Node *node, *** 1685,1691 **** * but here we assume that the input is a Boolean expression, and wish to * see if NULL inputs will provably cause a FALSE-or-NULL result. We expect * the expression to have been AND/OR flattened and converted to implicit-AND ! * format. * * The result is a palloc'd List, but we have not copied the member Var nodes. * Also, we don't bother trying to eliminate duplicate entries. --- 1692,1698 ---- * but here we assume that the input is a Boolean expression, and wish to * see if NULL inputs will provably cause a FALSE-or-NULL result. We expect * the expression to have been AND/OR flattened and converted to implicit-AND ! * format (but the results are still good if it wasn't AND/OR flattened). * * The result is a palloc'd List, but we have not copied the member Var nodes. * Also, we don't bother trying to eliminate duplicate entries. *************** find_forced_null_var(Node *node) *** 1987,1992 **** --- 1994,2241 ---- } /* + * query_outputs_are_not_nullable + * Returns TRUE if the output values of the Query are certainly not NULL. + * All output columns must return non-NULL to answer TRUE. + * + * The reason this takes a Query, and not just an individual tlist expression, + * is so that we can make use of the query's WHERE/ON clauses to prove it does + * not return nulls. + * + * In current usage, the passed sub-Query hasn't yet been through any planner + * processing. This means that applying find_nonnullable_vars() to its WHERE + * clauses isn't really ideal: for lack of const-simplification, we might be + * unable to prove not-nullness in some cases where we could have proved it + * afterwards. However, we should not get any false positive results. + * + * Like the other forms of nullability analysis above, we can err on the + * side of conservatism: if we're not sure, it's okay to return FALSE. + */ + bool + query_outputs_are_not_nullable(Query *query) + { + PlannerInfo subroot; + Relids innerjoined_rels = NULL; + bool computed_innerjoined_rels = false; + List *usable_quals = NIL; + List *nonnullable_vars = NIL; + bool computed_nonnullable_vars = false; + ListCell *tl; + + /* + * If the query contains set operations, punt. The set ops themselves + * couldn't introduce nulls that weren't in their inputs, but the tlist + * present in the top-level query is just dummy and won't give us useful + * info. We could get an answer by recursing to examine each leaf query, + * but for the moment it doesn't seem worth the extra complication. + * + * Note that we needn't consider other top-level operators such as + * DISTINCT, GROUP BY, etc, as those will not introduce nulls either. + */ + if (query->setOperations) + return false; + + /* + * We need a PlannerInfo to pass to flatten_join_alias_vars. Fortunately, + * we can cons up an entirely dummy one, because only the "parse" link in + * the struct is used by flatten_join_alias_vars. + */ + MemSet(&subroot, 0, sizeof(subroot)); + subroot.parse = query; + + /* + * Examine each targetlist entry to prove that it can't produce NULL. + */ + foreach(tl, query->targetList) + { + TargetEntry *tle = (TargetEntry *) lfirst(tl); + Expr *expr = tle->expr; + + /* Resjunk columns can be ignored: they don't produce output values */ + if (tle->resjunk) + continue; + + /* + * For the most part we don't try to deal with anything more complex + * than Consts and Vars; but it seems worthwhile to look through + * binary relabelings, since we know those don't introduce nulls. + */ + while (expr && IsA(expr, RelabelType)) + expr = ((RelabelType *) expr)->arg; + + if (expr == NULL) /* paranoia */ + return false; + + if (IsA(expr, Const)) + { + /* Consts are easy: they're either null or not. */ + if (((Const *) expr)->constisnull) + return false; + } + else if (IsA(expr, Var)) + { + Var *var = (Var *) expr; + + /* Currently, we punt for any nonlocal Vars */ + if (var->varlevelsup != 0) + return false; + + /* + * Since the subquery hasn't yet been through expression + * preprocessing, we must apply flatten_join_alias_vars to the + * given Var, and to any Vars found by find_nonnullable_vars, to + * avoid being fooled by join aliases. If we get something other + * than a plain Var out of the substitution, punt. + */ + var = (Var *) flatten_join_alias_vars(&subroot, (Node *) var); + + if (!IsA(var, Var)) + return false; + Assert(var->varlevelsup == 0); + + /* + * We don't bother to compute innerjoined_rels and usable_quals + * until we've found a Var we must analyze. + */ + if (!computed_innerjoined_rels) + { + find_innerjoined_rels((Node *) query->jointree, + &innerjoined_rels, &usable_quals); + computed_innerjoined_rels = true; + } + + /* + * If Var is from a plain relation, and that relation is not on + * the nullable side of any outer join, and its column is marked + * NOT NULL according to the catalogs, it can't produce NULL. + */ + if (bms_is_member(var->varno, innerjoined_rels)) + { + RangeTblEntry *rte = rt_fetch(var->varno, query->rtable); + + if (rte->rtekind == RTE_RELATION && + get_attnotnull(rte->relid, var->varattno)) + continue; /* cannot produce NULL */ + } + + /* + * Even if that didn't work, we can conclude that the Var is not + * nullable if find_nonnullable_vars can find a "var IS NOT NULL" + * or similarly strict condition among the usable_quals. Compute + * the list of Vars having such quals if we didn't already. + */ + if (!computed_nonnullable_vars) + { + nonnullable_vars = find_nonnullable_vars((Node *) usable_quals); + nonnullable_vars = (List *) + flatten_join_alias_vars(&subroot, + (Node *) nonnullable_vars); + /* We don't bother removing any non-Vars from the result */ + computed_nonnullable_vars = true; + } + + if (!list_member(nonnullable_vars, var)) + return false; /* we failed to prove the Var non-null */ + } + else + { + /* Not a Const or Var; punt */ + return false; + } + } + + return true; /* query cannot emit NULLs */ + } + + /* + * find_innerjoined_rels + * Traverse jointree to locate non-outerjoined-rels and quals above them + * + * We fill innerjoined_rels with the relids of all rels that are not below + * the nullable side of any outer join (which would cause their Vars to be + * possibly NULL regardless of what's in the catalogs). In the same scan, + * we locate all WHERE and JOIN/ON quals that constrain these rels, and add + * them to the usable_quals list (forming a list with implicit-AND semantics). + * + * Top-level caller must initialize innerjoined_rels/usable_quals to NULL/NIL. + */ + static void + find_innerjoined_rels(Node *jtnode, + Relids *innerjoined_rels, List **usable_quals) + { + if (jtnode == NULL) + return; + if (IsA(jtnode, RangeTblRef)) + { + int varno = ((RangeTblRef *) jtnode)->rtindex; + + *innerjoined_rels = bms_add_member(*innerjoined_rels, varno); + } + else if (IsA(jtnode, FromExpr)) + { + FromExpr *f = (FromExpr *) jtnode; + ListCell *lc; + + /* All elements of the FROM list are allowable */ + foreach(lc, f->fromlist) + find_innerjoined_rels((Node *) lfirst(lc), + innerjoined_rels, usable_quals); + /* ... and its WHERE quals are too */ + if (f->quals) + *usable_quals = lappend(*usable_quals, f->quals); + } + else if (IsA(jtnode, JoinExpr)) + { + JoinExpr *j = (JoinExpr *) jtnode; + + switch (j->jointype) + { + case JOIN_INNER: + /* visit both children */ + find_innerjoined_rels(j->larg, + innerjoined_rels, usable_quals); + find_innerjoined_rels(j->rarg, + innerjoined_rels, usable_quals); + /* and grab the ON quals too */ + if (j->quals) + *usable_quals = lappend(*usable_quals, j->quals); + break; + + case JOIN_LEFT: + case JOIN_SEMI: + case JOIN_ANTI: + + /* + * Only the left input is possibly non-nullable; furthermore, + * the quals of this join don't constrain the left input. + * Note: we probably can't see SEMI or ANTI joins at this + * point, but if we do, we can treat them like LEFT joins. + */ + find_innerjoined_rels(j->larg, + innerjoined_rels, usable_quals); + break; + + case JOIN_RIGHT: + /* Reverse of the above case */ + find_innerjoined_rels(j->rarg, + innerjoined_rels, usable_quals); + break; + + case JOIN_FULL: + /* Neither side is non-nullable, so stop descending */ + break; + + default: + elog(ERROR, "unrecognized join type: %d", + (int) j->jointype); + } + } + else + elog(ERROR, "unrecognized node type: %d", + (int) nodeTag(jtnode)); + } + + /* * Can we treat a ScalarArrayOpExpr as strict? * * If "falseOK" is true, then a "false" result can be considered strict, diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c index 4b5ef99..1d581a8 100644 *** a/src/backend/utils/cache/lsyscache.c --- b/src/backend/utils/cache/lsyscache.c *************** get_atttypetypmodcoll(Oid relid, AttrNum *** 972,977 **** --- 972,1004 ---- ReleaseSysCache(tp); } + /* + * get_attnotnull + * + * Given the relation id and the attribute number, + * return the "attnotnull" field from the attribute relation. + */ + bool + get_attnotnull(Oid relid, AttrNumber attnum) + { + HeapTuple tp; + + tp = SearchSysCache2(ATTNUM, + ObjectIdGetDatum(relid), + Int16GetDatum(attnum)); + if (HeapTupleIsValid(tp)) + { + Form_pg_attribute att_tup = (Form_pg_attribute) GETSTRUCT(tp); + bool result; + + result = att_tup->attnotnull; + ReleaseSysCache(tp); + return result; + } + else + return false; + } + /* ---------- COLLATION CACHE ---------- */ /* diff --git a/src/include/optimizer/clauses.h b/src/include/optimizer/clauses.h index dd991b1..5b25d01 100644 *** a/src/include/optimizer/clauses.h --- b/src/include/optimizer/clauses.h *************** extern Relids find_nonnullable_rels(Node *** 69,74 **** --- 69,75 ---- extern List *find_nonnullable_vars(Node *clause); extern List *find_forced_null_vars(Node *clause); extern Var *find_forced_null_var(Node *clause); + extern bool query_outputs_are_not_nullable(Query *query); extern bool is_pseudo_constant_clause(Node *clause); extern bool is_pseudo_constant_clause_relids(Node *clause, Relids relids); diff --git a/src/include/optimizer/subselect.h b/src/include/optimizer/subselect.h index 5607e98..3e8bfe7 100644 *** a/src/include/optimizer/subselect.h --- b/src/include/optimizer/subselect.h *************** *** 18,23 **** --- 18,24 ---- extern void SS_process_ctes(PlannerInfo *root); extern JoinExpr *convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, + bool under_not, Relids available_rels); extern JoinExpr *convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink, diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h index f46460a..3ec200a 100644 *** a/src/include/utils/lsyscache.h --- b/src/include/utils/lsyscache.h *************** extern Oid get_atttype(Oid relid, AttrNu *** 70,75 **** --- 70,76 ---- extern int32 get_atttypmod(Oid relid, AttrNumber attnum); extern void get_atttypetypmodcoll(Oid relid, AttrNumber attnum, Oid *typid, int32 *typmod, Oid *collid); + extern bool get_attnotnull(Oid relid, AttrNumber attnum); extern char *get_collation_name(Oid colloid); extern char *get_constraint_name(Oid conoid); extern Oid get_opclass_family(Oid opclass); diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out index d85a717..b63f7ac 100644 *** a/src/test/regress/expected/subselect.out --- b/src/test/regress/expected/subselect.out *************** select nextval('ts1'); *** 803,805 **** --- 803,1103 ---- 11 (1 row) + -- + -- Check NOT IN performs an ANTI JOIN when NULLs are not possible + -- in the target list of the subquery. + -- + BEGIN; + CREATE TEMP TABLE a (id INT PRIMARY KEY); + CREATE TEMP TABLE b (x INT NOT NULL, y INT); + CREATE TEMP TABLE c (z INT NOT NULL); + -- ANTI JOIN. x is defined as NOT NULL + EXPLAIN (COSTS OFF) + SELECT * FROM a WHERE id NOT IN (SELECT x FROM b); + QUERY PLAN + ----------------------------------------- + Merge Anti Join + Merge Cond: (a.id = b.x) + -> Index Only Scan using a_pkey on a + -> Sort + Sort Key: b.x + -> Seq Scan on b + (6 rows) + + -- No ANTI JOIN, y can be NULL + EXPLAIN (COSTS OFF) + SELECT * FROM a WHERE id NOT IN (SELECT y FROM b); + QUERY PLAN + ------------------------------------ + Seq Scan on a + Filter: (NOT (hashed SubPlan 1)) + SubPlan 1 + -> Seq Scan on b + (4 rows) + + -- No ANTI JOIN, x is NOT NULL, but we don't know if + 1 will change that. + EXPLAIN (COSTS OFF) + SELECT * FROM a WHERE id NOT IN (SELECT x+1 FROM b); + QUERY PLAN + ------------------------------------ + Seq Scan on a + Filter: (NOT (hashed SubPlan 1)) + SubPlan 1 + -> Seq Scan on b + (4 rows) + + -- ANTI JOIN 1 is a Const that is not null. + EXPLAIN (COSTS OFF) + SELECT * FROM a WHERE id NOT IN (SELECT 1 FROM b); + QUERY PLAN + --------------------------- + Nested Loop Anti Join + Join Filter: (a.id = 1) + -> Seq Scan on a + -> Materialize + -> Seq Scan on b + (5 rows) + + -- No ANTI JOIN, results contain a NULL Const + EXPLAIN (COSTS OFF) + SELECT * FROM a WHERE id NOT IN (SELECT NULL::int FROM b); + QUERY PLAN + ------------------------------------ + Seq Scan on a + Filter: (NOT (hashed SubPlan 1)) + SubPlan 1 + -> Seq Scan on b + (4 rows) + + -- ANTI JOIN y = 1 means y can't be NULL + EXPLAIN (COSTS OFF) + SELECT * FROM a WHERE id NOT IN (SELECT y FROM b WHERE y = 1); + QUERY PLAN + ------------------------------- + Hash Anti Join + Hash Cond: (a.id = b.y) + -> Seq Scan on a + -> Hash + -> Seq Scan on b + Filter: (y = 1) + (6 rows) + + -- No ANTI JOIN, OR condition does not ensure y = 1 + EXPLAIN (COSTS OFF) + SELECT * FROM a WHERE id NOT IN (SELECT y FROM b WHERE y = 1 OR x = 1); + QUERY PLAN + ---------------------------------------- + Seq Scan on a + Filter: (NOT (hashed SubPlan 1)) + SubPlan 1 + -> Seq Scan on b + Filter: ((y = 1) OR (x = 1)) + (5 rows) + + -- No ANTI JOIN, OR condition does not ensure y = 1 or y = 2 + EXPLAIN (COSTS OFF) + SELECT * FROM a WHERE id NOT IN (SELECT y FROM b WHERE (y = 1 OR x = 1) AND (y = 2 OR x = 2)); + QUERY PLAN + ------------------------------------------------------------------- + Seq Scan on a + Filter: (NOT (hashed SubPlan 1)) + SubPlan 1 + -> Seq Scan on b + Filter: (((y = 1) OR (x = 1)) AND ((y = 2) OR (x = 2))) + (5 rows) + + -- ANTI JOIN y must be 2, so can't be NULL + EXPLAIN (COSTS OFF) + SELECT * FROM a WHERE id NOT IN (SELECT y FROM b WHERE (y = 1 OR x = 1) AND y = 2); + QUERY PLAN + ---------------------------------------------------------- + Hash Anti Join + Hash Cond: (a.id = b.y) + -> Seq Scan on a + -> Hash + -> Seq Scan on b + Filter: ((y = 2) AND ((y = 1) OR (x = 1))) + (6 rows) + + -- ANTI JOIN y can be 1 or 2, but can't be null. + EXPLAIN (COSTS OFF) + SELECT * FROM a WHERE id NOT IN (SELECT y FROM b WHERE (y = 1 OR y = 2)); + QUERY PLAN + -------------------------------------------- + Hash Anti Join + Hash Cond: (a.id = b.y) + -> Seq Scan on a + -> Hash + -> Seq Scan on b + Filter: ((y = 1) OR (y = 2)) + (6 rows) + + -- No ANTI JOIN c.z is from a left outer join so it can have nulls. + EXPLAIN (COSTS OFF) + SELECT * FROM a WHERE id NOT IN (SELECT c.z FROM b LEFT JOIN c ON b.x = c.z); + QUERY PLAN + ------------------------------------ + Seq Scan on a + Filter: (NOT (hashed SubPlan 1)) + SubPlan 1 + -> Merge Left Join + Merge Cond: (b.x = c.z) + -> Sort + Sort Key: b.x + -> Seq Scan on b + -> Sort + Sort Key: c.z + -> Seq Scan on c + (11 rows) + + -- ANTI JOIN, c.z is not from an outer join + EXPLAIN (COSTS OFF) + SELECT * FROM a WHERE id NOT IN (SELECT c.z FROM b RIGHT JOIN c ON b.x = c.z); + QUERY PLAN + ----------------------------------------------------- + Merge Left Join + Merge Cond: (c.z = b.x) + -> Sort + Sort Key: c.z + -> Merge Anti Join + Merge Cond: (a.id = c.z) + -> Index Only Scan using a_pkey on a + -> Sort + Sort Key: c.z + -> Seq Scan on c + -> Sort + Sort Key: b.x + -> Seq Scan on b + (13 rows) + + -- No ANTI JOIN, b.x is from an outer join + EXPLAIN (COSTS OFF) + SELECT * FROM a WHERE id NOT IN (SELECT b.x FROM b RIGHT JOIN c ON b.x = c.z); + QUERY PLAN + ------------------------------------ + Seq Scan on a + Filter: (NOT (hashed SubPlan 1)) + SubPlan 1 + -> Merge Right Join + Merge Cond: (b.x = c.z) + -> Sort + Sort Key: b.x + -> Seq Scan on b + -> Sort + Sort Key: c.z + -> Seq Scan on c + (11 rows) + + -- No ANTI JOIN, c.z is not from an outer join + EXPLAIN (COSTS OFF) + SELECT * FROM a WHERE id NOT IN (SELECT c.z FROM b FULL JOIN c ON b.x = c.z); + QUERY PLAN + ------------------------------------ + Seq Scan on a + Filter: (NOT (hashed SubPlan 1)) + SubPlan 1 + -> Merge Full Join + Merge Cond: (b.x = c.z) + -> Sort + Sort Key: b.x + -> Seq Scan on b + -> Sort + Sort Key: c.z + -> Seq Scan on c + (11 rows) + + -- No ANTI JOIN, b.x is from an outer join + EXPLAIN (COSTS OFF) + SELECT * FROM a WHERE id NOT IN (SELECT b.x FROM b FULL JOIN c ON b.x = c.z); + QUERY PLAN + ------------------------------------ + Seq Scan on a + Filter: (NOT (hashed SubPlan 1)) + SubPlan 1 + -> Merge Full Join + Merge Cond: (b.x = c.z) + -> Sort + Sort Key: b.x + -> Seq Scan on b + -> Sort + Sort Key: c.z + -> Seq Scan on c + (11 rows) + + -- ANTI JOIN, c.z is from an inner join and has a NOT NULL constraint. + EXPLAIN (COSTS OFF) + SELECT * FROM a WHERE id NOT IN (SELECT c.z FROM b INNER JOIN c ON b.x = c.z); + QUERY PLAN + ----------------------------------------- + Merge Anti Join + Merge Cond: (a.id = c.z) + -> Index Only Scan using a_pkey on a + -> Materialize + -> Merge Join + Merge Cond: (b.x = c.z) + -> Sort + Sort Key: b.x + -> Seq Scan on b + -> Sort + Sort Key: c.z + -> Seq Scan on c + (12 rows) + + -- ANTI JOIN, c.z must be 1 + EXPLAIN (COSTS OFF) + SELECT * FROM a WHERE id NOT IN (SELECT c.z FROM b LEFT JOIN c ON b.x = c.z WHERE c.z = 1); + QUERY PLAN + ------------------------------------------- + Hash Anti Join + Hash Cond: (a.id = c.z) + -> Seq Scan on a + -> Hash + -> Nested Loop + -> Seq Scan on c + Filter: (z = 1) + -> Materialize + -> Seq Scan on b + Filter: (x = 1) + (10 rows) + + -- ANTI JOIN, c.z can't be NULL + EXPLAIN (COSTS OFF) + SELECT * FROM a WHERE id NOT IN (SELECT c.z FROM b LEFT JOIN c ON b.x = c.z WHERE c.z IS NOT NULL); + QUERY PLAN + --------------------------------------------------- + Merge Anti Join + Merge Cond: (a.id = c.z) + -> Index Only Scan using a_pkey on a + -> Materialize + -> Merge Join + Merge Cond: (b.x = c.z) + -> Sort + Sort Key: b.x + -> Seq Scan on b + -> Sort + Sort Key: c.z + -> Seq Scan on c + Filter: (z IS NOT NULL) + (13 rows) + + ALTER TABLE c ADD COLUMN x INT; + -- ANTI JOIN, x cannot be NULL as b.x has a NOT NULL constraint + EXPLAIN (COSTS OFF) + SELECT * FROM a WHERE id NOT IN (SELECT x FROM b NATURAL JOIN c); + QUERY PLAN + ----------------------------------------- + Merge Anti Join + Merge Cond: (a.id = b.x) + -> Index Only Scan using a_pkey on a + -> Materialize + -> Merge Join + Merge Cond: (b.x = c.x) + -> Sort + Sort Key: b.x + -> Seq Scan on b + -> Sort + Sort Key: c.x + -> Seq Scan on c + (12 rows) + + ROLLBACK; diff --git a/src/test/regress/sql/subselect.sql b/src/test/regress/sql/subselect.sql index c3b4773..c3ca67f 100644 *** a/src/test/regress/sql/subselect.sql --- b/src/test/regress/sql/subselect.sql *************** select * from *** 444,446 **** --- 444,538 ---- order by 1; select nextval('ts1'); + + -- + -- Check NOT IN performs an ANTI JOIN when NULLs are not possible + -- in the target list of the subquery. + -- + + BEGIN; + + CREATE TEMP TABLE a (id INT PRIMARY KEY); + CREATE TEMP TABLE b (x INT NOT NULL, y INT); + CREATE TEMP TABLE c (z INT NOT NULL); + + -- ANTI JOIN. x is defined as NOT NULL + EXPLAIN (COSTS OFF) + SELECT * FROM a WHERE id NOT IN (SELECT x FROM b); + + -- No ANTI JOIN, y can be NULL + EXPLAIN (COSTS OFF) + SELECT * FROM a WHERE id NOT IN (SELECT y FROM b); + + -- No ANTI JOIN, x is NOT NULL, but we don't know if + 1 will change that. + EXPLAIN (COSTS OFF) + SELECT * FROM a WHERE id NOT IN (SELECT x+1 FROM b); + + -- ANTI JOIN 1 is a Const that is not null. + EXPLAIN (COSTS OFF) + SELECT * FROM a WHERE id NOT IN (SELECT 1 FROM b); + + -- No ANTI JOIN, results contain a NULL Const + EXPLAIN (COSTS OFF) + SELECT * FROM a WHERE id NOT IN (SELECT NULL::int FROM b); + + -- ANTI JOIN y = 1 means y can't be NULL + EXPLAIN (COSTS OFF) + SELECT * FROM a WHERE id NOT IN (SELECT y FROM b WHERE y = 1); + + -- No ANTI JOIN, OR condition does not ensure y = 1 + EXPLAIN (COSTS OFF) + SELECT * FROM a WHERE id NOT IN (SELECT y FROM b WHERE y = 1 OR x = 1); + + -- No ANTI JOIN, OR condition does not ensure y = 1 or y = 2 + EXPLAIN (COSTS OFF) + SELECT * FROM a WHERE id NOT IN (SELECT y FROM b WHERE (y = 1 OR x = 1) AND (y = 2 OR x = 2)); + + -- ANTI JOIN y must be 2, so can't be NULL + EXPLAIN (COSTS OFF) + SELECT * FROM a WHERE id NOT IN (SELECT y FROM b WHERE (y = 1 OR x = 1) AND y = 2); + + -- ANTI JOIN y can be 1 or 2, but can't be null. + EXPLAIN (COSTS OFF) + SELECT * FROM a WHERE id NOT IN (SELECT y FROM b WHERE (y = 1 OR y = 2)); + + -- No ANTI JOIN c.z is from a left outer join so it can have nulls. + EXPLAIN (COSTS OFF) + SELECT * FROM a WHERE id NOT IN (SELECT c.z FROM b LEFT JOIN c ON b.x = c.z); + + -- ANTI JOIN, c.z is not from an outer join + EXPLAIN (COSTS OFF) + SELECT * FROM a WHERE id NOT IN (SELECT c.z FROM b RIGHT JOIN c ON b.x = c.z); + + -- No ANTI JOIN, b.x is from an outer join + EXPLAIN (COSTS OFF) + SELECT * FROM a WHERE id NOT IN (SELECT b.x FROM b RIGHT JOIN c ON b.x = c.z); + + -- No ANTI JOIN, c.z is not from an outer join + EXPLAIN (COSTS OFF) + SELECT * FROM a WHERE id NOT IN (SELECT c.z FROM b FULL JOIN c ON b.x = c.z); + + -- No ANTI JOIN, b.x is from an outer join + EXPLAIN (COSTS OFF) + SELECT * FROM a WHERE id NOT IN (SELECT b.x FROM b FULL JOIN c ON b.x = c.z); + + -- ANTI JOIN, c.z is from an inner join and has a NOT NULL constraint. + EXPLAIN (COSTS OFF) + SELECT * FROM a WHERE id NOT IN (SELECT c.z FROM b INNER JOIN c ON b.x = c.z); + + -- ANTI JOIN, c.z must be 1 + EXPLAIN (COSTS OFF) + SELECT * FROM a WHERE id NOT IN (SELECT c.z FROM b LEFT JOIN c ON b.x = c.z WHERE c.z = 1); + + -- ANTI JOIN, c.z can't be NULL + EXPLAIN (COSTS OFF) + SELECT * FROM a WHERE id NOT IN (SELECT c.z FROM b LEFT JOIN c ON b.x = c.z WHERE c.z IS NOT NULL); + + ALTER TABLE c ADD COLUMN x INT; + + -- ANTI JOIN, x cannot be NULL as b.x has a NOT NULL constraint + EXPLAIN (COSTS OFF) + SELECT * FROM a WHERE id NOT IN (SELECT x FROM b NATURAL JOIN c); + + + ROLLBACK;
pgsql-hackers by date: