Re: UniqueKey v2 - Mailing list pgsql-hackers
From | Antonin Houska |
---|---|
Subject | Re: UniqueKey v2 |
Date | |
Msg-id | 7971.1713526758@antos Whole thread Raw |
In response to | Re: UniqueKey v2 (zhihuifan1213@163.com) |
Responses |
Re: UniqueKey v2
Re: UniqueKey v2 |
List | pgsql-hackers |
zhihuifan1213@163.com wrote: > Here is the v3, ... I'm trying to enhance the join removal functionality (will post my patch in a separate thread soon) and I consider your patch very helpful in this area. Following is my review. Attached are also some fixes and one enhancement: propagation of the unique keys (UK) from a subquery to the parent query (0004). (Note that 0001 is just your patch rebased). * I think that, before EC is considered suitable for an UK, its ec_opfamilies field needs to be checked. I try to do that in find_ec_position_matching_expr(), see 0004. * Set-returning functions (SRFs) can make a "distinct path" necessary even if the join output is unique. * RelOptInfo.notnullattrs My understanding is that this field contains the set attributes whose uniqueness is guaranteed by the unique key. They are acceptable because they are either 1) not allowed to be NULL due to NOT NULL constraint or 2) NULL value makes the containing row filtered out, so the row cannot break uniqueness of the output. Am I right? If so, I suggest to change the field name to something less generic, maybe 'uniquekey_attrs' or 'uniquekey_candidate_attrs', and adding a comment that more checks are needed before particular attribute can actually be used in UniqueKey. * add_uniquekey_for_uniqueindex() I'd appreciate an explanation why the "single-row UK" is created. I think the reason for unique_exprs==NIL is that a restriction clause VAR=CONST exists for each column of the unique index. Whether I'm right or not, a comment should state clearly what the reason is. * uniquekey_useful_for_merging() How does uniqueness relate to merge join? In README.uniquekey you seem to point out that a single row is always sorted, but I don't think this function is related to that fact. (Instead, I'd expect that pathkeys are added to all paths for a single-row relation, but I'm not sure you do that in the current version of the patch.) * is_uniquekey_useful_afterjoin() Now that my patch (0004) allows propagation of the unique keys from a subquery to the upper query, I was wondering if the UniqueKey structure needs the 'use_for_distinct field' I mean we should now propagate the unique keys to the parent query whether the subquery has DISTINCT clause or not. I noticed that the field is checked by is_uniquekey_useful_afterjoin(), so I changed the function to always returned true. However nothing changed in the output of regression tests (installcheck). Do you insist that the 'use_for_distinct' field is needed? * uniquekey_contains_multinulls() ** Instead of calling the function when trying to use the UK, how about checking the ECs when considering creation of the UK? If the tests fail, just don't create the UK. ** What does the 'multi' word in the function name mean? * relation_is_distinct_for() The function name is too similar to rel_is_distinct_for(). I think the name should indicate that you are checking the relation against a set of pathkeys. Maybe rel_is_distinct_for_pathkeys() (and remove 'distinct' from the argument name)? At the same time, it might be good to rename rel_is_distinct_for() to rel_is_distinct_for_clauses(). * uniquekey_contains_in() Shouldn't this be uniquekey_contained_in()? And likewise, shouldn't the comment be " ... if UniqueKey is contained in the list of EquivalenceClass" ? (In general, even though I'm not English native speaker, I think I see quite a few grammar issues, which often make reading of the comments/documentation a bit difficult.) * Combining the UKs IMO this is the most problematic part of the patch. You call populate_joinrel_uniquekeys() for the same join multiple times, each time with a different 'restrictlist', and you try to do two things at the same time: 1) combine the UKs of the input relations into the UKs of the join relation, 2) check if the join relation can be marked single-row. I think that both 1) and 2) should be independent from join order, and thus both computations should only take place once for given set of input relations. And I think they should be done separately: 1) Compute the join UKs As you admit in a comment in populate_joinrel_uniquekeys(), neither join method nor clauses should matter. So I think you only need to pick the "component UKs" (i.e. UKs of the input relations) which are usable above that join (i.e. neither the join itself nor any join below sets any column of the UK to NULL) and combine them. Of course one problem is that the number of combinations can grow exponentially as new relations are joined. I'm not sure it's necessary to combine the UKs (and to discard some of them) immediately. Instead, maybe we can keep lists of UKs only for base relations, and postpone picking the suitable UKs and combining them until we actually need to check the relation uniqueness. 2) Check if the join relation is single-row I in order to get rid of the dependency on 'restrictlist', I think you can use ECs. Consider a query from your regression tests: CREATE TABLE uk_t (id int primary key, a int not null, b int not null, c int, d int, e int); SELECT distinct t1.d FROM uk_t t1 JOIN uk_t t2 ON t1.e = t2.id and t1.id = 1; The idea here seems to be that no more than one row of t1 matches the query clauses. Therefore, if t2(id) is unique, the clause t1.e=t2.id ensures that no more than one row of t2 matches the query (because t1 cannot provide the clause with more than one input value of 'e'). And therefore, the join also produces at most one row. My theory is that relation is single-row if it has an UK such that each of its ECs meets at least one of the following conditions: a) contains a constant b) contains a column of a relation which has already been proven single-row. b) is referenced by an UK of a relation which has already been proven single-row. I think that in the example above, an EC {t1.e, t2.id} should exist. So when checking whether 't2' is single-row, the condition b) cam be ised: the UK of 't2' should reference the EC {t1.e, t2.id}, which in turn contains the column t1.e. And 't1' is unique because its EC meets the condition a). (Of course you need to check em_jdomain before you use particular EM.) Are you going to submit the patch to the first CF of PG 18? Please let me know if I can contribute to the effort by reviewing or writing some code. -- Antonin Houska Web: https://www.cybertec-postgresql.com From 9fe85dae62ae580f377771935ee399e7f46dc99b Mon Sep 17 00:00:00 2001 From: Antonin Houska <ah@cybertec.at> Date: Thu, 18 Apr 2024 14:43:02 +0200 Subject: [PATCH 1/5] Unique keys rebased. The original patch: https://www.postgresql.org/message-id/871qczm9oc.fsf%40163.com --- src/backend/nodes/list.c | 17 + src/backend/optimizer/path/Makefile | 3 +- src/backend/optimizer/path/README.uniquekey | 220 +++++++ src/backend/optimizer/path/allpaths.c | 2 + src/backend/optimizer/path/equivclass.c | 48 ++ src/backend/optimizer/path/joinrels.c | 2 + src/backend/optimizer/path/uniquekey.c | 654 ++++++++++++++++++++ src/backend/optimizer/plan/initsplan.c | 10 +- src/backend/optimizer/plan/planner.c | 33 + src/backend/optimizer/util/plancat.c | 10 + src/backend/optimizer/util/relnode.c | 2 + src/include/nodes/pathnodes.h | 19 + src/include/nodes/pg_list.h | 2 + src/include/optimizer/paths.h | 13 + src/test/regress/expected/join.out | 11 +- src/test/regress/expected/uniquekey.out | 268 ++++++++ src/test/regress/parallel_schedule | 2 +- src/test/regress/sql/uniquekey.sql | 104 ++++ 18 files changed, 1410 insertions(+), 10 deletions(-) create mode 100644 src/backend/optimizer/path/README.uniquekey create mode 100644 src/backend/optimizer/path/uniquekey.c create mode 100644 src/test/regress/expected/uniquekey.out create mode 100644 src/test/regress/sql/uniquekey.sql diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c index e2615ab105..20eb1267eb 100644 --- a/src/backend/nodes/list.c +++ b/src/backend/nodes/list.c @@ -695,6 +695,23 @@ list_member_ptr(const List *list, const void *datum) return false; } +/* + * Return index of the datum in list if found. otherwise return -1. + */ +int +list_member_ptr_pos(const List *list, const void *datum) +{ + ListCell *lc; + + foreach(lc, list) + { + if (lfirst(lc) == datum) + return foreach_current_index(lc); + } + + return -1; +} + /* * Return true iff the integer 'datum' is a member of the list. */ diff --git a/src/backend/optimizer/path/Makefile b/src/backend/optimizer/path/Makefile index 1e199ff66f..63cc1505d9 100644 --- a/src/backend/optimizer/path/Makefile +++ b/src/backend/optimizer/path/Makefile @@ -21,6 +21,7 @@ OBJS = \ joinpath.o \ joinrels.o \ pathkeys.o \ - tidpath.o + tidpath.o \ + uniquekey.o include $(top_srcdir)/src/backend/common.mk diff --git a/src/backend/optimizer/path/README.uniquekey b/src/backend/optimizer/path/README.uniquekey new file mode 100644 index 0000000000..be13b113b9 --- /dev/null +++ b/src/backend/optimizer/path/README.uniquekey @@ -0,0 +1,220 @@ +Here is a design document and a part of implementation. + +What is UniqueKey? +----------------- + +UniqueKey represents a uniqueness information for a RelOptInfo. for +example: + +SELECT id FROM t; + +where the ID is the UniqueKey for the RelOptInfo (t). In the real world, +it has the following attributes: + +1). It should be EquivalenceClass based. for example: + +SELECT a FROM t WHERE a = id; + +In this case, the UniqueKey should be 1 EC with two members +- EC(EM(a), EM(id)). + + +2). Each UniqueKey may be made up with 1+ EquivalenceClass. for example: + +CREATE TABLE t(a int not null, b int not null); +CREATE UNIQUE INDEX on t(a, b); +SELECT * FROM t; + +Where the UniqueKey for RelOptInfo (t) will be 2 ECs with each one has 1 +member. + +- EC(em=a), EC(em=b) + +3). Each RelOptInfo may have 1+ UniqueKeys. + +CREATE TABLE t(a int not null, b int not null, c int not null); +CREATE UNIQUE INDEX on t(a, b); +CREATE UNIQUE INDEX on t(c); + +SELECT * FROM t; + +Where the UniqueKey for RelOptInfo (t) will be +- [EC(em=a), EC(em=b)]. +- [EC(em=c)] + +4). A special case is about the one-row case. It works like: +SELECT * FROM t WHERE id = 1; +Here every single expression in the RelOptInfo (t) is unique since only +one row here. + +Where can we use it? +-------------------- +1. mark the distinct as no-op. SELECT DISTINCT uniquekey FROM v; This + optimization has been required several times in our threads. + +2. Figure out more pathkey within the onerow case, then some planning + time can be reduced to be big extend. This user case is not absurd, + a real user case like this: + + CREATE TABLE small_t (id int primary key, b int, c int .. u int); + CREATE INDEX ON small_t(b); + CREATE INDEX ON small_t(c); + .. + + SELECT * FROM small_t s + JOIN t1 on t1.sb = s.b + JOIN T2 on t2.sc = s.c + .. + JOIN t20 on t20.su = s.u + WHERE s.id = 1; + + Without the above optimization, we don't know s.b /s.c is ordered + already, so it might keep more different paths for small_t because of + they have different interesting pathkey, and use more planning time + for sorting to support merge join. + + With the above optimization, the planning time should be reduced since + the seq scan can produce a ordered result for every expression. + +3. Figure out more interesting pathkey after join with normal UniqueKey. + + CREATE TABLE t(id int primary key, b int, c int); + CREATE INDEX on t(c); + ANALYZE t; + + explain (costs off) + select t1.id, t2.c from t t1 + join t1 t2 on t1.id = t2.b + and t2.c > 3 + order by t1.id, t2.c; + + QUERY PLAN + -------------------------------------------------- + Sort Key: t1.id, t2.c <--- this sort can be avoided actually. + -> Nested Loop + Join Filter: (t1.id = t2.b) + -> Index Only Scan using t_pkey on t t1 + -> Index Scan using t1_c_idx on t1 t2 + Index Cond: (c > 3) + + *Without knowing the t1.id is unique*, which means there are some + duplicated data in t1.id, the duplication data in t1 will break the + order of (t1.id, t2.c), but if we know the t1.id is unique, the sort + will be not needed. I'm pretty happy with this finding. + +4. Optimize some group by case, like + + SELECT id, sum(b) FROM t GROUP BY id + is same with + SELECT id, b from t; + + I'm not sure how often it is in the real life, I'm not so excited with + this for now. + + +How to present ECs in UniqueKey? +-------------------------------- + +I choose "Bitmapset *eclass_indexes;" finally, which is because +Bitmapset is memory compact and good at bms_union, bms_is_subset +stuffs. The value in the bitmap is the positions in root->eq_classes. It +is also be able to present the UniqueKey which is made up from multi +relations or upper relation. + +How to present single row in UniqueKey +------------------------------------- + +I just use a 'Index relid', an non-zero value means the +RelOptInfo[relid] is single row. For the case like + +SELECT * FROM t WHERE id = 1; +The UniqueKey is: +- UniqueKey(eclass_indexes=NULL, relid=1) + +during a join, any unique keys join with single row, it's uniqueness can +be kept. + +SELECT t1.uk, t2.a FROM t WHERE t2.id = 1 and any-qual(t1, t2); +- UniqueKey (t1.uk) + +more specially, join two single row like: + +SELECT * FROM t1 join t2 on true where t1.id = 1 and t2.id = 2; + +the UniqueKey for the JoinRel will be: +- UniqueKey(eclass_indexes=NULL, relid=1) +- UniqueKey(eclass_indexes=NULL, relid=2) + +However, the current single row presentation can't works well with Upper +relation, which I think it would be acceptable. See the following case: + +SELECT count(*) FROM t1 JOIN t2 on true; + + +How to maintain the uniquekey? +------------------------------- +the uniquekey is maintained from baserel to join rel then to upper +relation. In the base rel, it comes from unique index. From the join +relation, it is maintained with two rules: + +- the uniquekey in one side is still unique if it can't be duplicated + after the join. for example: + + SELECT t1.pk FROM t1 JOIN t2 ON t1.a = t2.pk; + UniqueKey on t1: t1.pk + UniqueKey on t1 Join t2: t1.pk + +- The combined unique key from both sides are unique all the times. + SELECT t1.pk , t2.pk FROM t1 join t2 on true; + UniqueKey on t1 join t2: (t1.pk, t2.pk) + +Some other operations like DISTINCT, GROUP BY can produce UniqueKey as well. + +NULL values +----------- +notnullattrs are added into RelOptInfo, which present if these attributes +may not be NULL after the baserestrictinfo is executed. not-null-attributes +may be generated by not-null constraint in catalog or baserestrictinfo +(only) filter. However it is possible become NULLs because of outer +join, then Var.varnullingrels is used in this case. see +'var_is_nullable' function call. + +To simplify the UniqueKey module, it doesn't care about the null values +during the maintaining, which means it may contains multi NULL values +all the time by design. However whenever a user case care about that, +the user case can get the answer with the above logic, that is what +'mark-distinct-as-noop' does. + +How to reduce the overhead +---------------------------------- +UniqueKey employs the similar strategy like PathKey, it only maintain +the interesting PathKey. Currently the interesting UniqueKey includes: +1). It is subset of distinct_pathkeys. +2). It is used in mergeable join clauses for unique key deduction (for +the join rel case, rule 1). In this case, it can be discarded quickly if +the join has been done. + +To avoid to check if an uniquekey is subset of distinct clause again and +again, I cached the result into UnqiueKey struct during the UniqueKey +creation. + +Since our first goal is just used for marking distinct as no-op, so if +there is no distinct clause at all, unique key will be not maintained at +the beginning. so we can have some codes like: + +if (root->distinct_pathkeys == NULL) +return; + +This fast path is NOT added for now for better code coverage. + +What I have now: +---------------- + +The current patch just maintain the UniqueKey at the baserel/joinrel level +and used it for mark-distinct-as-noop purpose. including the basic idea of + +- How the UniqueKey is defined. +- How to find out the interesting pathkey in the baserel/joinrel level. +- How to figure out the unique key contains NULL values. + +Also the test cases are prepared, see uniquekey.sql. diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index cc51ae1757..ced0fce415 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -455,6 +455,8 @@ set_rel_size(PlannerInfo *root, RelOptInfo *rel, } } + populate_baserel_uniquekeys(root, rel); + /* * We insist that all non-dummy rels have a nonzero rowcount estimate. */ diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c index 1d6bedb399..892cd0000c 100644 --- a/src/backend/optimizer/path/equivclass.c +++ b/src/backend/optimizer/path/equivclass.c @@ -755,6 +755,54 @@ get_eclass_for_sort_expr(PlannerInfo *root, return newec; } +/* + * find_ec_position_matching_expr + * Locate the position of EquivalenceClass whose members matching + * the given expr, if any; return -1 if no match. + */ +int +find_ec_position_matching_expr(PlannerInfo *root, + Expr *expr, + RelOptInfo *baserel) +{ + int i = -1; + + while ((i = bms_next_member(baserel->eclass_indexes, i)) >= 0) + { + EquivalenceClass *ec = list_nth(root->eq_classes, i); + + if (find_ec_member_matching_expr(ec, expr, baserel->relids)) + return i; + } + return -1; +} + +/* + * build_ec_positions_for_exprs + * + * Given a list of exprs, find the related EC positions for each of + * them. if any exprs has no EC related, return NULL; + */ +Bitmapset * +build_ec_positions_for_exprs(PlannerInfo *root, List *exprs, RelOptInfo *rel) +{ + ListCell *lc; + Bitmapset *res = NULL; + + foreach(lc, exprs) + { + int pos = find_ec_position_matching_expr(root, lfirst(lc), rel); + + if (pos < 0) + { + bms_free(res); + return NULL; + } + res = bms_add_member(res, pos); + } + return res; +} + /* * find_ec_member_matching_expr * Locate an EquivalenceClass member matching the given expr, if any; diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c index f3a9412d18..b5c2b7d084 100644 --- a/src/backend/optimizer/path/joinrels.c +++ b/src/backend/optimizer/path/joinrels.c @@ -761,6 +761,8 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2) sjinfo, pushed_down_joins, &restrictlist); + populate_joinrel_uniquekeys(root, joinrel, rel1, rel2, restrictlist, sjinfo->jointype); + /* * If we've already proven this join is empty, we needn't consider any * more paths for it. diff --git a/src/backend/optimizer/path/uniquekey.c b/src/backend/optimizer/path/uniquekey.c new file mode 100644 index 0000000000..ecc7c6d802 --- /dev/null +++ b/src/backend/optimizer/path/uniquekey.c @@ -0,0 +1,654 @@ +/*------------------------------------------------------------------------- + * + * uniquekey.c + * Utilities for maintaining uniquekey. + * + * + * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * IDENTIFICATION + * src/backend/optimizer/path/uniquekey.c + * + *------------------------------------------------------------------------- + */ +#include "postgres.h" + +#include "access/sysattr.h" +#include "nodes/nodeFuncs.h" +#include "nodes/pathnodes.h" +#include "optimizer/optimizer.h" +#include "optimizer/paths.h" + + +/* Functions to populate UniqueKey */ +static bool add_uniquekey_for_uniqueindex(PlannerInfo *root, + IndexOptInfo *unique_index, + List *truncatable_exprs, + List *expr_opfamilies); +static bool unique_ecs_useful_for_distinct(PlannerInfo *root, Bitmapset *ec_indexes); + +/* Helper functions to create UniqueKey. */ +static UniqueKey * make_uniquekey(Bitmapset *eclass_indexes, + bool useful_for_distinct); +static void mark_rel_singlerow(RelOptInfo *rel, int relid); + +static UniqueKey * rel_singlerow_uniquekey(RelOptInfo *rel); +static bool uniquekey_contains_multinulls(PlannerInfo *root, RelOptInfo *rel, UniqueKey * ukey); + +/* Debug only */ +static void print_uniquekey(PlannerInfo *root, RelOptInfo *rel); + +static bool uniquekey_contains_in(PlannerInfo *root, UniqueKey * ukey, Bitmapset *ecs, Relids relids); +static bool is_uniquekey_useful_afterjoin(PlannerInfo *root, UniqueKey * ukey, RelOptInfo *joinrel); + +/* + * populate_baserel_uniquekeys + * + * UniqueKey on baserel comes from unique indexes. Any expression + * which equals with Const can be stripped and the left expressions are + * still unique. + */ +void +populate_baserel_uniquekeys(PlannerInfo *root, RelOptInfo *rel) +{ + ListCell *lc; + List *truncatable_exprs = NIL, + *expr_opfamilies = NIL; + + /* + * Currently we only use UniqueKey for mark-distinct-as-noop case, so if + * there is no-distinct-clause at all, we can ignore the maintenance at + * the first place. however for code coverage at the development stage, we + * bypass this fastpath on purpose. + * + * XXX: even we want this fastpath, we still need to distinguish even the + * current subquery has no DISTINCT, but the upper query may have. + */ + + /* + * if (root->distinct_pathkeys == NIL) return; + */ + foreach(lc, rel->baserestrictinfo) + { + RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc); + + if (rinfo->mergeopfamilies == NIL) + continue; + + if (!IsA(rinfo->clause, OpExpr)) + continue; + + if (bms_is_empty(rinfo->left_relids)) + truncatable_exprs = lappend(truncatable_exprs, get_rightop(rinfo->clause)); + else if (bms_is_empty(rinfo->right_relids)) + truncatable_exprs = lappend(truncatable_exprs, get_leftop(rinfo->clause)); + else + continue; + + expr_opfamilies = lappend(expr_opfamilies, rinfo->mergeopfamilies); + } + + foreach(lc, rel->indexlist) + { + IndexOptInfo *index = (IndexOptInfo *) lfirst(lc); + + if (!index->unique || !index->immediate || + (index->indpred != NIL && !index->predOK)) + continue; + + if (add_uniquekey_for_uniqueindex(root, index, + truncatable_exprs, + expr_opfamilies)) + /* Find a singlerow case, no need to go through other indexes. */ + return; + } + + print_uniquekey(root, rel); +} + + +/* + * add_uniquekey_for_uniqueindex + * + * populate a UniqueKey if it is interesting, return true iff the + * UniqueKey is an SingleRow. Only the interesting UniqueKeys are kept. + */ +static bool +add_uniquekey_for_uniqueindex(PlannerInfo *root, IndexOptInfo *unique_index, + List *truncatable_exprs, List *expr_opfamilies) +{ + List *unique_exprs = NIL; + Bitmapset *unique_ecs = NULL; + ListCell *indexpr_item; + RelOptInfo *rel = unique_index->rel; + bool used_for_distinct; + int c; + + indexpr_item = list_head(unique_index->indexprs); + + for (c = 0; c < unique_index->nkeycolumns; c++) + { + int attr = unique_index->indexkeys[c]; + Expr *expr; /* The candidate for UniqueKey expression. */ + bool matched_const = false; + ListCell *lc1, + *lc2; + + if (attr > 0) + { + expr = list_nth_node(TargetEntry, unique_index->indextlist, c)->expr; + } + else if (attr == 0) + { + /* Expression index */ + expr = lfirst(indexpr_item); + indexpr_item = lnext(unique_index->indexprs, indexpr_item); + } + else /* attr < 0 */ + { + /* Index on OID is possible, not handle it for now. */ + return false; + } + + /* Ignore the expr which are equals to const. */ + forboth(lc1, truncatable_exprs, lc2, expr_opfamilies) + { + if (list_member_oid((List *) lfirst(lc2), unique_index->opfamily[c]) && + match_index_to_operand((Node *) lfirst(lc1), c, unique_index)) + { + matched_const = true; + break; + } + } + + if (matched_const) + continue; + + unique_exprs = lappend(unique_exprs, expr); + } + + if (unique_exprs == NIL) + { + /* single row is always interesting. */ + mark_rel_singlerow(rel, rel->relid); + return true; + } + + /* + * if no EquivalenceClass is found for any exprs in unique exprs, we are + * sure the whole exprs are not in the DISTINCT clause or mergeable join + * clauses. so it is not interesting. + */ + unique_ecs = build_ec_positions_for_exprs(root, unique_exprs, rel); + if (unique_ecs == NULL) + return false; + + used_for_distinct = unique_ecs_useful_for_distinct(root, unique_ecs); + + + rel->uniquekeys = lappend(rel->uniquekeys, + make_uniquekey(unique_ecs, + used_for_distinct)); + return false; +} + +/* + * make_uniquekey + * Based on UnqiueKey rules, it is impossible for a UnqiueKey + * which have eclass_indexes and relid both set. This function just + * handle eclass_indexes case. + */ +static UniqueKey * +make_uniquekey(Bitmapset *eclass_indexes, bool useful_for_distinct) +{ + UniqueKey *ukey = makeNode(UniqueKey); + + ukey->eclass_indexes = eclass_indexes; + ukey->relid = 0; + ukey->use_for_distinct = useful_for_distinct; + return ukey; +} + +/* + * mark_rel_singlerow + * mark a relation as singlerow. + */ +static void +mark_rel_singlerow(RelOptInfo *rel, int relid) +{ + UniqueKey *ukey = makeNode(UniqueKey); + + ukey->relid = relid; + rel->uniquekeys = list_make1(ukey); +} + +static inline bool +uniquekey_is_singlerow(UniqueKey * ukey) +{ + return ukey->relid != 0; +} + +/* + * + * Return the UniqueKey if rel is a singlerow Relation. othwise + * return NULL. + */ +static UniqueKey * +rel_singlerow_uniquekey(RelOptInfo *rel) +{ + if (rel->uniquekeys != NIL) + { + UniqueKey *ukey = linitial_node(UniqueKey, rel->uniquekeys); + + if (ukey->relid) + return ukey; + } + return NULL; +} + +/* + * print_uniquekey + * Used for easier reivew, should be removed before commit. + */ +static void +print_uniquekey(PlannerInfo *root, RelOptInfo *rel) +{ + if (!enable_geqo) + { + ListCell *lc; + + elog(INFO, "Rel = %s", bmsToString(rel->relids)); + foreach(lc, rel->uniquekeys) + { + UniqueKey *ukey = lfirst_node(UniqueKey, lc); + + elog(INFO, "UNIQUEKEY{indexes=%s, singlerow_rels=%d, use_for_distinct=%d}", + bmsToString(ukey->eclass_indexes), + ukey->relid, + ukey->use_for_distinct); + } + } +} + +/* + * is it possible that the var contains multi NULL values in the given + * RelOptInfo rel? + */ +static bool +var_is_nullable(PlannerInfo *root, Var *var, RelOptInfo *rel) +{ + RelOptInfo *base_rel; + + /* check if the outer join can add the NULL values. */ + if (bms_overlap(var->varnullingrels, rel->relids)) + return true; + + /* check if the user data has the NULL values. */ + base_rel = root->simple_rel_array[var->varno]; + return !bms_is_member(var->varattno - FirstLowInvalidHeapAttributeNumber, base_rel->notnullattrs); +} + + +/* + * uniquekey_contains_multinulls + * + * Check if the uniquekey contains nulls values. + */ +static bool +uniquekey_contains_multinulls(PlannerInfo *root, RelOptInfo *rel, UniqueKey * ukey) +{ + int i = -1; + + while ((i = bms_next_member(ukey->eclass_indexes, i)) >= 0) + { + EquivalenceClass *ec = list_nth_node(EquivalenceClass, root->eq_classes, i); + ListCell *lc; + + foreach(lc, ec->ec_members) + { + EquivalenceMember *em = lfirst_node(EquivalenceMember, lc); + Var *var; + + var = (Var *) em->em_expr; + + if (!IsA(var, Var)) + continue; + + if (var_is_nullable(root, var, rel)) + return true; + else + + /* + * If any one of member in the EC is not nullable, we all the + * members are not nullable since they are equal with each + * other. + */ + break; + } + } + + return false; +} + + +/* + * relation_is_distinct_for + * + * Check if the rel is distinct for distinct_pathkey. + */ +bool +relation_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *distinct_pathkey) +{ + ListCell *lc; + UniqueKey *singlerow_ukey = rel_singlerow_uniquekey(rel); + Bitmapset *pathkey_bm = NULL; + + if (singlerow_ukey) + { + return !uniquekey_contains_multinulls(root, rel, singlerow_ukey); + } + + foreach(lc, distinct_pathkey) + { + PathKey *pathkey = lfirst_node(PathKey, lc); + int pos = list_member_ptr_pos(root->eq_classes, pathkey->pk_eclass); + + if (pos == -1) + return false; + + pathkey_bm = bms_add_member(pathkey_bm, pos); + } + + foreach(lc, rel->uniquekeys) + { + UniqueKey *ukey = lfirst_node(UniqueKey, lc); + + if (bms_is_subset(ukey->eclass_indexes, pathkey_bm) && + !uniquekey_contains_multinulls(root, rel, ukey)) + return true; + } + + return false; +} + +/* + * unique_ecs_useful_for_distinct + * + * Return true if all the EquivalenceClass for ecs exists in root->distinct_pathkey. + */ +static bool +unique_ecs_useful_for_distinct(PlannerInfo *root, Bitmapset *ec_indexes) +{ + int i = -1; + + while ((i = bms_next_member(ec_indexes, i)) >= 0) + { + EquivalenceClass *ec = list_nth(root->eq_classes, i); + ListCell *p; + bool found = false; + + foreach(p, root->distinct_pathkeys) + { + PathKey *pathkey = lfirst_node(PathKey, p); + + if (ec == pathkey->pk_eclass) + { + found = true; + break; + } + } + if (!found) + return false; + } + return true; +} + +/* + * populate_joinrel_uniquekey_for_rel + * + * Check if the pattern of rel.any_column = other_rel.unique_key_column + * exists, if so, the uniquekey in rel is still valid after join and it is + * added into joinrel and return true. otherwise return false. + */ +static bool +populate_joinrel_uniquekey_for_rel(PlannerInfo *root, RelOptInfo *joinrel, + RelOptInfo *rel, RelOptInfo *other_rel, + List *restrictlist) +{ + bool rel_keep_unique = false; + Bitmapset *other_ecs = NULL; + Relids other_relids = NULL; + ListCell *lc; + + if (rel_singlerow_uniquekey(other_rel)) + { + /* + * any uniquekeys stuff join with single-row, its uniqueness is still + * kept. + */ + goto done; + } + + /* find out the ECs which the rel.any_columns equals to. */ + foreach(lc, restrictlist) + { + RestrictInfo *r = lfirst_node(RestrictInfo, lc); + + if (r->mergeopfamilies == NIL) + continue; + + /* Build the Bitmapset for easy comparing. */ + if (bms_equal(r->left_relids, rel->relids) && r->right_ec != NULL) + { + other_ecs = bms_add_member(other_ecs, list_member_ptr_pos(root->eq_classes, r->right_ec)); + other_relids = bms_add_members(other_relids, r->right_relids); + } + else if (bms_equal(r->right_relids, rel->relids) && r->left_ec != NULL) + { + other_ecs = bms_add_member(other_ecs, list_member_ptr_pos(root->eq_classes, r->left_ec)); + other_relids = bms_add_members(other_relids, r->left_relids); + } + } + + /* Check if these ECs include a uniquekey of other_rel */ + foreach(lc, other_rel->uniquekeys) + { + UniqueKey *ukey = lfirst_node(UniqueKey, lc); + + if (uniquekey_contains_in(root, ukey, other_ecs, other_relids)) + { + rel_keep_unique = true; + break; + } + } + + if (!rel_keep_unique) + return false; + +done: + + /* + * Now copy the uniquekey in rel to joinrel, but first we need to know if + * it is useful. + */ + foreach(lc, rel->uniquekeys) + { + UniqueKey *ukey = lfirst_node(UniqueKey, lc); + + if (is_uniquekey_useful_afterjoin(root, ukey, joinrel)) + { + if (uniquekey_is_singlerow(ukey)) + { + /* + * XXX (?): a). NULL values. b). other relids rather than + * ukey->relid. + */ + mark_rel_singlerow(joinrel, ukey->relid); + break; + } + joinrel->uniquekeys = lappend(joinrel->uniquekeys, ukey); + } + } + + return true; +} + +/* + * populate_joinrel_uniquekeys + */ +void +populate_joinrel_uniquekeys(PlannerInfo *root, RelOptInfo *joinrel, + RelOptInfo *outerrel, RelOptInfo *innerrel, + List *restrictlist, JoinType jointype) +{ + bool outeruk_still_valid = false, + inneruk_still_valid = false; + ListCell *lc, + *lc2; + + if (jointype == JOIN_SEMI || jointype == JOIN_ANTI) + { + foreach(lc, outerrel->uniquekeys) + { + /* + * the uniquekey on the outer side is not changed after semi/anti + * join. + */ + joinrel->uniquekeys = lappend(joinrel->uniquekeys, lfirst(lc)); + } + return; + } + + if (outerrel->uniquekeys == NIL || innerrel->uniquekeys == NIL) + return; + + outeruk_still_valid = populate_joinrel_uniquekey_for_rel(root, joinrel, outerrel, + innerrel, restrictlist); + inneruk_still_valid = populate_joinrel_uniquekey_for_rel(root, joinrel, innerrel, + outerrel, restrictlist); + + if (outeruk_still_valid || inneruk_still_valid) + + /* + * the uniquekey on outers or inners have been added into joinrel so + * the combined uniuqekey from both sides is not needed. + */ + return; + + /* + * The combined UniqueKey is still unique no matter the join method or + * join clauses. So let build the combined ones. + */ + foreach(lc, outerrel->uniquekeys) + { + UniqueKey *outer_ukey = lfirst(lc); + + if (!is_uniquekey_useful_afterjoin(root, outer_ukey, joinrel)) + /* discard the uniquekey which is not interesting. */ + continue; + + /* singlerow will make the inneruk_still_valid true */ + Assert(!uniquekey_is_singlerow(outer_ukey)); + + foreach(lc2, innerrel->uniquekeys) + { + UniqueKey *inner_ukey = lfirst(lc2); + + if (!is_uniquekey_useful_afterjoin(root, inner_ukey, joinrel)) + continue; + + /* singlerow will make the outeruk_still_valid true */ + Assert(!uniquekey_is_singlerow(inner_ukey)); + + joinrel->uniquekeys = lappend(joinrel->uniquekeys, + make_uniquekey( + bms_union(outer_ukey->eclass_indexes, inner_ukey->eclass_indexes), + outer_ukey->use_for_distinct || inner_ukey->use_for_distinct)); + } + } +} + +/* + * uniquekey_contains_in + * Return if UniqueKey contains in the list of EquivalenceClass + * or the UniqueKey's SingleRow contains in relids. + */ +static bool +uniquekey_contains_in(PlannerInfo *root, UniqueKey * ukey, Bitmapset *ecs, Relids relids) +{ + + if (uniquekey_is_singlerow(ukey)) + { + return bms_is_member(ukey->relid, relids); + } + + return bms_is_subset(ukey->eclass_indexes, ecs); +} + + +/* + * uniquekey_useful_for_merging + * Check if the uniquekey is useful for mergejoins above the given relation. + * + * similar with pathkeys_useful_for_merging. + */ +static bool +uniquekey_useful_for_merging(PlannerInfo *root, UniqueKey * ukey, RelOptInfo *rel) +{ + + int i = -1; + + while ((i = bms_next_member(ukey->eclass_indexes, i)) >= 0) + { + EquivalenceClass *ec = list_nth(root->eq_classes, i); + ListCell *j; + bool matched = false; + + if (rel->has_eclass_joins && eclass_useful_for_merging(root, ec, rel)) + { + matched = true; + } + else + { + foreach(j, rel->joininfo) + { + RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(j); + + if (restrictinfo->mergeopfamilies == NIL) + continue; + update_mergeclause_eclasses(root, restrictinfo); + + if (ec == restrictinfo->left_ec || ec == restrictinfo->right_ec) + { + matched = true; + break; + } + } + } + + if (!matched) + return false; + } + + return true; +} + +/* + * is_uniquekey_useful_afterjoin + * + * uniquekey is useful when it contains in distinct_pathkey or in mergable join clauses. + */ +static bool +is_uniquekey_useful_afterjoin(PlannerInfo *root, UniqueKey * ukey, RelOptInfo *joinrel) +{ + if (ukey->use_for_distinct) + return true; + + /* XXX might needs a better judgement */ + if (uniquekey_is_singlerow(ukey)) + return true; + + + return uniquekey_useful_for_merging(root, ukey, joinrel); +} diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c index e2c68fe6f9..a04a6022bc 100644 --- a/src/backend/optimizer/plan/initsplan.c +++ b/src/backend/optimizer/plan/initsplan.c @@ -2678,7 +2678,15 @@ add_base_clause_to_rel(PlannerInfo *root, Index relid, } /* Add clause to rel's restriction list */ - rel->baserestrictinfo = lappend(rel->baserestrictinfo, restrictinfo); + rel->baserestrictinfo = lappend(rel->baserestrictinfo, + restrictinfo); + { + List *not_null_vars = find_nonnullable_vars((Node *) restrictinfo->clause); + + if (not_null_vars != NIL) + rel->notnullattrs = bms_union(rel->notnullattrs, + list_nth(not_null_vars, rel->relid)); + } /* Update security level info */ rel->baserestrict_min_security = Min(rel->baserestrict_min_security, diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index 5320da51a0..e40c729e7e 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -1671,9 +1671,12 @@ grouping_planner(PlannerInfo *root, double tuple_fraction, gset_data); /* Fix things up if grouping_target contains SRFs */ if (parse->hasTargetSRFs) + { adjust_paths_for_srfs(root, current_rel, grouping_targets, grouping_targets_contain_srfs); + current_rel->uniquekeys = NIL; + } } /* @@ -1734,6 +1737,7 @@ grouping_planner(PlannerInfo *root, double tuple_fraction, * Now we are prepared to build the final-output upperrel. */ final_rel = fetch_upper_rel(root, UPPERREL_FINAL, NULL); + /* simple_copy_uniquekeys(final_rel, current_rel); */ /* * If the input rel is marked consider_parallel and there's nothing that's @@ -4009,6 +4013,22 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel, gd, extra->targetList); + if (root->parse->groupingSets) + { + /* nothing to do */ + } + else if (root->parse->groupClause && root->group_pathkeys != NIL) + { + /* + * populate_uniquekeys_from_pathkeys(root, grouped_rel, + * root->group_pathkeys); + */ + } + else + { + /* SingleRow Case */ + } + /* Build final grouping paths */ add_paths_to_grouping_rel(root, input_rel, grouped_rel, partially_grouped_rel, agg_costs, gd, @@ -4632,9 +4652,22 @@ create_distinct_paths(PlannerInfo *root, RelOptInfo *input_rel, { RelOptInfo *distinct_rel; + /* + * distinct_pathkeys may be NIL if it distinctClause is not sortable. XXX: + * What should we do for the else? + */ + if (root->distinct_pathkeys && + relation_is_distinct_for(root, input_rel, root->distinct_pathkeys)) + return input_rel; + /* For now, do all work in the (DISTINCT, NULL) upperrel */ distinct_rel = fetch_upper_rel(root, UPPERREL_DISTINCT, NULL); + /* + * populate_uniquekeys_from_pathkeys(root, distinct_rel, + * root->distinct_pathkeys); + */ + /* * We don't compute anything at this level, so distinct_rel will be * parallel-safe if the input rel is parallel-safe. In particular, if diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c index 130f838629..438ec75689 100644 --- a/src/backend/optimizer/util/plancat.c +++ b/src/backend/optimizer/util/plancat.c @@ -119,6 +119,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, Relation relation; bool hasindex; List *indexinfos = NIL; + Index i; /* * We need not lock the relation since it was already locked, either by @@ -203,6 +204,15 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, /* Retrieve the parallel_workers reloption, or -1 if not set. */ rel->rel_parallel_workers = RelationGetParallelWorkers(relation, -1); + for (i = 0; i < relation->rd_att->natts; i++) + { + FormData_pg_attribute attr = relation->rd_att->attrs[i]; + + if (attr.attnotnull) + rel->notnullattrs = bms_add_member(rel->notnullattrs, + attr.attnum - FirstLowInvalidHeapAttributeNumber); + } + /* * Make list of indexes. Ignore indexes on system catalogs if told to. * Don't bother with indexes from traditional inheritance parents. For diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c index e05b21c884..0e3d427fbb 100644 --- a/src/backend/optimizer/util/relnode.c +++ b/src/backend/optimizer/util/relnode.c @@ -286,6 +286,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent) rel->all_partrels = NULL; rel->partexprs = NULL; rel->nullable_partexprs = NULL; + rel->uniquekeys = NIL; /* * Pass assorted information down the inheritance hierarchy. @@ -768,6 +769,7 @@ build_join_rel(PlannerInfo *root, joinrel->all_partrels = NULL; joinrel->partexprs = NULL; joinrel->nullable_partexprs = NULL; + joinrel->uniquekeys = NIL; /* Compute information relevant to the foreign relations. */ set_foreign_rel_properties(joinrel, outer_rel, inner_rel); diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h index 6c71098f2d..4e879b0b7c 100644 --- a/src/include/nodes/pathnodes.h +++ b/src/include/nodes/pathnodes.h @@ -881,6 +881,7 @@ typedef struct RelOptInfo * Vars/Exprs, cost, width */ struct PathTarget *reltarget; + List *uniquekeys; /* A list of UniqueKey. */ /* * materialization information @@ -919,6 +920,9 @@ typedef struct RelOptInfo /* array indexed [min_attr .. max_attr] */ int32 *attr_widths pg_node_attr(read_write_ignore); + /* The not null attrs from catalogs or baserestrictinfo. */ + Bitmapset *notnullattrs; + /* * Zero-based set containing attnums of NOT NULL columns. Not populated * for rels corresponding to non-partitioned inh==true RTEs. @@ -1477,6 +1481,21 @@ typedef struct PathKeyInfo List *clauses; } PathKeyInfo; + +typedef struct UniqueKey +{ + pg_node_attr(no_read, no_query_jumble) + + NodeTag type; + Bitmapset *eclass_indexes; /* Indexes in PlannerInfo's eq_class list of + * ECs that is unique for a RelOptInfo. */ + int relid; + bool use_for_distinct; /* true if it is used in distinct-pathkey, + * in this case we would never check if we + * should discard it during join search. */ +} UniqueKey; + + /* * VolatileFunctionStatus -- allows nodes to cache their * contain_volatile_functions properties. VOLATILITY_UNKNOWN means not yet diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h index 52df93759f..9d860f0135 100644 --- a/src/include/nodes/pg_list.h +++ b/src/include/nodes/pg_list.h @@ -632,6 +632,8 @@ extern bool list_member_int(const List *list, int datum); extern bool list_member_oid(const List *list, Oid datum); extern bool list_member_xid(const List *list, TransactionId datum); +extern int list_member_ptr_pos(const List *list, const void *datum); + extern pg_nodiscard List *list_delete(List *list, void *datum); extern pg_nodiscard List *list_delete_ptr(List *list, void *datum); extern pg_nodiscard List *list_delete_int(List *list, int datum); diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h index 39ba461548..f3c683b2c9 100644 --- a/src/include/optimizer/paths.h +++ b/src/include/optimizer/paths.h @@ -142,11 +142,17 @@ extern EquivalenceClass *get_eclass_for_sort_expr(PlannerInfo *root, extern EquivalenceMember *find_ec_member_matching_expr(EquivalenceClass *ec, Expr *expr, Relids relids); +extern int find_ec_position_matching_expr(PlannerInfo *root, + Expr *expr, + RelOptInfo *rel); extern EquivalenceMember *find_computable_ec_member(PlannerInfo *root, EquivalenceClass *ec, List *exprs, Relids relids, bool require_parallel_safe); +extern Bitmapset *build_ec_positions_for_exprs(PlannerInfo *root, + List *exprs, + RelOptInfo *rel); extern bool relation_can_be_sorted_early(PlannerInfo *root, RelOptInfo *rel, EquivalenceClass *ec, bool require_parallel_safe); @@ -270,4 +276,11 @@ extern PathKey *make_canonical_pathkey(PlannerInfo *root, extern void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel, List *live_childrels); +extern bool relation_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, + List *distinct_pathkey); +extern void populate_baserel_uniquekeys(PlannerInfo *root, + RelOptInfo *baserel); +extern void populate_joinrel_uniquekeys(PlannerInfo *root, RelOptInfo *joinrel, + RelOptInfo *outerrel, RelOptInfo *innerrel, + List *restrictlist, JoinType jointype); #endif /* PATHS_H */ diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out index 8b640c2fc2..675b5b1484 100644 --- a/src/test/regress/expected/join.out +++ b/src/test/regress/expected/join.out @@ -5671,18 +5671,15 @@ select d.* from d left join (select * from b group by b.id, b.c_id) s explain (costs off) select d.* from d left join (select distinct * from b) s on d.a = s.id; - QUERY PLAN --------------------------------------- + QUERY PLAN +------------------------------------ Merge Right Join Merge Cond: (b.id = d.a) - -> Unique - -> Sort - Sort Key: b.id, b.c_id - -> Seq Scan on b + -> Index Scan using b_pkey on b -> Sort Sort Key: d.a -> Seq Scan on d -(9 rows) +(6 rows) -- join removal is not possible here explain (costs off) diff --git a/src/test/regress/expected/uniquekey.out b/src/test/regress/expected/uniquekey.out new file mode 100644 index 0000000000..aa712c2dd7 --- /dev/null +++ b/src/test/regress/expected/uniquekey.out @@ -0,0 +1,268 @@ +CREATE TABLE uk_t (id int primary key, a int not null, b int not null, c int, d int, e int); +explain (costs off) +select distinct id from uk_t; + QUERY PLAN +------------------ + Seq Scan on uk_t +(1 row) + +explain (costs off) +select distinct e from uk_t where id = e; + QUERY PLAN +-------------------- + Seq Scan on uk_t + Filter: (id = e) +(2 rows) + +create unique index on uk_t (a, b); +create unique index on uk_t (c, d); +explain (costs off) +select distinct a, b from uk_t; + QUERY PLAN +------------------ + Seq Scan on uk_t +(1 row) + +explain (costs off) +select distinct c, d from uk_t; + QUERY PLAN +------------------------ + HashAggregate + Group Key: c, d + -> Seq Scan on uk_t +(3 rows) + +explain (costs off) +select distinct c, d from uk_t +where c > 0 and d > 0; + QUERY PLAN +------------------------------------------- + Bitmap Heap Scan on uk_t + Recheck Cond: ((c > 0) AND (d > 0)) + -> Bitmap Index Scan on uk_t_c_d_idx + Index Cond: ((c > 0) AND (d > 0)) +(4 rows) + +explain (costs off) +select distinct d from uk_t +where c > 1 and d > 0; + QUERY PLAN +------------------------------------------------- + HashAggregate + Group Key: d + -> Bitmap Heap Scan on uk_t + Recheck Cond: ((c > 1) AND (d > 0)) + -> Bitmap Index Scan on uk_t_c_d_idx + Index Cond: ((c > 1) AND (d > 0)) +(6 rows) + +explain (costs off) +select distinct d from uk_t +where c = 1 and d > 0; + QUERY PLAN +------------------------------------------- + Bitmap Heap Scan on uk_t + Recheck Cond: ((c = 1) AND (d > 0)) + -> Bitmap Index Scan on uk_t_c_d_idx + Index Cond: ((c = 1) AND (d > 0)) +(4 rows) + +-- test the join case -- +EXPLAIN (COSTS OFF) +SELECT distinct t1.id FROM uk_t t1 JOIN uk_t t2 ON t1.e = t2.id; + QUERY PLAN +--------------------------------- + Hash Join + Hash Cond: (t1.e = t2.id) + -> Seq Scan on uk_t t1 + -> Hash + -> Seq Scan on uk_t t2 +(5 rows) + +EXPLAIN (COSTS OFF) +SELECT distinct t1.id FROM uk_t t1 JOIN uk_t t2 ON t1.e = t2.a and t1.d = t2.b; + QUERY PLAN +------------------------------------------------ + Hash Join + Hash Cond: ((t1.e = t2.a) AND (t1.d = t2.b)) + -> Seq Scan on uk_t t1 + -> Hash + -> Seq Scan on uk_t t2 +(5 rows) + +EXPLAIN (COSTS OFF) +SELECT distinct t1.id FROM uk_t t1 LEFT JOIN uk_t t2 ON t1.e = t2.id; + QUERY PLAN +--------------------- + Seq Scan on uk_t t1 +(1 row) + +EXPLAIN (COSTS OFF) +SELECT distinct t1.id FROM uk_t t1 LEFT JOIN uk_t t2 ON t1.e = t2.a and t1.d = t2.b; + QUERY PLAN +--------------------- + Seq Scan on uk_t t1 +(1 row) + +-- outer join makes null values so distinct can't be a no-op. +EXPLAIN (COSTS OFF) +SELECT distinct t1.id FROM uk_t t1 RIGHT JOIN uk_t t2 ON t1.e = t2.id; + QUERY PLAN +--------------------------------------- + HashAggregate + Group Key: t1.id + -> Hash Right Join + Hash Cond: (t1.e = t2.id) + -> Seq Scan on uk_t t1 + -> Hash + -> Seq Scan on uk_t t2 +(7 rows) + +-- single row case. +-- single row is unqiue all the time. +-- case 1: +-- t1.d is single row at base rel level, and because of t1.e = t2.id, it is +-- still single row after join. +EXPLAIN (COSTS OFF) +SELECT distinct t1.d FROM uk_t t1 JOIN uk_t t2 ON t1.e = t2.id and t1.id = 1; + QUERY PLAN +-------------------------------------------------- + Nested Loop + -> Index Scan using uk_t_pkey on uk_t t1 + Index Cond: (id = 1) + -> Index Only Scan using uk_t_pkey on uk_t t2 + Index Cond: (id = t1.e) +(5 rows) + +-- case 2 +-- any uniquekey which join with single row whose uniqueness can be kept. +-- t1.d is single row at base rel level because of t1.id = 1 +-- so t2's uniquekey can be kept when join with t2.any_column. +EXPLAIN (COSTS OFF) +SELECT distinct t2.id FROM uk_t t1 JOIN uk_t t2 ON t2.e = t1.e and t1.id = 1; + QUERY PLAN +--------------------------------------------------- + Hash Join + Hash Cond: (t2.e = t1.e) + -> Seq Scan on uk_t t2 + -> Hash + -> Index Scan using uk_t_pkey on uk_t t1 + Index Cond: (id = 1) +(6 rows) + +insert into uk_t SELECT 1, 2, 3, 4, 5, 6; +-- combined uniquekey cases. +-- the combinations of uniquekeys from the 2 sides is still unique +-- no matter the join method and join clauses. +EXPLAIN (COSTS OFF) +SELECT distinct t2.id, t1.id FROM uk_t t1 JOIN uk_t t2 ON true; + QUERY PLAN +--------------------------------- + Nested Loop + -> Seq Scan on uk_t t1 + -> Materialize + -> Seq Scan on uk_t t2 +(4 rows) + +SELECT distinct t2.id, t1.id FROM uk_t t1 JOIN uk_t t2 ON true; + id | id +----+---- + 1 | 1 +(1 row) + +EXPLAIN (COSTS OFF) +SELECT distinct t2.id, t1.id FROM uk_t t1 JOIN uk_t t2 ON false; + QUERY PLAN +-------------------------- + Result + One-Time Filter: false +(2 rows) + +SELECT distinct t2.id, t1.id FROM uk_t t1 JOIN uk_t t2 ON false; + id | id +----+---- +(0 rows) + +-- BAD CASE: the distinct should be marked as noop in the below 4 cases, but +-- since it is nullable by outer join, so it can't be removed. However the rule +-- here should be if both uniquekey are not nullable, the combinations of them +-- is not *mutli* nullable even by outer join. It is not clear to me how to +-- implement it since I am just feeling not maintaining the not-null stuff +-- during the joins is pretty amazing, otherwise there are too many stuff to +-- do to cover this special case. +EXPLAIN (COSTS OFF) +SELECT distinct t2.id, t1.id FROM uk_t t1 LEFT JOIN uk_t t2 ON true; + QUERY PLAN +--------------------------------------- + HashAggregate + Group Key: t2.id, t1.id + -> Nested Loop Left Join + -> Seq Scan on uk_t t1 + -> Materialize + -> Seq Scan on uk_t t2 +(6 rows) + +SELECT distinct t2.id, t1.id FROM uk_t t1 LEFT JOIN uk_t t2 ON true; + id | id +----+---- + 1 | 1 +(1 row) + +EXPLAIN (COSTS OFF) +SELECT distinct t2.id, t1.id FROM uk_t t1 LEFT JOIN uk_t t2 ON false; + QUERY PLAN +-------------------------------------- + HashAggregate + Group Key: id, t1.id + -> Nested Loop Left Join + Join Filter: false + -> Seq Scan on uk_t t1 + -> Result + One-Time Filter: false +(7 rows) + +SELECT distinct t2.id, t1.id FROM uk_t t1 LEFT JOIN uk_t t2 ON false; + id | id +----+---- + | 1 +(1 row) + +EXPLAIN (COSTS OFF) +SELECT distinct t2.id, t1.id FROM uk_t t1 FULL JOIN uk_t t2 ON true; + QUERY PLAN +--------------------------------------- + HashAggregate + Group Key: t2.id, t1.id + -> Merge Full Join + -> Seq Scan on uk_t t1 + -> Materialize + -> Seq Scan on uk_t t2 +(6 rows) + +SELECT distinct t2.id, t1.id FROM uk_t t1 FULL JOIN uk_t t2 ON true; + id | id +----+---- + 1 | 1 +(1 row) + +EXPLAIN (COSTS OFF) +SELECT distinct t2.id, t1.id FROM uk_t t1 FULL JOIN uk_t t2 ON false; + QUERY PLAN +--------------------------------------------- + Unique + -> Sort + Sort Key: t2.id, t1.id + -> Merge Full Join + Join Filter: false + -> Seq Scan on uk_t t1 + -> Materialize + -> Seq Scan on uk_t t2 +(8 rows) + +SELECT distinct t2.id, t1.id FROM uk_t t1 FULL JOIN uk_t t2 ON false; + id | id +----+---- + 1 | + | 1 +(2 rows) + diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule index 675c567617..8d9ede091d 100644 --- a/src/test/regress/parallel_schedule +++ b/src/test/regress/parallel_schedule @@ -62,7 +62,7 @@ test: sanity_check # join depends on create_misc # ---------- test: select_into select_distinct select_distinct_on select_implicit select_having subselect union case join aggregatestransactions random portals arrays btree_index hash_index update delete namespace prepared_xacts - +test: uniquekey # ---------- # Another group of parallel tests # ---------- diff --git a/src/test/regress/sql/uniquekey.sql b/src/test/regress/sql/uniquekey.sql new file mode 100644 index 0000000000..4db0213a04 --- /dev/null +++ b/src/test/regress/sql/uniquekey.sql @@ -0,0 +1,104 @@ +CREATE TABLE uk_t (id int primary key, a int not null, b int not null, c int, d int, e int); + +explain (costs off) +select distinct id from uk_t; + +explain (costs off) +select distinct e from uk_t where id = e; + +create unique index on uk_t (a, b); +create unique index on uk_t (c, d); + +explain (costs off) +select distinct a, b from uk_t; + +explain (costs off) +select distinct c, d from uk_t; + +explain (costs off) +select distinct c, d from uk_t +where c > 0 and d > 0; + +explain (costs off) +select distinct d from uk_t +where c > 1 and d > 0; + +explain (costs off) +select distinct d from uk_t +where c = 1 and d > 0; + + +-- test the join case -- + +EXPLAIN (COSTS OFF) +SELECT distinct t1.id FROM uk_t t1 JOIN uk_t t2 ON t1.e = t2.id; + +EXPLAIN (COSTS OFF) +SELECT distinct t1.id FROM uk_t t1 JOIN uk_t t2 ON t1.e = t2.a and t1.d = t2.b; + + +EXPLAIN (COSTS OFF) +SELECT distinct t1.id FROM uk_t t1 LEFT JOIN uk_t t2 ON t1.e = t2.id; + +EXPLAIN (COSTS OFF) +SELECT distinct t1.id FROM uk_t t1 LEFT JOIN uk_t t2 ON t1.e = t2.a and t1.d = t2.b; + +-- outer join makes null values so distinct can't be a no-op. +EXPLAIN (COSTS OFF) +SELECT distinct t1.id FROM uk_t t1 RIGHT JOIN uk_t t2 ON t1.e = t2.id; + +-- single row case. + +-- single row is unqiue all the time. + +-- case 1: +-- t1.d is single row at base rel level, and because of t1.e = t2.id, it is +-- still single row after join. +EXPLAIN (COSTS OFF) +SELECT distinct t1.d FROM uk_t t1 JOIN uk_t t2 ON t1.e = t2.id and t1.id = 1; + +-- case 2 +-- any uniquekey which join with single row whose uniqueness can be kept. +-- t1.d is single row at base rel level because of t1.id = 1 +-- so t2's uniquekey can be kept when join with t2.any_column. + +EXPLAIN (COSTS OFF) +SELECT distinct t2.id FROM uk_t t1 JOIN uk_t t2 ON t2.e = t1.e and t1.id = 1; + +insert into uk_t SELECT 1, 2, 3, 4, 5, 6; + +-- combined uniquekey cases. +-- the combinations of uniquekeys from the 2 sides is still unique +-- no matter the join method and join clauses. +EXPLAIN (COSTS OFF) +SELECT distinct t2.id, t1.id FROM uk_t t1 JOIN uk_t t2 ON true; +SELECT distinct t2.id, t1.id FROM uk_t t1 JOIN uk_t t2 ON true; + +EXPLAIN (COSTS OFF) +SELECT distinct t2.id, t1.id FROM uk_t t1 JOIN uk_t t2 ON false; +SELECT distinct t2.id, t1.id FROM uk_t t1 JOIN uk_t t2 ON false; + + +-- BAD CASE: the distinct should be marked as noop in the below 4 cases, but +-- since it is nullable by outer join, so it can't be removed. However the rule +-- here should be if both uniquekey are not nullable, the combinations of them +-- is not *mutli* nullable even by outer join. It is not clear to me how to +-- implement it since I am just feeling not maintaining the not-null stuff +-- during the joins is pretty amazing, otherwise there are too many stuff to +-- do to cover this special case. +EXPLAIN (COSTS OFF) +SELECT distinct t2.id, t1.id FROM uk_t t1 LEFT JOIN uk_t t2 ON true; +SELECT distinct t2.id, t1.id FROM uk_t t1 LEFT JOIN uk_t t2 ON true; + +EXPLAIN (COSTS OFF) +SELECT distinct t2.id, t1.id FROM uk_t t1 LEFT JOIN uk_t t2 ON false; +SELECT distinct t2.id, t1.id FROM uk_t t1 LEFT JOIN uk_t t2 ON false; + + +EXPLAIN (COSTS OFF) +SELECT distinct t2.id, t1.id FROM uk_t t1 FULL JOIN uk_t t2 ON true; +SELECT distinct t2.id, t1.id FROM uk_t t1 FULL JOIN uk_t t2 ON true; + +EXPLAIN (COSTS OFF) +SELECT distinct t2.id, t1.id FROM uk_t t1 FULL JOIN uk_t t2 ON false; +SELECT distinct t2.id, t1.id FROM uk_t t1 FULL JOIN uk_t t2 ON false; -- 2.44.0 From 0351ade32c93e9360b11c7dd00b521cebb3c86dd Mon Sep 17 00:00:00 2001 From: Antonin Houska <ah@cybertec.at> Date: Thu, 18 Apr 2024 15:28:36 +0200 Subject: [PATCH 2/5] Moved functions to uniquekey.c. They are not called from other modules. --- src/backend/optimizer/path/equivclass.c | 48 ---------------------- src/backend/optimizer/path/uniquekey.c | 53 +++++++++++++++++++++++++ src/include/optimizer/paths.h | 6 --- 3 files changed, 53 insertions(+), 54 deletions(-) diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c index 892cd0000c..1d6bedb399 100644 --- a/src/backend/optimizer/path/equivclass.c +++ b/src/backend/optimizer/path/equivclass.c @@ -755,54 +755,6 @@ get_eclass_for_sort_expr(PlannerInfo *root, return newec; } -/* - * find_ec_position_matching_expr - * Locate the position of EquivalenceClass whose members matching - * the given expr, if any; return -1 if no match. - */ -int -find_ec_position_matching_expr(PlannerInfo *root, - Expr *expr, - RelOptInfo *baserel) -{ - int i = -1; - - while ((i = bms_next_member(baserel->eclass_indexes, i)) >= 0) - { - EquivalenceClass *ec = list_nth(root->eq_classes, i); - - if (find_ec_member_matching_expr(ec, expr, baserel->relids)) - return i; - } - return -1; -} - -/* - * build_ec_positions_for_exprs - * - * Given a list of exprs, find the related EC positions for each of - * them. if any exprs has no EC related, return NULL; - */ -Bitmapset * -build_ec_positions_for_exprs(PlannerInfo *root, List *exprs, RelOptInfo *rel) -{ - ListCell *lc; - Bitmapset *res = NULL; - - foreach(lc, exprs) - { - int pos = find_ec_position_matching_expr(root, lfirst(lc), rel); - - if (pos < 0) - { - bms_free(res); - return NULL; - } - res = bms_add_member(res, pos); - } - return res; -} - /* * find_ec_member_matching_expr * Locate an EquivalenceClass member matching the given expr, if any; diff --git a/src/backend/optimizer/path/uniquekey.c b/src/backend/optimizer/path/uniquekey.c index ecc7c6d802..d3f9aaaac8 100644 --- a/src/backend/optimizer/path/uniquekey.c +++ b/src/backend/optimizer/path/uniquekey.c @@ -26,6 +26,11 @@ static bool add_uniquekey_for_uniqueindex(PlannerInfo *root, IndexOptInfo *unique_index, List *truncatable_exprs, List *expr_opfamilies); +static Bitmapset *build_ec_positions_for_exprs(PlannerInfo *root, List *exprs, + RelOptInfo *rel); +static int find_ec_position_matching_expr(PlannerInfo *root, + Expr *expr, + RelOptInfo *baserel); static bool unique_ecs_useful_for_distinct(PlannerInfo *root, Bitmapset *ec_indexes); /* Helper functions to create UniqueKey. */ @@ -193,6 +198,54 @@ add_uniquekey_for_uniqueindex(PlannerInfo *root, IndexOptInfo *unique_index, return false; } +/* + * find_ec_position_matching_expr + * Locate the position of EquivalenceClass whose members matching + * the given expr, if any; return -1 if no match. + */ +static int +find_ec_position_matching_expr(PlannerInfo *root, + Expr *expr, + RelOptInfo *baserel) +{ + int i = -1; + + while ((i = bms_next_member(baserel->eclass_indexes, i)) >= 0) + { + EquivalenceClass *ec = list_nth(root->eq_classes, i); + + if (find_ec_member_matching_expr(ec, expr, baserel->relids)) + return i; + } + return -1; +} + +/* + * build_ec_positions_for_exprs + * + * Given a list of exprs, find the related EC positions for each of + * them. if any exprs has no EC related, return NULL; + */ +static Bitmapset * +build_ec_positions_for_exprs(PlannerInfo *root, List *exprs, RelOptInfo *rel) +{ + ListCell *lc; + Bitmapset *res = NULL; + + foreach(lc, exprs) + { + int pos = find_ec_position_matching_expr(root, lfirst(lc), rel); + + if (pos < 0) + { + bms_free(res); + return NULL; + } + res = bms_add_member(res, pos); + } + return res; +} + /* * make_uniquekey * Based on UnqiueKey rules, it is impossible for a UnqiueKey diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h index f3c683b2c9..b3fadd2f05 100644 --- a/src/include/optimizer/paths.h +++ b/src/include/optimizer/paths.h @@ -142,17 +142,11 @@ extern EquivalenceClass *get_eclass_for_sort_expr(PlannerInfo *root, extern EquivalenceMember *find_ec_member_matching_expr(EquivalenceClass *ec, Expr *expr, Relids relids); -extern int find_ec_position_matching_expr(PlannerInfo *root, - Expr *expr, - RelOptInfo *rel); extern EquivalenceMember *find_computable_ec_member(PlannerInfo *root, EquivalenceClass *ec, List *exprs, Relids relids, bool require_parallel_safe); -extern Bitmapset *build_ec_positions_for_exprs(PlannerInfo *root, - List *exprs, - RelOptInfo *rel); extern bool relation_can_be_sorted_early(PlannerInfo *root, RelOptInfo *rel, EquivalenceClass *ec, bool require_parallel_safe); -- 2.44.0 From 1a6ccb87c4be7d0f15ff8d273f28929ec0fc9264 Mon Sep 17 00:00:00 2001 From: Antonin Houska <ah@cybertec.at> Date: Thu, 18 Apr 2024 16:20:06 +0200 Subject: [PATCH 3/5] Do not use expression indexes to create unique keys. The existing code does not rely on predOK when checking uniqueness, so don't let the new code do that. --- src/backend/optimizer/path/uniquekey.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/src/backend/optimizer/path/uniquekey.c b/src/backend/optimizer/path/uniquekey.c index d3f9aaaac8..0b65ff135f 100644 --- a/src/backend/optimizer/path/uniquekey.c +++ b/src/backend/optimizer/path/uniquekey.c @@ -98,8 +98,12 @@ populate_baserel_uniquekeys(PlannerInfo *root, RelOptInfo *rel) { IndexOptInfo *index = (IndexOptInfo *) lfirst(lc); - if (!index->unique || !index->immediate || - (index->indpred != NIL && !index->predOK)) + /* + * Like in relation_has_unique_index_for(), we shouldn't use predOK + * because its validity might depend on join clauses, however here we + * test uniqueness of the join input. + */ + if (!index->unique || !index->immediate || index->indpred != NIL) continue; if (add_uniquekey_for_uniqueindex(root, index, -- 2.44.0 From 47efe758434ba8a58de50b0124853de29f48f6b4 Mon Sep 17 00:00:00 2001 From: Antonin Houska <ah@cybertec.at> Date: Thu, 18 Apr 2024 16:40:36 +0200 Subject: [PATCH 4/5] Teach the planner to pass UniqueKey nodes from a subquery to the parent query. So far it was only possible to recognize uniqueness of the output of relatively simple subqueries. With this feature, more complex subqueries can be checked, even those "hidden" in a view with security_barrier enabled. Thus there should be more opportunities to apply specific optimizations. Typically, when the inner relation of join is unique, then INNER join can be replaced with SEMI join. --- src/backend/optimizer/path/allpaths.c | 17 + src/backend/optimizer/path/equivclass.c | 5 +- src/backend/optimizer/path/indxpath.c | 6 +- src/backend/optimizer/path/pathkeys.c | 6 +- src/backend/optimizer/path/uniquekey.c | 487 ++++++++++++++++++++-- src/backend/optimizer/plan/analyzejoins.c | 46 ++ src/backend/optimizer/plan/planner.c | 84 +++- src/include/nodes/pathnodes.h | 26 +- src/include/optimizer/paths.h | 10 + src/test/regress/expected/join.out | 66 +-- src/test/regress/expected/subselect.out | 24 ++ src/test/regress/sql/join.sql | 13 +- src/test/regress/sql/subselect.sql | 13 + 13 files changed, 729 insertions(+), 74 deletions(-) diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index ced0fce415..faedc4a247 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -2667,6 +2667,23 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel, return; } + /* + * Translate the subquery unique keys so that 'rel' can use them. + * + * The final relation is not guaranteed to have the target set, so + * retrieve it from ->upper_targets. + * + * Unique keys are currently not supported for RELOPT_OTHER_MEMBER_REL. + */ + if (rel->reloptkind == RELOPT_BASEREL) + { + PathTarget *sub_final_target; + + sub_final_target = rel->subroot->upper_targets[UPPERREL_FINAL]; + convert_unique_keys_for_rel(root, rel, NULL, sub_final_rel, + sub_final_target); + } + /* * Mark rel with estimated output rows, width, etc. Note that we have to * do this before generating outer-query paths, else cost_subqueryscan is diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c index 1d6bedb399..95d4722b41 100644 --- a/src/backend/optimizer/path/equivclass.c +++ b/src/backend/optimizer/path/equivclass.c @@ -591,9 +591,9 @@ get_eclass_for_sort_expr(PlannerInfo *root, Oid collation, Index sortref, Relids rel, + JoinDomain *jdomain, bool create_it) { - JoinDomain *jdomain; Relids expr_relids; EquivalenceClass *newec; EquivalenceMember *newem; @@ -609,7 +609,8 @@ get_eclass_for_sort_expr(PlannerInfo *root, * Since SortGroupClause nodes are top-level expressions (GROUP BY, ORDER * BY, etc), they can be presumed to belong to the top JoinDomain. */ - jdomain = linitial_node(JoinDomain, root->join_domains); + if (jdomain == NULL) + jdomain = linitial_node(JoinDomain, root->join_domains); /* * Scan through the existing EquivalenceClasses for a match diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c index 2230b13104..87081627ac 100644 --- a/src/backend/optimizer/path/indxpath.c +++ b/src/backend/optimizer/path/indxpath.c @@ -151,7 +151,6 @@ static IndexClause *match_clause_to_indexcol(PlannerInfo *root, RestrictInfo *rinfo, int indexcol, IndexOptInfo *index); -static bool IsBooleanOpfamily(Oid opfamily); static IndexClause *match_boolean_index_clause(PlannerInfo *root, RestrictInfo *rinfo, int indexcol, IndexOptInfo *index); @@ -2275,8 +2274,11 @@ match_clause_to_indexcol(PlannerInfo *root, * If the opfamily OID is in the range of built-in objects, we can rely * on hard-wired knowledge of which built-in opfamilies support this. * For extension opfamilies, there's no choice but to do a catcache lookup. + * + * TODO If the call from equivclass.c remains, move this function to better + * place. */ -static bool +bool IsBooleanOpfamily(Oid opfamily) { if (opfamily < FirstNormalObjectId) diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c index 1d61881a6b..9e69085b5d 100644 --- a/src/backend/optimizer/path/pathkeys.c +++ b/src/backend/optimizer/path/pathkeys.c @@ -233,7 +233,7 @@ make_pathkey_from_sortinfo(PlannerInfo *root, /* Now find or (optionally) create a matching EquivalenceClass */ eclass = get_eclass_for_sort_expr(root, expr, opfamilies, opcintype, collation, - sortref, rel, create_it); + sortref, rel, NULL, create_it); /* Fail if no EC and !create_it */ if (!eclass) @@ -1122,6 +1122,7 @@ convert_subquery_pathkeys(PlannerInfo *root, RelOptInfo *rel, sub_eclass->ec_collation, 0, rel->relids, + NULL, false); /* @@ -1203,6 +1204,7 @@ convert_subquery_pathkeys(PlannerInfo *root, RelOptInfo *rel, sub_expr_coll, 0, rel->relids, + NULL, false); /* @@ -1467,6 +1469,7 @@ initialize_mergeclause_eclasses(PlannerInfo *root, RestrictInfo *restrictinfo) ((OpExpr *) clause)->inputcollid, 0, NULL, + NULL, true); restrictinfo->right_ec = get_eclass_for_sort_expr(root, @@ -1476,6 +1479,7 @@ initialize_mergeclause_eclasses(PlannerInfo *root, RestrictInfo *restrictinfo) ((OpExpr *) clause)->inputcollid, 0, NULL, + NULL, true); } diff --git a/src/backend/optimizer/path/uniquekey.c b/src/backend/optimizer/path/uniquekey.c index 0b65ff135f..4200663b82 100644 --- a/src/backend/optimizer/path/uniquekey.c +++ b/src/backend/optimizer/path/uniquekey.c @@ -19,7 +19,20 @@ #include "nodes/pathnodes.h" #include "optimizer/optimizer.h" #include "optimizer/paths.h" +#include "utils/lsyscache.h" +/* + * Information on a column of an unique index for which we are trying to + * create an unique key. + */ +typedef struct UniqueKeyExpr +{ + /* The expression itself. */ + Expr *expr; + + /* Operator family to find / create suitable equivalence class. */ + Oid opfamily; +} UniqueKeyExpr; /* Functions to populate UniqueKey */ static bool add_uniquekey_for_uniqueindex(PlannerInfo *root, @@ -29,23 +42,24 @@ static bool add_uniquekey_for_uniqueindex(PlannerInfo *root, static Bitmapset *build_ec_positions_for_exprs(PlannerInfo *root, List *exprs, RelOptInfo *rel); static int find_ec_position_matching_expr(PlannerInfo *root, - Expr *expr, - RelOptInfo *baserel); + RelOptInfo *baserel, + Expr *expr, List *opfamilies); static bool unique_ecs_useful_for_distinct(PlannerInfo *root, Bitmapset *ec_indexes); /* Helper functions to create UniqueKey. */ -static UniqueKey * make_uniquekey(Bitmapset *eclass_indexes, +static UniqueKey * make_uniquekey(Bitmapset *item_indexes, + List *opfamily_lists, bool useful_for_distinct); static void mark_rel_singlerow(RelOptInfo *rel, int relid); static UniqueKey * rel_singlerow_uniquekey(RelOptInfo *rel); -static bool uniquekey_contains_multinulls(PlannerInfo *root, RelOptInfo *rel, UniqueKey * ukey); /* Debug only */ static void print_uniquekey(PlannerInfo *root, RelOptInfo *rel); static bool uniquekey_contains_in(PlannerInfo *root, UniqueKey * ukey, Bitmapset *ecs, Relids relids); static bool is_uniquekey_useful_afterjoin(PlannerInfo *root, UniqueKey * ukey, RelOptInfo *joinrel); +static int find_expr_pos_in_list(Expr *expr, List *list); /* * populate_baserel_uniquekeys @@ -61,6 +75,9 @@ populate_baserel_uniquekeys(PlannerInfo *root, RelOptInfo *rel) List *truncatable_exprs = NIL, *expr_opfamilies = NIL; + if (rel->reloptkind != RELOPT_BASEREL) + return; + /* * Currently we only use UniqueKey for mark-distinct-as-noop case, so if * there is no-distinct-clause at all, we can ignore the maintenance at @@ -132,7 +149,8 @@ add_uniquekey_for_uniqueindex(PlannerInfo *root, IndexOptInfo *unique_index, ListCell *indexpr_item; RelOptInfo *rel = unique_index->rel; bool used_for_distinct; - int c; + int c, i; + List *opfamily_lists = NIL; indexpr_item = list_head(unique_index->indexprs); @@ -143,6 +161,7 @@ add_uniquekey_for_uniqueindex(PlannerInfo *root, IndexOptInfo *unique_index, bool matched_const = false; ListCell *lc1, *lc2; + UniqueKeyExpr *uexpr; if (attr > 0) { @@ -174,7 +193,10 @@ add_uniquekey_for_uniqueindex(PlannerInfo *root, IndexOptInfo *unique_index, if (matched_const) continue; - unique_exprs = lappend(unique_exprs, expr); + uexpr = palloc_object(UniqueKeyExpr); + uexpr->expr = expr; + uexpr->opfamily = unique_index->opfamily[c]; + unique_exprs = lappend(unique_exprs, uexpr); } if (unique_exprs == NIL) @@ -193,35 +215,94 @@ add_uniquekey_for_uniqueindex(PlannerInfo *root, IndexOptInfo *unique_index, if (unique_ecs == NULL) return false; + /* Collect the operator families. */ + i = -1; + while ((i = bms_next_member(unique_ecs, i)) >= 0) + { + EquivalenceClass *ec = list_nth(root->eq_classes, i); + + opfamily_lists = lappend(opfamily_lists, ec->ec_opfamilies); + } + used_for_distinct = unique_ecs_useful_for_distinct(root, unique_ecs); rel->uniquekeys = lappend(rel->uniquekeys, - make_uniquekey(unique_ecs, + make_uniquekey(unique_ecs, opfamily_lists, used_for_distinct)); return false; } /* * find_ec_position_matching_expr - * Locate the position of EquivalenceClass whose members matching - * the given expr, if any; return -1 if no match. + * Locate the position of EquivalenceClass whose members matching the + * given expr. Try to create EC if suitable one does not exist. Return -1 + * if there is not enough information in the catalog about the data type + * to create the EC. */ static int -find_ec_position_matching_expr(PlannerInfo *root, - Expr *expr, - RelOptInfo *baserel) +find_ec_position_matching_expr(PlannerInfo *root, RelOptInfo *baserel, + Expr *expr, List *opfamilies) { int i = -1; + EquivalenceClass *ec; + int ec_index; + JoinDomain *jdomain; + ListCell *lc; + + /* + * XXX Currently the function is only used to build unique keys, so we + * don't create ECs for boolean columns: normal EC processing does not + * create them as well and build_index_pathkeys() considers boolean + * pathkeys redundant, so missing boolean EC is o.k. However, if we + * created the boolean ECs, build_index_pathkeys() would return the + * corresponding pathkeys, and those could mismatch query_pathkeys. + * Shouldn't this be handled in pathkeys_useful_for_ordering()? + */ + foreach(lc, opfamilies) + { + /* XXX The result should be the same for each item of the list. */ + if (IsBooleanOpfamily(lfirst_oid(lc))) + return -1; + } while ((i = bms_next_member(baserel->eclass_indexes, i)) >= 0) { - EquivalenceClass *ec = list_nth(root->eq_classes, i); + List *diff; + + ec = list_nth(root->eq_classes, i); + + /* + * The EC must understand equality in the same way as the unique index + * that guarantees the uniqueness. That is, ec_opfamilies must contain + * all the opfamilies passed by the caller. + */ + diff = list_difference_oid(opfamilies, ec->ec_opfamilies); + if (list_length(diff) > 0) + continue; if (find_ec_member_matching_expr(ec, expr, baserel->relids)) return i; } - return -1; + + /* Create the EC. */ + jdomain = makeNode(JoinDomain); + jdomain->jd_relids = pull_varnos(root, (Node *) expr); + ec = get_eclass_for_sort_expr(root, expr, + opfamilies, + exprType((Node *) expr), + exprCollation((Node *) expr), + 0, + NULL, + jdomain, + true); + ec_index = list_length(root->eq_classes) - 1; + + /* The new EC should now be known to both 'root' and 'baserel'. */ + Assert(ec == llast(root->eq_classes)); + Assert(bms_is_member(ec_index, baserel->eclass_indexes)); + + return ec_index; } /* @@ -238,7 +319,10 @@ build_ec_positions_for_exprs(PlannerInfo *root, List *exprs, RelOptInfo *rel) foreach(lc, exprs) { - int pos = find_ec_position_matching_expr(root, lfirst(lc), rel); + UniqueKeyExpr *uexpr = (UniqueKeyExpr *) lfirst(lc); + int pos = find_ec_position_matching_expr(root, rel, + uexpr->expr, + list_make1_oid(uexpr->opfamily)); if (pos < 0) { @@ -248,20 +332,23 @@ build_ec_positions_for_exprs(PlannerInfo *root, List *exprs, RelOptInfo *rel) res = bms_add_member(res, pos); } return res; + } /* * make_uniquekey * Based on UnqiueKey rules, it is impossible for a UnqiueKey - * which have eclass_indexes and relid both set. This function just - * handle eclass_indexes case. + * which have item_indexes and relid both set. This function just + * handle item_indexes case. */ static UniqueKey * -make_uniquekey(Bitmapset *eclass_indexes, bool useful_for_distinct) +make_uniquekey(Bitmapset *item_indexes, List *opfamily_lists, + bool useful_for_distinct) { UniqueKey *ukey = makeNode(UniqueKey); - ukey->eclass_indexes = eclass_indexes; + ukey->item_indexes = item_indexes; + ukey->opfamily_lists = opfamily_lists; ukey->relid = 0; ukey->use_for_distinct = useful_for_distinct; return ukey; @@ -321,7 +408,7 @@ print_uniquekey(PlannerInfo *root, RelOptInfo *rel) UniqueKey *ukey = lfirst_node(UniqueKey, lc); elog(INFO, "UNIQUEKEY{indexes=%s, singlerow_rels=%d, use_for_distinct=%d}", - bmsToString(ukey->eclass_indexes), + bmsToString(ukey->item_indexes), ukey->relid, ukey->use_for_distinct); } @@ -352,12 +439,12 @@ var_is_nullable(PlannerInfo *root, Var *var, RelOptInfo *rel) * * Check if the uniquekey contains nulls values. */ -static bool +bool uniquekey_contains_multinulls(PlannerInfo *root, RelOptInfo *rel, UniqueKey * ukey) { int i = -1; - while ((i = bms_next_member(ukey->eclass_indexes, i)) >= 0) + while ((i = bms_next_member(ukey->item_indexes, i)) >= 0) { EquivalenceClass *ec = list_nth_node(EquivalenceClass, root->eq_classes, i); ListCell *lc; @@ -421,7 +508,7 @@ relation_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *distinct_path { UniqueKey *ukey = lfirst_node(UniqueKey, lc); - if (bms_is_subset(ukey->eclass_indexes, pathkey_bm) && + if (bms_is_subset(ukey->item_indexes, pathkey_bm) && !uniquekey_contains_multinulls(root, rel, ukey)) return true; } @@ -611,6 +698,10 @@ populate_joinrel_uniquekeys(PlannerInfo *root, RelOptInfo *joinrel, foreach(lc2, innerrel->uniquekeys) { UniqueKey *inner_ukey = lfirst(lc2); + Bitmapset *indexes_new; + List *opfamily_lists_new = NIL; + int i; + ListCell *lo, *li; if (!is_uniquekey_useful_afterjoin(root, inner_ukey, joinrel)) continue; @@ -618,9 +709,35 @@ populate_joinrel_uniquekeys(PlannerInfo *root, RelOptInfo *joinrel, /* singlerow will make the outeruk_still_valid true */ Assert(!uniquekey_is_singlerow(inner_ukey)); + /* Assemble opfamily_lists_new in the correct order. */ + i = -1; + indexes_new = bms_union(outer_ukey->item_indexes, inner_ukey->item_indexes); + lo = list_head(outer_ukey->opfamily_lists); + li = list_head(inner_ukey->opfamily_lists); + while ((i = bms_next_member(indexes_new, i)) >= 0) + { + List *l; + + if (bms_is_member(i, outer_ukey->item_indexes)) + { + l = lfirst(lo); + lo = lnext(outer_ukey->opfamily_lists, lo); + } + else + { + Assert(bms_is_member(i, inner_ukey->item_indexes)); + + l = lfirst(li); + li = lnext(inner_ukey->opfamily_lists, li); + } + + opfamily_lists_new = lappend(opfamily_lists_new, l); + } + joinrel->uniquekeys = lappend(joinrel->uniquekeys, make_uniquekey( - bms_union(outer_ukey->eclass_indexes, inner_ukey->eclass_indexes), + indexes_new, + opfamily_lists_new, outer_ukey->use_for_distinct || inner_ukey->use_for_distinct)); } } @@ -640,7 +757,7 @@ uniquekey_contains_in(PlannerInfo *root, UniqueKey * ukey, Bitmapset *ecs, Relid return bms_is_member(ukey->relid, relids); } - return bms_is_subset(ukey->eclass_indexes, ecs); + return bms_is_subset(ukey->item_indexes, ecs); } @@ -656,7 +773,7 @@ uniquekey_useful_for_merging(PlannerInfo *root, UniqueKey * ukey, RelOptInfo *re int i = -1; - while ((i = bms_next_member(ukey->eclass_indexes, i)) >= 0) + while ((i = bms_next_member(ukey->item_indexes, i)) >= 0) { EquivalenceClass *ec = list_nth(root->eq_classes, i); ListCell *j; @@ -709,3 +826,321 @@ is_uniquekey_useful_afterjoin(PlannerInfo *root, UniqueKey * ukey, RelOptInfo *j return uniquekey_useful_for_merging(root, ukey, joinrel); } + + +/* + * convert_unique_keys_for_rel + * + * Convert unique keys of 'input_relation' and assign them to 'rel'. The + * comments of UniqueKey explain the difference in the format. + * + * This function assumes that at least one of the relations is "upper + * relation". If 'input_rel' is upper and 'rel' is base/join rel, pass NULL + * for 'target'. + */ +void +convert_unique_keys_for_rel(PlannerInfo *root, RelOptInfo *rel, + PathTarget *target, RelOptInfo *input_rel, + PathTarget *input_target) +{ + ListCell *lc1; + + /* No unique keys, in case we return early. */ + rel->uniquekeys = NIL; + + /* + * Set-returning functions can break uniqueness. This does not necessarily + * affect all of the upper relations, but checking is not trivial and + * should be done by the caller rather than by this function. Make the + * logic simple for now. + */ + if (root->parse->hasTargetSRFs) + return; + + /* No keys to convert? */ + if (input_rel->uniquekeys == NIL) + return; + + /* + * If both relations are upper and if they have the same target, + * just copy the uniquekeys unmodified. + */ + if (IS_UPPER_REL(rel) && IS_UPPER_REL(input_rel) && + target == input_target) + { + rel->uniquekeys = copyObject(input_rel->uniquekeys); + return; + } + + foreach(lc1, input_rel->uniquekeys) + { + UniqueKey *key = lfirst_node(UniqueKey, lc1); + Bitmapset *item_indexes_new = NULL; + List *opfamily_lists_new = NIL; + int matched; + int i = -1; + ListCell *lc2; + + if (!IS_UPPER_REL(input_rel)) + { + /* Conversion between base/join rels is currently not needed. */ + IS_UPPER_REL(rel); + + /* + * For each key, check its ECs and find the EM present in + * 'target'. + */ + while ((i = bms_next_member(key->item_indexes, i)) >= 0) + { + EquivalenceClass *ec = list_nth(root->eq_classes, i); + + matched = -1; + foreach(lc2, ec->ec_members) + { + EquivalenceMember *em; + Expr *expr; + + em = lfirst_node(EquivalenceMember, lc2); + expr = em->em_expr; + + /* EMs can be wrapped in RelabelType. */ + while (expr && IsA(expr, RelabelType)) + expr = ((RelabelType *) expr)->arg; + + /* + * Currently we only accept Vars, for simplicity (and + * because the unique key shouldn't contain other + * nodes). In general, we look for expressions that map + * unique input value to unique output value. Not sure + * though how we can recognize them easily. + */ + if (!IsA(expr, Var)) + continue; + + matched = find_expr_pos_in_list(expr, target->exprs); + if (matched >= 0) + break; + } + + if (matched >= 0) + { + item_indexes_new = bms_add_member(item_indexes_new, + matched); + opfamily_lists_new = lappend(opfamily_lists_new, + ec->ec_opfamilies); + } + else + { + /* The caller does not want an incomplete unique key. */ + bms_free(item_indexes_new); + item_indexes_new = NULL; + list_free(opfamily_lists_new); + opfamily_lists_new = NIL; + break; + } + } + } + else + { + ListCell *lc2 = list_head(key->opfamily_lists); + + /* IS_UPPER_REL(input_rel) */ + + Assert(bms_num_members(key->item_indexes) == + list_length(key->opfamily_lists)); + + /* + * For an upper relation, 'index_items' point to expressions of an + * existing target. For each key, find its target expressions in + * 'target' or (if target==NULL) in the members of the ECs 'rel' + * belongs to. + */ + while ((i = bms_next_member(key->item_indexes, i)) >= 0) + { + Expr *expr; + + expr = (Expr *) list_nth(input_target->exprs, i); + + if (target) + { + Assert(IS_UPPER_REL(rel)); + + matched = find_expr_pos_in_list(expr, target->exprs); + } + else + { + ListCell *lc3; + Var *target_var = NULL; + + /* + * 'rel' should be a base relation, not upper or join + * relation. + */ + Assert(!IS_UPPER_REL(rel)); + Assert(rel->relid > 0); + + /* + * Base relation should use plain Var nodes to reference + * the subquery target. Find the Var in the base relation + * target that points to this expression. + */ + foreach(lc3, rel->reltarget->exprs) + { + Var *var; + + /* + * If a general expression happens to be there, give + * up because it can break uniqueness of the relation + * output. + */ + if (!IsA(lfirst(lc3), Var)) + return; + + var = lfirst_node(Var, lc3); + + if (var->varno == rel->relid && + var->varattno == (i + 1)) + { + target_var = var; + break; + } + } + if (target_var) + { + List *opfamilies = lfirst(lc2); + + matched = find_ec_position_matching_expr(root, rel, + (Expr *) target_var, + opfamilies); + lc2 = lnext(key->opfamily_lists, lc2); + } + else + matched = -1; + + if (matched >= 0) + { + /* + * By generating the unique key the subquery + * guarantees that the value of the output column + * cannot be NULL. + */ + rel->notnullattrs = bms_add_member(rel->notnullattrs, + target_var->varattno - FirstLowInvalidHeapAttributeNumber); + } + else + matched = -1; + } + + if (matched >= 0) + item_indexes_new = bms_add_member(item_indexes_new, + matched); + else + { + /* The caller does not want an incomplete unique key. */ + bms_free(item_indexes_new); + item_indexes_new = NULL; + break; + } + } + } + + if (item_indexes_new) + { + UniqueKey *key_new; + + if (!IS_UPPER_REL(input_rel)) + Assert(opfamily_lists_new); + else + { + Assert(opfamily_lists_new == NIL); + + opfamily_lists_new = copyObject(key->opfamily_lists); + } + + + key_new = make_uniquekey(item_indexes_new, opfamily_lists_new, + true); + rel->uniquekeys = lappend(rel->uniquekeys, key_new); + } + } +} + +List * +create_uniquekeys_for_sortop(SetOperationStmt *topop, List *targetlist) +{ + ListCell *l, *lg; + int i; + Bitmapset *key_idxs = NULL; + List *opfamily_lists = NIL; + UniqueKey *key; + + if (topop->all) + { + /* + * There is no easy way to combine unique keys of the individual + * queries. + */ + return NIL; + } + + /* Iterate like in query_is_distinct_for() */ + lg = list_head(topop->groupClauses); + i = 0; + foreach(l, targetlist) + { + TargetEntry *tle = (TargetEntry *) lfirst(l); + SortGroupClause *sgc; + List *sgc_opfamilies; + + if (tle->resjunk) + { + i++; + continue; /* ignore resjunk columns */ + } + + key_idxs = bms_add_member(key_idxs, i++); + + /* non-resjunk columns should have grouping clauses */ + Assert(lg != NULL); + sgc = (SortGroupClause *) lfirst(lg); + lg = lnext(topop->groupClauses, lg); + sgc_opfamilies = get_mergejoin_opfamilies(sgc->eqop); + if (sgc_opfamilies == NIL) + { + /* XXX Shouldn' this be ERROR? */ + bms_free(key_idxs); + key_idxs = NULL; + break; + } + opfamily_lists = lappend(opfamily_lists, sgc_opfamilies); + } + + if (key_idxs == NULL) + return NIL; + + key = make_uniquekey(key_idxs, opfamily_lists, true); + + return list_make1(key); +} + +/* + * Return a zero-based index of an expression in a list, or -1 if not found. + */ +static int +find_expr_pos_in_list(Expr *expr, List *list) +{ + ListCell *lc; + int idx = 0; + + foreach(lc, list) + { + Expr *lexpr = (Expr *) lfirst(lc); + + if (equal(expr, lexpr)) + return idx; + + idx++; + } + + return -1; +} diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c index 506fccd20c..3029d1a5b0 100644 --- a/src/backend/optimizer/plan/analyzejoins.c +++ b/src/backend/optimizer/plan/analyzejoins.c @@ -839,6 +839,13 @@ reduce_unique_semijoins(PlannerInfo *root) static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel) { + /* + * If uniquekeys could already be setup, the relation definitely supports + * distinctness. + */ + if (rel->uniquekeys) + return true; + /* We only know about baserels ... */ if (rel->reloptkind != RELOPT_BASEREL) return false; @@ -899,6 +906,45 @@ static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list, List **extra_clauses) { + /* + * Use unique keys if already available, but do not skip filling of + * extra_clauses if caller is interested in it. + */ + if (rel->uniquekeys && extra_clauses == NULL) + { + Relids ecs = NULL; + ListCell *l; + + /* First, express the clauses in the form of EC indexes. */ + foreach(l, clause_list) + { + RestrictInfo *rinfo = lfirst_node(RestrictInfo, l); + EquivalenceClass *ec; + + if (rinfo->outer_is_left) + ec = rinfo->right_ec; + else + ec = rinfo->left_ec; + + /* EC pointers might not have been updated yet. */ + while (ec->ec_merged) + ec = ec->ec_merged; + + ecs = bms_add_member(ecs, + list_member_ptr_pos(root->eq_classes, ec)); + } + + /* Now check if a clause exists for each unique key expression. */ + foreach(l, rel->uniquekeys) + { + UniqueKey *ukey = lfirst_node(UniqueKey, l); + + if (bms_is_subset(ukey->item_indexes, ecs) && + !uniquekey_contains_multinulls(root, rel, ukey)) + return true; + } + } + /* * We could skip a couple of tests here if we assume all callers checked * rel_supports_distinctness first, but it doesn't seem worth taking any diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index e40c729e7e..075f98ed16 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -1318,6 +1318,7 @@ grouping_planner(PlannerInfo *root, double tuple_fraction, RelOptInfo *final_rel; FinalPathExtraData extra; ListCell *lc; + PathTarget *sort_input_target = NULL; /* Tweak caller-supplied tuple_fraction if have LIMIT/OFFSET */ if (parse->limitCount || parse->limitOffset) @@ -1363,6 +1364,12 @@ grouping_planner(PlannerInfo *root, double tuple_fraction, /* Also extract the PathTarget form of the setop result tlist */ final_target = current_rel->cheapest_total_path->pathtarget; + /* + * If in a subquery, the target might be needed by the parent query, + * in order to convert unique keys. + */ + root->upper_targets[UPPERREL_FINAL] = final_target; + /* And check whether it's parallel safe */ final_target_parallel_safe = is_parallel_safe(root, (Node *) final_target->exprs); @@ -1395,7 +1402,6 @@ grouping_planner(PlannerInfo *root, double tuple_fraction, else { /* No set operations, do regular planning */ - PathTarget *sort_input_target; List *sort_input_targets; List *sort_input_targets_contain_srfs; bool sort_input_target_parallel_safe; @@ -1645,10 +1651,12 @@ grouping_planner(PlannerInfo *root, double tuple_fraction, /* * Save the various upper-rel PathTargets we just computed into - * root->upper_targets[]. The core code doesn't use this, but it - * provides a convenient place for extensions to get at the info. For - * consistency, we save all the intermediate targets, even though some - * of the corresponding upperrels might not be needed for this query. + * root->upper_targets[]. In the core we need it to convert unique + * keys of a subquery so the parent query can use them. Besides that + * it provides a convenient place for extensions to get at the info. + * For consistency, we save all the intermediate targets, even though + * some of the corresponding upperrels might not be needed for this + * query. */ root->upper_targets[UPPERREL_FINAL] = final_target; root->upper_targets[UPPERREL_ORDERED] = final_target; @@ -1720,6 +1728,9 @@ grouping_planner(PlannerInfo *root, double tuple_fraction, */ if (parse->sortClause) { + RelOptInfo *input_rel = current_rel; + PathTarget *input_target; + current_rel = create_ordered_paths(root, current_rel, final_target, @@ -1731,13 +1742,53 @@ grouping_planner(PlannerInfo *root, double tuple_fraction, adjust_paths_for_srfs(root, current_rel, final_targets, final_targets_contain_srfs); + + if (!parse->setOperations) + input_target = sort_input_target; + else + /* + * Unlike other upper rels, UPPERREL_SETOP should have the target + * initialized properly + */ + input_target = input_rel->reltarget; + + convert_unique_keys_for_rel(root, current_rel, final_target, + input_rel, input_target); } /* * Now we are prepared to build the final-output upperrel. */ final_rel = fetch_upper_rel(root, UPPERREL_FINAL, NULL); - /* simple_copy_uniquekeys(final_rel, current_rel); */ + + + if (!parse->setOperations) + { + /* + * Make sure unique keys are valid for final_rel. + */ + convert_unique_keys_for_rel(root, final_rel, final_target, current_rel, + final_target); + + } + /* + * TODO Consider if this is worth the effort. For set operations other + * than UNION ALL, the detection of uniqueness is simple enough to be + * handled by query_is_distinct_for(). (That function probably should not + * be removed because it can be useful before the unique keys have been + * initialized.) + */ + else + { + SetOperationStmt *topop; + + Assert(IS_UPPER_REL(current_rel)); + + /* For set operations, we can create the keys from scratch. */ + topop = castNode(SetOperationStmt, parse->setOperations); + final_rel->uniquekeys = create_uniquekeys_for_sortop(topop, + root->parse->targetList); + } /* * If the input rel is marked consider_parallel and there's nothing that's @@ -3680,6 +3731,14 @@ create_grouping_paths(PlannerInfo *root, grouped_rel = make_grouping_rel(root, input_rel, target, target_parallel_safe, parse->havingQual); + /* Try to initialize unique keys. */ + if (!root->parse->groupingSets) + convert_unique_keys_for_rel(root, grouped_rel, target, input_rel, + input_rel->reltarget); + else + /* Grouping sets can break uniqueness. */ + grouped_rel->uniquekeys = NIL; + /* * Create either paths for a degenerate grouping or paths for ordinary * grouping, as appropriate. @@ -4441,6 +4500,10 @@ create_window_paths(PlannerInfo *root, /* For now, do all work in the (WINDOW, NULL) upperrel */ window_rel = fetch_upper_rel(root, UPPERREL_WINDOW, NULL); + /* Make sure unique keys are valid for window_rel. */ + convert_unique_keys_for_rel(root, window_rel, output_target, input_rel, + input_target); + /* * If the input relation is not parallel-safe, then the window relation * can't be parallel-safe, either. Otherwise, we need to examine the @@ -4664,9 +4727,14 @@ create_distinct_paths(PlannerInfo *root, RelOptInfo *input_rel, distinct_rel = fetch_upper_rel(root, UPPERREL_DISTINCT, NULL); /* - * populate_uniquekeys_from_pathkeys(root, distinct_rel, - * root->distinct_pathkeys); + * Make sure unique keys are valid for distinct_rel. + * + * Both input and output relation should use the same target in this case, + * however the targets are not necessarily assigned to the relations, so + * pass them explicitly. */ + convert_unique_keys_for_rel(root, distinct_rel, target, input_rel, + target); /* * We don't compute anything at this level, so distinct_rel will be diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h index 4e879b0b7c..3b53b82ba1 100644 --- a/src/include/nodes/pathnodes.h +++ b/src/include/nodes/pathnodes.h @@ -1487,13 +1487,33 @@ typedef struct UniqueKey pg_node_attr(no_read, no_query_jumble) NodeTag type; - Bitmapset *eclass_indexes; /* Indexes in PlannerInfo's eq_class list of - * ECs that is unique for a RelOptInfo. */ + + /* + * A subset of columns that is unique across the output rows of the + * underlying relation. + * + * For base relations and joins, the indexes point to equivalence classes + * the columns belong to. (Essentially it's a subset of + * RelOptInfo.eclass_indexes) + * + * For upper relations, the indexes point to the items of the relation + * target. + */ + Bitmapset *item_indexes; + + /* + * Here we store the operator families for each member of + * 'item_indexes'. (i-th item of the list corresponds to i-th member of + * 'item_indexes'). This is needed to create ECs in the parent query if + * the upper relation represents a subquery. + */ + List *opfamily_lists; + int relid; bool use_for_distinct; /* true if it is used in distinct-pathkey, * in this case we would never check if we * should discard it during join search. */ -} UniqueKey; +} UniqueKey; /* diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h index b3fadd2f05..4b99ce6557 100644 --- a/src/include/optimizer/paths.h +++ b/src/include/optimizer/paths.h @@ -82,6 +82,8 @@ extern bool match_index_to_operand(Node *operand, int indexcol, IndexOptInfo *index); extern void check_index_predicates(PlannerInfo *root, RelOptInfo *rel); +extern bool IsBooleanOpfamily(Oid opfamily); + /* * tidpath.c * routines to generate tid paths @@ -138,6 +140,7 @@ extern EquivalenceClass *get_eclass_for_sort_expr(PlannerInfo *root, Oid collation, Index sortref, Relids rel, + JoinDomain *jdomain, bool create_it); extern EquivalenceMember *find_ec_member_matching_expr(EquivalenceClass *ec, Expr *expr, @@ -277,4 +280,11 @@ extern void populate_baserel_uniquekeys(PlannerInfo *root, extern void populate_joinrel_uniquekeys(PlannerInfo *root, RelOptInfo *joinrel, RelOptInfo *outerrel, RelOptInfo *innerrel, List *restrictlist, JoinType jointype); +extern void convert_unique_keys_for_rel(PlannerInfo *root, RelOptInfo *rel, + PathTarget *target, RelOptInfo *input_rel, + PathTarget *input_target); +extern List *create_uniquekeys_for_sortop(SetOperationStmt *topop, + List *targetlist); +extern bool uniquekey_contains_multinulls(PlannerInfo *root, RelOptInfo *rel, + UniqueKey * ukey); #endif /* PATHS_H */ diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out index 675b5b1484..571198a86a 100644 --- a/src/test/regress/expected/join.out +++ b/src/test/regress/expected/join.out @@ -5648,38 +5648,52 @@ select d.* from d left join (select distinct * from b) s Seq Scan on d (1 row) --- join removal is not possible when the GROUP BY contains a column that is --- not in the join condition. (Note: as of 9.6, we notice that b.id is a --- primary key and so drop b.c_id from the GROUP BY of the resulting plan; --- but this happens too late for join removal in the outer plan level.) -explain (costs off) +-- when the GROUP BY contains a column that is not in the join condition, join +-- removal takes place too, due to "unique keys" (Note: as of 9.6, we notice +-- that b.id is a primary key and so drop b.c_id from the GROUP BY of the +-- resulting plan; but this happens too late for join removal in the outer +-- plan level.) +explain (costs off, verbose on) select d.* from d left join (select * from b group by b.id, b.c_id) s on d.a = s.id; - QUERY PLAN ------------------------------------------- - Merge Right Join - Merge Cond: (b.id = d.a) - -> Group - Group Key: b.id - -> Index Scan using b_pkey on b - -> Sort - Sort Key: d.a - -> Seq Scan on d -(8 rows) + QUERY PLAN +------------------------------------------------ + Hash Left Join + Output: d.a, d.b + Inner Unique: true + Hash Cond: (d.a = s.id) + -> Seq Scan on pg_temp.d + Output: d.a, d.b + -> Hash + Output: s.id + -> Subquery Scan on s + Output: s.id + -> HashAggregate + Output: b.id, b.c_id + Group Key: b.id + -> Seq Scan on pg_temp.b + Output: b.id, b.c_id +(15 rows) -- similarly, but keying off a DISTINCT clause -explain (costs off) +explain (costs off, verbose on) select d.* from d left join (select distinct * from b) s on d.a = s.id; - QUERY PLAN ------------------------------------- - Merge Right Join - Merge Cond: (b.id = d.a) - -> Index Scan using b_pkey on b - -> Sort - Sort Key: d.a - -> Seq Scan on d -(6 rows) + QUERY PLAN +------------------------------------------ + Hash Left Join + Output: d.a, d.b + Inner Unique: true + Hash Cond: (d.a = s.id) + -> Seq Scan on pg_temp.d + Output: d.a, d.b + -> Hash + Output: s.id + -> Subquery Scan on s + Output: s.id + -> Seq Scan on pg_temp.b + Output: b.id, b.c_id +(12 rows) -- join removal is not possible here explain (costs off) diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out index 9eecdc1e92..b9147e8f1a 100644 --- a/src/test/regress/expected/subselect.out +++ b/src/test/regress/expected/subselect.out @@ -2105,3 +2105,27 @@ ON B.hundred in (SELECT min(c.hundred) FROM tenk2 C WHERE c.odd = b.odd); Filter: (odd = b.odd) (16 rows) +-- Check that uniqueness of the view output is recognized. +begin; +create table tabx as select * from generate_series(1,100) idx; +create table taby as select * from generate_series(1,100) idy; +create unique index on taby using btree (idy); +create view viewy_barrier with (security_barrier=true) as select * from taby; +analyze tabx, taby; +explain (costs off, verbose on) +select * from tabx x left join viewy_barrier y on idy = idx; + QUERY PLAN +------------------------------------- + Hash Left Join + Output: x.idx, taby.idy + Inner Unique: true + Hash Cond: (x.idx = taby.idy) + -> Seq Scan on public.tabx x + Output: x.idx + -> Hash + Output: taby.idy + -> Seq Scan on public.taby + Output: taby.idy +(10 rows) + +rollback; diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql index c4c6c7b8ba..bf8fb27072 100644 --- a/src/test/regress/sql/join.sql +++ b/src/test/regress/sql/join.sql @@ -2047,16 +2047,17 @@ explain (costs off) select d.* from d left join (select distinct * from b) s on d.a = s.id and d.b = s.c_id; --- join removal is not possible when the GROUP BY contains a column that is --- not in the join condition. (Note: as of 9.6, we notice that b.id is a --- primary key and so drop b.c_id from the GROUP BY of the resulting plan; --- but this happens too late for join removal in the outer plan level.) -explain (costs off) +-- when the GROUP BY contains a column that is not in the join condition, join +-- removal takes place too, due to "unique keys" (Note: as of 9.6, we notice +-- that b.id is a primary key and so drop b.c_id from the GROUP BY of the +-- resulting plan; but this happens too late for join removal in the outer +-- plan level.) +explain (costs off, verbose on) select d.* from d left join (select * from b group by b.id, b.c_id) s on d.a = s.id; -- similarly, but keying off a DISTINCT clause -explain (costs off) +explain (costs off, verbose on) select d.* from d left join (select distinct * from b) s on d.a = s.id; diff --git a/src/test/regress/sql/subselect.sql b/src/test/regress/sql/subselect.sql index 75a9b718b2..7b82242221 100644 --- a/src/test/regress/sql/subselect.sql +++ b/src/test/regress/sql/subselect.sql @@ -1020,3 +1020,16 @@ WHERE a.thousand < 750; explain (costs off) SELECT * FROM tenk1 A LEFT JOIN tenk2 B ON B.hundred in (SELECT min(c.hundred) FROM tenk2 C WHERE c.odd = b.odd); + + +-- Check that uniqueness of the view output is recognized. +begin; +create table tabx as select * from generate_series(1,100) idx; +create table taby as select * from generate_series(1,100) idy; +create unique index on taby using btree (idy); +create view viewy_barrier with (security_barrier=true) as select * from taby; +analyze tabx, taby; +explain (costs off, verbose on) +select * from tabx x left join viewy_barrier y on idy = idx; + +rollback; -- 2.44.0
pgsql-hackers by date: