Re: UniqueKey v2 - Mailing list pgsql-hackers
From | Andy Fan |
---|---|
Subject | Re: UniqueKey v2 |
Date | |
Msg-id | 87h6f1kqr6.fsf@163.com Whole thread Raw |
In response to | Re: UniqueKey v2 (Antonin Houska <ah@cybertec.at>) |
Responses |
Re: UniqueKey v2
|
List | pgsql-hackers |
Antonin Houska <ah@cybertec.at> writes: >> Could you make the reason clearer for adding 'List *opfamily_lists;' >> into UniqueKey? You said "This is needed to create ECs in the parent >> query if the upper relation represents a subquery." but I didn't get the >> it. Since we need to maintain the UniqueKey in the many places, I'd like >> to keep it as simple as possbile. Of course, anything essentical should >> be added for sure. > > If unique keys are generated for a subquery output, they also need to be > created for the corresponding relation in the upper query ("sub" in the > following example): OK. > > select * from tab1 left join (select * from tab2) sub; > > However, to create an unique key for "sub", you need an EC for each expression > of the key. OK. > And to create an EC, you in turn need the list of operator > families. I'm thinking if we need to "create" any EC. Can you find out a user case where the outer EC is missed and the UniqueKey is still interesting? I don't have an example now. convert_subquery_pathkeys has a similar sistuation and has the following codes: outer_ec = get_eclass_for_sort_expr(root, (Expr *) outer_var, sub_eclass->ec_opfamilies, sub_member->em_datatype, sub_eclass->ec_collation, 0, rel->relids, NULL, false); /* * If we don't find a matching EC, sub-pathkey isn't * interesting to the outer query */ if (outer_ec) best_pathkey = make_canonical_pathkey(root, outer_ec, sub_pathkey->pk_opfamily, sub_pathkey->pk_strategy, sub_pathkey->pk_nulls_first); } > Even if the parent query already had ECs for the columns of "sub" which are > contained in the unique key, you need to make sure that those ECs are > "compatible" with the ECs of the subquery which generated the unique key. That > is, if an EC of the subquery considers certain input values equal, the EC of > the parent query must also be able to determine if they are equal or not. > >> > * RelOptInfo.notnullattrs >> > >> > My understanding is that this field contains the set attributes whose >> > uniqueness is guaranteed by the unique key. They are acceptable because they >> > are either 1) not allowed to be NULL due to NOT NULL constraint or 2) NULL >> > value makes the containing row filtered out, so the row cannot break >> > uniqueness of the output. Am I right? >> > >> > If so, I suggest to change the field name to something less generic, maybe >> > 'uniquekey_attrs' or 'uniquekey_candidate_attrs', and adding a comment that >> > more checks are needed before particular attribute can actually be used in >> > UniqueKey. >> >> I don't think so, UniqueKey is just one of the places to use this >> not-null property, see 3af704098 for the another user case of it. >> >> (Because of 3af704098, we should leverage notnullattnums somehow in this >> patch, which will be included in the next version as well). > > In your patch you modify 'notnullattrs' in add_base_clause_to_rel(), but that > does not happen to 'notnullattnums' in the current master branch. Thus I think > that 'notnullattrs' is specific to the unique keys feature, so the field name > should be less generic. OK. >> > >> > * uniquekey_useful_for_merging() >> > >> > How does uniqueness relate to merge join? In README.uniquekey you seem to >> > point out that a single row is always sorted, but I don't think this >> > function is related to that fact. (Instead, I'd expect that pathkeys are >> > added to all paths for a single-row relation, but I'm not sure you do that >> > in the current version of the patch.) >> >> The merging is for "mergejoinable join clauses", see function >> eclass_useful_for_merging. Usually I think it as operator "t1.a = t2.a"; > > My question is: why is the uniqueness important specifically to merge join? I > understand that join evaluation can be more efficient if we know that one > input relation is unique (i.e. we only scan that relation until we find the > first match), but this is not specific to merge join. So the answer is the "merging" in uniquekey_useful_for_merging() has nothing with merge join. >> > * is_uniquekey_useful_afterjoin() >> > >> > Now that my patch (0004) allows propagation of the unique keys from a >> > subquery to the upper query, I was wondering if the UniqueKey structure >> > needs the 'use_for_distinct field' I mean we should now propagate the unique >> > keys to the parent query whether the subquery has DISTINCT clause or not. I >> > noticed that the field is checked by is_uniquekey_useful_afterjoin(), so I >> > changed the function to always returned true. However nothing changed in the >> > output of regression tests (installcheck). Do you insist that the >> > 'use_for_distinct' field is needed? > > I miss your answer to this comment. After we considers the uniquekey from subquery, 'use_for_distinct' field is not needed. >> > ** What does the 'multi' word in the function name mean? >> >> multi means multiple, I thought we use this short name in the many >> places, for ex bt_multi_page_stats after a quick search. > > Why not simply uniquekey_contains_nulls() ? > Actually I wouldn't say that an instance of UniqueKey contains any value (NULL > or NOT NULL) because it describes the whole relation rather than particular > row. I consider UniqueKey to be a set of expressions. How about > uniquekey_expression_nullable() ? uniquekey_expression_nullable() is a better name, I will use it in the next version. IIUC, we have reached to the agreement based on your latest response for the most of the questions. Please point me if I missed anything. >> > Of course one problem is that the number of combinations can grow >> > exponentially as new relations are joined. >> >> Yes, that's why "rule 1" needed and "How to reduce the overhead" in >> UniqueKey.README is introduced. > > What if we are interested in unique keys of a subquery, but the subquery has > no DISTINCT clause? I agree we should remove the prerequisite of "use_for_distinct". -- Best Regards Andy Fan
pgsql-hackers by date: