Re: UniqueKey v2 - Mailing list pgsql-hackers
From | Antonin Houska |
---|---|
Subject | Re: UniqueKey v2 |
Date | |
Msg-id | 4986.1715622135@antos Whole thread Raw |
In response to | Re: UniqueKey v2 (Antonin Houska <ah@cybertec.at>) |
Responses |
Re: UniqueKey v2
|
List | pgsql-hackers |
Antonin Houska <ah@cybertec.at> wrote: > Andy Fan <zhihuifan1213@163.com> wrote: > > > > > > * Combining the UKs > > > > > > IMO this is the most problematic part of the patch. You call > > > populate_joinrel_uniquekeys() for the same join multiple times, > > > > Why do you think so? The below code is called in "make_join_rel" > > Consider join of tables "a", "b" and "c". My understanding is that > make_join_rel() is called once with rel1={a} and rel2={b join c}, then with > rel1={a join b} and rel2={c}, etc. I wanted to say that each call should > produce the same set of unique keys. > > I need to check this part more in detail. I think I understand now. By calling populate_joinrel_uniquekeys() for various orderings, you can find out that various input relation unique keys can represent the whole join. For example, if the ordering is A JOIN (B JOIN C) you can prove that the unique keys of A can be used for the whole join, while for the ordering B JOIN (A JOIN C) you can prove the same for the unique keys of B, and so on. > > Is your original question is about populate_joinrel_uniquekey_for_rel > > rather than populate_joinrel_uniquekeys? We have the below codes: > > > > outeruk_still_valid = populate_joinrel_uniquekey_for_rel(root, joinrel, outerrel, > > innerrel, restrictlist); > > inneruk_still_valid = populate_joinrel_uniquekey_for_rel(root, joinrel, innerrel, > > outerrel, restrictlist); > > > > This is mainly because of the following theory. Quoted from > > README.uniquekey. Let's called this as "rule 1". > > > > """ > > How to maintain the uniquekey? > > ------------------------------- > > .. From the join relation, it is maintained with two rules: > > > > - the uniquekey in one side is still unique if it can't be duplicated > > after the join. for example: > > > > SELECT t1.pk FROM t1 JOIN t2 ON t1.a = t2.pk; > > UniqueKey on t1: t1.pk > > UniqueKey on t1 Join t2: t1.pk > > """ > > > > AND the blow codes: > > > > > > if (outeruk_still_valid || inneruk_still_valid) > > > > /* > > * the uniquekey on outers or inners have been added into joinrel so > > * the combined uniuqekey from both sides is not needed. > > */ > > return; > > > > > > We don't create the component uniquekey if any one side of the boths > > sides is unique already. For example: > > > > "(t1.id) in joinrel(t1, t2) is unique" OR "(t2.id) in joinrel is > > unique", there is no need to create component UniqueKey (t1.id, t2.id); > > ok, I need to check more in detail how this part works. This optimization makes sense to me. > > > > > > Of course one problem is that the number of combinations can grow > > > exponentially as new relations are joined. > > > > Yes, that's why "rule 1" needed and "How to reduce the overhead" in > > UniqueKey.README is introduced. I think there should yet be some guarantee that the number of unique keys does not grow exponentially. Perhaps a constant that allows a relation (base or join) to have at most N unique keys. (I imagine N to be rather small, e.g. 3 or 4.) And when picking the "best N keys", one criterion could be the number of expressions in the key (the shorter key the better). > > > > > > 2) Check if the join relation is single-row > > > > > > I in order to get rid of the dependency on 'restrictlist', I think you can > > > use ECs. Consider a query from your regression tests: > > > > > > CREATE TABLE uk_t (id int primary key, a int not null, b int not null, c int, d int, e int); > > > > > > SELECT distinct t1.d FROM uk_t t1 JOIN uk_t t2 ON t1.e = t2.id and t1.id = 1; > > > > > > The idea here seems to be that no more than one row of t1 matches the query > > > clauses. Therefore, if t2(id) is unique, the clause t1.e=t2.id ensures that > > > no more than one row of t2 matches the query (because t1 cannot provide the > > > clause with more than one input value of 'e'). And therefore, the join also > > > produces at most one row. > > > > You are correct and IMO my current code are able to tell it is a single > > row as well. > > > > 1. Since t1.id = 1, so t1 is single row, so t1.d is unqiuekey as a > > consequence. > > 2. Given t2.id is unique, t1.e = t2.id so t1's unqiuekey can be kept > > after the join because of rule 1 on joinrel. and t1 is singlerow, so the > > joinrel is singlerow as well. ok, I think I understand now. -- Antonin Houska Web: https://www.cybertec-postgresql.com
pgsql-hackers by date: