Re: UniqueKey v2 - Mailing list pgsql-hackers

From Antonin Houska
Subject Re: UniqueKey v2
Date
Msg-id 4986.1715622135@antos
Whole thread Raw
In response to Re: UniqueKey v2  (Antonin Houska <ah@cybertec.at>)
Responses Re: UniqueKey v2
List pgsql-hackers
Antonin Houska <ah@cybertec.at> wrote:

> Andy Fan <zhihuifan1213@163.com> wrote:
> > >
> > > * Combining the UKs
> > >
> > >   IMO this is the most problematic part of the patch. You call
> > >   populate_joinrel_uniquekeys() for the same join multiple times,
> >
> > Why do you think so? The below code is called in "make_join_rel"
>
> Consider join of tables "a", "b" and "c". My understanding is that
> make_join_rel() is called once with rel1={a} and rel2={b join c}, then with
> rel1={a join b} and rel2={c}, etc. I wanted to say that each call should
> produce the same set of unique keys.
>
> I need to check this part more in detail.

I think I understand now. By calling populate_joinrel_uniquekeys() for various
orderings, you can find out that various input relation unique keys can
represent the whole join. For example, if the ordering is

A JOIN (B JOIN C)

you can prove that the unique keys of A can be used for the whole join, while
for the ordering

B JOIN (A JOIN C)

you can prove the same for the unique keys of B, and so on.

> > Is your original question is about populate_joinrel_uniquekey_for_rel
> > rather than populate_joinrel_uniquekeys? We have the below codes:
> >
> >     outeruk_still_valid = populate_joinrel_uniquekey_for_rel(root, joinrel, outerrel,
> >                                                              innerrel, restrictlist);
> >     inneruk_still_valid = populate_joinrel_uniquekey_for_rel(root, joinrel, innerrel,
> >                                                              outerrel, restrictlist);
> >
> > This is mainly because of the following theory. Quoted from
> > README.uniquekey. Let's called this as "rule 1".
> >
> > """
> > How to maintain the uniquekey?
> > -------------------------------
> > .. From the join relation, it is maintained with two rules:
> >
> > - the uniquekey in one side is still unique if it can't be duplicated
> >   after the join. for example:
> >
> >   SELECT t1.pk FROM t1 JOIN t2 ON t1.a = t2.pk;
> >   UniqueKey on t1:  t1.pk
> >   UniqueKey on t1 Join t2:  t1.pk
> > """
> >
> > AND the blow codes:
> >
> >
> >     if (outeruk_still_valid || inneruk_still_valid)
> >
> >         /*
> >          * the uniquekey on outers or inners have been added into joinrel so
> >          * the combined uniuqekey from both sides is not needed.
> >          */
> >         return;
> >
> >
> > We don't create the component uniquekey if any one side of the boths
> > sides is unique already. For example:
> >
> > "(t1.id) in joinrel(t1, t2) is unique" OR "(t2.id) in joinrel is
> > unique", there is no need to create component UniqueKey (t1.id, t2.id);
>
> ok, I need to check more in detail how this part works.

This optimization makes sense to me.

> > >
> > >   Of course one problem is that the number of combinations can grow
> > >   exponentially as new relations are joined.
> >
> > Yes, that's why "rule 1" needed and "How to reduce the overhead" in
> > UniqueKey.README is introduced.

I think there should yet be some guarantee that the number of unique keys does
not grow exponentially. Perhaps a constant that allows a relation (base or
join) to have at most N unique keys. (I imagine N to be rather small, e.g. 3
or 4.) And when picking the "best N keys", one criterion could be the number
of expressions in the key (the shorter key the better).

> > >
> > >   2) Check if the join relation is single-row
> > >
> > >   I in order to get rid of the dependency on 'restrictlist', I think you can
> > >   use ECs. Consider a query from your regression tests:
> > >
> > > CREATE TABLE uk_t (id int primary key, a int not null, b int not null, c int, d int, e int);
> > >
> > > SELECT distinct t1.d FROM uk_t t1 JOIN uk_t t2 ON t1.e = t2.id and t1.id = 1;
> > >
> > >   The idea here seems to be that no more than one row of t1 matches the query
> > >   clauses. Therefore, if t2(id) is unique, the clause t1.e=t2.id ensures that
> > >   no more than one row of t2 matches the query (because t1 cannot provide the
> > >   clause with more than one input value of 'e'). And therefore, the join also
> > >   produces at most one row.
> >
> > You are correct and IMO my current code are able to tell it is a single
> > row as well.
> >
> > 1. Since t1.id = 1, so t1 is single row, so t1.d is unqiuekey as a
> > consequence.
> > 2. Given t2.id is unique, t1.e = t2.id so t1's unqiuekey can be kept
> > after the join because of rule 1 on joinrel. and t1 is singlerow, so the
> > joinrel is singlerow as well.

ok, I think I understand now.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



pgsql-hackers by date:

Previous
From: Jelte Fennema-Nio
Date:
Subject: Re: Direct SSL connection with ALPN and HBA rules
Next
From: Ranier Vilela
Date:
Subject: Re: Fix out-of-bounds in the function GetCommandTagName