Re: UniqueKey v2 - Mailing list pgsql-hackers

From Andy Fan
Subject Re: UniqueKey v2
Date
Msg-id 87a5ktkpun.fsf@163.com
Whole thread Raw
In response to Re: UniqueKey v2  (Antonin Houska <ah@cybertec.at>)
List pgsql-hackers
>> Consider join of tables "a", "b" and "c". My understanding is that
>> make_join_rel() is called once with rel1={a} and rel2={b join c}, then with
>> rel1={a join b} and rel2={c}, etc. I wanted to say that each call should
>> produce the same set of unique keys.
>> 
>> I need to check this part more in detail.
>
> I think I understand now. By calling populate_joinrel_uniquekeys() for various
> orderings, you can find out that various input relation unique keys can
> represent the whole join. For example, if the ordering is
>
> A JOIN (B JOIN C)
>
> you can prove that the unique keys of A can be used for the whole join, while
> for the ordering
>
> B JOIN (A JOIN C)
>
> you can prove the same for the unique keys of B, and so on.

Yes.

>> > We don't create the component uniquekey if any one side of the boths
>> > sides is unique already. For example:
>> > 
>> > "(t1.id) in joinrel(t1, t2) is unique" OR "(t2.id) in joinrel is
>> > unique", there is no need to create component UniqueKey (t1.id, t2.id);  
>> 
>> ok, I need to check more in detail how this part works.
>
> This optimization makes sense to me.

OK.

>> > >
>> > >   Of course one problem is that the number of combinations can grow
>> > >   exponentially as new relations are joined.
>> > 
>> > Yes, that's why "rule 1" needed and "How to reduce the overhead" in
>> > UniqueKey.README is introduced. 
>
> I think there should yet be some guarantee that the number of unique keys does
> not grow exponentially. Perhaps a constant that allows a relation (base or
> join) to have at most N unique keys. (I imagine N to be rather small, e.g. 3
> or 4.) And when picking the "best N keys", one criterion could be the number
> of expressions in the key (the shorter key the better).

I don't want to introduce this complextity right now. I'm more
inerested with how to work with them effectivity. main effort includes: 

- the design of bitmapset which is memory usage friendly and easy for
combinations.
- Optimize the singlerow cases to reduce N UnqiueKeys to 1 UniqueKey.

I hope we can pay more attention to this optimization (at most N
UniqueKeys) when the major inforastruce has been done. 

>> > You are correct and IMO my current code are able to tell it is a single
>> > row as well.
>> > 
>> > 1. Since t1.id = 1, so t1 is single row, so t1.d is unqiuekey as a
>> > consequence.
>> > 2. Given t2.id is unique, t1.e = t2.id so t1's unqiuekey can be kept
>> > after the join because of rule 1 on joinrel. and t1 is singlerow, so the
>> > joinrel is singlerow as well.
>
> ok, I think I understand now.

OK.

At last, this probably is my first non-trival patchs which has multiple
authors, I don't want myself is the bottleneck for the coorperation, so
if you need something to do done sooner, please don't hesitate to ask me
for it explicitly.

Here is my schedule about this. I can provide the next version based
our discussion and your patches at the eariler of next week. and update
the UniqueKey.README to make sure the overall design clearer. What I
hope you to pay more attention is the UniqueKey.README besides the
code. I hope the UniqueKey.README can reduce the effort for others to
understand the overall design enormously.

-- 
Best Regards
Andy Fan




pgsql-hackers by date:

Previous
From: Erik Wienhold
Date:
Subject: Underscore in positional parameters?
Next
From: Peter Smith
Date:
Subject: Re: Slow catchup of 2PC (twophase) transactions on replica in LR