Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning |
Date | |
Msg-id | CAExHW5vQEDuuoUPrJX379T7wCn2kYPu4vaGMrciFaw4RzJoatw@mail.gmail.com Whole thread Raw |
In response to | Reducing memory consumed by RestrictInfo list translations in partitionwise join planning (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>) |
Responses |
Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
|
List | pgsql-hackers |
I spent some time on 4th point below but also looked at other points. Here's what I have found so far On Thu, Jul 27, 2023 at 7:35 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > 1. The patch uses RestrictInfo::required_relids as the key for > searching child RelOptInfos. I am not sure which of the two viz. > required_relids and clause_relids is a better key. required_relids > seems to be a subset of clause_relids and from the description it > looks like that's the set that decides the applicability of a clause > in a join. But clause_relids is obtained from all the Vars that appear > in the clause, so may be that's the one that matters for the > translations. Can somebody guide me? I was wrong that required_relids is subset of clause_relids. The first can contain OJ relids which the other can not. OJ relids do not have any children, so they won't be translated. So clause_relids seems to be a better key. I haven't made a decision yet. > > 2. The patch adds two extra pointers per RestrictInfo. They will > remain unused when partitionwise join is not used. Right now, I do not > see any increase in memory consumed by planner because of those > pointers even in case of unpartitioned tables; maybe they are absorbed > in memory alignment. They may show up as extra memory in the future. I > am wondering whether we can instead save and track translations in > PlannerInfo as a hash table using <rinfo_serial, required_relids (or > whatever is the answer to above question) of parent and child > respectively> as key. That will just add one extra pointer in > PlannerInfo when partitionwise join is not used. Please let me know > your suggestions. I will go ahead with a pointer in PlannerInfo for now. > > 3. I have changed adjust_appendrel_attrs_mutator() to return a > translated RestrictInfo if it already exists. IOW, it won't always > return a deep copy of given RestrictInfo as it does today. This can be > fixed by writing wrappers around adjust_appendrel_attrs() to translate > RestrictInfo specifically. But maybe we don't always need deep copies. > Are there any cases when we need translated deep copies of > RestrictInfo? Those cases will require fixing callers of > adjust_appendrel_attrs() instead of the mutator. I think it's better to handle the tracking logic outside adjust_appendrel_attrs. That will be some code churn but it will be cleaner and won't affect anything other that partitionwise joins. > > 4. IIRC, when partitionwise join was implemented we had discussed > creating child RestrictInfos using a login similar to > build_joinrel_restrictlist(). That might be another way to build > RestrictInfo only once and use it multiple times. But we felt that it > was much harder problem to solve since it's not known which partitions > from joining partitioned tables will match and will be joined till we > enter try_partitionwise_join(), so the required RestrictInfos may not > be available in RelOptInfo::joininfo. Let me know your thoughts on > this. Here's some lengthy description of why I feel translations are better compared to computing restrictlist and joininfo for a child join from joining relation's joininfo Consider a query "select * from p, q, r where p.c1 = q.c1 and q.c1 = r.c1 and p.c2 + q.c2 < r.c2 and p.c3 != q.c3 and q.c3 != r.c3". The query has following clauses 1. p.c1 = q.c1 2. q.c1 = r.c1 3. p.c2 + q.c2 < r.c2 4. p.c3 != q.c3 5. q.c3 != r.c3 The first two clauses are added to EC machinery and do not appear in joininfo. They appear in restrictlist when we construct clauses in restrictlist from ECs. Let's ignore them for now. Assume that each table p, q, r has partitions (p1, p2, ...), (q1, q2, ...) and (r1, r2, ... ) respectively. Each triplet (pi, qi,ri) forms the set of matching partitions from p, q, r respectively for all "i". Consider join, p1q1r1. We will generate relations p1, q1, r1, p1q1, p1r1, q1r1 and p1q1r1 while building the last join. Below is description of how these clauses would look in each of these relations and the list they appear in when computing that join. Please notice the numeric suffixes carefully. p1. joininfo: p1.c2 + q.c2 < r.c2, p1.c3 != q.c3 restrictlist: <> q1 joininfo: p.c2 + q1.c2 < r.c2, p.c3 != q1.c3, q1.c3 != r.c3 restrictlist: <> r1 joininfo: p.c2 + q.c2 < r1.c2, q.c3 != r1.c3 restrictlist: <> p1q1 joininfo: p1.c2 + q1.c2 < r.c2, q1.c3 != r.c3 restrictlist: p1.c3 != q1.c3 q1r1 joininfo: p.c2 + q1.c2 < r1.c2, p.c3 != q1.c3 restrictlist: q1.c3 != r1.c3 p1r1 joininfo: p1.c2 + q.c2 < r1.c2, p1.c3 != q.c3, q.c3 != r1.c3 restrictlist: <> p1q1r1 joininfo: <> restrictlist for (p1q1)r1: p1.c2 + q1.c2 < r1.c2, q1.c3 != r1.c3 restrictlist for (p1r1)q1: p1.c2 + q1.c2 < r1.c2, p1.c3 != q1.c3, q1.c3 != r1.c3 restrictlist for p1(q1r1): p1.c2 + q1.c2 < r1.c2, p1.c3 != q1.c3 If we translate the clauses when building join e.g. translate p1.c3 != q1.c3 when building p1q1 or p1q1r1, it would cause repeated translations. So the translations need to be saved in lower relations when we detect matching partitions and then use these translations. Something I have done in the attached patches. But the problem is the same clause reaches its final translation through different intermediate translations as the join search advances. E.g. the evolution of p.c2 + q.c2 < r.c2 to p1.c2 + q1.c2 < r1.c2 has three different intemediate translations at second level of join. Each of these intermediate translations conflict with each other and none of them can be saved in any of the second level joins as a candidate for the last stage translation. Extending the logic in the patches would make those more complicated. Another possibility is to avoid the same clause being translated multiple times when building the join using RestrictInfo::rinfo_serial. But simply that won't help avoiding repeated translations caused by different join orders. E.g. we won't be able to detect that p.c2 + q.c2 < r.c2 has been translated to p1.c2 + q1.c2 < r1.c2 already when we computed (p1r1)q1 or p1(q1r1) or (p1q1)r1 whichever was computed earlier. For that we need some tracking outside the join relations themselves like I did in my first patch. Coming back to the problem of generating child restrictlist clauses from equivalence classes, I think it's easier with some tweaks to pass child relids down to the minions when dealing with child joins. It seems to be working as is but I haven't tested it thoroughly. Obtaining child clauses from parent clauses by translation and tracking the translations is less complex and may be more efficient too. I will post a patch on those lines soon. -- Best Wishes, Ashutosh Bapat
Attachment
pgsql-hackers by date: