Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables |
Date | |
Msg-id | CAFjFpRcCH0Xm+Ae6jArpN2ychoX4NWseMji6o81uUM7Sbv6+_A@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables
(Robert Haas <robertmhaas@gmail.com>)
|
List | pgsql-hackers |
On Thu, Mar 16, 2017 at 7:10 AM, Robert Haas <robertmhaas@gmail.com> wrote: > So I am looking at this part of 0008: > > + /* > + * Do not copy parent_rinfo and child_rinfos because 1. they create a > + * circular dependency between child and parent RestrictInfo 2. dropping > + * those links just means that we loose some memory > optimizations. 3. There > + * is a possibility that the child and parent RestrictInfots > themselves may > + * have got copied and thus the old links may no longer be valid. The > + * caller may set up those links itself, if needed. > + */ > > I don't think that it's very clear whether or not this is safe. I > experimented with making _copyRestrictInfo PANIC, I am not able to understand how to make _copyRestrictInfo PANIC. Can you please share the patch or compiler flags or settings? I will look at the case below once I have that. > which, > interestingly, does not affect the core regression tests at all, but > does trip on this bit from the postgres_fdw tests: > > -- subquery using stable function (can't be sent to remote) > PREPARE st2(int) AS SELECT * FROM ft1 t1 WHERE t1.c1 < $2 AND t1.c3 IN > (SELECT c3 FROM ft2 t2 WHERE c1 > $1 AND date(c4) = > '1970-01-17'::date) ORDER BY c1; > EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st2(10, 20); > > I'm not sure why this particular case is affected when so many others > are not, and the comment doesn't help me very much in figuring it out. > > Why do we need this cache in the RestrictInfo, anyway? Aside from the > comment above, I looked at the comment in the RestrictInfo struct, and > I looked at the comment in build_child_restrictinfo, and I looked at > the comment in build_child_clauses, and I looked at the place where > build_child_clauses is called in set_append_rel_size, and none of > those places explain why we need this cache. I would assume we'd need > a separate translation of the RestrictInfo for every separate > child-join, so how does the cache help? > > Maybe the answer is that build_child_clauses() is also called from > try_partition_wise_join() and add_paths_to_child_joinrel(), and those > three call sights all end up producing the same set of translated > RestrictInfos. But if that's the case, somehow it seems like we ought > to be producing these in one place where we can get convenient access > to them from each child join, rather than having to search through > this cache to find it. I had explained this briefly in [1]. But forgot to add it as comments. There are multiple means by which a RestrictInfo gets translated multiple times for the same child. 1. Consider a join A J (B J C on B.b = C.c) ON (A.a = B.b) the clause A.a = B.b is part of the restrictlist for orders (AB)C and A(BC) (and (AC)B depending upon the type of join). So, the clause gets translated twice once for each of those join orders. 2. In the above example, A.a = B.b is part of joininfo list (if it happens to be an outer join) of A, B and BC. So, it should be part of joininfo list of children of A, B and BC. But the RestrictInfo which is part of joininfo of B and BC looks exactly same. Similarly param_info->clauses get translated multiple times each time with a different set of required_outer. In order to avoid multiple translations and spend memory in each translation it's better to cache the result and retrieve it. Updated prologue of build_child_restrictinfo with this explanation. > It's a pretty inefficient cache: it takes O(n) > time to search it, I think, where n is the number of partitions. Above explanation shows that it's worse than that. > And > you do O(n) searches. So it's an O(n^2) algorithm, which is a little > unfortunate. Can't we affix the translated RestrictInfos someplace > where they can be found more efficiently? Would a hash similar to root->join_rel_hash help? That will reduce the searches to O(1). I have added a separate patch (0008) for using hashtable to store child restrictinfos. If that patch looks good to you, I will merge it with the main patch supporting partition-wise join. > > Yet another thing that the comments don't explain is why the existing > adjust_appendrel_attrs call needs to be replaced with > build_child_clauses. The call to adjust_appendrel_attrs() used to translate joininfo for child has been replaced by build_child_clauses to take advantage of the RestrictInfo cache. As explained above a clause which is part of joininfo of a child, is also part of joininfo of the child-join in which it participates except the child-joins covering the clause. So, a cached copy of that RestrictInfo helps. I have added a patch (0010) to use build_child_clause() only for partitioned tables and use adjust_appendrel_attrs() for non-partitioned case. If this change looks good, I will merge it with the main patch. > > So I feel, overall, that the point of all of this is not explained well at all. Sorry for that. I should have added the explanation in the comments. Corrected this in this set of patches. [1] https://www.postgresql.org/message-id/CAFjFpRe66z%2Bw9%2BdnAkWGiaB1CU2CUQsLGsqzHzYBoA%3DKJFf%2BPQ%40mail.gmail.com -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: