Re: [HACKERS] Proposal : Parallel Merge Join - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: [HACKERS] Proposal : Parallel Merge Join |
Date | |
Msg-id | CAA4eK1+D7TN8PUMjfqHLn+ufUdZgo58OXidnrWmReLT-XqrQew@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Proposal : Parallel Merge Join (Dilip Kumar <dilipbalaut@gmail.com>) |
Responses |
Re: [HACKERS] Proposal : Parallel Merge Join
|
List | pgsql-hackers |
On Wed, Jan 4, 2017 at 9:27 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote: > On Wed, Jan 4, 2017 at 12:02 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> 3. >> + /* >> + * See comments in try_partial_nestloop_path(). >> + */ >> >> This same code exists in try_partial_nestloop_path() and >> try_partial_hashjoin_path() with minor difference of code in >> try_partial_hashjoin_path(). Can't we write a separate inline >> function for this code and call from all the three places. > > It's a small check, is it ok to make it the separate function. I have > also observed similar kind of duplicate code in try_mergejoin_path and > try_hashjoin_path. However, if you think that it will be better to > move that check to an inline function I can submit a seperate patch in > this thread as a refactoring patch? > Let's keep that check unchanged. Few review comments on the latest version of the patch: 1. - if (joinrel->consider_parallel && nestjoinOK && - save_jointype != JOIN_UNIQUE_OUTER && - bms_is_empty(joinrel->lateral_relids)) + if (outerrel->partial_pathlist == NIL || + !joinrel->consider_parallel || + save_jointype == JOIN_UNIQUE_OUTER || + !bms_is_empty(joinrel->lateral_relids) || + jointype == JOIN_FULL || + jointype == JOIN_RIGHT) + return; For the matter of consistency, I think the above checks should be in same order as they are in function hash_inner_and_outer(). I wonder why are you using jointype in above check instead of save_jointype, it seems to me we can use save_jointype for all the cases. 2. + generate_mergejoin_paths(root, joinrel, innerrel, outerpath, + jointype, extra, false, + inner_cheapest_total, merge_pathkeys, + true); IIUC, the reason of passing a 7th parameter (useallclauses) as false is that it can be true only for Right and Full joins and for both we don't generate partial merge join paths. If so, then I think it is better to add a comment about such an assumption, so that we don't forget to update this code in future if we need to useallclauses for something else. Another way to deal with this is that you can pass the value of useallclauses to consider_parallel_mergejoin() and then to generate_mergejoin_paths(). I feel second way is better, but if for some reason you don't find it appropriate then at least add a comment. 3. generate_mergejoin_paths() { .. .. - try_mergejoin_path(root, - joinrel, - outerpath, - innerpath, - merge_pathkeys, - newclauses, - NIL, - NIL, - jointype, - extra); + if (!is_partial) + try_mergejoin_path(root, + joinrel, + outerpath, + innerpath, + merge_pathkeys, + newclauses, + NIL, + NIL, + jointype, + extra); + /* Generate partial path only if innerpath is parallel safe. */ + else if (innerpath->parallel_safe) + try_partial_mergejoin_path(root, + joinrel, + outerpath, + innerpath, + merge_pathkeys, + newclauses, + NIL, + NIL, + jointype, + extra); .. } The above and similar changes in generate_mergejoin_paths() doesn't look good and in some cases when innerpath is *parallel-unsafe*, you need to perform some extra computation which is not required. How about writing a separate function generate_partial_mergejoin_paths()? I think you can save some extra computation and it will look neat. I understand that there will be some code duplication, but not sure if there is any other better way. How about adding one simple test case as I have kept for other parallel patches like parallel index scan, parallel subplan, etc.? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: