Re: [HACKERS] Proposal : Parallel Merge Join - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] Proposal : Parallel Merge Join
Date
Msg-id CA+TgmoYOv+dFK0MWW6366dFj_xTnohQfoBDrHyB7d1oZhrgPjA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Proposal : Parallel Merge Join  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: [HACKERS] Proposal : Parallel Merge Join  (Dilip Kumar <dilipbalaut@gmail.com>)
Re: [HACKERS] Proposal : Parallel Merge Join  (Dilip Kumar <dilipbalaut@gmail.com>)
List pgsql-hackers
On Wed, Mar 1, 2017 at 12:01 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Wed, Mar 1, 2017 at 11:24 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>> On Wed, Mar 1, 2017 at 11:13 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>> I think for now we can keep the parallel safety check for cheapest
>>> inner path, though it will be of use only for the very first time we
>>> compare the paths in that loop.  I am not sure if there is any other
>>> better way to handle the same.
>>
>> Done that way.
>
> Thanks, your patch looks good to me.

I'm not happy with the way this patch can just happen to latch on to a
path that's not parallel-safe rather than one that is and then just
give up on a merge join in that case.  I already made this argument in
https://www.postgresql.org/message-id/CA+TgmobdW2au1Jq5L4ySA2ZhqFmA-qNvD7ZFaZbJWm3c0ysWyw@mail.gmail.com
and my opinion hasn't changed.  I've subsequently come to realize that
this problem is more widespread that I initially realized: not only do
sort_inner_and_outer() and generate_partial_mergejoin_paths() give up
without searching for the cheapest parallel-safe path, but the latter
later calls get_cheapest_path_for_pathkeys() and then just pretends it
didn't find anything unless the result is parallel-safe.  This makes
no sense to me: the cheapest parallel-safe path could be 1% more
expensive than the cheapest path of any kind, but because it's not the
cheapest we arbitrarily skip it, even though parallelizing more of the
tree might leave us way ahead overall.

I suggest that we add an additional "bool require_parallel_safe"
argument to get_cheapest_path_for_pathkeys(); when false, the function
will behave as at present, but when true, it will skip paths that are
not parallel-safe.  And similarly have a function to find the cheapest
parallel-safe path if the first one in the list isn't it.  See
attached draft patch.  Then this code can search for the correct path
instead of searching using the wrong criteria and then giving up if it
doesn't get the result it wants.

+    if (!(joinrel->consider_parallel &&
+          save_jointype != JOIN_UNIQUE_OUTER &&
+          save_jointype != JOIN_FULL &&
+          save_jointype != JOIN_RIGHT &&
+          outerrel->partial_pathlist != NIL &&
+          bms_is_empty(joinrel->lateral_relids)))

I would distribute the negation, so that this reads if
(!joinrel->consider_parallel || save_jointype == JOIN_UNIQUE_OUTER ||
save_jointype == JOIN_FULL || ...).  Or maybe better yet, just drop
the ! and the return that follows and put the
consider_parallel_nestloop and consider_parallel_mergejoin calls
inside the if-block.

You could Assert(nestjoinOK) instead of testing it, although I guess
it doesn't really matter.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Kyotaro HORIGUCHI
Date:
Subject: Re: [HACKERS] [BUG FIX] Removing NamedLWLockTrancheArray
Next
From: Yugo Nagata
Date:
Subject: Re: [HACKERS] [POC] hash partitioning