Re: [HACKERS] Proposal : Parallel Merge Join - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: [HACKERS] Proposal : Parallel Merge Join
Date
Msg-id CAFiTN-szCEcZrQm0i_w4xqSaRUTOUFstNu32Zn4rxxDcoa8gnA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Proposal : Parallel Merge Join  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] Proposal : Parallel Merge Join  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Tue, Mar 7, 2017 at 5:21 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> +    /* Can't do anything else if inner path needs to be unique'd */
> +    if (save_jointype == JOIN_UNIQUE_INNER)
> +        return;
>
> Right after this, you should try_partial_mergejoin_path() with the
> result of get_cheapest_parallel_safe_total_inner(), just like
> sort_inner_and_outer() already does.
>
> I think with some work you could merge generate_mergejoin_paths with
> generate_partial_mergejoin_paths instead of duplicating all the code.
> They're not really that different, and you could reduce the
> differences further if you made try_mergejoin_path() take an
> additional is_partial argument and pass the call to
> try_partial_mergejoin_path() if it's true.  The various bits of
> special handling related to join types that aren't supported in
> parallel queries aren't going to help you in parallel mode, but they
> won't hurt either.  This bit:
>
> +    /* Generate partial path if inner is parallel safe. */
> +    if (inner_cheapest_total->parallel_safe)
> +        try_partial_mergejoin_path(root,
>
> ...is really not right.  That value is being passed down from
> consider_parallel_mergejoin(), which is getting it from
> match_unsorted_outer(), which really shouldn't be calling this code in
> the first place with a path that's not parallel-safe.  What it ought
> to do is (a) pass inner_cheapest_total if that's non-NULL and
> parallel-safe, or else (b) call
> get_cheapest_parallel_safe_total_inner() and pass that value, (c)
> unless that's NULL also in which case it should skip the call
> altogether.

I have tried to address above comments in the latest patch, But there
is one part of the patch which I am not quite sure yet. So still this
patch is not a final one.

+ *
+ * If inner_cheapest_total is not NULL or not a parallel safe path then
+ * find the cheapest total parallel safe path.
+ */
+ if (inner_cheapest_total == NULL ||
+ inner_cheapest_total->parallel_safe == false)
+ {
+ inner_cheapest_total =
+ get_cheapest_parallel_safe_total_inner(innerrel->pathlist);
+ }
+
+ if (inner_cheapest_total)
+ consider_parallel_mergejoin(root, joinrel, outerrel, innerrel,
+ save_jointype, extra,
+ inner_cheapest_total);

I am confused about whether to call
"get_cheapest_parallel_safe_total_inner" with
innerrel->cheapest_parameterized_paths like we do in case of
hash_inner_and_outer or with
innerrel->pathlist.  The reason behind I am calling this with complete
pathlist (innerrel->pathlist) is that inside generate_mergejoin_paths
it calls "get_cheapest_path_for_pathkeys" for complete pathlist and if
we decide not to call consider_parallel_mergejoin if the cheapest
parallel safe path in innerrel->cheapest_parameterized_paths is NULL
then it will not be fair (we might have some parallel safe paths in
innerrel->pathlist).

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Oleg Bartunov
Date:
Subject: Re: [HACKERS] SQL/JSON in PostgreSQL
Next
From: Heikki Linnakangas
Date:
Subject: Re: [HACKERS] SCRAM authentication, take three