Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables
Date
Msg-id CAFjFpRfsxGx-7jZED9ZcWxfUs+R_27K=c-fXHkyFXpNqVDyB6Q@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Wed, Mar 22, 2017 at 6:32 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Mar 22, 2017 at 8:46 AM, Ashutosh Bapat
> <ashutosh.bapat@enterprisedb.com> wrote:
>> I have also completed reparameterize_path_by_child() for all the
>> required paths. There's no TODO there now. :) The function has grown
>> quite long now and might take some time to review. Given the size, I
>> am wondering whether we should separate that fix from the main
>> partition-wise join fix. That will make reviewing that function
>> easier, allowing a careful review. Here's the idea how that can be
>> done. As explained in the commit of 0009, the function is required in
>> case of lateral joins between partitioned relations. For a A LATERAL
>> JOIN B, B is the minimum required parameterization by A. Hence
>> children of A i.e. A1, A2 ... all require their paths to be
>> parameterized by B. When that comes to partition-wise joins, A1
>> requires its paths to be parameterized by B1 (matching partition from
>> B). Otherwise we can not create paths for A1B1. This means that we
>> require to reparameterize all A1's paths to be reparameterized by B1
>> using function reparameterize_paths_by_child(). So the function needs
>> to support reparameterization of all the paths; we do not know which
>> of those have survived add_path(). But if we disable partition-wise
>> join for lateral joins i.e. when direct_lateral_relids of one relation
>> contains the any subset of the relids in the other relation, we do not
>> need reparameterize_path_by_child(). Please let me know if this
>> strategy will help to make review and commit easier.
>
> In my testing last week, reparameterize_path_by_child() was essential
> for nested loops to work properly, even without LATERAL.  Without it,
> the parameterized path ends up containing vars that reference the
> parent varno instead of the child varno.  That confused later planner
> stages so that those Vars did not get replaced with Param during
> replace_nestloop_params(), eventually resulting in a crash at
> execution time.

I half-described the solution. Sorry. Along-with disabling
partition-wise lateral joins, we have to disable nested loop
child-joins where inner child is parameterized by the parent of the
outer one. We will still have nestloop join between parents where
inner relation is parameterized by the outer and every child of inner
is parameterized by the outer. But we won't create nest loop joins
where inner child is parameterized by the outer child, where we
require reparameterize_path_by_child. We will loose this optimization
only till we get reparameterize_path_by_child() committed. Basically,
in try_nestloop_path() (in the patch 0009), if
(PATH_PARAM_BY_PARENT(inner_path, outer_path->parent)), give up
creating nest loop path. That shouldn't create any problems.

Did you experiment with this change in try_nestloop_path()? Can you
please share the testcase? I will take a look at it.

> Based on that experiment, I think we could consider
> having reparameterize_path_by_child() handle only scan paths as
> reparameterize_path() does, and just give up on plans like this:
>
> Append
> -> Left Join
>    -> Scan on a
>    -> Inner Join
>       -> Index Scan on b
>       -> Index Scan on c
> [repeat for each partition]
>

I am assuming that a, b and c are partitions of A, B and C resp. which
are being joined and both or one of the scans on b and c are
parameteried by a or scan of c is parameterized by b.

I don't think we will get away by supporting just scan paths, since
the inner side of lateral join can be any paths not just scan path. Or
you are suggesting that we disable partition-wise lateral join and
support reparameterization of only scan paths?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: [HACKERS] perlcritic
Next
From: Alvaro Herrera
Date:
Subject: Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix and simplify check forwhether we're running as Windows serv