Thread: Missing MaterialPath support in reparameterize_path_by_child

Missing MaterialPath support in reparameterize_path_by_child

From
Tom Lane
Date:
Whilst fooling with my outer-join-aware-Vars patch, I tripped
across a multi-way join query that failed with
  ERROR:  could not devise a query plan for the given query
when enable_partitionwise_join is on.

I traced that to the fact that reparameterize_path_by_child()
omits support for MaterialPath, so that if the only surviving
path(s) for a child join include materialization steps, we'll
fail outright to produce a plan for the parent join.

Unfortunately, I don't have an example that produces such a
failure against HEAD.  It seems certain to me that such cases
exist, though, so I'd like to apply and back-patch the attached.

I'm suspicious now that reparameterize_path() should be
extended likewise, but I don't really have any hard
evidence for that.

            regards, tom lane

commit f74ca5d8611af3306eb96719d5d1910c9b6a8db5
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Thu Dec 1 21:04:49 2022 -0500

    Add missing MaterialPath case in reparameterize_path_by_child.

    Surprised we hadn't noticed this already.

diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index c77399ca92..224ba84107 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -4209,6 +4209,16 @@ do { \
             }
             break;

+        case T_MaterialPath:
+            {
+                MaterialPath *mpath;
+
+                FLAT_COPY_PATH(mpath, path, MaterialPath);
+                REPARAMETERIZE_CHILD_PATH(mpath->subpath);
+                new_path = (Path *) mpath;
+            }
+            break;
+
         case T_MemoizePath:
             {
                 MemoizePath *mpath;

Re: Missing MaterialPath support in reparameterize_path_by_child

From
Richard Guo
Date:

On Fri, Dec 2, 2022 at 10:55 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I traced that to the fact that reparameterize_path_by_child()
omits support for MaterialPath, so that if the only surviving
path(s) for a child join include materialization steps, we'll
fail outright to produce a plan for the parent join.
 
Yeah, that's true.  It's weird we neglect MaterialPath here.
 
Unfortunately, I don't have an example that produces such a
failure against HEAD.  It seems certain to me that such cases
exist, though, so I'd like to apply and back-patch the attached.
 
I tried on HEAD and got one, which leverages sampled rel to generate the
MaterialPath and lateral reference to make it the only available path.

SET enable_partitionwise_join to true;

CREATE TABLE prt (a int, b int) PARTITION BY RANGE(a);
CREATE TABLE prt_p1 PARTITION OF prt FOR VALUES FROM (0) TO (10);

CREATE EXTENSION tsm_system_time;

explain (costs off)
select * from prt t1 left join lateral (select t1.a as t1a, t2.a as t2a from prt t2 TABLESAMPLE system_time (10)) ss on ss.t1a = ss.t2a;
ERROR:  could not devise a query plan for the given query

Thanks
Richard

Re: Missing MaterialPath support in reparameterize_path_by_child

From
Ashutosh Bapat
Date:
Hi Tom,

On Fri, Dec 2, 2022 at 8:25 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Whilst fooling with my outer-join-aware-Vars patch, I tripped
> across a multi-way join query that failed with
>   ERROR:  could not devise a query plan for the given query
> when enable_partitionwise_join is on.
>
> I traced that to the fact that reparameterize_path_by_child()
> omits support for MaterialPath, so that if the only surviving
> path(s) for a child join include materialization steps, we'll
> fail outright to produce a plan for the parent join.
>
> Unfortunately, I don't have an example that produces such a
> failure against HEAD.  It seems certain to me that such cases
> exist, though, so I'd like to apply and back-patch the attached.

From this comment, that I wrote back when I implemented that function,
I wonder if we thought MaterialPath wouldn't appear on the inner side
of nestloop join. But that can't be the case. Or probably we didn't
find MaterialPath being there from our tests.
     * This function is currently only applied to the inner side of a nestloop
     * join that is being partitioned by the partitionwise-join code.  Hence,
     * we need only support path types that plausibly arise in that context.
But I think it's good to have MaterialPath there.

>
> I'm suspicious now that reparameterize_path() should be
> extended likewise, but I don't really have any hard
> evidence for that.

I think we need it there since the scope of paths under appendrel has
certainly expanded a lot because of partitioned table optimizations.

The patch looks good to me.

-- 
Best Wishes,
Ashutosh Bapat



Re: Missing MaterialPath support in reparameterize_path_by_child

From
Richard Guo
Date:

On Fri, Dec 2, 2022 at 7:21 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
> I'm suspicious now that reparameterize_path() should be
> extended likewise, but I don't really have any hard
> evidence for that.

I think we need it there since the scope of paths under appendrel has
certainly expanded a lot because of partitioned table optimizations.
 
I tried to see if the similar error can be triggered because of the lack
of MaterialPath support in reparameterize_path but didn't succeed.
Instead I see the optimization opportunity here if we can extend
reparameterize_path.  As an example, consider query

create table t (a int, b int);
insert into t select i, i from generate_series(1,10000)i;
create index on t(a);
analyze t;

explain (costs off)
select * from (select * from t t1 union all select * from t t2 TABLESAMPLE system_time (10)) s join (select * from t t3 limit 1) ss on s.a > ss.a;

Currently parameterized append path is not possible because MaterialPath
is not supported in reparameterize_path.  The current plan looks like

                             QUERY PLAN
--------------------------------------------------------------------
 Nested Loop
   Join Filter: (t1.a > t3.a)
   ->  Limit
         ->  Seq Scan on t t3
   ->  Append
         ->  Seq Scan on t t1
         ->  Materialize
               ->  Sample Scan on t t2
                     Sampling: system_time ('10'::double precision)
(9 rows)

If we extend reparameterize_path to support MaterialPath, we would have
the additional parameterized append path and generate a better plan as
below

                             QUERY PLAN
--------------------------------------------------------------------
 Nested Loop
   ->  Limit
         ->  Seq Scan on t t3
   ->  Append
         ->  Index Scan using t_a_idx on t t1
               Index Cond: (a > t3.a)
         ->  Materialize
               ->  Sample Scan on t t2
                     Sampling: system_time ('10'::double precision)
                     Filter: (a > t3.a)
(10 rows)

So I also agree it's worth doing.

BTW, the code changes I'm using:

--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -3979,6 +3979,17 @@ reparameterize_path(PlannerInfo *root, Path *path,
                                       apath->path.parallel_aware,
                                       -1);
            }
+       case T_Material:
+           {
+               MaterialPath *matpath = (MaterialPath *) path;
+               Path         *spath = matpath->subpath;
+
+               spath = reparameterize_path(root, spath,
+                                           required_outer,
+                                           loop_count);
+
+               return (Path *) create_material_path(rel, spath);
+           }

Thanks
Richard

Re: Missing MaterialPath support in reparameterize_path_by_child

From
Richard Guo
Date:

On Fri, Dec 2, 2022 at 8:49 PM Richard Guo <guofenglinux@gmail.com> wrote:
BTW, the code changes I'm using:

--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -3979,6 +3979,17 @@ reparameterize_path(PlannerInfo *root, Path *path,
                                       apath->path.parallel_aware,
                                       -1);
            }
+       case T_Material:
+           {
+               MaterialPath *matpath = (MaterialPath *) path;
+               Path         *spath = matpath->subpath;
+
+               spath = reparameterize_path(root, spath,
+                                           required_outer,
+                                           loop_count);
+
+               return (Path *) create_material_path(rel, spath);
+           }
 
BTW, the subpath needs to be checked if it is null after being
reparameterized, since it might be a path type that is not supported
yet.

--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -3979,6 +3979,19 @@ reparameterize_path(PlannerInfo *root, Path *path,
                                       apath->path.parallel_aware,
                                       -1);
            }
+       case T_Material:
+           {
+               MaterialPath *matpath = (MaterialPath *) path;
+               Path         *spath = matpath->subpath;
+
+               spath = reparameterize_path(root, spath,
+                                           required_outer,
+                                           loop_count);
+               if (spath == NULL)
+                   return NULL;
+
+               return (Path *) create_material_path(rel, spath);
+           }

Thanks
Richard

Re: Missing MaterialPath support in reparameterize_path_by_child

From
Tom Lane
Date:
Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> writes:
> On Fri, Dec 2, 2022 at 8:25 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Unfortunately, I don't have an example that produces such a
>> failure against HEAD.  It seems certain to me that such cases
>> exist, though, so I'd like to apply and back-patch the attached.

> From this comment, that I wrote back when I implemented that function,
> I wonder if we thought MaterialPath wouldn't appear on the inner side
> of nestloop join. But that can't be the case. Or probably we didn't
> find MaterialPath being there from our tests.
>      * This function is currently only applied to the inner side of a nestloop
>      * join that is being partitioned by the partitionwise-join code.  Hence,
>      * we need only support path types that plausibly arise in that context.
> But I think it's good to have MaterialPath there.

So thinking about this a bit: the reason it is okay if reparameterize_path
fails is that it's not fatal.  We just go on our way without making
a parameterized path for that appendrel.  However, if
reparameterize_path_by_child fails for every available child path,
we end up with "could not devise a query plan", because the
partitionwise-join code is brittle and won't tolerate failure
to build a parent-join path.  Seems like we should be willing to
fall back to a non-partitionwise join in that case.

            regards, tom lane



Re: Missing MaterialPath support in reparameterize_path_by_child

From
Ashutosh Bapat
Date:
On Fri, Dec 2, 2022 at 9:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> writes:
> > On Fri, Dec 2, 2022 at 8:25 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Unfortunately, I don't have an example that produces such a
> >> failure against HEAD.  It seems certain to me that such cases
> >> exist, though, so I'd like to apply and back-patch the attached.
>
> > From this comment, that I wrote back when I implemented that function,
> > I wonder if we thought MaterialPath wouldn't appear on the inner side
> > of nestloop join. But that can't be the case. Or probably we didn't
> > find MaterialPath being there from our tests.
> >      * This function is currently only applied to the inner side of a nestloop
> >      * join that is being partitioned by the partitionwise-join code.  Hence,
> >      * we need only support path types that plausibly arise in that context.
> > But I think it's good to have MaterialPath there.
>
> So thinking about this a bit: the reason it is okay if reparameterize_path
> fails is that it's not fatal.  We just go on our way without making
> a parameterized path for that appendrel.  However, if
> reparameterize_path_by_child fails for every available child path,
> we end up with "could not devise a query plan", because the
> partitionwise-join code is brittle and won't tolerate failure
> to build a parent-join path.  Seems like we should be willing to
> fall back to a non-partitionwise join in that case.
>
>                         regards, tom lane

partition-wise join should be willing to fallback to non-partitionwise
join in such a case. After spending a few minutes with the code, I
think generate_partitionwise_join_paths() should not call
set_cheapest() is the pathlist of the child is NULL and should just
wind up and avoid adding any path.

-- 
Best Wishes,
Ashutosh Bapat



Re: Missing MaterialPath support in reparameterize_path_by_child

From
Tom Lane
Date:
Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> writes:
> partition-wise join should be willing to fallback to non-partitionwise
> join in such a case. After spending a few minutes with the code, I
> think generate_partitionwise_join_paths() should not call
> set_cheapest() is the pathlist of the child is NULL and should just
> wind up and avoid adding any path.

We clearly need to not call set_cheapest(), but that's not sufficient;
we still fail at higher levels, as you'll see if you try the example
Richard found.  I ended up making fe12f2f8f to fix this.

I don't especially like "rel->nparts = 0" as a way of disabling
partitionwise join; ISTM it'd be clearer and more flexible to reset
consider_partitionwise_join instead of destroying the data structure.
But that's the way it's being done elsewhere, and I didn't want to
tamper with it in a bug fix.  I see various assertions about parent
and child consider_partitionwise_join flags being equal, which we
might have to revisit if we try to make it work that way.

            regards, tom lane



Re: Missing MaterialPath support in reparameterize_path_by_child

From
Ashutosh Bapat
Date:
On Mon, Dec 5, 2022 at 8:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> writes:
> > partition-wise join should be willing to fallback to non-partitionwise
> > join in such a case. After spending a few minutes with the code, I
> > think generate_partitionwise_join_paths() should not call
> > set_cheapest() is the pathlist of the child is NULL and should just
> > wind up and avoid adding any path.
>
> We clearly need to not call set_cheapest(), but that's not sufficient;
> we still fail at higher levels, as you'll see if you try the example
> Richard found.  I ended up making fe12f2f8f to fix this.

Thanks. That looks good.

>
> I don't especially like "rel->nparts = 0" as a way of disabling
> partitionwise join; ISTM it'd be clearer and more flexible to reset
> consider_partitionwise_join instead of destroying the data structure.
> But that's the way it's being done elsewhere, and I didn't want to
> tamper with it in a bug fix.  I see various assertions about parent
> and child consider_partitionwise_join flags being equal, which we
> might have to revisit if we try to make it work that way.
>

AFAIR, consider_partitionwise_join tells whether a given partitioned
relation (join, higher or base) can be considered for partitionwise
join. set_append_rel_size() decides that based on some properties. But
rel->nparts is indicator of whether the relation (join, higher or
base) is partitioned or not. If we can not generate AppendPath for a
join relation, it means there is no way to compute child join
relations and thus the relation is not partitioned. So setting
rel->nparts = 0 is right. Probably we should add macros similar to
dummy relation for marking and checking partitioned relation. I see
IS_PARTITIONED_RELATION() is defined already. Maybe we could add
mark_(un)partitioned_rel().

-- 
Best Wishes,
Ashutosh Bapat



Re: Missing MaterialPath support in reparameterize_path_by_child

From
Tom Lane
Date:
Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> writes:
> On Mon, Dec 5, 2022 at 8:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I don't especially like "rel->nparts = 0" as a way of disabling
>> partitionwise join ...

> ... If we can not generate AppendPath for a
> join relation, it means there is no way to compute child join
> relations and thus the relation is not partitioned. So setting
> rel->nparts = 0 is right.

If we had nparts > 0 before, then it is partitioned for some value
of "partitioned", so I don't entirely buy this argument.

> Probably we should add macros similar to
> dummy relation for marking and checking partitioned relation. I see
> IS_PARTITIONED_RELATION() is defined already. Maybe we could add
> mark_(un)partitioned_rel().

Hiding it behind a macro with an explanatory name would be an
improvement, for sure.

            regards, tom lane