Re: Missing MaterialPath support in reparameterize_path_by_child - Mailing list pgsql-hackers

From Richard Guo
Subject Re: Missing MaterialPath support in reparameterize_path_by_child
Date
Msg-id CAMbWs4_0Rk3QN9bUWCvKF7S_OMzg0jsE5OsNM7KZZ5NkO5trMg@mail.gmail.com
Whole thread Raw
In response to Re: Missing MaterialPath support in reparameterize_path_by_child  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Responses Re: Missing MaterialPath support in reparameterize_path_by_child
List pgsql-hackers

On Fri, Dec 2, 2022 at 7:21 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
> I'm suspicious now that reparameterize_path() should be
> extended likewise, but I don't really have any hard
> evidence for that.

I think we need it there since the scope of paths under appendrel has
certainly expanded a lot because of partitioned table optimizations.
 
I tried to see if the similar error can be triggered because of the lack
of MaterialPath support in reparameterize_path but didn't succeed.
Instead I see the optimization opportunity here if we can extend
reparameterize_path.  As an example, consider query

create table t (a int, b int);
insert into t select i, i from generate_series(1,10000)i;
create index on t(a);
analyze t;

explain (costs off)
select * from (select * from t t1 union all select * from t t2 TABLESAMPLE system_time (10)) s join (select * from t t3 limit 1) ss on s.a > ss.a;

Currently parameterized append path is not possible because MaterialPath
is not supported in reparameterize_path.  The current plan looks like

                             QUERY PLAN
--------------------------------------------------------------------
 Nested Loop
   Join Filter: (t1.a > t3.a)
   ->  Limit
         ->  Seq Scan on t t3
   ->  Append
         ->  Seq Scan on t t1
         ->  Materialize
               ->  Sample Scan on t t2
                     Sampling: system_time ('10'::double precision)
(9 rows)

If we extend reparameterize_path to support MaterialPath, we would have
the additional parameterized append path and generate a better plan as
below

                             QUERY PLAN
--------------------------------------------------------------------
 Nested Loop
   ->  Limit
         ->  Seq Scan on t t3
   ->  Append
         ->  Index Scan using t_a_idx on t t1
               Index Cond: (a > t3.a)
         ->  Materialize
               ->  Sample Scan on t t2
                     Sampling: system_time ('10'::double precision)
                     Filter: (a > t3.a)
(10 rows)

So I also agree it's worth doing.

BTW, the code changes I'm using:

--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -3979,6 +3979,17 @@ reparameterize_path(PlannerInfo *root, Path *path,
                                       apath->path.parallel_aware,
                                       -1);
            }
+       case T_Material:
+           {
+               MaterialPath *matpath = (MaterialPath *) path;
+               Path         *spath = matpath->subpath;
+
+               spath = reparameterize_path(root, spath,
+                                           required_outer,
+                                           loop_count);
+
+               return (Path *) create_material_path(rel, spath);
+           }

Thanks
Richard

pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Using AF_UNIX sockets always for tests on Windows
Next
From: Ronan Dunklau
Date:
Subject: Re: Fix gin index cost estimation