Re: Oversight in reparameterize_path_by_child leading to executor crash - Mailing list pgsql-hackers
From | Richard Guo |
---|---|
Subject | Re: Oversight in reparameterize_path_by_child leading to executor crash |
Date | |
Msg-id | CAMbWs4-CSR4VnZCDep3ReSoHGTA7E+3tnjF_LmHcX7yiGrkVfQ@mail.gmail.com Whole thread Raw |
In response to | Re: Oversight in reparameterize_path_by_child leading to executor crash (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Oversight in reparameterize_path_by_child leading to executor crash
Re: Oversight in reparameterize_path_by_child leading to executor crash Re: Oversight in reparameterize_path_by_child leading to executor crash Re: Oversight in reparameterize_path_by_child leading to executor crash |
List | pgsql-hackers |
On Tue, Aug 22, 2023 at 10:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Richard Guo <guofenglinux@gmail.com> writes:
> I'm wondering if we can instead adjust the 'inner_req_outer' in
> create_nestloop_path before we perform the check to work around this
> issue for the back branches, something like
> + /*
> + * Adjust the parameterization information, which refers to the topmost
> + * parent.
> + */
> + inner_req_outer =
> + adjust_child_relids_multilevel(root, inner_req_outer,
> + outer_path->parent->relids,
> + outer_path->parent->top_parent_relids);
Mmm ... at the very least you'd need to not do that when top_parent_relids
is unset.
Hmm, adjust_child_relids_multilevel would just return the given relids
unchanged when top_parent_relids is unset, so I think it should be safe
to call it even for non-other rel.
unchanged when top_parent_relids is unset, so I think it should be safe
to call it even for non-other rel.
... But I think the risk/reward ratio for messing with this in the
back branches is unattractive in any case: to fix a corner case that
apparently nobody uses in the field, we risk breaking any number of
mainstream parameterized-path cases. I'm content to commit the v5 patch
(or a successor) into HEAD, but at this point I'm not sure I even want
to risk it in v16, let alone perform delicate surgery to get it to work
in older branches. I think we ought to go with the "tablesample scans
can't be reparameterized" approach in v16 and before.
If we go with the "tablesample scans can't be reparameterized" approach
in the back branches, I'm a little concerned that what if we find more
cases in the futrue where we need modify RTEs for reparameterization.
So I spent some time seeking and have managed to find one: there might
be lateral references in a scan path's restriction clauses, and
currently reparameterize_path_by_child fails to adjust them.
regression=# explain (verbose, costs off)
select count(*) from prt1 t1 left join lateral
(select t1.b as t1b, t2.* from prt2 t2) s
on t1.a = s.b where s.t1b = s.a;
QUERY PLAN
----------------------------------------------------------------------
Aggregate
Output: count(*)
-> Append
-> Nested Loop
-> Seq Scan on public.prt1_p1 t1_1
Output: t1_1.a, t1_1.b
-> Index Scan using iprt2_p1_b on public.prt2_p1 t2_1
Output: t2_1.b
Index Cond: (t2_1.b = t1_1.a)
Filter: (t1.b = t2_1.a)
-> Nested Loop
-> Seq Scan on public.prt1_p2 t1_2
Output: t1_2.a, t1_2.b
-> Index Scan using iprt2_p2_b on public.prt2_p2 t2_2
Output: t2_2.b
Index Cond: (t2_2.b = t1_2.a)
Filter: (t1.b = t2_2.a)
-> Nested Loop
-> Seq Scan on public.prt1_p3 t1_3
Output: t1_3.a, t1_3.b
-> Index Scan using iprt2_p3_b on public.prt2_p3 t2_3
Output: t2_3.b
Index Cond: (t2_3.b = t1_3.a)
Filter: (t1.b = t2_3.a)
(24 rows)
The clauses in 'Filter' are not adjusted correctly. Var 't1.b' still
refers to the top parent.
For this query it would just give wrong results.
regression=# set enable_partitionwise_join to off;
SET
regression=# select count(*) from prt1 t1 left join lateral
(select t1.b as t1b, t2.* from prt2 t2) s
on t1.a = s.b where s.t1b = s.a;
count
-------
100
(1 row)
regression=# set enable_partitionwise_join to on;
SET
regression=# select count(*) from prt1 t1 left join lateral
(select t1.b as t1b, t2.* from prt2 t2) s
on t1.a = s.b where s.t1b = s.a;
count
-------
5
(1 row)
For another query it would error out during planning.
regression=# explain (verbose, costs off)
select count(*) from prt1 t1 left join lateral
(select t1.b as t1b, t2.* from prt2 t2) s
on t1.a = s.b where s.t1b = s.b;
ERROR: variable not found in subplan target list
in the back branches, I'm a little concerned that what if we find more
cases in the futrue where we need modify RTEs for reparameterization.
So I spent some time seeking and have managed to find one: there might
be lateral references in a scan path's restriction clauses, and
currently reparameterize_path_by_child fails to adjust them.
regression=# explain (verbose, costs off)
select count(*) from prt1 t1 left join lateral
(select t1.b as t1b, t2.* from prt2 t2) s
on t1.a = s.b where s.t1b = s.a;
QUERY PLAN
----------------------------------------------------------------------
Aggregate
Output: count(*)
-> Append
-> Nested Loop
-> Seq Scan on public.prt1_p1 t1_1
Output: t1_1.a, t1_1.b
-> Index Scan using iprt2_p1_b on public.prt2_p1 t2_1
Output: t2_1.b
Index Cond: (t2_1.b = t1_1.a)
Filter: (t1.b = t2_1.a)
-> Nested Loop
-> Seq Scan on public.prt1_p2 t1_2
Output: t1_2.a, t1_2.b
-> Index Scan using iprt2_p2_b on public.prt2_p2 t2_2
Output: t2_2.b
Index Cond: (t2_2.b = t1_2.a)
Filter: (t1.b = t2_2.a)
-> Nested Loop
-> Seq Scan on public.prt1_p3 t1_3
Output: t1_3.a, t1_3.b
-> Index Scan using iprt2_p3_b on public.prt2_p3 t2_3
Output: t2_3.b
Index Cond: (t2_3.b = t1_3.a)
Filter: (t1.b = t2_3.a)
(24 rows)
The clauses in 'Filter' are not adjusted correctly. Var 't1.b' still
refers to the top parent.
For this query it would just give wrong results.
regression=# set enable_partitionwise_join to off;
SET
regression=# select count(*) from prt1 t1 left join lateral
(select t1.b as t1b, t2.* from prt2 t2) s
on t1.a = s.b where s.t1b = s.a;
count
-------
100
(1 row)
regression=# set enable_partitionwise_join to on;
SET
regression=# select count(*) from prt1 t1 left join lateral
(select t1.b as t1b, t2.* from prt2 t2) s
on t1.a = s.b where s.t1b = s.a;
count
-------
5
(1 row)
For another query it would error out during planning.
regression=# explain (verbose, costs off)
select count(*) from prt1 t1 left join lateral
(select t1.b as t1b, t2.* from prt2 t2) s
on t1.a = s.b where s.t1b = s.b;
ERROR: variable not found in subplan target list
To fix it we may need to modify RelOptInfos for Path, BitmapHeapPath,
ForeignPath and CustomPath, and modify IndexOptInfos for IndexPath. It
seems that that is not easily done without postponing reparameterization
of paths until createplan.c.
Attached is a patch which is v5 + fix for this new issue.
Thanks
Richard
ForeignPath and CustomPath, and modify IndexOptInfos for IndexPath. It
seems that that is not easily done without postponing reparameterization
of paths until createplan.c.
Attached is a patch which is v5 + fix for this new issue.
Thanks
Richard
Attachment
pgsql-hackers by date: