Re: Oversight in reparameterize_path_by_child leading to executor crash - Mailing list pgsql-hackers
From | Richard Guo |
---|---|
Subject | Re: Oversight in reparameterize_path_by_child leading to executor crash |
Date | |
Msg-id | CAMbWs4-pXDTs_w+LMEmGUCYSAGxMRwdndaHiRSZHRA72qqzRBA@mail.gmail.com Whole thread Raw |
In response to | Re: Oversight in reparameterize_path_by_child leading to executor crash (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>) |
Responses |
Re: Oversight in reparameterize_path_by_child leading to executor crash
|
List | pgsql-hackers |
On Mon, Aug 7, 2023 at 9:29 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
Thanks. The patch looks good overall. I like it because of its potential to reduce memory consumption in reparameterization. That's cake on top of cream :)
Thanks for reviewing this patch!
Some comments here.+ {
+ FLAT_COPY_PATH(new_path, path, Path);
+
+ if (path->pathtype == T_SampleScan)
+ {
+ Index scan_relid = path->parent->relid;
+ RangeTblEntry *rte;
+
+ /* it should be a base rel with a tablesample clause... */
+ Assert(scan_relid > 0);
+ rte = planner_rt_fetch(scan_relid, root);
+ Assert(rte->rtekind == RTE_RELATION);
+ Assert(rte->tablesample != NULL);
+
+ ADJUST_CHILD_EXPRS(rte->tablesample, TableSampleClause *);
+ }
+ }
This change makes this function usable only after the final plan has been
selected. If some code accidently uses it earlier, it would corrupt
rte->tablesample without getting detected easily. I think we should mention
this in the function prologue and move it somewhere else. Other option is to a.
leave rte->tablesample unchanged here with a comment as to why we aren't
changing it b. reparameterize tablesample expression in
create_samplescan_plan() if the path is parameterized by the parent. We call
reparameterize_path_by_child() in create_nestloop_plan() as this patch does
right now. I like that we are delaying reparameterization. It saves a bunch of
memory. I haven't measured it but from my recent experiments I know it will be
a lot.
I agree that we should mention in the function's comment that
reparameterize_path_by_child can only be used after the best path has
been selected because the RTE might be modified by this function. I'm
not sure if it's a good idea to move the reparameterization of
tablesample to create_samplescan_plan(). That would make the
reparameterization work separated in different places. And in the
future we may find more cases where we need modify RTEs or RELs for
reparameterization. It seems better to me that we keep all the work in
one place.
reparameterize_path_by_child can only be used after the best path has
been selected because the RTE might be modified by this function. I'm
not sure if it's a good idea to move the reparameterization of
tablesample to create_samplescan_plan(). That would make the
reparameterization work separated in different places. And in the
future we may find more cases where we need modify RTEs or RELs for
reparameterization. It seems better to me that we keep all the work in
one place.
* If the inner path is parameterized, it is parameterized by the
- * topmost parent of the outer rel, not the outer rel itself. Fix
- * that.
+ * topmost parent of the outer rel, not the outer rel itself. We will
There's something wrong with the original sentence (which was probably written
by me back then :)). I think it should have read "If the inner path is
parameterized by the topmost parent of the outer rel instead of the outer rel
itself, fix that.". If you agree, the new comment should change it accordingly.
Right. Will do that.
@@ -2430,6 +2430,16 @@ create_nestloop_path(PlannerInfo *root,
{
NestPath *pathnode = makeNode(NestPath);
Relids inner_req_outer = PATH_REQ_OUTER(inner_path);
+ Relids outerrelids;
+
+ /*
+ * Paths are parameterized by top-level parents, so run parameterization
+ * tests on the parent relids.
+ */
+ if (outer_path->parent->top_parent_relids)
+ outerrelids = outer_path->parent->top_parent_relids;
+ else
+ outerrelids = outer_path->parent->relids;
This looks like an existing bug. Did partitionwise join paths ended up with
extra restrict clauses without this fix? I am not able to see why this change
is needed by rest of the changes in the patch?
This is not an existing bug. Our current code (without this patch)
would always call create_nestloop_path() after the reparameterization
work has been done for the inner path. So we can safely check against
outer rel (not its topmost parent) in create_nestloop_path(). But now
with this patch, the reparameterization is postponed until createplan.c,
so we have to check against the topmost parent of outer rel in
create_nestloop_path(), otherwise we'd have duplicate clauses in the
final plan. For instance, without this change you'd get a plan like
regression=# explain (costs off)
select * from prt1 t1 left join lateral
(select t1.a as t1a, t2.* from prt1 t2) s on t1.a = s.a;
QUERY PLAN
---------------------------------------------------------
Append
-> Nested Loop Left Join
Join Filter: (t1_1.a = t2_1.a)
-> Seq Scan on prt1_p1 t1_1
-> Index Scan using iprt1_p1_a on prt1_p1 t2_1
Index Cond: (a = t1_1.a)
-> Nested Loop Left Join
Join Filter: (t1_2.a = t2_2.a)
-> Seq Scan on prt1_p2 t1_2
-> Index Scan using iprt1_p2_a on prt1_p2 t2_2
Index Cond: (a = t1_2.a)
-> Nested Loop Left Join
Join Filter: (t1_3.a = t2_3.a)
-> Seq Scan on prt1_p3 t1_3
-> Index Scan using iprt1_p3_a on prt1_p3 t2_3
Index Cond: (a = t1_3.a)
(16 rows)
Note that the clauses in Join Filter are duplicate.
would always call create_nestloop_path() after the reparameterization
work has been done for the inner path. So we can safely check against
outer rel (not its topmost parent) in create_nestloop_path(). But now
with this patch, the reparameterization is postponed until createplan.c,
so we have to check against the topmost parent of outer rel in
create_nestloop_path(), otherwise we'd have duplicate clauses in the
final plan. For instance, without this change you'd get a plan like
regression=# explain (costs off)
select * from prt1 t1 left join lateral
(select t1.a as t1a, t2.* from prt1 t2) s on t1.a = s.a;
QUERY PLAN
---------------------------------------------------------
Append
-> Nested Loop Left Join
Join Filter: (t1_1.a = t2_1.a)
-> Seq Scan on prt1_p1 t1_1
-> Index Scan using iprt1_p1_a on prt1_p1 t2_1
Index Cond: (a = t1_1.a)
-> Nested Loop Left Join
Join Filter: (t1_2.a = t2_2.a)
-> Seq Scan on prt1_p2 t1_2
-> Index Scan using iprt1_p2_a on prt1_p2 t2_2
Index Cond: (a = t1_2.a)
-> Nested Loop Left Join
Join Filter: (t1_3.a = t2_3.a)
-> Seq Scan on prt1_p3 t1_3
-> Index Scan using iprt1_p3_a on prt1_p3 t2_3
Index Cond: (a = t1_3.a)
(16 rows)
Note that the clauses in Join Filter are duplicate.
Anyway, let's rename outerrelids to something else e.g. outer_param_relids to reflect
its true meaning.
Hmm, I'm not sure this is a good idea. Here the 'outerrelids' is just
the relids of the outer rel or outer rel's topmost parent. I think it's
better to keep it as-is, and this is consistent with 'outerrelids' in
try_nestloop_path().
the relids of the outer rel or outer rel's topmost parent. I think it's
better to keep it as-is, and this is consistent with 'outerrelids' in
try_nestloop_path().
+bool
+path_is_reparameterizable_by_child(Path *path)
+{
+ switch (nodeTag(path))
+ {
+ case T_Path:
... snip ...
+ case T_GatherPath:
+ return true;
+ default:
+
+ /* We don't know how to reparameterize this path. */
+ return false;
+ }
+
+ return false;
+}
How do we make sure that any changes to reparameterize_path_by_child() are
reflected here? One way is to call this function from
reparameterize_path_by_child() the first thing and return if the path can not
be reparameterized.
Good question. But I don't think calling this function at the beginning
of reparameterize_path_by_child() can solve this problem. Even if we do
that, if we find that the path cannot be reparameterized in
reparameterize_path_by_child(), it would be too late for us to take any
measures. So we need to make sure that such situation cannot happen. I
think we can emphasize this point in the comments of the two functions,
and meanwhile add an Assert in reparameterize_path_by_child() that the
path must be reparameterizable.
of reparameterize_path_by_child() can solve this problem. Even if we do
that, if we find that the path cannot be reparameterized in
reparameterize_path_by_child(), it would be too late for us to take any
measures. So we need to make sure that such situation cannot happen. I
think we can emphasize this point in the comments of the two functions,
and meanwhile add an Assert in reparameterize_path_by_child() that the
path must be reparameterizable.
-#define ADJUST_CHILD_ATTRS(node) \
+#define ADJUST_CHILD_EXPRS(node, fieldtype) \
((node) = \
- (List *) adjust_appendrel_attrs_multilevel(root, (Node *) (node), \
- child_rel, \
- child_rel->top_parent))
+ (fieldtype) adjust_appendrel_attrs_multilevel(root, (Node *) (node), \
+ child_rel, \
+ child_rel->top_parent))
+
+#define ADJUST_CHILD_ATTRS(node) ADJUST_CHILD_EXPRS(node, List *)
IIRC, ATTRS here is taken from the function being called. I think we should
just change the macro definition, not its name. If you would like to have a
separate macro for List nodes, create ADJUST_CHILD_ATTRS_IN_LIST or some such.
Agreed. I introduced the new macro ADJUST_CHILD_EXPRS just to keep
macro ADJUST_CHILD_ATTRS as-is, so callers to ADJUST_CHILD_ATTRS can
keep unchanged. It seems better to just change ADJUST_CHILD_ATTRS as
well as its callers. Will do that.
Attaching v3 patch to address all the reviews above.
Thanks
Richard
macro ADJUST_CHILD_ATTRS as-is, so callers to ADJUST_CHILD_ATTRS can
keep unchanged. It seems better to just change ADJUST_CHILD_ATTRS as
well as its callers. Will do that.
Attaching v3 patch to address all the reviews above.
Thanks
Richard
Attachment
pgsql-hackers by date: