Thread: Fix bogus Asserts in calc_non_nestloop_required_outer
As stated in [1], all paths arriving here are parameterized by top
parents, so we should check against top_parent_relids if it exists in
the two Asserts.
Attached is a patch fixing that.
[1] https://www.postgresql.org/message-id/CAMbWs4_UoVcCwkVMfi9TjSC%3Do5U6BRHUNZiVhrvSbDfU2HaeDA%40mail.gmail.com
Thanks
Richard
parents, so we should check against top_parent_relids if it exists in
the two Asserts.
Attached is a patch fixing that.
[1] https://www.postgresql.org/message-id/CAMbWs4_UoVcCwkVMfi9TjSC%3Do5U6BRHUNZiVhrvSbDfU2HaeDA%40mail.gmail.com
Thanks
Richard
Attachment
Hi Richard, > As stated in [1], all paths arriving here are parameterized by top > parents, so we should check against top_parent_relids if it exists in > the two Asserts. > > Attached is a patch fixing that. Probably it's just because of my limited experience with the optimizer but I don't find the proposed change particularly straightforward. I would suggest adding a comment before the Assert's and/or a detailed commit message would be helpful. Other than that I can confirm that both branches in the Assert's are executed and the tests pass in different test environments. -- Best regards, Aleksander Alekseev
On Thu, Sep 7, 2023 at 11:09 PM Aleksander Alekseev <aleksander@timescale.com> wrote:
Probably it's just because of my limited experience with the optimizer
but I don't find the proposed change particularly straightforward. I
would suggest adding a comment before the Assert's and/or a detailed
commit message would be helpful.
Comment is now added above the Asserts. Thanks for taking an interest
in this.
Thanks
Richard
in this.
Thanks
Richard
Attachment
I have reviewed the changes and it looks fine. The new status of this patch is: Ready for Committer
On Fri, Nov 3, 2023 at 2:47 AM Richard Guo <guofenglinux@gmail.com> wrote: > Comment is now added above the Asserts. Thanks for taking an interest > in this. I'd like to suggest rewording this comment a little more. Here's my proposal: Both of the paths passed to this function are still parameterized by the topmost parent, because reparameterize_path_by_child() hasn't been called yet. So, these sanity checks use the parent relids. What do you think? -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Nov 3, 2023 at 2:47 AM Richard Guo <guofenglinux@gmail.com> wrote: >> Comment is now added above the Asserts. Thanks for taking an interest >> in this. > I'd like to suggest rewording this comment a little more. Here's my proposal: > Both of the paths passed to this function are still parameterized by > the topmost parent, because reparameterize_path_by_child() hasn't been > called yet. So, these sanity checks use the parent relids. > What do you think? I don't think I believe this code change, let alone any of the explanations for it. The point of these Asserts is to be sure that we don't form an alleged parameterization set that includes any rels that are included in the new join, because that would be obviously wrong. They do check that as they stand. I'm not sure what invariant the proposed revision would be enforcing. There might be an argument for leaving the existing Asserts in place and *also* checking Assert(!bms_overlap(outer_paramrels, inner_path->parent->top_parent_relids)); Assert(!bms_overlap(inner_paramrels, outer_path->parent->top_parent_relids)); but I'm not quite convinced what the point of that is. regards, tom lane
On Fri, Jan 5, 2024 at 4:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I don't think I believe this code change, let alone any of the > explanations for it. The point of these Asserts is to be sure that > we don't form an alleged parameterization set that includes any rels > that are included in the new join, because that would be obviously > wrong. They do check that as they stand. I'm not sure what invariant > the proposed revision would be enforcing. Well, this explanation made sense to me: https://www.postgresql.org/message-id/CAMbWs4-%2BGs0HJ9ouBUb%3DqwHsGCXxG%2B92eJzLOpCkedvgtOWQ%3DQ%40mail.gmail.com -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Jan 5, 2024 at 4:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I don't think I believe this code change, let alone any of the >> explanations for it. The point of these Asserts is to be sure that >> we don't form an alleged parameterization set that includes any rels >> that are included in the new join, because that would be obviously >> wrong. They do check that as they stand. I'm not sure what invariant >> the proposed revision would be enforcing. > Well, this explanation made sense to me: > https://www.postgresql.org/message-id/CAMbWs4-%2BGs0HJ9ouBUb%3DqwHsGCXxG%2B92eJzLOpCkedvgtOWQ%3DQ%40mail.gmail.com The argument for the patch as proposed is that we should make the mergejoin and hashjoin code paths do what the nestloop path is doing. However, as I replied further down in that other thread, I'm not exactly convinced that the nestloop path is doing the right thing. I've not had time to look closer at that, though. regards, tom lane
On Sat, Jan 6, 2024 at 4:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Well, this explanation made sense to me: > > https://www.postgresql.org/message-id/CAMbWs4-%2BGs0HJ9ouBUb%3DqwHsGCXxG%2B92eJzLOpCkedvgtOWQ%3DQ%40mail.gmail.com > > The argument for the patch as proposed is that we should make the > mergejoin and hashjoin code paths do what the nestloop path is doing. > However, as I replied further down in that other thread, I'm not > exactly convinced that the nestloop path is doing the right thing. > I've not had time to look closer at that, though. I don't really understand what you were saying in your response there, or what you're saying here. It seems to me that if the path is parameterized by top relids, and the assertion is verifying that a certain set of non-toprelids i.e. otherrels isn't present, then obviously the assertion is going to pass, but it's not checking what it purports to be checking. The ostensible purpose of the assertion is to check that neither side is parameterized by the other side, because a non-nestloop can't satisfy a parameterization. But with the assertion as currently formulated, it would pass even if one side *were* parameterized by the other side, because outer_paramrels and inner_paramrels would contain child relations, and the set that it was being compared to would contain only top-parents, and so they wouldn't overlap. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Sat, Jan 6, 2024 at 4:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The argument for the patch as proposed is that we should make the >> mergejoin and hashjoin code paths do what the nestloop path is doing. >> However, as I replied further down in that other thread, I'm not >> exactly convinced that the nestloop path is doing the right thing. >> I've not had time to look closer at that, though. > I don't really understand what you were saying in your response there, > or what you're saying here. It seems to me that if the path is > parameterized by top relids, and the assertion is verifying that a > certain set of non-toprelids i.e. otherrels isn't present, then > obviously the assertion is going to pass, but it's not checking what > it purports to be checking. I think we're talking at cross-purposes. What I was wondering about (at least further down in the thread) was whether we shouldn't be checking *both* the "real" and the "parent" relids to make sure they don't overlap the parameterization sets. But it's probably overkill. After reading the code again I agree that the parent relids are more useful to check. However, I still don't like Richard's patch too much as-is, because the Asserts are difficult to read/understand and even more difficult to compare to the other code path. I think we want to keep the nestloop and not-nestloop paths as visually similar as possible, so I propose we do it more like the attached (where I also borrowed some of your wording for the comments). A variant of this could be to ifdef out all the added code in non-Assert builds: Relids outer_paramrels = PATH_REQ_OUTER(outer_path); Relids inner_paramrels = PATH_REQ_OUTER(inner_path); Relids required_outer; #ifdef USE_ASSERT_CHECKING Relids innerrelids; Relids outerrelids; /* * Any parameterization of the input paths refers to topmost parents of * the relevant relations, because reparameterize_path_by_child() hasn't * been called yet. So we must consider topmost parents of the relations * being joined, too, while checking for disallowed parameterization * cases. */ if (inner_path->parent->top_parent_relids) innerrelids = inner_path->parent->top_parent_relids; else innerrelids = inner_path->parent->relids; if (outer_path->parent->top_parent_relids) outerrelids = outer_path->parent->top_parent_relids; else outerrelids = outer_path->parent->relids; /* neither path can require rels from the other */ Assert(!bms_overlap(outer_paramrels, innerrelids)); Assert(!bms_overlap(inner_paramrels, outerrelids)); #endif /* form the union ... */ but I'm not sure that's better. Probably any reasonable compiler will throw away the whole calculation upon noticing the outputs are unused. regards, tom lane diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c index 3fd5a24fad..e9def9d540 100644 --- a/src/backend/optimizer/path/joinpath.c +++ b/src/backend/optimizer/path/joinpath.c @@ -730,8 +730,11 @@ try_nestloop_path(PlannerInfo *root, return; /* - * Paths are parameterized by top-level parents, so run parameterization - * tests on the parent relids. + * Any parameterization of the input paths refers to topmost parents of + * the relevant relations, because reparameterize_path_by_child() hasn't + * been called yet. So we must consider topmost parents of the relations + * being joined, too, while determining parameterization of the result and + * checking for disallowed parameterization cases. */ if (innerrel->top_parent_relids) innerrelids = innerrel->top_parent_relids; diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c index f7ac71087a..8dbf790e89 100644 --- a/src/backend/optimizer/util/pathnode.c +++ b/src/backend/optimizer/util/pathnode.c @@ -2360,6 +2360,9 @@ create_foreign_upper_path(PlannerInfo *root, RelOptInfo *rel, * calc_nestloop_required_outer * Compute the required_outer set for a nestloop join path * + * Note: when considering a child join, the inputs nonetheless use top-level + * parent relids + * * Note: result must not share storage with either input */ Relids @@ -2394,11 +2397,30 @@ calc_non_nestloop_required_outer(Path *outer_path, Path *inner_path) { Relids outer_paramrels = PATH_REQ_OUTER(outer_path); Relids inner_paramrels = PATH_REQ_OUTER(inner_path); + Relids innerrelids PG_USED_FOR_ASSERTS_ONLY; + Relids outerrelids PG_USED_FOR_ASSERTS_ONLY; Relids required_outer; + /* + * Any parameterization of the input paths refers to topmost parents of + * the relevant relations, because reparameterize_path_by_child() hasn't + * been called yet. So we must consider topmost parents of the relations + * being joined, too, while checking for disallowed parameterization + * cases. + */ + if (inner_path->parent->top_parent_relids) + innerrelids = inner_path->parent->top_parent_relids; + else + innerrelids = inner_path->parent->relids; + + if (outer_path->parent->top_parent_relids) + outerrelids = outer_path->parent->top_parent_relids; + else + outerrelids = outer_path->parent->relids; + /* neither path can require rels from the other */ - Assert(!bms_overlap(outer_paramrels, inner_path->parent->relids)); - Assert(!bms_overlap(inner_paramrels, outer_path->parent->relids)); + Assert(!bms_overlap(outer_paramrels, innerrelids)); + Assert(!bms_overlap(inner_paramrels, outerrelids)); /* form the union ... */ required_outer = bms_union(outer_paramrels, inner_paramrels); /* we do not need an explicit test for empty; bms_union gets it right */
On Mon, Jan 8, 2024 at 5:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I think we're talking at cross-purposes. What I was wondering about > (at least further down in the thread) was whether we shouldn't be > checking *both* the "real" and the "parent" relids to make sure they > don't overlap the parameterization sets. But it's probably overkill. > After reading the code again I agree that the parent relids are more > useful to check. Hmm, OK. But we could also find some way to assert that the parameterization sets contain only top-most rels. That would be strictly stronger than asserting that they don't contain any non-topmost rels that exist on the other side. Trying to say that more clearly, I think the rules here are: 1. outer_paramrels can't contain ANY non-top rels 2. outer_paramrels can't contain top rels that appear on the other side of the join What you're proposing to assert here is (2) plus a weaker version of (1), namely "no non-top rels that appear on the other side of the join". That's not wrong, but I don't see why it's better than checking only (2) or checking exactly (1) and (2). > However, I still don't like Richard's patch too much as-is, because > the Asserts are difficult to read/understand and even more difficult > to compare to the other code path. I think we want to keep the > nestloop and not-nestloop paths as visually similar as possible, > so I propose we do it more like the attached (where I also borrowed > some of your wording for the comments). I don't understand what this is parallel to; calc_nestloop_required_outer does no similar dance. If we don't set top_relids when it's the same as relids, then I agree that doing this dance is warranted, but otherwise it's just clutter. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Jan 8, 2024 at 5:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I think we're talking at cross-purposes. What I was wondering about >> (at least further down in the thread) was whether we shouldn't be >> checking *both* the "real" and the "parent" relids to make sure they >> don't overlap the parameterization sets. But it's probably overkill. > But we could also find some way to assert that the parameterization > sets contain only top-most rels. Hmm ... perhaps worth doing. I think bms_is_subset against all_baserels would work. >> However, I still don't like Richard's patch too much as-is, because >> the Asserts are difficult to read/understand and even more difficult >> to compare to the other code path. I think we want to keep the >> nestloop and not-nestloop paths as visually similar as possible, >> so I propose we do it more like the attached (where I also borrowed >> some of your wording for the comments). > I don't understand what this is parallel to; > calc_nestloop_required_outer does no similar dance. No, because the dance is done in its caller. The fact that these two functions don't have more-similar argument lists is a bit of a wart, perhaps. regards, tom lane
I wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> But we could also find some way to assert that the parameterization >> sets contain only top-most rels. > Hmm ... perhaps worth doing. I think bms_is_subset against > all_baserels would work. [ tries that ... ] Nope, it needs to be all_query_rels. I plead ENOCAFFEINE. However, while we could easily add + Assert(bms_is_subset(inner_paramrels, root->all_query_rels)); + Assert(bms_is_subset(outer_paramrels, root->all_query_rels)); to try_nestloop_path, that doesn't work as-is in calc_non_nestloop_required_outer because we don't pass "root" to it. We could add that, or perhaps contemplate some more-extensive rearrangement to make the division of labor between calc_nestloop_required_outer and its caller more like that between calc_non_nestloop_required_outer and its callers. But I'm feeling kind of unenthused about that, because (1) I don't see how to rearrange things without duplicating code: try_nestloop_path needs access to some of these values, while moving any work out of calc_non_nestloop_required_outer would mean two copies in its two callers. So there are reasons why that's not perfectly symmetric. (But I still maintain that we should try to make the code look similar between these two code paths, when considering both the calc_xxx functions and their callers.) (2) joinpath.c already knows that parameterization sets mention only top-level relids, specifically at the definition and uses of PATH_PARAM_BY_PARENT, and I bet there are more dependencies elsewhere. So I'm not sure about the value of asserting it only here. In short I'm inclined to apply v3 as-is. regards, tom lane