Thread: Re: Inconsistent RestrictInfo serial numbers
On Tue, Oct 8, 2024 at 4:50 PM Richard Guo <guofenglinux@gmail.com> wrote: > > I ran into an "ERROR: variable not found in subplan target lists" > error, which can be reproduced with the following query. > > create table t (a int primary key, b int); > insert into t select i, i from generate_series(1, 10)i; > analyze t; > > explain (costs off) > select 1 from t t1 > left join > (t t2 left join t t3 on t2.a = t3.a) on true > left join t t4 on t1.a is null and t1.b = 1 > right join t t5 on t1.b = t5.b; > ERROR: variable not found in subplan target lists > > The first bad commit is a3179ab69. > > commit a3179ab692be4314d5ee5cd56598976c487d5ef2 > Author: Tom Lane <tgl@sss.pgh.pa.us> > Date: Fri Sep 27 16:04:04 2024 -0400 > > Recalculate where-needed data accurately after a join removal. > > However, after digging into it further, it seems to me that the issue > originated in b262ad440, and the changes in a3179ab69 made it easier > to surface. > > commit b262ad440edecda0b1aba81d967ab560a83acb8a > Author: David Rowley <drowley@postgresql.org> > Date: Tue Jan 23 18:09:18 2024 +1300 > > Add better handling of redundant IS [NOT] NULL quals > > When we generate multiple clones of the same qual condition to cope > with outer join identity 3, we need to ensure that all the clones get > the same serial number. To achieve this, we reset the > root->last_rinfo_serial counter each time we produce RestrictInfo(s) > from the qual (see deconstruct_distribute_oj_quals). This approach > works only if we ensure that we are not changing the qual list in > any way that'd affect the number of RestrictInfos built from it. > > However, with b262ad440, an IS NULL qual on a NOT NULL column might > result in an additional constant-FALSE RestrictInfo. This can > unexpectedly increase root->last_rinfo_serial, causing inconsistent > RestrictInfo serial numbers across multiple clones of the same qual, > which can confuse users of 'rinfo_serial', such as > rebuild_joinclause_attr_needed, and lead to planner errors. > > In the query above, the has_clone version of qual 't1.a is null' would > be reduced to constant-FALSE, while the is_clone version would not. > This results in differing serial numbers for subsequent quals (such as > qual 't1.b = 1') across different versions. > > To fix, I think we can reset the root->last_rinfo_serial counter after > generating the additional constant-FALSE RestrictInfo. Please see > attached. A naive question: Why don't we just reuse the same RestrictInfo instead of creating a new one. I understand that the original RestrictInfo was passed from somewhere else to add it to a given relation. But I don't see any relation specific information being considered while deciding whether the clause is constant false. So may be we should do this processing elsewhere and replace the original clause itself? -- Best Wishes, Ashutosh Bapat
On Tue, Oct 8, 2024 at 9:02 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > But I don't see any relation specific information being > considered while deciding whether the clause is constant false. Isn't rel->notnullattnums relation specific information? We need this information to decide if a Var cannot be NULL. > So may > be we should do this processing elsewhere and replace the original > clause itself? I’m not sure about this. Different versions of the same qual clause can lead to different conclusions about whether it can be reduced to constant-FALSE. I don't think it is possible to replace the original clause; we need to do this processing on a version-by-version basis. Thanks Richard
On Wed, Oct 9, 2024 at 7:45 AM Richard Guo <guofenglinux@gmail.com> wrote: > > On Tue, Oct 8, 2024 at 9:02 PM Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: > > But I don't see any relation specific information being > > considered while deciding whether the clause is constant false. > > Isn't rel->notnullattnums relation specific information? We need this > information to decide if a Var cannot be NULL. I am talking about restriction_is_always_false() being called from add_base_clause_to_rel(). The later doesn't pass the given relid or the corresponding rel to the first. So restriction_is_always_false() is independent of relid passed to add_base_clause_to_rel() and thus restriction_is_always_false() being called from outside (say caller of add_base_clause_to_rel()) will have same result for the same RestrictInfo. > > > So may > > be we should do this processing elsewhere and replace the original > > clause itself? > > I’m not sure about this. Different versions of the same qual clause > can lead to different conclusions about whether it can be reduced to > constant-FALSE. Can you please provide an example? > I don't think it is possible to replace the original > clause; we need to do this processing on a version-by-version basis. > Either way, each version will be represented by only one RestrictInfo. Why can't we replace that version instead of creating a new RestrictInfo? -- Best Wishes, Ashutosh Bapat
On Wed, Oct 9, 2024 at 9:30 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > On Wed, Oct 9, 2024 at 7:45 AM Richard Guo <guofenglinux@gmail.com> wrote: > > On Tue, Oct 8, 2024 at 9:02 PM Ashutosh Bapat > > <ashutosh.bapat.oss@gmail.com> wrote: > > > But I don't see any relation specific information being > > > considered while deciding whether the clause is constant false. > > > > Isn't rel->notnullattnums relation specific information? We need this > > information to decide if a Var cannot be NULL. > > I am talking about restriction_is_always_false() being called from > add_base_clause_to_rel(). The later doesn't pass the given relid or > the corresponding rel to the first. So restriction_is_always_false() > is independent of relid passed to add_base_clause_to_rel() and thus > restriction_is_always_false() being called from outside (say caller of > add_base_clause_to_rel()) will have same result for the same > RestrictInfo. I see what you meant now. You're right that the result of restriction_is_always_true/false is independent of the relid. However, I still don't think it's a good idea to move it elsewhere because: 1) we need the relid to determine whether the relation is an inheritance parent table, as we must avoid this type of simplification for those tables; 2) I recall there was a discussion during the work on commit b262ad440 about potentially extending the simplification logic in the future to consider the relation's constraints. > > > So may > > > be we should do this processing elsewhere and replace the original > > > clause itself? > > > > I’m not sure about this. Different versions of the same qual clause > > can lead to different conclusions about whether it can be reduced to > > constant-FALSE. > > Can you please provide an example? I think the query I provided in my initial email serves as an example. To quote what I said there: " In the query above, the has_clone version of qual 't1.a is null' would be reduced to constant-FALSE, while the is_clone version would not. " > > I don't think it is possible to replace the original > > clause; we need to do this processing on a version-by-version basis. > > > > Either way, each version will be represented by only one RestrictInfo. > Why can't we replace that version instead of creating a new > RestrictInfo? Sure we can do that. However, I find that manually altering a well-formed RestrictInfo's clause (along with its left_relids, right_relids, clause_relids, num_base_rels, etc.) seems too hacky to me. Thanks Richard
On Thu, Oct 10, 2024 at 7:37 AM Richard Guo <guofenglinux@gmail.com> wrote: > > > > > > So may > > > > be we should do this processing elsewhere and replace the original > > > > clause itself? > > > > > > I’m not sure about this. Different versions of the same qual clause > > > can lead to different conclusions about whether it can be reduced to > > > constant-FALSE. > > > > Can you please provide an example? > > I think the query I provided in my initial email serves as an example. > To quote what I said there: > > " > In the query above, the has_clone version of qual 't1.a is null' would > be reduced to constant-FALSE, while the is_clone version would not. > " > > > > I don't think it is possible to replace the original > > > clause; we need to do this processing on a version-by-version basis. > > > > > > > Either way, each version will be represented by only one RestrictInfo. > > Why can't we replace that version instead of creating a new > > RestrictInfo? > > Sure we can do that. However, I find that manually altering a > well-formed RestrictInfo's clause (along with its left_relids, > right_relids, clause_relids, num_base_rels, etc.) seems too > hacky to me. That's fair. 1. What strikes me as odd in your patch is that the last_rinfo_serial is reset so far away from the new clause that will be created with the next last_rinfo_serial OR the clause whose clones are being created. It either indicates another site where we are missing (possibly future) to restore rinfo_serial in a clone OR reset of last_serial_rinfo needs to happen somewhere else. Why isn't resetting last_rinfo_serial in deconstruct_distribute_oj_quals() enough? 2. This change would also prevent add_base_clause_to_rel() and add_join_clause_to_rels() from being used anywhere outside the context of deconstruct_distribute_oj_quals() since these functions would reset last_rinfo_serial when it's expected to be incremented. Moreover another make_restrictinfo() call somewhere in the minions of distribute_qual_to_rels() or distribute_quals_to_rels() would cause a similar bug. 3. Do we need to reset last_serial_rinfo everywhere we reset rinfo_serial e.g. create_join_clause()? I suspect that b262ad440edecda0b1aba81d967ab560a83acb8a didn't do anything wrong. It's effectively copying rinfo_serial of original clause into the derived clause. I would have liked it better if it would have used the same method as create_join_clause(). Some other commti failed to notice the some minion of distribute_quals_to_rels()->distribute_qual_to_rels() may increment last_rinfo_serial while creating an alternate RestrictInfo for the qual passed to distribute_qual_to_rels(). I think a robust fix is to reset last_rinfo_serial in distribute_quals_to_rels() for every qual and add an Assert(root->last_rinfo_serial == saved_last_rinfo_serial + list_length(quals)) before resetting last_rinfo_serial in deconstruct_distribute_oj_quals() to catch any future digressions. -- Best Wishes, Ashutosh Bapat
On Fri, Oct 11, 2024 at 3:02 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > 1. What strikes me as odd in your patch is that the last_rinfo_serial > is reset so far away from the new clause that will be created with the > next last_rinfo_serial OR the clause whose clones are being created. > It either indicates another site where we are missing (possibly > future) to restore rinfo_serial in a clone OR reset of > last_serial_rinfo needs to happen somewhere else. Why isn't resetting > last_rinfo_serial in deconstruct_distribute_oj_quals() enough? Well, the resetting of the last_rinfo_serial counter in deconstruct_distribute_oj_quals is to ensure that multiple clones of the same qual receive the same serial number. As the comment there explains, it works only if we ensure that we are not changing the qual list in any way that'd affect the number of RestrictInfos built from it. However, in b262ad440, we did not get this memo promptly. As I explained earlier, the has_clone version of qual 't1.a is null' would be reduced to constant-FALSE, while the is_clone version would not. That is to say, for the has_clone version, we'd build three RestrictInfos, whereas for the is_clone version, we'd build only two. This discrepancy in numbers is the root cause of the issue. So, resetting last_rinfo_serial in deconstruct_distribute_oj_quals() is not enough. > 2. This change would also prevent add_base_clause_to_rel() and > add_join_clause_to_rels() from being used anywhere outside the context > of deconstruct_distribute_oj_quals() since these functions would reset > last_rinfo_serial when it's expected to be incremented. Moreover > another make_restrictinfo() call somewhere in the minions of > distribute_qual_to_rels() or distribute_quals_to_rels() would cause a > similar bug. I don't see how this change would prevent add_base_clause_to_rel and add_join_clause_to_rels from being used in other context than deconstruct_distribute_oj_quals. Could you please provide an example? > 3. Do we need to reset last_serial_rinfo everywhere we reset > rinfo_serial e.g. create_join_clause()? Hmm, I don't think that's necessary. As long as we'd build the same number of RestrictInfos from the qual list for different versions, we're good. > I suspect that b262ad440edecda0b1aba81d967ab560a83acb8a didn't do > anything wrong. It's effectively copying rinfo_serial of original > clause into the derived clause. I would have liked it better if it > would have used the same method as create_join_clause(). Some other > commti failed to notice the some minion of > distribute_quals_to_rels()->distribute_qual_to_rels() may increment > last_rinfo_serial while creating an alternate RestrictInfo for the > qual passed to distribute_qual_to_rels(). I think a robust fix is to > reset last_rinfo_serial in distribute_quals_to_rels() for every qual > and add an Assert(root->last_rinfo_serial == saved_last_rinfo_serial + > list_length(quals)) before resetting last_rinfo_serial in > deconstruct_distribute_oj_quals() to catch any future digressions. Would you mind proposing a patch for this method so we can discuss the details? Thanks Richard