Thread: Re: Inconsistent RestrictInfo serial numbers

Re: Inconsistent RestrictInfo serial numbers

From
Ashutosh Bapat
Date:
On Tue, Oct 8, 2024 at 4:50 PM Richard Guo <guofenglinux@gmail.com> wrote:
>
> I ran into an "ERROR:  variable not found in subplan target lists"
> error, which can be reproduced with the following query.
>
> create table t (a int primary key, b int);
> insert into t select i, i from generate_series(1, 10)i;
> analyze t;
>
> explain (costs off)
> select 1 from t t1
>     left join
>         (t t2 left join t t3 on t2.a = t3.a) on true
>     left join t t4 on t1.a is null and t1.b = 1
>     right join t t5 on t1.b = t5.b;
> ERROR:  variable not found in subplan target lists
>
> The first bad commit is a3179ab69.
>
> commit a3179ab692be4314d5ee5cd56598976c487d5ef2
> Author: Tom Lane <tgl@sss.pgh.pa.us>
> Date:   Fri Sep 27 16:04:04 2024 -0400
>
>     Recalculate where-needed data accurately after a join removal.
>
> However, after digging into it further, it seems to me that the issue
> originated in b262ad440, and the changes in a3179ab69 made it easier
> to surface.
>
> commit b262ad440edecda0b1aba81d967ab560a83acb8a
> Author: David Rowley <drowley@postgresql.org>
> Date:   Tue Jan 23 18:09:18 2024 +1300
>
>     Add better handling of redundant IS [NOT] NULL quals
>
> When we generate multiple clones of the same qual condition to cope
> with outer join identity 3, we need to ensure that all the clones get
> the same serial number.  To achieve this, we reset the
> root->last_rinfo_serial counter each time we produce RestrictInfo(s)
> from the qual (see deconstruct_distribute_oj_quals).  This approach
> works only if we ensure that we are not changing the qual list in
> any way that'd affect the number of RestrictInfos built from it.
>
> However, with b262ad440, an IS NULL qual on a NOT NULL column might
> result in an additional constant-FALSE RestrictInfo.  This can
> unexpectedly increase root->last_rinfo_serial, causing inconsistent
> RestrictInfo serial numbers across multiple clones of the same qual,
> which can confuse users of 'rinfo_serial', such as
> rebuild_joinclause_attr_needed, and lead to planner errors.
>
> In the query above, the has_clone version of qual 't1.a is null' would
> be reduced to constant-FALSE, while the is_clone version would not.
> This results in differing serial numbers for subsequent quals (such as
> qual 't1.b = 1') across different versions.
>
> To fix, I think we can reset the root->last_rinfo_serial counter after
> generating the additional constant-FALSE RestrictInfo.  Please see
> attached.

A naive question: Why don't we just reuse the same RestrictInfo
instead of creating a new one. I understand that the original
RestrictInfo was passed from somewhere else to add it to a given
relation. But I don't see any relation specific information being
considered while deciding whether the clause is constant false. So may
be we should do this processing elsewhere and replace the original
clause itself?

--
Best Wishes,
Ashutosh Bapat



Re: Inconsistent RestrictInfo serial numbers

From
Richard Guo
Date:
On Tue, Oct 8, 2024 at 9:02 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
> But I don't see any relation specific information being
> considered while deciding whether the clause is constant false.

Isn't rel->notnullattnums relation specific information?  We need this
information to decide if a Var cannot be NULL.

> So may
> be we should do this processing elsewhere and replace the original
> clause itself?

I’m not sure about this.  Different versions of the same qual clause
can lead to different conclusions about whether it can be reduced to
constant-FALSE.  I don't think it is possible to replace the original
clause; we need to do this processing on a version-by-version basis.

Thanks
Richard



Re: Inconsistent RestrictInfo serial numbers

From
Ashutosh Bapat
Date:
On Wed, Oct 9, 2024 at 7:45 AM Richard Guo <guofenglinux@gmail.com> wrote:
>
> On Tue, Oct 8, 2024 at 9:02 PM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> > But I don't see any relation specific information being
> > considered while deciding whether the clause is constant false.
>
> Isn't rel->notnullattnums relation specific information?  We need this
> information to decide if a Var cannot be NULL.

I am talking about restriction_is_always_false() being called from
add_base_clause_to_rel(). The later doesn't pass the given relid or
the corresponding rel to the first. So restriction_is_always_false()
is independent of relid passed to add_base_clause_to_rel() and thus
restriction_is_always_false() being called from outside (say caller of
add_base_clause_to_rel()) will have same result for the same
RestrictInfo.

>
> > So may
> > be we should do this processing elsewhere and replace the original
> > clause itself?
>
> I’m not sure about this.  Different versions of the same qual clause
> can lead to different conclusions about whether it can be reduced to
> constant-FALSE.

Can you please provide an example?

>  I don't think it is possible to replace the original
> clause; we need to do this processing on a version-by-version basis.
>

Either way, each version will be represented by only one RestrictInfo.
Why can't we replace that version instead of creating a new
RestrictInfo?

--
Best Wishes,
Ashutosh Bapat



Re: Inconsistent RestrictInfo serial numbers

From
Richard Guo
Date:
On Wed, Oct 9, 2024 at 9:30 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
> On Wed, Oct 9, 2024 at 7:45 AM Richard Guo <guofenglinux@gmail.com> wrote:
> > On Tue, Oct 8, 2024 at 9:02 PM Ashutosh Bapat
> > <ashutosh.bapat.oss@gmail.com> wrote:
> > > But I don't see any relation specific information being
> > > considered while deciding whether the clause is constant false.
> >
> > Isn't rel->notnullattnums relation specific information?  We need this
> > information to decide if a Var cannot be NULL.
>
> I am talking about restriction_is_always_false() being called from
> add_base_clause_to_rel(). The later doesn't pass the given relid or
> the corresponding rel to the first. So restriction_is_always_false()
> is independent of relid passed to add_base_clause_to_rel() and thus
> restriction_is_always_false() being called from outside (say caller of
> add_base_clause_to_rel()) will have same result for the same
> RestrictInfo.

I see what you meant now.  You're right that the result of
restriction_is_always_true/false is independent of the relid.
However, I still don't think it's a good idea to move it elsewhere
because: 1) we need the relid to determine whether the relation is an
inheritance parent table, as we must avoid this type of simplification
for those tables; 2) I recall there was a discussion during the work
on commit b262ad440 about potentially extending the simplification
logic in the future to consider the relation's constraints.

> > > So may
> > > be we should do this processing elsewhere and replace the original
> > > clause itself?
> >
> > I’m not sure about this.  Different versions of the same qual clause
> > can lead to different conclusions about whether it can be reduced to
> > constant-FALSE.
>
> Can you please provide an example?

I think the query I provided in my initial email serves as an example.
To quote what I said there:

"
In the query above, the has_clone version of qual 't1.a is null' would
be reduced to constant-FALSE, while the is_clone version would not.
"

> >  I don't think it is possible to replace the original
> > clause; we need to do this processing on a version-by-version basis.
> >
>
> Either way, each version will be represented by only one RestrictInfo.
> Why can't we replace that version instead of creating a new
> RestrictInfo?

Sure we can do that.  However, I find that manually altering a
well-formed RestrictInfo's clause (along with its left_relids,
right_relids, clause_relids, num_base_rels, etc.) seems too
hacky to me.

Thanks
Richard



Re: Inconsistent RestrictInfo serial numbers

From
Ashutosh Bapat
Date:
On Thu, Oct 10, 2024 at 7:37 AM Richard Guo <guofenglinux@gmail.com> wrote:
>
>
> > > > So may
> > > > be we should do this processing elsewhere and replace the original
> > > > clause itself?
> > >
> > > I’m not sure about this.  Different versions of the same qual clause
> > > can lead to different conclusions about whether it can be reduced to
> > > constant-FALSE.
> >
> > Can you please provide an example?
>
> I think the query I provided in my initial email serves as an example.
> To quote what I said there:
>
> "
> In the query above, the has_clone version of qual 't1.a is null' would
> be reduced to constant-FALSE, while the is_clone version would not.
> "
>
> > >  I don't think it is possible to replace the original
> > > clause; we need to do this processing on a version-by-version basis.
> > >
> >
> > Either way, each version will be represented by only one RestrictInfo.
> > Why can't we replace that version instead of creating a new
> > RestrictInfo?
>
> Sure we can do that.  However, I find that manually altering a
> well-formed RestrictInfo's clause (along with its left_relids,
> right_relids, clause_relids, num_base_rels, etc.) seems too
> hacky to me.

That's fair.

1. What strikes me as odd in your patch is that the last_rinfo_serial
is reset so far away from the new clause that will be created with the
next last_rinfo_serial OR the clause whose clones are being created.
It either indicates another site where we are missing (possibly
future) to restore rinfo_serial in a clone OR reset of
last_serial_rinfo needs to happen somewhere else. Why isn't resetting
last_rinfo_serial in deconstruct_distribute_oj_quals() enough?

2. This change would also prevent add_base_clause_to_rel() and
add_join_clause_to_rels() from being used anywhere outside the context
of deconstruct_distribute_oj_quals() since these functions would reset
last_rinfo_serial when it's expected to be incremented. Moreover
another make_restrictinfo() call somewhere in the minions of
distribute_qual_to_rels() or distribute_quals_to_rels() would cause a
similar bug.

3. Do we need to reset last_serial_rinfo everywhere we reset
rinfo_serial e.g. create_join_clause()?

I suspect that b262ad440edecda0b1aba81d967ab560a83acb8a didn't do
anything wrong. It's effectively copying rinfo_serial of original
clause into the derived clause. I would have liked it better if it
would have used the same method as create_join_clause(). Some other
commti failed to notice the some minion of
distribute_quals_to_rels()->distribute_qual_to_rels() may increment
last_rinfo_serial while creating an alternate RestrictInfo for the
qual passed to distribute_qual_to_rels(). I think a robust fix is to
reset last_rinfo_serial in distribute_quals_to_rels() for every qual
and add an Assert(root->last_rinfo_serial == saved_last_rinfo_serial +
list_length(quals)) before resetting last_rinfo_serial in
deconstruct_distribute_oj_quals() to catch any future digressions.

--
Best Wishes,
Ashutosh Bapat



Re: Inconsistent RestrictInfo serial numbers

From
Richard Guo
Date:
On Fri, Oct 11, 2024 at 3:02 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
> 1. What strikes me as odd in your patch is that the last_rinfo_serial
> is reset so far away from the new clause that will be created with the
> next last_rinfo_serial OR the clause whose clones are being created.
> It either indicates another site where we are missing (possibly
> future) to restore rinfo_serial in a clone OR reset of
> last_serial_rinfo needs to happen somewhere else. Why isn't resetting
> last_rinfo_serial in deconstruct_distribute_oj_quals() enough?

Well, the resetting of the last_rinfo_serial counter in
deconstruct_distribute_oj_quals is to ensure that multiple clones of
the same qual receive the same serial number.  As the comment there
explains, it works only if we ensure that we are not changing the qual
list in any way that'd affect the number of RestrictInfos built from
it.

However, in b262ad440, we did not get this memo promptly.  As I
explained earlier, the has_clone version of qual 't1.a is null'
would be reduced to constant-FALSE, while the is_clone version would
not.  That is to say, for the has_clone version, we'd build three
RestrictInfos, whereas for the is_clone version, we'd build only two.
This discrepancy in numbers is the root cause of the issue.

So, resetting last_rinfo_serial in deconstruct_distribute_oj_quals()
is not enough.

> 2. This change would also prevent add_base_clause_to_rel() and
> add_join_clause_to_rels() from being used anywhere outside the context
> of deconstruct_distribute_oj_quals() since these functions would reset
> last_rinfo_serial when it's expected to be incremented. Moreover
> another make_restrictinfo() call somewhere in the minions of
> distribute_qual_to_rels() or distribute_quals_to_rels() would cause a
> similar bug.

I don't see how this change would prevent add_base_clause_to_rel and
add_join_clause_to_rels from being used in other context than
deconstruct_distribute_oj_quals.  Could you please provide an example?

> 3. Do we need to reset last_serial_rinfo everywhere we reset
> rinfo_serial e.g. create_join_clause()?

Hmm, I don't think that's necessary.  As long as we'd build the same
number of RestrictInfos from the qual list for different versions,
we're good.

> I suspect that b262ad440edecda0b1aba81d967ab560a83acb8a didn't do
> anything wrong. It's effectively copying rinfo_serial of original
> clause into the derived clause. I would have liked it better if it
> would have used the same method as create_join_clause(). Some other
> commti failed to notice the some minion of
> distribute_quals_to_rels()->distribute_qual_to_rels() may increment
> last_rinfo_serial while creating an alternate RestrictInfo for the
> qual passed to distribute_qual_to_rels(). I think a robust fix is to
> reset last_rinfo_serial in distribute_quals_to_rels() for every qual
> and add an Assert(root->last_rinfo_serial == saved_last_rinfo_serial +
> list_length(quals)) before resetting last_rinfo_serial in
> deconstruct_distribute_oj_quals() to catch any future digressions.

Would you mind proposing a patch for this method so we can discuss the
details?

Thanks
Richard