Thread: Assert failure of the cross-check for nullingrels

Assert failure of the cross-check for nullingrels

From
Richard Guo
Date:
While looking into issue [1], I came across $subject on master.  Below
is how to reproduce it.

DROP TABLE IF EXISTS t1,t2,t3,t4 CASCADE;
CREATE TABLE t1 AS SELECT true AS x FROM generate_series(0,1) x;
CREATE TABLE t2 AS SELECT true AS x FROM generate_series(0,1) x;
CREATE TABLE t3 AS SELECT true AS x FROM generate_series(0,1) x;
CREATE TABLE t4 AS SELECT true AS x FROM generate_series(0,1) x;
ANALYZE;

explain (costs off)
select * from t1 left join (t2 left join t3 on t2.x) on t2.x left join t4 on t3.x and t2.x where t1.x = coalesce(t2.x,true);

I've looked into this a little bit.  For the join of t2/t3 to t4, since
it can commute with the join of t1 to t2/t3 according to identity 3, we
would generate multiple versions for its joinquals.  In particular, the
qual 't3.x' would have two versions, one with varnullingrels as {t2/t3,
t1/t2}, the other one with varnullingrels as {t2/t3}.  So far so good.

Assume we've determined to build the join of t2/t3 to t4 after we've
built t1/t2 and t2/t3, then we'd find that both versions of qual 't3.x'
would be accepted by clause_is_computable_at.  This is not correct.  We
are supposed to accept only the one marked as {t2/t3, t1/t2}.  The other
one is not rejected mainly because we found that the qual 't3.x' does
not mention any nullable Vars of outer join t1/t2.

I wonder if we should consider syn_xxxhand rather than min_xxxhand in
clause_is_computable_at when we check if clause mentions any nullable
Vars.  But I'm not sure about that.

[1] https://www.postgresql.org/message-id/flat/0b819232-4b50-f245-1c7d-c8c61bf41827%40postgrespro.ru

Thanks
Richard

Re: Assert failure of the cross-check for nullingrels

From
Richard Guo
Date:

On Fri, Mar 10, 2023 at 4:13 PM Richard Guo <guofenglinux@gmail.com> wrote:
I wonder if we should consider syn_xxxhand rather than min_xxxhand in
clause_is_computable_at when we check if clause mentions any nullable
Vars.  But I'm not sure about that.
 
No, considering syn_xxxhand is not right.  After some join order
commutation we may form the join with only its min_lefthand and
min_righthand.  In this case if we check against syn_xxxhand rather than
min_xxxhand in clause_is_computable_at, we may end up with being unable
to find a proper place for some quals.  I can see this problem in below
query.

select * from t1 left join ((select t2.x from t2 left join t3 on t2.x where t3.x is null) s left join t4 on s.x) on s.x = t1.x;

Suppose we've formed join t1/t2 and go ahead to form the join of t1/t2
to t3.  If we consider t1/t2 join's syn_xxxhand, then the pushed down
qual 't3.x is null' would not be computable at this level because it
mentions nullable Vars from t1/t2 join's syn_righthand and meanwhile is
not marked with t1/t2 join.  This is not correct and would trigger an
Assert.

Back to the original issue, if a join has more than one quals, actually
we treat them as a whole when we check if identity 3 applies as well as
when we adjust them to be suitable for commutation according to identity
3.  So when we check if a qual is computable at a given level, I think
we should also consider the join's quals as a whole.  I'm thinking that
we use a 'group' notion for RestrictInfos and then use the clause_relids
of the 'group' in clause_is_computable_at.  Does this make sense?

Thanks
Richard

Re: Assert failure of the cross-check for nullingrels

From
Richard Guo
Date:

On Mon, Mar 13, 2023 at 5:03 PM Richard Guo <guofenglinux@gmail.com> wrote:
Back to the original issue, if a join has more than one quals, actually
we treat them as a whole when we check if identity 3 applies as well as
when we adjust them to be suitable for commutation according to identity
3.  So when we check if a qual is computable at a given level, I think
we should also consider the join's quals as a whole.  I'm thinking that
we use a 'group' notion for RestrictInfos and then use the clause_relids
of the 'group' in clause_is_computable_at.  Does this make sense?
 
I'm imagining something like attached (no comments and test cases yet).

Thanks
Richard
Attachment

Re: Assert failure of the cross-check for nullingrels

From
Richard Guo
Date:

On Mon, Mar 13, 2023 at 5:44 PM Richard Guo <guofenglinux@gmail.com> wrote:
On Mon, Mar 13, 2023 at 5:03 PM Richard Guo <guofenglinux@gmail.com> wrote:
Back to the original issue, if a join has more than one quals, actually
we treat them as a whole when we check if identity 3 applies as well as
when we adjust them to be suitable for commutation according to identity
3.  So when we check if a qual is computable at a given level, I think
we should also consider the join's quals as a whole.  I'm thinking that
we use a 'group' notion for RestrictInfos and then use the clause_relids
of the 'group' in clause_is_computable_at.  Does this make sense?
 
I'm imagining something like attached (no comments and test cases yet).

Here is an updated patch with comments and test case.  I also change the
code to store 'group_clause_relids' directly in RestrictInfo.

Thanks
Richard
Attachment

Re: Assert failure of the cross-check for nullingrels

From
Richard Guo
Date:

On Fri, Mar 17, 2023 at 11:05 AM Richard Guo <guofenglinux@gmail.com> wrote:
Here is an updated patch with comments and test case.  I also change the
code to store 'group_clause_relids' directly in RestrictInfo.

BTW, I've added an open item for this issue.

Thanks
Richard

Re: Assert failure of the cross-check for nullingrels

From
"Jonathan S. Katz"
Date:
On 5/12/23 3:02 AM, Richard Guo wrote:
> 
> On Fri, Mar 17, 2023 at 11:05 AM Richard Guo <guofenglinux@gmail.com 
> <mailto:guofenglinux@gmail.com>> wrote:
> 
>     Here is an updated patch with comments and test case.  I also change the
>     code to store 'group_clause_relids' directly in RestrictInfo.
> 
> 
> BTW, I've added an open item for this issue.

[RMT hat]

Is there a specific commit targeted for v16 that introduced this issue? 
Does it only affect v16 or does it affect backbranches?

Thanks,

Jonathan


Attachment

Re: Assert failure of the cross-check for nullingrels

From
Tom Lane
Date:
"Jonathan S. Katz" <jkatz@postgresql.org> writes:
> Is there a specific commit targeted for v16 that introduced this issue? 
> Does it only affect v16 or does it affect backbranches?

It's part of the outer-join-aware-Vars stuff, so it's my fault ...
and v16 only.

            regards, tom lane



Re: Assert failure of the cross-check for nullingrels

From
"Jonathan S. Katz"
Date:
On 5/16/23 9:49 AM, Tom Lane wrote:
> "Jonathan S. Katz" <jkatz@postgresql.org> writes:
>> Is there a specific commit targeted for v16 that introduced this issue?
>> Does it only affect v16 or does it affect backbranches?
> 
> It's part of the outer-join-aware-Vars stuff, so it's my fault ...
> and v16 only.

*nods* thanks. I updated the Open Items page accordingly (doing RMT 
housecleaning today in advance of Beta 1).

Jonathan


Attachment

Re: Assert failure of the cross-check for nullingrels

From
Tom Lane
Date:
Richard Guo <guofenglinux@gmail.com> writes:
> Here is an updated patch with comments and test case.  I also change the
> code to store 'group_clause_relids' directly in RestrictInfo.

Hmm ... I don't like this patch terribly much.  It's not obvious why
(or if) it works, and it's squirreling bits of semantic knowledge
into places they don't belong.  ISTM the fundamental problem is that
clause_is_computable_at() is accepting clauses it shouldn't, and we
should try to solve it there.

After some poking at it I hit on what seems like a really simple
solution: we should be checking syn_righthand not min_righthand
to see whether a Var should be considered nullable by a given OJ.
Maybe that's still not quite right, but it seems like it might be
right given that the last fix reaffirmed our conviction that Vars
should be marked according to the syntactic structure.

If we don't want to do it like this, another way is to consider
the transitive closure of commutable outer joins, along similar
lines to your fixes to my earlier patch.  But that seems like it
might just be adding complication.

            regards, tom lane

diff --git a/src/backend/optimizer/util/restrictinfo.c b/src/backend/optimizer/util/restrictinfo.c
index c44bd2f815..23ab4177fa 100644
--- a/src/backend/optimizer/util/restrictinfo.c
+++ b/src/backend/optimizer/util/restrictinfo.c
@@ -563,9 +563,9 @@ clause_is_computable_at(PlannerInfo *root,
             continue;

         /* Else, trouble if clause mentions any nullable Vars. */
-        if (bms_overlap(clause_relids, sjinfo->min_righthand) ||
+        if (bms_overlap(clause_relids, sjinfo->syn_righthand) ||
             (sjinfo->jointype == JOIN_FULL &&
-             bms_overlap(clause_relids, sjinfo->min_lefthand)))
+             bms_overlap(clause_relids, sjinfo->syn_lefthand)))
             return false;        /* doesn't work */
     }

diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 9bafadde66..af68567561 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2500,6 +2500,28 @@ select * from int4_tbl t1
                      ->  Seq Scan on int4_tbl t4
 (12 rows)

+explain (costs off)
+select * from int4_tbl t1
+  left join (int4_tbl t2 left join int4_tbl t3 on t2.f1 > 0) on t2.f1 > 1
+  left join int4_tbl t4 on t2.f1 > 2 and t3.f1 > 3
+where t1.f1 = coalesce(t2.f1, 1);
+                     QUERY PLAN
+----------------------------------------------------
+ Nested Loop Left Join
+   Join Filter: ((t2.f1 > 2) AND (t3.f1 > 3))
+   ->  Nested Loop Left Join
+         Join Filter: (t2.f1 > 0)
+         ->  Nested Loop Left Join
+               Filter: (t1.f1 = COALESCE(t2.f1, 1))
+               ->  Seq Scan on int4_tbl t1
+               ->  Materialize
+                     ->  Seq Scan on int4_tbl t2
+                           Filter: (f1 > 1)
+         ->  Seq Scan on int4_tbl t3
+   ->  Materialize
+         ->  Seq Scan on int4_tbl t4
+(13 rows)
+
 --
 -- check a case where we formerly got confused by conflicting sort orders
 -- in redundant merge join path keys
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index a44234b0af..c7a39abd23 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -488,6 +488,12 @@ select * from int4_tbl t1
   left join int4_tbl t3 on t2.f1 = t3.f1
   left join int4_tbl t4 on t3.f1 != t4.f1;

+explain (costs off)
+select * from int4_tbl t1
+  left join (int4_tbl t2 left join int4_tbl t3 on t2.f1 > 0) on t2.f1 > 1
+  left join int4_tbl t4 on t2.f1 > 2 and t3.f1 > 3
+where t1.f1 = coalesce(t2.f1, 1);
+
 --
 -- check a case where we formerly got confused by conflicting sort orders
 -- in redundant merge join path keys

Re: Assert failure of the cross-check for nullingrels

From
Tom Lane
Date:
... BTW, something I'd considered in an earlier attempt at fixing this
was to change clause_is_computable_at's API to pass the clause's
RestrictInfo not just the clause_relids, along the lines of

@@ -541,9 +547,10 @@ extract_actual_join_clauses(List *restrictinfo_list,
  */
 bool
 clause_is_computable_at(PlannerInfo *root,
-                        Relids clause_relids,
+                        RestrictInfo *rinfo,
                         Relids eval_relids)
 {
+    Relids        clause_relids = rinfo->clause_relids;
     ListCell   *lc;
 
     /* Nothing to do if no outer joins have been performed yet. */

with corresponding simplifications at the call sites.  That was with
a view to examining has_clone/is_clone inside this function.  My
current proposal doesn't require that, but I'm somewhat tempted
to make this API change anyway for future-proofing purposes.
Thoughts?

            regards, tom lane



Re: Assert failure of the cross-check for nullingrels

From
Richard Guo
Date:

On Thu, May 18, 2023 at 3:34 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
After some poking at it I hit on what seems like a really simple
solution: we should be checking syn_righthand not min_righthand
to see whether a Var should be considered nullable by a given OJ.
Maybe that's still not quite right, but it seems like it might be
right given that the last fix reaffirmed our conviction that Vars
should be marked according to the syntactic structure.

I thought about this solution before but proved it was not right in
https://www.postgresql.org/message-id/CAMbWs48fObJJ%3DYVb4ip8tnwxwixUNKUThfnA1eGfPzJxJRRgZQ%40mail.gmail.com

I checked the query shown there and it still fails with v3 patch.

explain (costs off)
select * from t1
    left join (select t2.x from t2
                      left join t3 on t2.x where t3.x is null) s
    left join t4 on s.x
on s.x = t1.x;
server closed the connection unexpectedly

The failure happens when we are forming the join of (t1/t2) to t3.
Consider qual 't3.x is null'.  It's a non-clone filter clause so
clause_is_computable_at is supposed to think it's applicable here.  We
have an Assert for that.  However, when checking outer join t1/t2, which
has been performed but is not listed in the qual's nullingrels,
clause_is_computable_at would think it'd null vars of the qual if we
check syn_righthand not min_righthand, and get a conclusion that the
qual is not applicable here.  This is how the Assert is triggered.

Thanks
Richard

Re: Assert failure of the cross-check for nullingrels

From
Richard Guo
Date:

On Thu, May 18, 2023 at 3:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
... BTW, something I'd considered in an earlier attempt at fixing this
was to change clause_is_computable_at's API to pass the clause's
RestrictInfo not just the clause_relids, along the lines of

@@ -541,9 +547,10 @@ extract_actual_join_clauses(List *restrictinfo_list,
  */
 bool
 clause_is_computable_at(PlannerInfo *root,
-                        Relids clause_relids,
+                        RestrictInfo *rinfo,
                         Relids eval_relids)
 {
+    Relids        clause_relids = rinfo->clause_relids;
     ListCell   *lc;

     /* Nothing to do if no outer joins have been performed yet. */

with corresponding simplifications at the call sites.  That was with
a view to examining has_clone/is_clone inside this function.  My
current proposal doesn't require that, but I'm somewhat tempted
to make this API change anyway for future-proofing purposes.
Thoughts?

This change looks good to me.

Thanks
Richard

Re: Assert failure of the cross-check for nullingrels

From
Tom Lane
Date:
Richard Guo <guofenglinux@gmail.com> writes:
> On Thu, May 18, 2023 at 3:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> ... BTW, something I'd considered in an earlier attempt at fixing this
>> was to change clause_is_computable_at's API to pass the clause's
>> RestrictInfo not just the clause_relids, along the lines of

> This change looks good to me.

Did that part.

            regards, tom lane



Re: Assert failure of the cross-check for nullingrels

From
Tom Lane
Date:
Richard Guo <guofenglinux@gmail.com> writes:
> On Thu, May 18, 2023 at 3:34 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> After some poking at it I hit on what seems like a really simple
>> solution: we should be checking syn_righthand not min_righthand
>> to see whether a Var should be considered nullable by a given OJ.

> I thought about this solution before but proved it was not right in
> https://www.postgresql.org/message-id/CAMbWs48fObJJ%3DYVb4ip8tnwxwixUNKUThfnA1eGfPzJxJRRgZQ%40mail.gmail.com
> I checked the query shown there and it still fails with v3 patch.

Bleah.  The other solution I'd been poking at involved adding an
extra check for clone clauses, as attached (note this requires
8a2523ff3).  This survives your example, but I wonder if it might
reject all the clones in some cases.  It seems a bit expensive
too, although as I said before, I don't think the clone cases get
traversed all that often.

Perhaps another answer could be to compare against syn_righthand
for clone clauses and min_righthand for non-clones?  That seems
mighty unprincipled though.

            regards, tom lane

diff --git a/src/backend/optimizer/util/restrictinfo.c b/src/backend/optimizer/util/restrictinfo.c
index d2bc096e1c..2071783987 100644
--- a/src/backend/optimizer/util/restrictinfo.c
+++ b/src/backend/optimizer/util/restrictinfo.c
@@ -532,6 +532,12 @@ extract_actual_join_clauses(List *restrictinfo_list,
  * This function checks the second condition; we assume the caller already
  * saw to the first one.
  *
+ * For has_clone and is_clone clauses, there is a third condition: the
+ * clause_relids and eval_relids must list the same subset of relevant
+ * commutating outer joins.  This ensures we don't select the wrong one
+ * of the clones.  (We do not check this for non-cloned clauses, as it
+ * might improperly reject them.)
+ *
  * For speed reasons, we don't individually examine each Var/PHV of the
  * expression, but just look at the overall clause_relids (the union of the
  * varnos and varnullingrels).  This could give a misleading answer if the
@@ -561,7 +567,27 @@ clause_is_computable_at(PlannerInfo *root,

         /* OK if clause lists it (we assume all Vars in it agree). */
         if (bms_is_member(sjinfo->ojrelid, clause_relids))
-            continue;
+        {
+            if (rinfo->has_clone || rinfo->is_clone)
+            {
+                /* Must check whether it's the right one of the clones */
+                Relids        commutators = bms_union(sjinfo->commute_below_l,
+                                                    sjinfo->commute_above_r);
+                Relids        clause_commutators = bms_intersect(clause_relids,
+                                                               commutators);
+                Relids        eval_commutators = bms_intersect(eval_relids,
+                                                             commutators);
+                bool        match = bms_equal(clause_commutators,
+                                              eval_commutators);
+
+                bms_free(commutators);    /* be tidy */
+                bms_free(clause_commutators);
+                bms_free(eval_commutators);
+                if (!match)
+                    return false;    /* wrong clone */
+            }
+            continue;            /* OK, clause expects this OJ to be done */
+        }

         /* Else, trouble if clause mentions any nullable Vars. */
         if (bms_overlap(clause_relids, sjinfo->min_righthand) ||
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 9bafadde66..b254a1b649 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2500,6 +2500,52 @@ select * from int4_tbl t1
                      ->  Seq Scan on int4_tbl t4
 (12 rows)

+explain (costs off)
+select * from int4_tbl t1
+  left join (int4_tbl t2 left join int4_tbl t3 on t2.f1 > 0) on t2.f1 > 1
+  left join int4_tbl t4 on t2.f1 > 2 and t3.f1 > 3
+where t1.f1 = coalesce(t2.f1, 1);
+                     QUERY PLAN
+----------------------------------------------------
+ Nested Loop Left Join
+   Join Filter: ((t2.f1 > 2) AND (t3.f1 > 3))
+   ->  Nested Loop Left Join
+         Join Filter: (t2.f1 > 0)
+         ->  Nested Loop Left Join
+               Filter: (t1.f1 = COALESCE(t2.f1, 1))
+               ->  Seq Scan on int4_tbl t1
+               ->  Materialize
+                     ->  Seq Scan on int4_tbl t2
+                           Filter: (f1 > 1)
+         ->  Seq Scan on int4_tbl t3
+   ->  Materialize
+         ->  Seq Scan on int4_tbl t4
+(13 rows)
+
+explain (costs off)
+select * from int4_tbl t1
+  left join ((select t2.f1 from int4_tbl t2
+                left join int4_tbl t3 on t2.f1 > 0
+                where t3.f1 is null) s
+             left join tenk1 t4 on s.f1 > 1)
+    on s.f1 = t1.f1;
+                   QUERY PLAN
+-------------------------------------------------
+ Nested Loop Left Join
+   Join Filter: (t2.f1 > 1)
+   ->  Hash Right Join
+         Hash Cond: (t2.f1 = t1.f1)
+         ->  Nested Loop Left Join
+               Join Filter: (t2.f1 > 0)
+               Filter: (t3.f1 IS NULL)
+               ->  Seq Scan on int4_tbl t2
+               ->  Materialize
+                     ->  Seq Scan on int4_tbl t3
+         ->  Hash
+               ->  Seq Scan on int4_tbl t1
+   ->  Seq Scan on tenk1 t4
+(13 rows)
+
 --
 -- check a case where we formerly got confused by conflicting sort orders
 -- in redundant merge join path keys
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index a44234b0af..a9352bf4e0 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -488,6 +488,20 @@ select * from int4_tbl t1
   left join int4_tbl t3 on t2.f1 = t3.f1
   left join int4_tbl t4 on t3.f1 != t4.f1;

+explain (costs off)
+select * from int4_tbl t1
+  left join (int4_tbl t2 left join int4_tbl t3 on t2.f1 > 0) on t2.f1 > 1
+  left join int4_tbl t4 on t2.f1 > 2 and t3.f1 > 3
+where t1.f1 = coalesce(t2.f1, 1);
+
+explain (costs off)
+select * from int4_tbl t1
+  left join ((select t2.f1 from int4_tbl t2
+                left join int4_tbl t3 on t2.f1 > 0
+                where t3.f1 is null) s
+             left join tenk1 t4 on s.f1 > 1)
+    on s.f1 = t1.f1;
+
 --
 -- check a case where we formerly got confused by conflicting sort orders
 -- in redundant merge join path keys

Re: Assert failure of the cross-check for nullingrels

From
Richard Guo
Date:

On Fri, May 19, 2023 at 12:33 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Bleah.  The other solution I'd been poking at involved adding an
extra check for clone clauses, as attached (note this requires
8a2523ff3).  This survives your example, but I wonder if it might
reject all the clones in some cases.  It seems a bit expensive
too, although as I said before, I don't think the clone cases get
traversed all that often.

I tried with v4 patch and find that, as you predicted, it might reject
all the clones in some cases.  Check the query below

explain (costs off)
select * from t t1
    left join t t2 on t1.a = t2.a
    left join t t3 on t2.a = t3.a
    left join t t4 on t3.a = t4.a and t2.b = t4.b;
                QUERY PLAN
------------------------------------------
 Hash Left Join
   Hash Cond: (t2.b = t4.b)
   ->  Hash Left Join
         Hash Cond: (t2.a = t3.a)
         ->  Hash Left Join
               Hash Cond: (t1.a = t2.a)
               ->  Seq Scan on t t1
               ->  Hash
                     ->  Seq Scan on t t2
         ->  Hash
               ->  Seq Scan on t t3
   ->  Hash
         ->  Seq Scan on t t4
(13 rows)

So the qual 't3.a = t4.a' is missing in this plan shape.
 
Perhaps another answer could be to compare against syn_righthand
for clone clauses and min_righthand for non-clones?  That seems
mighty unprincipled though.

I also checked this solution with the same query.

explain (costs off)
select * from t t1
    left join t t2 on t1.a = t2.a
    left join t t3 on t2.a = t3.a
    left join t t4 on t3.a = t4.a and t2.b = t4.b;
                            QUERY PLAN
------------------------------------------------------------------
 Hash Left Join
   Hash Cond: ((t3.a = t4.a) AND (t3.a = t4.a) AND (t2.b = t4.b))
   ->  Hash Left Join
         Hash Cond: (t2.a = t3.a)
         ->  Hash Left Join
               Hash Cond: (t1.a = t2.a)
               ->  Seq Scan on t t1
               ->  Hash
                     ->  Seq Scan on t t2
         ->  Hash
               ->  Seq Scan on t t3
   ->  Hash
         ->  Seq Scan on t t4
(13 rows)

This time the qual 't3.a = t4.a' is back, but twice.

I keep thinking about my proposal in v2 patch.  It seems more natural to
me to fix this issue, because an outer join's quals are always treated
as a whole when we check if identity 3 applies in make_outerjoininfo, as
well as when we adjust the outer join's quals for commutation in
deconstruct_distribute_oj_quals.  So when it comes to check if quals are
computable at a join level, they should be still treated as a whole.
This should have the same effect regarding qual placement if the quals
of an outer join are in form of 'qual1 OR qual2 OR ...' rather than
'qual1 AND qual2 AND ...'.

Thanks
Richard

Re: Assert failure of the cross-check for nullingrels

From
Tom Lane
Date:
Richard Guo <guofenglinux@gmail.com> writes:
> I keep thinking about my proposal in v2 patch.  It seems more natural to
> me to fix this issue, because an outer join's quals are always treated
> as a whole when we check if identity 3 applies in make_outerjoininfo, as
> well as when we adjust the outer join's quals for commutation in
> deconstruct_distribute_oj_quals.

No, I doubt that that patch works properly.  If the join condition
contains independent quals on different relations, say

    select ... from t1 left join t2 on (t1.a = 1 and t2.b = 2)

then it may be that those quals need to be pushed to different levels.
I don't believe that considering the union of the rels mentioned in
any qual is a reasonable thing to do here.

            regards, tom lane



Re: Assert failure of the cross-check for nullingrels

From
Tom Lane
Date:
Richard Guo <guofenglinux@gmail.com> writes:
> I tried with v4 patch and find that, as you predicted, it might reject
> all the clones in some cases.  Check the query below
> ...
> So the qual 't3.a = t4.a' is missing in this plan shape.

I poked into that more closely and realized that the reason that
clause_is_computable_at() misbehaves is that both clones of the
"t3.a = t4.a" qual have the same clause_relids: (4 5 6) which is
t3, the left join to t3, and t4.  This is unsurprising because
the difference in these clones is whether they are expected to be
evaluated above or below outer join 3 (the left join to t2), and
t2 doesn't appear in the qual.  (It does appear in "t2.b = t4.b",
which is why there's no similar misbehavior for that qual.)

If they have the same clause_relids, then clearly in its current
form clause_is_computable_at() cannot distinguish them.  So what
I now think we should do is have clause_is_computable_at() examine
their required_relids instead.  Those will be different, by
construction in deconstruct_distribute_oj_quals(), ensuring that
we select only one of the group of clones.

BTW, while I've not tried it, I suspect your v2 patch also fails
on this example for the same reason: it cannot distinguish the
clones of this qual.

            regards, tom lane

diff --git a/src/backend/optimizer/util/restrictinfo.c b/src/backend/optimizer/util/restrictinfo.c
index d2bc096e1c..760d24bebf 100644
--- a/src/backend/optimizer/util/restrictinfo.c
+++ b/src/backend/optimizer/util/restrictinfo.c
@@ -544,13 +544,24 @@ clause_is_computable_at(PlannerInfo *root,
                         RestrictInfo *rinfo,
                         Relids eval_relids)
 {
-    Relids        clause_relids = rinfo->clause_relids;
+    Relids        clause_relids;
     ListCell   *lc;

     /* Nothing to do if no outer joins have been performed yet. */
     if (!bms_overlap(eval_relids, root->outer_join_rels))
         return true;

+    /*
+     * For an ordinary qual clause, we consider the actual clause_relids as
+     * explained above.  However, it's possible for multiple members of a
+     * group of clone quals to have the same clause_relids, so for clones use
+     * the required_relids instead to ensure we select just one of them.
+     */
+    if (rinfo->has_clone || rinfo->is_clone)
+        clause_relids = rinfo->required_relids;
+    else
+        clause_relids = rinfo->clause_relids;
+
     foreach(lc, root->join_info_list)
     {
         SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(lc);
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 8fa2b376f3..1fd75c8f58 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2500,6 +2500,74 @@ select * from int4_tbl t1
                      ->  Seq Scan on int4_tbl t4
 (12 rows)

+explain (costs off)
+select * from int4_tbl t1
+  left join (int4_tbl t2 left join int4_tbl t3 on t2.f1 > 0) on t2.f1 > 1
+  left join int4_tbl t4 on t2.f1 > 2 and t3.f1 > 3
+where t1.f1 = coalesce(t2.f1, 1);
+                     QUERY PLAN
+----------------------------------------------------
+ Nested Loop Left Join
+   Join Filter: ((t2.f1 > 2) AND (t3.f1 > 3))
+   ->  Nested Loop Left Join
+         Join Filter: (t2.f1 > 0)
+         ->  Nested Loop Left Join
+               Filter: (t1.f1 = COALESCE(t2.f1, 1))
+               ->  Seq Scan on int4_tbl t1
+               ->  Materialize
+                     ->  Seq Scan on int4_tbl t2
+                           Filter: (f1 > 1)
+         ->  Seq Scan on int4_tbl t3
+   ->  Materialize
+         ->  Seq Scan on int4_tbl t4
+(13 rows)
+
+explain (costs off)
+select * from int4_tbl t1
+  left join ((select t2.f1 from int4_tbl t2
+                left join int4_tbl t3 on t2.f1 > 0
+                where t3.f1 is null) s
+             left join tenk1 t4 on s.f1 > 1)
+    on s.f1 = t1.f1;
+                   QUERY PLAN
+-------------------------------------------------
+ Nested Loop Left Join
+   Join Filter: (t2.f1 > 1)
+   ->  Hash Right Join
+         Hash Cond: (t2.f1 = t1.f1)
+         ->  Nested Loop Left Join
+               Join Filter: (t2.f1 > 0)
+               Filter: (t3.f1 IS NULL)
+               ->  Seq Scan on int4_tbl t2
+               ->  Materialize
+                     ->  Seq Scan on int4_tbl t3
+         ->  Hash
+               ->  Seq Scan on int4_tbl t1
+   ->  Seq Scan on tenk1 t4
+(13 rows)
+
+explain (costs off)
+select * from onek t1
+    left join onek t2 on t1.unique1 = t2.unique1
+    left join onek t3 on t2.unique1 = t3.unique1
+    left join onek t4 on t3.unique1 = t4.unique1 and t2.unique2 = t4.unique2;
+                               QUERY PLAN
+------------------------------------------------------------------------
+ Hash Left Join
+   Hash Cond: ((t3.unique1 = t4.unique1) AND (t2.unique2 = t4.unique2))
+   ->  Hash Left Join
+         Hash Cond: (t2.unique1 = t3.unique1)
+         ->  Hash Left Join
+               Hash Cond: (t1.unique1 = t2.unique1)
+               ->  Seq Scan on onek t1
+               ->  Hash
+                     ->  Seq Scan on onek t2
+         ->  Hash
+               ->  Seq Scan on onek t3
+   ->  Hash
+         ->  Seq Scan on onek t4
+(13 rows)
+
 --
 -- check a case where we formerly got confused by conflicting sort orders
 -- in redundant merge join path keys
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 641fd1a21b..84547c7dff 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -488,6 +488,26 @@ select * from int4_tbl t1
   left join int4_tbl t3 on t2.f1 = t3.f1
   left join int4_tbl t4 on t3.f1 != t4.f1;

+explain (costs off)
+select * from int4_tbl t1
+  left join (int4_tbl t2 left join int4_tbl t3 on t2.f1 > 0) on t2.f1 > 1
+  left join int4_tbl t4 on t2.f1 > 2 and t3.f1 > 3
+where t1.f1 = coalesce(t2.f1, 1);
+
+explain (costs off)
+select * from int4_tbl t1
+  left join ((select t2.f1 from int4_tbl t2
+                left join int4_tbl t3 on t2.f1 > 0
+                where t3.f1 is null) s
+             left join tenk1 t4 on s.f1 > 1)
+    on s.f1 = t1.f1;
+
+explain (costs off)
+select * from onek t1
+    left join onek t2 on t1.unique1 = t2.unique1
+    left join onek t3 on t2.unique1 = t3.unique1
+    left join onek t4 on t3.unique1 = t4.unique1 and t2.unique2 = t4.unique2;
+
 --
 -- check a case where we formerly got confused by conflicting sort orders
 -- in redundant merge join path keys

Re: Assert failure of the cross-check for nullingrels

From
Tom Lane
Date:
I wrote:
> If they have the same clause_relids, then clearly in its current
> form clause_is_computable_at() cannot distinguish them.  So what
> I now think we should do is have clause_is_computable_at() examine
> their required_relids instead.  Those will be different, by
> construction in deconstruct_distribute_oj_quals(), ensuring that
> we select only one of the group of clones.

Since we're hard up against the beta1 wrap deadline, I went ahead
and pushed the v5 patch.  I doubt that it's perfect yet, but it's
a small change and demonstrably fixes the cases we know about.

As I said in the commit message, the main knock I'd lay on v5
is "why not use required_relids all the time?".  I tried to do
that and soon found that the answer is that we're not maintaining
required_relids very accurately.  I found two causes so far:

1. equivclass.c sometimes generates placeholder constant-true
join clauses, and it's being sloppy about that.  It copies
the required_relids of the original clause, but fails to copy
is_pushed_down, making the clause look like it's been assigned
to the wrong side of the join-clause-vs-filter-clause divide.
I found that we need to copy has_clone/is_clone as well.  The
attached quick-hack patch avoids the bugs, but now I feel like
it was a mistake to not add has_clone/is_clone as full-fledged
arguments of make_restrictinfo.  I'm inclined to change that,
but not right before beta1 when we have no evidence of a reachable
bug.  (Mind you, there might *be* a reachable bug, but ...)

2. When distribute_qual_to_rels forces a qual up to a particular
syntactic level, it applies a relid set that very possibly refers
to rels the clause doesn't actually depend on.  This is problematic
because if the clause gets postponed to above some outer join that
nulls those rels, then it looks like it's being evaluated in an
unsafe location.  I think that when we detect commutability of two
outer joins, we need to adjust the relevant min_xxxhand sets more
thoroughly than we do now.  I've not managed to write a patch
for that yet.  One problem is that if we insist on removing all
unreferenced rels from required_relids, we might end up with a
set that mentions *none* of the RHS and therefore fails to keep
the clause from dropping into the LHS where it must not go.
Not sure about a nice way to handle that.

            regards, tom lane

diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 2db1bf6448..e0077aa1e4 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -1993,20 +1993,24 @@ reconsider_outer_join_clauses(PlannerInfo *root)
             if (reconsider_outer_join_clause(root, ojcinfo, true))
             {
                 RestrictInfo *rinfo = ojcinfo->rinfo;
+                RestrictInfo *ninfo;

                 found = true;
                 /* remove it from the list */
                 root->left_join_clauses =
                     foreach_delete_current(root->left_join_clauses, cell);
                 /* throw back a dummy replacement clause (see notes above) */
-                rinfo = make_restrictinfo(root,
+                ninfo = make_restrictinfo(root,
                                           (Expr *) makeBoolConst(true, false),
-                                          true, /* is_pushed_down */
-                                          false,    /* pseudoconstant */
+                                          rinfo->is_pushed_down,
+                                          false,    /* !pseudoconstant */
                                           0,    /* security_level */
                                           rinfo->required_relids,
                                           rinfo->outer_relids);
-                distribute_restrictinfo_to_rels(root, rinfo);
+                /* copy clone marking, too */
+                ninfo->has_clone = rinfo->has_clone;
+                ninfo->is_clone = rinfo->is_clone;
+                distribute_restrictinfo_to_rels(root, ninfo);
             }
         }

@@ -2018,20 +2022,24 @@ reconsider_outer_join_clauses(PlannerInfo *root)
             if (reconsider_outer_join_clause(root, ojcinfo, false))
             {
                 RestrictInfo *rinfo = ojcinfo->rinfo;
+                RestrictInfo *ninfo;

                 found = true;
                 /* remove it from the list */
                 root->right_join_clauses =
                     foreach_delete_current(root->right_join_clauses, cell);
                 /* throw back a dummy replacement clause (see notes above) */
-                rinfo = make_restrictinfo(root,
+                ninfo = make_restrictinfo(root,
                                           (Expr *) makeBoolConst(true, false),
-                                          true, /* is_pushed_down */
-                                          false,    /* pseudoconstant */
+                                          rinfo->is_pushed_down,
+                                          false,    /* !pseudoconstant */
                                           0,    /* security_level */
                                           rinfo->required_relids,
                                           rinfo->outer_relids);
-                distribute_restrictinfo_to_rels(root, rinfo);
+                /* copy clone marking, too */
+                ninfo->has_clone = rinfo->has_clone;
+                ninfo->is_clone = rinfo->is_clone;
+                distribute_restrictinfo_to_rels(root, ninfo);
             }
         }

@@ -2043,20 +2051,24 @@ reconsider_outer_join_clauses(PlannerInfo *root)
             if (reconsider_full_join_clause(root, ojcinfo))
             {
                 RestrictInfo *rinfo = ojcinfo->rinfo;
+                RestrictInfo *ninfo;

                 found = true;
                 /* remove it from the list */
                 root->full_join_clauses =
                     foreach_delete_current(root->full_join_clauses, cell);
                 /* throw back a dummy replacement clause (see notes above) */
-                rinfo = make_restrictinfo(root,
+                ninfo = make_restrictinfo(root,
                                           (Expr *) makeBoolConst(true, false),
-                                          true, /* is_pushed_down */
-                                          false,    /* pseudoconstant */
+                                          rinfo->is_pushed_down,
+                                          false,    /* !pseudoconstant */
                                           0,    /* security_level */
                                           rinfo->required_relids,
                                           rinfo->outer_relids);
-                distribute_restrictinfo_to_rels(root, rinfo);
+                /* copy clone marking, too */
+                ninfo->has_clone = rinfo->has_clone;
+                ninfo->is_clone = rinfo->is_clone;
+                distribute_restrictinfo_to_rels(root, ninfo);
             }
         }
     } while (found);
diff --git a/src/backend/optimizer/util/restrictinfo.c b/src/backend/optimizer/util/restrictinfo.c
index 760d24bebf..8558c8acd4 100644
--- a/src/backend/optimizer/util/restrictinfo.c
+++ b/src/backend/optimizer/util/restrictinfo.c
@@ -513,8 +513,8 @@ extract_actual_join_clauses(List *restrictinfo_list,
         {
             /* joinquals shouldn't have been marked pseudoconstant */
             Assert(!rinfo->pseudoconstant);
-            Assert(!rinfo_is_constant_true(rinfo));
-            *joinquals = lappend(*joinquals, rinfo->clause);
+            if (!rinfo_is_constant_true(rinfo))
+                *joinquals = lappend(*joinquals, rinfo->clause);
         }
     }
 }

Re: Assert failure of the cross-check for nullingrels

From
Alvaro Herrera
Date:
On 2023-May-21, Tom Lane wrote:

> Since we're hard up against the beta1 wrap deadline, I went ahead
> and pushed the v5 patch.  I doubt that it's perfect yet, but it's
> a small change and demonstrably fixes the cases we know about.
> 
> As I said in the commit message, the main knock I'd lay on v5
> is "why not use required_relids all the time?".

So, is this done?  I see that you made other commits fixing related code
several days after this email, but none seems to match the changes you
posted in this patch; and also it's not clear to me that there's any
test case where this patch is expected to change behavior.  (So there's
also a question of whether this is a bug fix or rather some icying on
cake.)

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/



Re: Assert failure of the cross-check for nullingrels

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> So, is this done?  I see that you made other commits fixing related code
> several days after this email, but none seems to match the changes you
> posted in this patch; and also it's not clear to me that there's any
> test case where this patch is expected to change behavior.  (So there's
> also a question of whether this is a bug fix or rather some icying on
> cake.)

Well, the bugs I was aware of ahead of PGCon are all fixed, but there
are some new reports I still have to deal with.  I left the existing
open issue open, but maybe it'd be better to close it and start a new
one?

            regards, tom lane



Re: Assert failure of the cross-check for nullingrels

From
Richard Guo
Date:

On Wed, Jun 7, 2023 at 4:22 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> So, is this done?  I see that you made other commits fixing related code
> several days after this email, but none seems to match the changes you
> posted in this patch; and also it's not clear to me that there's any
> test case where this patch is expected to change behavior.  (So there's
> also a question of whether this is a bug fix or rather some icying on
> cake.)

This issue is fixed at 991a3df22.
 
Well, the bugs I was aware of ahead of PGCon are all fixed, but there
are some new reports I still have to deal with.  I left the existing
open issue open, but maybe it'd be better to close it and start a new
one?

I went ahead and closed it, and then started two new open items for the
two new issues --- one is about assert failure and wrong query results
due to incorrectly removing PHVs, the other is about inconsistent
nulling bitmap in nestloop parameters.

Thanks
Richard