Thread: Check SubPlan clause for nonnullable rels/Vars

Check SubPlan clause for nonnullable rels/Vars

From
Richard Guo
Date:
Hi hackers,

While wandering around the codes of reducing outer joins, I noticed that
when determining which base rels/Vars are forced nonnullable by given
clause, we don't take SubPlan into consideration. Does anyone happen to
know what is the concern behind that?

IMO, for SubPlans of type ALL/ANY/ROWCOMPARE, we should be able to find
additional nonnullable rels/Vars by descending through their testexpr.
As we know, ALL_SUBLINK/ANY_SUBLINK combine results across tuples
produced by the subplan using AND/OR semantics. ROWCOMPARE_SUBLINK
doesn't allow multiple tuples from the subplan. So we can tell whether
the subplan is strict or not by checking its testexpr, leveraging the
existing codes in find_nonnullable_rels/vars_walker. Below is an
example:

# explain (costs off)
select * from a left join b on a.i = b.i where b.i = ANY (select i from c where c.j = b.j);
            QUERY PLAN
-----------------------------------
 Hash Join
   Hash Cond: (b.i = a.i)
   ->  Seq Scan on b
         Filter: (SubPlan 1)
         SubPlan 1
           ->  Seq Scan on c
                 Filter: (j = b.j)
   ->  Hash
         ->  Seq Scan on a
(9 rows)

BTW, this change would also have impact on SpecialJoinInfo, especially
for the case of identity 3, because find_nonnullable_rels() is also used
to determine strict_relids from the clause. As an example, consider

    select * from a left join b on a.i = b.i
        left join c on b.j = ANY (select j from c);

Now we can know the SubPlan is strict for 'b'. Thus the b/c join would
be considered to be legal.

Thanks
Richard
Attachment

Re: Check SubPlan clause for nonnullable rels/Vars

From
Tom Lane
Date:
Richard Guo <guofenglinux@gmail.com> writes:
> While wandering around the codes of reducing outer joins, I noticed that
> when determining which base rels/Vars are forced nonnullable by given
> clause, we don't take SubPlan into consideration. Does anyone happen to
> know what is the concern behind that?

Probably just didn't bother with the case at the time.

> IMO, for SubPlans of type ALL/ANY/ROWCOMPARE, we should be able to find
> additional nonnullable rels/Vars by descending through their testexpr.

I think you can make something of this, but you need to be a lot more
paranoid than this patch is.

* I don't believe you can prove anything from an ALL_SUBLINK SubPlan,
because it will return true if the sub-query returns zero rows, no
matter what the testexpr is.  (Maybe if you could prove the sub-query
does return a row, but I doubt it's worth going there.)

* You need to explicitly check the subLinkType; as written this'll
consider EXPR_SUBLINK and so on.  I'm not really on board with
assuming that nothing bad will happen with sublink types other than
the ones the code is expecting.

* It's not apparent to me that it's okay to pass down "top_level"
rather than "false".  Maybe it's all right, but it could do with
a comment.

            regards, tom lane



Re: Check SubPlan clause for nonnullable rels/Vars

From
Richard Guo
Date:

On Thu, Nov 3, 2022 at 4:26 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
* I don't believe you can prove anything from an ALL_SUBLINK SubPlan,
because it will return true if the sub-query returns zero rows, no
matter what the testexpr is.  (Maybe if you could prove the sub-query
does return a row, but I doubt it's worth going there.)

Thanks for pointing this out. You're right. I didn't consider the case
that the subplan produces zero rows.  In this case ALL_SUBLINK will
always return true, and ANY_SUBLINK will always return false.  That
makes ALL_SUBLINK not strict, and ANY_SUBLINK can be strict only at top
level. 

* You need to explicitly check the subLinkType; as written this'll
consider EXPR_SUBLINK and so on.  I'm not really on board with
assuming that nothing bad will happen with sublink types other than
the ones the code is expecting.

Yes, I need to check for ANY_SUBLINK and ROWCOMPARE_SUBLINK here.  The
testexpr is only meaningful for ALL/ANY/ROWCOMPARE, and ALL_SUBLINK has
been proven not strict. 

* It's not apparent to me that it's okay to pass down "top_level"
rather than "false".  Maybe it's all right, but it could do with
a comment.
 
The 'top_level' param is one point that I'm not very confident about.
I've added comments in the v2 patch.

Thanks for reviewing this patch!

Thanks
Richard
Attachment

Re: Check SubPlan clause for nonnullable rels/Vars

From
Tom Lane
Date:
Richard Guo <guofenglinux@gmail.com> writes:
> [ v2-0001-Check-SubPlan-clause-for-nonnullable-rels-Vars.patch ]

Pushed with cosmetic changes:

* I don't believe in "add at the end" as a principle for placement
of new code.  There's usually some other logic that will give more
consistent results.  In cases like this, ordering the treatment of
Node types in the same way as they appear in the include/nodes/
headers is the standard answer.  (Not that everybody's been totally
consistent about that :-( ... but that's not an argument for
introducing even more entropy.)

* I rewrote the comments a bit.

* I didn't like the test case too much: spinning up a whole new set
of tables seems like a lot of useless cycles.  Plus it makes it
harder to experiment with the test query manually.  I usually like
to write such queries using the regression database's standard tables,
so I rewrote this example that way.

            regards, tom lane



Re: Check SubPlan clause for nonnullable rels/Vars

From
Richard Guo
Date:

On Sun, Nov 6, 2022 at 3:33 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Richard Guo <guofenglinux@gmail.com> writes:
> [ v2-0001-Check-SubPlan-clause-for-nonnullable-rels-Vars.patch ]

Pushed with cosmetic changes:

* I don't believe in "add at the end" as a principle for placement
of new code.  There's usually some other logic that will give more
consistent results.  In cases like this, ordering the treatment of
Node types in the same way as they appear in the include/nodes/
headers is the standard answer.  (Not that everybody's been totally
consistent about that :-( ... but that's not an argument for
introducing even more entropy.)

* I rewrote the comments a bit.

* I didn't like the test case too much: spinning up a whole new set
of tables seems like a lot of useless cycles.  Plus it makes it
harder to experiment with the test query manually.  I usually like
to write such queries using the regression database's standard tables,
so I rewrote this example that way.
 
Thanks for the changes. They make the patch look better.  And thanks for
pushing it.

Thanks
Richard