Thread: Support "Right Semi Join" plan shapes

Support "Right Semi Join" plan shapes

From
wenhui qiu
Date:
Hi Richard Guo 

      I see that the test samples are all (exists)  subqueries ,I think semi join should also support ( in) and ( any) subqueries. would you do more test on  ( in) and ( any) subqueries?




Best whish 

Re: Support "Right Semi Join" plan shapes

From
wenhui qiu
Date:
Hi Richard Guo
     I did a simple test ,Subqueries of type (in) can be supported, There is a test sql that doesn't support it, and I think that's because it can't  pull up  the subqueries.
```
test=# explain (costs off) SELECT  t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a = t2.b) AND t1.b = 0 ORDER BY t1.a;
                      QUERY PLAN
------------------------------------------------------
 Sort
   Sort Key: t1.a
   ->  Hash Right Semi Join
         Hash Cond: (t2.b = t1.a)
         ->  Append
               ->  Seq Scan on prt2_adv_p1 t2_1
               ->  Seq Scan on prt2_adv_p2 t2_2
               ->  Seq Scan on prt2_adv_p3 t2_3
         ->  Hash
               ->  Append
                     ->  Seq Scan on prt1_adv_p1 t1_1
                           Filter: (b = 0)
                     ->  Seq Scan on prt1_adv_p2 t1_2
                           Filter: (b = 0)
                     ->  Seq Scan on prt1_adv_p3 t1_3
                           Filter: (b = 0)
(16 rows)

test=# explain (costs off) SELECT t1.* FROM prt1_adv t1 WHERE t1.a IN (SELECT t2.b FROM prt2_adv t2) AND t1.b = 0 ORDER BY t1.a;
                      QUERY PLAN
------------------------------------------------------
 Sort
   Sort Key: t1.a
   ->  Hash Right Semi Join
         Hash Cond: (t2.b = t1.a)
         ->  Append
               ->  Seq Scan on prt2_adv_p1 t2_1
               ->  Seq Scan on prt2_adv_p2 t2_2
               ->  Seq Scan on prt2_adv_p3 t2_3
         ->  Hash
               ->  Append
                     ->  Seq Scan on prt1_adv_p1 t1_1
                           Filter: (b = 0)
                     ->  Seq Scan on prt1_adv_p2 t1_2
                           Filter: (b = 0)
                     ->  Seq Scan on prt1_adv_p3 t1_3
                           Filter: (b = 0)
(16 rows)

test=#

test=# explain (costs off) SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
                      QUERY PLAN
------------------------------------------------------
 Sort
   Sort Key: t1.a
   ->  Hash Right Semi Join
         Hash Cond: ((t2.a = t1.a) AND (t2.c = t1.c))
         ->  Append
               ->  Seq Scan on plt2_adv_p1 t2_1
               ->  Seq Scan on plt2_adv_p2 t2_2
               ->  Seq Scan on plt2_adv_p3 t2_3
         ->  Hash
               ->  Append
                     ->  Seq Scan on plt1_adv_p1 t1_1
                           Filter: (b < 10)
                     ->  Seq Scan on plt1_adv_p2 t1_2
                           Filter: (b < 10)
                     ->  Seq Scan on plt1_adv_p3 t1_3
                           Filter: (b < 10)
(16 rows)

test=#
test=# explain (costs off) SELECT t1.* FROM plt1_adv t1 WHERE (t1.a, t1.c) IN (SELECT t2.a, t2.c FROM plt2_adv t2) AND t1.b < 10 ORDER BY t1.a;
                      QUERY PLAN
------------------------------------------------------
 Sort
   Sort Key: t1.a
   ->  Hash Right Semi Join
         Hash Cond: ((t2.a = t1.a) AND (t2.c = t1.c))
         ->  Append
               ->  Seq Scan on plt2_adv_p1 t2_1
               ->  Seq Scan on plt2_adv_p2 t2_2
               ->  Seq Scan on plt2_adv_p3 t2_3
         ->  Hash
               ->  Append
                     ->  Seq Scan on plt1_adv_p1 t1_1
                           Filter: (b < 10)
                     ->  Seq Scan on plt1_adv_p2 t1_2
                           Filter: (b < 10)
                     ->  Seq Scan on plt1_adv_p3 t1_3
                           Filter: (b < 10)
(16 rows)


```

```
test=# explain (costs off) select * from int4_tbl i4, tenk1 a
 where exists(select * from tenk1 b
              where a.twothousand = b.twothousand and a.fivethous <> b.fivethous)
       and i4.f1 = a.tenthous;
                   QUERY PLAN
-------------------------------------------------
 Hash Right Semi Join
   Hash Cond: (b.twothousand = a.twothousand)
   Join Filter: (a.fivethous <> b.fivethous)
   ->  Seq Scan on tenk1 b
   ->  Hash
         ->  Hash Join
               Hash Cond: (a.tenthous = i4.f1)
               ->  Seq Scan on tenk1 a
               ->  Hash
                     ->  Seq Scan on int4_tbl i4
(10 rows)

test=# explain (costs off ) SELECT *
FROM int4_tbl i4, tenk1 a
WHERE (a.twothousand, a.fivethous) IN (
    SELECT b.twothousand, b.fivethous
    FROM tenk1 b
    WHERE a.twothousand = b.twothousand  and  a.fivethous <> b.fivethous
)
AND i4.f1 = a.tenthous;
                                       QUERY PLAN
----------------------------------------------------------------------------------------
 Nested Loop
   Join Filter: (i4.f1 = a.tenthous)
   ->  Seq Scan on tenk1 a
         Filter: (SubPlan 1)
         SubPlan 1
           ->  Seq Scan on tenk1 b
                 Filter: ((a.fivethous <> fivethous) AND (a.twothousand = twothousand))
   ->  Materialize
         ->  Seq Scan on int4_tbl i4
(9 rows)
test=# set enable_nestloop =off;
SET
test=# explain (costs off ) SELECT *
FROM int4_tbl i4, tenk1 a
WHERE (a.twothousand, a.fivethous) IN (
    SELECT b.twothousand, b.fivethous
    FROM tenk1 b
    WHERE a.twothousand = b.twothousand  and  a.fivethous <> b.fivethous
)
AND i4.f1 = a.tenthous;
                                       QUERY PLAN
----------------------------------------------------------------------------------------
 Hash Join
   Hash Cond: (a.tenthous = i4.f1)
   ->  Seq Scan on tenk1 a
         Filter: (SubPlan 1)
         SubPlan 1
           ->  Seq Scan on tenk1 b
                 Filter: ((a.fivethous <> fivethous) AND (a.twothousand = twothousand))
   ->  Hash
         ->  Seq Scan on int4_tbl i4
(9 rows)


```

wenhui qiu <qiuwenhuifx@gmail.com> 于2023年12月15日周五 14:40写道:
Hi Richard Guo 

      I see that the test samples are all (exists)  subqueries ,I think semi join should also support ( in) and ( any) subqueries. would you do more test on  ( in) and ( any) subqueries?




Best whish 

Re: Support "Right Semi Join" plan shapes

From
Japin Li
Date:
On Mon, 24 Jun 2024 at 17:59, Richard Guo <guofenglinux@gmail.com> wrote:
> Thank you for reviewing.
>
> On Mon, Jun 24, 2024 at 1:27 PM Li Japin <japinli@hotmail.com> wrote:
>> +       /*
>> +        * For now we do not support RIGHT_SEMI join in mergejoin or nestloop
>> +        * join.
>> +        */
>> +       if (jointype == JOIN_RIGHT_SEMI)
>> +               return;
>> +
>>
>> How about adding some reasons here?
>
> I've included a brief explanation in select_mergejoin_clauses.
>

Thank you for updating the patch.

>> + * this is a right-semi join, or this is a right/right-anti/full join and
>> + * there are nonmergejoinable join clauses.  The executor's mergejoin
>>
>> Maybe we can put the right-semi join together with the right/right-anti/full
>> join.  Is there any other significance by putting it separately?
>
> I don't think so.  The logic is different: for right-semi join we will
> always set *mergejoin_allowed to false, while for right/right-anti/full
> join it is set to false only if there are nonmergejoinable join clauses.
>

Got it.  Thanks for the explanation.

>> Maybe the following comments also should be updated. Right?
>
> Correct.  And there are a few more places where we need to mention
> JOIN_RIGHT_SEMI, such as in reduce_outer_joins_pass2 and in the comment
> for SpecialJoinInfo.
>
>
> I noticed that this patch changes the plan of a query in join.sql from
> a semi join to right semi join, compromising the original purpose of
> this query, which was to test the fix for neqjoinsel's behavior for
> semijoins (see commit 7ca25b7d).
>
> --
> -- semijoin selectivity for <>
> --
> explain (costs off)
> select * from int4_tbl i4, tenk1 a
> where exists(select * from tenk1 b
>              where a.twothousand = b.twothousand and a.fivethous <> b.fivethous)
>       and i4.f1 = a.tenthous;
>
> So I've changed this test case a bit so that it is still testing what it
> is supposed to test.
>
> In passing, I've also updated the commit message to clarify that this
> patch does not address the support of "Right Semi Join" for merge joins.
>

Tested and looks good to me!

--
Regrads,
Japin Li



Re: Support "Right Semi Join" plan shapes

From
wenhui qiu
Date:
Hi Japin Li 
     Thank you for your reviewing ,This way the notes are more accurate and complete. Thanks also to the author for updating the patch ,I also tested the new patch ,It looks good to me 


Regrads

Japin Li <japinli@hotmail.com> 于2024年6月25日周二 08:51写道:
On Mon, 24 Jun 2024 at 17:59, Richard Guo <guofenglinux@gmail.com> wrote:
> Thank you for reviewing.
>
> On Mon, Jun 24, 2024 at 1:27 PM Li Japin <japinli@hotmail.com> wrote:
>> +       /*
>> +        * For now we do not support RIGHT_SEMI join in mergejoin or nestloop
>> +        * join.
>> +        */
>> +       if (jointype == JOIN_RIGHT_SEMI)
>> +               return;
>> +
>>
>> How about adding some reasons here?
>
> I've included a brief explanation in select_mergejoin_clauses.
>

Thank you for updating the patch.

>> + * this is a right-semi join, or this is a right/right-anti/full join and
>> + * there are nonmergejoinable join clauses.  The executor's mergejoin
>>
>> Maybe we can put the right-semi join together with the right/right-anti/full
>> join.  Is there any other significance by putting it separately?
>
> I don't think so.  The logic is different: for right-semi join we will
> always set *mergejoin_allowed to false, while for right/right-anti/full
> join it is set to false only if there are nonmergejoinable join clauses.
>

Got it.  Thanks for the explanation.

>> Maybe the following comments also should be updated. Right?
>
> Correct.  And there are a few more places where we need to mention
> JOIN_RIGHT_SEMI, such as in reduce_outer_joins_pass2 and in the comment
> for SpecialJoinInfo.
>
>
> I noticed that this patch changes the plan of a query in join.sql from
> a semi join to right semi join, compromising the original purpose of
> this query, which was to test the fix for neqjoinsel's behavior for
> semijoins (see commit 7ca25b7d).
>
> --
> -- semijoin selectivity for <>
> --
> explain (costs off)
> select * from int4_tbl i4, tenk1 a
> where exists(select * from tenk1 b
>              where a.twothousand = b.twothousand and a.fivethous <> b.fivethous)
>       and i4.f1 = a.tenthous;
>
> So I've changed this test case a bit so that it is still testing what it
> is supposed to test.
>
> In passing, I've also updated the commit message to clarify that this
> patch does not address the support of "Right Semi Join" for merge joins.
>

Tested and looks good to me!

--
Regrads,
Japin Li