Thread: Support "Right Semi Join" plan shapes

Support "Right Semi Join" plan shapes

From
Richard Guo
Date:
In thread [1] which discussed 'Right Anti Join', Tom once mentioned
'Right Semi Join'.  After a preliminary investigation I think it is
beneficial and can be implemented with very short change.  With 'Right
Semi Join', what we want to do is to just have the first match for each
inner tuple.  For HashJoin, after scanning the hash bucket for matches
to current outer, we just need to check whether the inner tuple has been
set match and skip it if so.  For MergeJoin, we can do it by avoiding
restoring inner scan to the marked tuple in EXEC_MJ_TESTOUTER, in the
case when new outer tuple == marked tuple.

As that thread is already too long, fork a new thread and attach a patch
used for discussion.  The patch implements 'Right Semi Join' for
HashJoin.

[1] https://www.postgresql.org/message-id/CAMbWs4_eChX1bN%3Dvj0Uzg_7iz9Uivan%2BWjjor-X87L-V27A%2Brw%40mail.gmail.com

Thanks
Richard
Attachment

Re: Support "Right Semi Join" plan shapes

From
Richard Guo
Date:

On Tue, Apr 18, 2023 at 5:07 PM Richard Guo <guofenglinux@gmail.com> wrote:
In thread [1] which discussed 'Right Anti Join', Tom once mentioned
'Right Semi Join'.  After a preliminary investigation I think it is
beneficial and can be implemented with very short change.  With 'Right
Semi Join', what we want to do is to just have the first match for each
inner tuple.  For HashJoin, after scanning the hash bucket for matches
to current outer, we just need to check whether the inner tuple has been
set match and skip it if so.  For MergeJoin, we can do it by avoiding
restoring inner scan to the marked tuple in EXEC_MJ_TESTOUTER, in the
case when new outer tuple == marked tuple.

As that thread is already too long, fork a new thread and attach a patch
used for discussion.  The patch implements 'Right Semi Join' for
HashJoin.

The cfbot reminds that this patch does not apply any more, so rebase it
to v2.

Thanks
Richard
Attachment

Re: Support "Right Semi Join" plan shapes

From
Richard Guo
Date:

On Thu, Aug 10, 2023 at 3:24 PM Richard Guo <guofenglinux@gmail.com> wrote:
The cfbot reminds that this patch does not apply any more, so rebase it
to v2.

Attached is another rebase over the latest master.  Any feedback is
appreciated.

Thanks
Richard
Attachment

Re: Support "Right Semi Join" plan shapes

From
vignesh C
Date:
On Wed, 1 Nov 2023 at 11:25, Richard Guo <guofenglinux@gmail.com> wrote:
>
>
> On Thu, Aug 10, 2023 at 3:24 PM Richard Guo <guofenglinux@gmail.com> wrote:
>>
>> The cfbot reminds that this patch does not apply any more, so rebase it
>> to v2.
>
>
> Attached is another rebase over the latest master.  Any feedback is
> appreciated.

One of the tests in CFBot has failed at [1] with:
-   Relations: (public.ft1 t1) SEMI JOIN (public.ft2 t2)
-   Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6,
r1.c7, r1.c8 FROM "S 1"."T 1" r1 WHERE ((r1."C 1" < 20)) AND EXISTS
(SELECT NULL FROM "S 1"."T 1" r3 WHERE ((r3."C 1" > 10)) AND
((date(r3.c5) = '1970-01-17'::date)) AND ((r3.c3 = r1.c3))) ORDER BY
r1."C 1" ASC NULLS LAST
-(4 rows)
+   Sort Key: t1.c1
+   ->  Foreign Scan
+         Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
+         Relations: (public.ft1 t1) SEMI JOIN (public.ft2 t2)
+         Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5,
r1.c6, r1.c7, r1.c8 FROM "S 1"."T 1" r1 WHERE ((r1."C 1" < 20)) AND
EXISTS (SELECT NULL FROM "S 1"."T 1" r3 WHERE ((r3."C 1" > 10)) AND
((date(r3.c5) = '1970-01-17'::date)) AND ((r3.c3 = r1.c3)))
+(7 rows)

More details are available at [2].

[1] - https://cirrus-ci.com/task/4868751326183424
[2] -
https://api.cirrus-ci.com/v1/artifact/task/4868751326183424/testrun/build/testrun/postgres_fdw/regress/regression.diffs

Regards,
Vignesh



Re: Support "Right Semi Join" plan shapes

From
Richard Guo
Date:

On Sun, Jan 7, 2024 at 3:03 PM vignesh C <vignesh21@gmail.com> wrote:
One of the tests in CFBot has failed at [1] with:
-   Relations: (public.ft1 t1) SEMI JOIN (public.ft2 t2)
-   Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6,
r1.c7, r1.c8 FROM "S 1"."T 1" r1 WHERE ((r1."C 1" < 20)) AND EXISTS
(SELECT NULL FROM "S 1"."T 1" r3 WHERE ((r3."C 1" > 10)) AND
((date(r3.c5) = '1970-01-17'::date)) AND ((r3.c3 = r1.c3))) ORDER BY
r1."C 1" ASC NULLS LAST
-(4 rows)
+   Sort Key: t1.c1
+   ->  Foreign Scan
+         Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
+         Relations: (public.ft1 t1) SEMI JOIN (public.ft2 t2)
+         Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5,
r1.c6, r1.c7, r1.c8 FROM "S 1"."T 1" r1 WHERE ((r1."C 1" < 20)) AND
EXISTS (SELECT NULL FROM "S 1"."T 1" r3 WHERE ((r3."C 1" > 10)) AND
((date(r3.c5) = '1970-01-17'::date)) AND ((r3.c3 = r1.c3)))
+(7 rows)

Thanks.  I looked into it and have figured out why the plan differs.
With this patch the SEMI JOIN that is pushed down to the remote server
is now implemented using JOIN_RIGHT_SEMI, whereas previously it was
implemented using JOIN_SEMI.  Consequently, this leads to changes in the
costs of the paths: path with the sort pushed down to remote server, and
path with the sort added atop the foreign join.  And at last the latter
one wins by a slim margin.

I think we can simply update the expected file to fix this plan diff, as
attached.

Thanks
Richard
Attachment

Re: Support "Right Semi Join" plan shapes

From
wenhui qiu
Date:
Hi  vignesh C    I saw this path has been passed (https://cirrus-ci.com/build/6109321080078336),can we push it?

Best wish

Richard Guo <guofenglinux@gmail.com> 于2024年1月9日周二 18:49写道:

On Sun, Jan 7, 2024 at 3:03 PM vignesh C <vignesh21@gmail.com> wrote:
One of the tests in CFBot has failed at [1] with:
-   Relations: (public.ft1 t1) SEMI JOIN (public.ft2 t2)
-   Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6,
r1.c7, r1.c8 FROM "S 1"."T 1" r1 WHERE ((r1."C 1" < 20)) AND EXISTS
(SELECT NULL FROM "S 1"."T 1" r3 WHERE ((r3."C 1" > 10)) AND
((date(r3.c5) = '1970-01-17'::date)) AND ((r3.c3 = r1.c3))) ORDER BY
r1."C 1" ASC NULLS LAST
-(4 rows)
+   Sort Key: t1.c1
+   ->  Foreign Scan
+         Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
+         Relations: (public.ft1 t1) SEMI JOIN (public.ft2 t2)
+         Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5,
r1.c6, r1.c7, r1.c8 FROM "S 1"."T 1" r1 WHERE ((r1."C 1" < 20)) AND
EXISTS (SELECT NULL FROM "S 1"."T 1" r3 WHERE ((r3."C 1" > 10)) AND
((date(r3.c5) = '1970-01-17'::date)) AND ((r3.c3 = r1.c3)))
+(7 rows)

Thanks.  I looked into it and have figured out why the plan differs.
With this patch the SEMI JOIN that is pushed down to the remote server
is now implemented using JOIN_RIGHT_SEMI, whereas previously it was
implemented using JOIN_SEMI.  Consequently, this leads to changes in the
costs of the paths: path with the sort pushed down to remote server, and
path with the sort added atop the foreign join.  And at last the latter
one wins by a slim margin.

I think we can simply update the expected file to fix this plan diff, as
attached.

Thanks
Richard

Re: Support "Right Semi Join" plan shapes

From
vignesh C
Date:
On Mon, 22 Jan 2024 at 11:27, wenhui qiu <qiuwenhuifx@gmail.com> wrote:
>
> Hi  vignesh C    I saw this path has been passed (https://cirrus-ci.com/build/6109321080078336),can we push it?

If you have found no comments from your review and testing, let's mark
it as "ready for committer".

Regards,
Vignesh



Re: Support "Right Semi Join" plan shapes

From
wenhui qiu
Date:
Hi  vignesh C
   Many thanks, I have marked it to  "ready for committer"

Best wish

vignesh C <vignesh21@gmail.com> 于2024年1月23日周二 10:56写道:
On Mon, 22 Jan 2024 at 11:27, wenhui qiu <qiuwenhuifx@gmail.com> wrote:
>
> Hi  vignesh C    I saw this path has been passed (https://cirrus-ci.com/build/6109321080078336),can we push it?

If you have found no comments from your review and testing, let's mark
it as "ready for committer".

Regards,
Vignesh

Re: Support "Right Semi Join" plan shapes

From
Alena Rybakina
Date:
Hi! Thank you for your work on this subject.

I have reviewed your patch and I think it is better to add an Assert for 
JOIN_RIGHT_SEMI to the ExecMergeJoin and ExecNestLoop functions to 
prevent the use of RIGHT_SEMI for these types of connections (NestedLoop 
and MergeJoin).
Mostly I'm suggesting this because of the set_join_pathlist_hook 
function, which is in the add_paths_to_joinrel function, which allows 
you to create a custom node. What do you think?

-- 
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Support "Right Semi Join" plan shapes

From
wenhui qiu
Date:
Hi Alena Rybakina
 I saw this code snippet also disable mergejoin ,I think it same  effect 

+ /*
+ * For now we do not support RIGHT_SEMI join in mergejoin.
+ */
+ if (jointype == JOIN_RIGHT_SEMI)
+ {
+ *mergejoin_allowed = false;
+ return NIL;
+ }
+

Regards


Alena Rybakina <lena.ribackina@yandex.ru> 于2024年1月30日周二 14:51写道:
Hi! Thank you for your work on this subject.

I have reviewed your patch and I think it is better to add an Assert for
JOIN_RIGHT_SEMI to the ExecMergeJoin and ExecNestLoop functions to
prevent the use of RIGHT_SEMI for these types of connections (NestedLoop
and MergeJoin).
Mostly I'm suggesting this because of the set_join_pathlist_hook
function, which is in the add_paths_to_joinrel function, which allows
you to create a custom node. What do you think?

--
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Support "Right Semi Join" plan shapes

From
wenhui qiu
Date:
HI Richard 
     Now it is starting the last commitfest for v17, can you respond to Alena Rybakina points?


Regards

On Thu, 8 Feb 2024 at 13:50, wenhui qiu <qiuwenhuifx@gmail.com> wrote:
Hi Alena Rybakina
 I saw this code snippet also disable mergejoin ,I think it same  effect 

+ /*
+ * For now we do not support RIGHT_SEMI join in mergejoin.
+ */
+ if (jointype == JOIN_RIGHT_SEMI)
+ {
+ *mergejoin_allowed = false;
+ return NIL;
+ }
+

Regards


Alena Rybakina <lena.ribackina@yandex.ru> 于2024年1月30日周二 14:51写道:
Hi! Thank you for your work on this subject.

I have reviewed your patch and I think it is better to add an Assert for
JOIN_RIGHT_SEMI to the ExecMergeJoin and ExecNestLoop functions to
prevent the use of RIGHT_SEMI for these types of connections (NestedLoop
and MergeJoin).
Mostly I'm suggesting this because of the set_join_pathlist_hook
function, which is in the add_paths_to_joinrel function, which allows
you to create a custom node. What do you think?

--
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Support "Right Semi Join" plan shapes

From
Richard Guo
Date:

On Mon, Mar 4, 2024 at 10:33 AM wenhui qiu <qiuwenhuifx@gmail.com> wrote:
HI Richard 
     Now it is starting the last commitfest for v17, can you respond to Alena Rybakina points?

Thanks for reminding.  Will do that soon.

Thanks
Richard

Re: Support "Right Semi Join" plan shapes

From
Richard Guo
Date:

On Tue, Jan 30, 2024 at 2:51 PM Alena Rybakina <lena.ribackina@yandex.ru> wrote:
I have reviewed your patch and I think it is better to add an Assert for
JOIN_RIGHT_SEMI to the ExecMergeJoin and ExecNestLoop functions to
prevent the use of RIGHT_SEMI for these types of connections (NestedLoop
and MergeJoin).

Hmm, I don't see why this is necessary.  The planner should already
guarantee that we won't have nestloops/mergejoins with right-semi joins.

Thanks
Richard

Re: Support "Right Semi Join" plan shapes

From
wenhui qiu
Date:
Hi Richard
     Agree +1 ,I think can push now.

Richard

On Tue, 5 Mar 2024 at 10:44, Richard Guo <guofenglinux@gmail.com> wrote:

On Tue, Jan 30, 2024 at 2:51 PM Alena Rybakina <lena.ribackina@yandex.ru> wrote:
I have reviewed your patch and I think it is better to add an Assert for
JOIN_RIGHT_SEMI to the ExecMergeJoin and ExecNestLoop functions to
prevent the use of RIGHT_SEMI for these types of connections (NestedLoop
and MergeJoin).

Hmm, I don't see why this is necessary.  The planner should already
guarantee that we won't have nestloops/mergejoins with right-semi joins.

Thanks
Richard

Re: Support "Right Semi Join" plan shapes

From
Alena Rybakina
Date:

To be honest, I didn't see it in the code, could you tell me where they are, please?

On 05.03.2024 05:44, Richard Guo wrote:

On Tue, Jan 30, 2024 at 2:51 PM Alena Rybakina <lena.ribackina@yandex.ru> wrote:
I have reviewed your patch and I think it is better to add an Assert for
JOIN_RIGHT_SEMI to the ExecMergeJoin and ExecNestLoop functions to
prevent the use of RIGHT_SEMI for these types of connections (NestedLoop
and MergeJoin).

Hmm, I don't see why this is necessary.  The planner should already
guarantee that we won't have nestloops/mergejoins with right-semi joins.

Thanks
Richard
-- 
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Support "Right Semi Join" plan shapes

From
wenhui qiu
Date:

Hi Alena Rybakina
For merge join
+ /*
+ * For now we do not support RIGHT_SEMI join in mergejoin.
+ */
+ if (jointype == JOIN_RIGHT_SEMI)
+ {
+ *mergejoin_allowed = false;
+ return NIL;
+ }
+
Tanks


On Wed, 6 Mar 2024 at 04:10, Alena Rybakina <lena.ribackina@yandex.ru> wrote:

To be honest, I didn't see it in the code, could you tell me where they are, please?

On 05.03.2024 05:44, Richard Guo wrote:

On Tue, Jan 30, 2024 at 2:51 PM Alena Rybakina <lena.ribackina@yandex.ru> wrote:
I have reviewed your patch and I think it is better to add an Assert for
JOIN_RIGHT_SEMI to the ExecMergeJoin and ExecNestLoop functions to
prevent the use of RIGHT_SEMI for these types of connections (NestedLoop
and MergeJoin).

Hmm, I don't see why this is necessary.  The planner should already
guarantee that we won't have nestloops/mergejoins with right-semi joins.

Thanks
Richard
-- 
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Support "Right Semi Join" plan shapes

From
Alena Rybakina
Date:


On 06.03.2024 05:23, wenhui qiu wrote:

Hi Alena Rybakina
For merge join
+ /*
+ * For now we do not support RIGHT_SEMI join in mergejoin.
+ */
+ if (jointype == JOIN_RIGHT_SEMI)
+ {
+ *mergejoin_allowed = false;
+ return NIL;
+ }
+
Tanks


Yes, I see it, thank you. Sorry for the noise.
-- 
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Support "Right Semi Join" plan shapes

From
Richard Guo
Date:
Here is another rebase with a commit message to help review.  I also
tweaked some comments.

Thanks
Richard
Attachment

Re: Support "Right Semi Join" plan shapes

From
wenhui qiu
Date:
Hi Richard 
     Thank you so much for your tireless work on this,I see the new version of the patch improves some of the comments .I think it can commit in July


Thanks

On Thu, 25 Apr 2024 at 11:28, Richard Guo <guofenglinux@gmail.com> wrote:
Here is another rebase with a commit message to help review.  I also
tweaked some comments.

Thanks
Richard

Re: Support "Right Semi Join" plan shapes

From
Li Japin
Date:
Hi, Richard

> On Apr 25, 2024, at 11:28, Richard Guo <guofenglinux@gmail.com> wrote:
>
> Here is another rebase with a commit message to help review.  I also
> tweaked some comments.

Thank you for updating the patch, here are some comments on the v5 patch.

+    /*
+     * For now we do not support RIGHT_SEMI join in mergejoin or nestloop
+     * join.
+     */
+    if (jointype == JOIN_RIGHT_SEMI)
+        return;
+

How about adding some reasons here?

+ * this is a right-semi join, or this is a right/right-anti/full join and
+ * there are nonmergejoinable join clauses.  The executor's mergejoin

Maybe we can put the right-semi join together with the right/right-anti/full
join.  Is there any other significance by putting it separately?


Maybe the following comments also should be updated. Right?

diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 5482ab85a7..791cbc551e 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -455,8 +455,8 @@ pull_up_sublinks_jointree_recurse(PlannerInfo *root, Node *jtnode,
          * point of the available_rels machinations is to ensure that we only
          * pull up quals for which that's okay.
          *
-         * We don't expect to see any pre-existing JOIN_SEMI, JOIN_ANTI, or
-         * JOIN_RIGHT_ANTI jointypes here.
+         * We don't expect to see any pre-existing JOIN_SEMI, JOIN_ANTI,
+         * JOIN_RIGHT_SEMI, or JOIN_RIGHT_ANTI jointypes here.
          */
         switch (j->jointype)
         {
@@ -2951,6 +2951,7 @@ reduce_outer_joins_pass2(Node *jtnode,
                  * so there's no way that upper quals could refer to their
                  * righthand sides, and no point in checking.  We don't expect
                  * to see JOIN_RIGHT_ANTI yet.
+                 * Does JOIN_RIGHT_SEMI is expected here?
                  */
                 break;
             default:


Re: Support "Right Semi Join" plan shapes

From
Richard Guo
Date:
Thank you for reviewing.

On Mon, Jun 24, 2024 at 1:27 PM Li Japin <japinli@hotmail.com> wrote:
> +       /*
> +        * For now we do not support RIGHT_SEMI join in mergejoin or nestloop
> +        * join.
> +        */
> +       if (jointype == JOIN_RIGHT_SEMI)
> +               return;
> +
>
> How about adding some reasons here?

I've included a brief explanation in select_mergejoin_clauses.

> + * this is a right-semi join, or this is a right/right-anti/full join and
> + * there are nonmergejoinable join clauses.  The executor's mergejoin
>
> Maybe we can put the right-semi join together with the right/right-anti/full
> join.  Is there any other significance by putting it separately?

I don't think so.  The logic is different: for right-semi join we will
always set *mergejoin_allowed to false, while for right/right-anti/full
join it is set to false only if there are nonmergejoinable join clauses.

> Maybe the following comments also should be updated. Right?

Correct.  And there are a few more places where we need to mention
JOIN_RIGHT_SEMI, such as in reduce_outer_joins_pass2 and in the comment
for SpecialJoinInfo.


I noticed that this patch changes the plan of a query in join.sql from
a semi join to right semi join, compromising the original purpose of
this query, which was to test the fix for neqjoinsel's behavior for
semijoins (see commit 7ca25b7d).

--
-- semijoin selectivity for <>
--
explain (costs off)
select * from int4_tbl i4, tenk1 a
where exists(select * from tenk1 b
             where a.twothousand = b.twothousand and a.fivethous <> b.fivethous)
      and i4.f1 = a.tenthous;

So I've changed this test case a bit so that it is still testing what it
is supposed to test.

In passing, I've also updated the commit message to clarify that this
patch does not address the support of "Right Semi Join" for merge joins.

Thanks
Richard

Attachment