Thread: Re: Wrong results with right-semi-joins

Re: Wrong results with right-semi-joins

From
Melanie Plageman
Date:
On Tue, Dec 3, 2024 at 3:56 AM Richard Guo <guofenglinux@gmail.com> wrote:
>
> I ran into $subject and it can be reproduced with the query below.
>
> create temp table tbl_rs(a int, b int);
> insert into tbl_rs select i, i from generate_series(1,10)i;
> analyze tbl_rs;
>
> set enable_nestloop to off;
> set enable_hashagg to off;
>
> select * from tbl_rs t1
> where (select a from tbl_rs t2
>        where exists (select 1 from
>                      (select (b in (select b from tbl_rs t3)) as c
>                       from tbl_rs t4 where t4.a = 1) s
>                      where c in
>                       (select t1.a = 1 from tbl_rs t5 union all select true))
>        order by a limit 1) >= 0;
>  a | b
> ---+---
>  1 | 1
> (1 row)
>
> The expected output should be 10 rows, not 1.

Thanks for finding and fixing this. Just for my own benefit, could you
explain more about the minimal repro? Specifically, if you just need a
subplan in the hash side of a right semi-join, why do you also need
the outer part of the query that produces the initplan?

 Seq Scan on tbl_rs t1
   Filter: ((SubPlan 3) >= 0)
   SubPlan 3
     ->  Limit
           InitPlan 2
             ->  Hash Right Semi Join

- Melanie



Re: Wrong results with right-semi-joins

From
Richard Guo
Date:
On Mon, Dec 9, 2024 at 11:01 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> Thanks for finding and fixing this. Just for my own benefit, could you
> explain more about the minimal repro? Specifically, if you just need a
> subplan in the hash side of a right semi-join, why do you also need
> the outer part of the query that produces the initplan?
>
>  Seq Scan on tbl_rs t1
>    Filter: ((SubPlan 3) >= 0)
>    SubPlan 3
>      ->  Limit
>            InitPlan 2
>              ->  Hash Right Semi Join

Upon further consideration, I believe the initplan is unnecessary.
What we really want from the plan is to reuse the hash table during
hash-right-semi-join rescans.  To achieve this, we just need to ensure
that it's a single-batch join and that there are no parameter changes
on the inner side.

I spent some time on this and came up with a simpler query to
reproduce the issue.

explain (costs off)
select * from tbl_rs t1 join
  lateral (select * from tbl_rs t2 where t2.a in
            (select t1.a+t3.a from tbl_rs t3) and t2.a < 5)
  on true;
                QUERY PLAN
-------------------------------------------
 Nested Loop
   ->  Seq Scan on tbl_rs t1
   ->  Hash Right Semi Join
         Hash Cond: ((t1.a + t3.a) = t2.a)
         ->  Seq Scan on tbl_rs t3
         ->  Hash
               ->  Seq Scan on tbl_rs t2
                     Filter: (a < 5)
(8 rows)

Without the fix, this query returns 3 rows rather than the expected 6.

Maybe I should update the test case introduced in 5668a857d to this
one.

Thanks
Richard



Re: Wrong results with right-semi-joins

From
Richard Guo
Date:
On Wed, Dec 11, 2024 at 11:27 AM Richard Guo <guofenglinux@gmail.com> wrote:
> I spent some time on this and came up with a simpler query to
> reproduce the issue.
>
> explain (costs off)
> select * from tbl_rs t1 join
>   lateral (select * from tbl_rs t2 where t2.a in
>             (select t1.a+t3.a from tbl_rs t3) and t2.a < 5)
>   on true;
>                 QUERY PLAN
> -------------------------------------------
>  Nested Loop
>    ->  Seq Scan on tbl_rs t1
>    ->  Hash Right Semi Join
>          Hash Cond: ((t1.a + t3.a) = t2.a)
>          ->  Seq Scan on tbl_rs t3
>          ->  Hash
>                ->  Seq Scan on tbl_rs t2
>                      Filter: (a < 5)
> (8 rows)
>
> Without the fix, this query returns 3 rows rather than the expected 6.
>
> Maybe I should update the test case introduced in 5668a857d to this
> one.

Done.

Thanks
Richard



Re: Wrong results with right-semi-joins

From
Melanie Plageman
Date:
On Wed, Dec 11, 2024 at 9:44 PM Richard Guo <guofenglinux@gmail.com> wrote:
>
> On Wed, Dec 11, 2024 at 11:27 AM Richard Guo <guofenglinux@gmail.com> wrote:
>
> > Maybe I should update the test case introduced in 5668a857d to this
> > one.
>
> Done.

Great, thanks. I think the new example is more clear.

- Melanie