Re: Do not scan index in right table if condition for left join evaluates to false using columns in left table - Mailing list pgsql-hackers

From Илья Жарков
Subject Re: Do not scan index in right table if condition for left join evaluates to false using columns in left table
Date
Msg-id CAKE=rqQS_hT6DV54Vq8bRAFV=A-Q3PdM9pmqCR_D61PPEcjv8g@mail.gmail.com
Whole thread Raw
In response to Re: Do not scan index in right table if condition for left join evaluates to false using columns in left table  (Andrei Lepikhov <lepihov@gmail.com>)
List pgsql-hackers
Regarding merge joins, I suppose in some edge cases inner set scan might not even be started.

FROM parent p
LEFT JOIN child c
  ON p.id = c.id
  AND p.dtype = 'B'

        ┌───┐ ┌───┐
parent  │1,A│ │2,A│
        └───┘ └───┘
       ^    
        ┌───┐ ┌───┐
child   │ 1 │ │ 2 │
        └───┘ └───┘
       ^

If p.dtype = 'B' was evaluated early, the pointer could move through the outer set as long as it is evaluated to false.
In the above example, it reaches the end without even accessing the inner set.

        ┌───┐ ┌───┐
parent  │1,A│ │2,A│
        └───┘ └───┘
                   ^
        ┌───┐ ┌───┐
child   │ 1 │ │ 2 │
        └───┘ └───┘
       ^

In the opposite scenario:

        ┌───┐ ┌───┐
parent  │1,B│ │2,A│
        └───┘ └───┘
       ^    
        ┌───┐ ┌───┐
child   │ 1 │ │ 2 │
        └───┘ └───┘
       ^

it would need to start moving the inner pointer at some point.

        ┌───┐ ┌───┐
parent  │1,B│ │2,A│
        └───┘ └───┘
                   ^
        ┌───┐ ┌───┐
child   │ 1 │ │ 2 │
        └───┘ └───┘
             ^    

But even in this case, the pointer may not reach the end in the inner set.

Though this highly depends on how merge join is implemented in the Postgres code. I have to admit that I have a very vague idea on this...

вс, 8 дек. 2024 г. в 11:44, Andrei Lepikhov <lepihov@gmail.com>:
On 8/12/2024 09:52, Andres Freund wrote:
>> I think avoiding touching a hash table and an index under MergeJoin can also
>> be beneficial.
>
> How would you get significant wins for mergejoins? You need to go through both
> inner and outer anyway?
In my mind, this trick can be designed for specific cases like sales
tables, as illustrated before and used by well-rounded developers. I'm
not sure that such optimisation would be profitable in general. My point
is that the sales database has lots of categories, and when requesting
product descriptions, we will not necessarily touch all the categories -
in that case, the one-sided clause could allow us to avoid scanning some
tables at all. Am I wrong?
BTW, may it be used in SEMI JOIN cases?

--
regards, Andrei Lepikhov

pgsql-hackers by date:

Previous
From: Evgeny
Date:
Subject: Re: [PATCH] Support Int64 GUCs
Next
From: Andres Freund
Date:
Subject: Re: Do not scan index in right table if condition for left join evaluates to false using columns in left table