Thread: Pulling up sublink may break join-removal logic

Pulling up sublink may break join-removal logic

From
Richard Guo
Date:
Hi hackers,

I happened to notice $subject and not sure if it's an issue or not. When
we're trying to remove a LEFT JOIN, one of the requirements is the inner
side needs to be a single baserel. If there is a join qual that is a
sublink and can be converted to a semi join with the inner side rel, the
inner side would no longer be a single baserel and as a result the LEFT
JOIN can no longer be removed.

Here is an example to illustrate this behavior:

create table a(i int, j int);
create table b(i int UNIQUE, j int);
create table c(i int, j int);

# explain (costs off) select a.i from a left join b on a.i = b.i and
    b.j in (select j from c where b.i = c.i);
  QUERY PLAN
---------------
 Seq Scan on a
(1 row)

For the query above, we do not pull up the sublink and the LEFT JOIN is
removed.


# explain (costs off) select a.i from a left join b on a.i = b.i and
    b.j in (select j from c);
              QUERY PLAN
---------------------------------------
 Hash Left Join
   Hash Cond: (a.i = b.i)
   ->  Seq Scan on a
   ->  Hash
         ->  Hash Semi Join
               Hash Cond: (b.j = c.j)
               ->  Seq Scan on b
               ->  Hash
                     ->  Seq Scan on c
(9 rows)

Now for this above query, the sublink is pulled up to be a semi-join
with inner side rel 'b', which makes the inner side no longer a single
baserel. That causes the LEFT JOIN failing to be removed.

That is to say, pulling up sublink sometimes breaks join-removal logic.
Is this an issue that bothers you too?

Thanks
Richard

Re: Pulling up sublink may break join-removal logic

From
David Rowley
Date:
On Tue, 28 Apr 2020 at 19:04, Richard Guo <guofenglinux@gmail.com> wrote:
> I happened to notice $subject and not sure if it's an issue or not. When
> we're trying to remove a LEFT JOIN, one of the requirements is the inner
> side needs to be a single baserel. If there is a join qual that is a
> sublink and can be converted to a semi join with the inner side rel, the
> inner side would no longer be a single baserel and as a result the LEFT
> JOIN can no longer be removed.

I think, in theory at least, that can be fixed by [1], where we no
longer rely on looking to see if the RelOptInfo has a unique index to
determine if the relation can duplicate outer side rows during the
join. Of course, they'll only exist on base relations, so hence the
check you're talking about. Instead, the patch's idea is to propagate
uniqueness down the join tree in the form of UniqueKeys.

A quick glance shows there are a few implementation details of join
removals of why the removal still won't work with [1].  For example,
the singleton rel check causes it to abort both on the pre-check and
the final join removal check.  There's also the removal itself that
assumes we're just removing a single relation. I'd guess that would
need to loop over the min_righthand relids with a bms_next_member loop
and remove each base rel one by one.  I'd need to look in more detail
to know if there are any other limiting factors there.

I'll link back here over on that thread to see if Andy would like to
take a quick look at it.

David

[1] https://www.postgresql.org/message-id/CAKU4AWoymsjbm5KDYbsko13GUfG57pX1QyC3Y8sDHyrfoQeyQQ@mail.gmail.com



Re: Pulling up sublink may break join-removal logic

From
Richard Guo
Date:

On Wed, Apr 29, 2020 at 8:23 AM David Rowley <dgrowleyml@gmail.com> wrote:
On Tue, 28 Apr 2020 at 19:04, Richard Guo <guofenglinux@gmail.com> wrote:
> I happened to notice $subject and not sure if it's an issue or not. When
> we're trying to remove a LEFT JOIN, one of the requirements is the inner
> side needs to be a single baserel. If there is a join qual that is a
> sublink and can be converted to a semi join with the inner side rel, the
> inner side would no longer be a single baserel and as a result the LEFT
> JOIN can no longer be removed.

I think, in theory at least, that can be fixed by [1], where we no
longer rely on looking to see if the RelOptInfo has a unique index to
determine if the relation can duplicate outer side rows during the
join. Of course, they'll only exist on base relations, so hence the
check you're talking about. Instead, the patch's idea is to propagate
uniqueness down the join tree in the form of UniqueKeys.

Do you mean we're tracking the uniqueness of each RelOptInfo, baserel or
joinrel, with UniqueKeys? I like the idea!
 

A quick glance shows there are a few implementation details of join
removals of why the removal still won't work with [1].  For example,
the singleton rel check causes it to abort both on the pre-check and
the final join removal check.  There's also the removal itself that
assumes we're just removing a single relation. I'd guess that would
need to loop over the min_righthand relids with a bms_next_member loop
and remove each base rel one by one.  I'd need to look in more detail
to know if there are any other limiting factors there.

Yeah, we'll have to teach remove_useless_joins to work with multiple
relids.

Thanks
Richard

Re: Pulling up sublink may break join-removal logic

From
Andy Fan
Date:


On Wed, Apr 29, 2020 at 10:37 AM Richard Guo <guofenglinux@gmail.com> wrote:

On Wed, Apr 29, 2020 at 8:23 AM David Rowley <dgrowleyml@gmail.com> wrote:
On Tue, 28 Apr 2020 at 19:04, Richard Guo <guofenglinux@gmail.com> wrote:
> I happened to notice $subject and not sure if it's an issue or not. When
> we're trying to remove a LEFT JOIN, one of the requirements is the inner
> side needs to be a single baserel. If there is a join qual that is a
> sublink and can be converted to a semi join with the inner side rel, the
> inner side would no longer be a single baserel and as a result the LEFT
> JOIN can no longer be removed.

I think, in theory at least, that can be fixed by [1], where we no
longer rely on looking to see if the RelOptInfo has a unique index to
determine if the relation can duplicate outer side rows during the
join. Of course, they'll only exist on base relations, so hence the
check you're talking about. Instead, the patch's idea is to propagate
uniqueness down the join tree in the form of UniqueKeys.

Do you mean we're tracking the uniqueness of each RelOptInfo, baserel or
joinrel, with UniqueKeys? I like the idea!

Yes, it is and welcome the for review for that patch:) 
 

A quick glance shows there are a few implementation details of join
removals of why the removal still won't work with [1].  For example,
the singleton rel check causes it to abort both on the pre-check and
the final join removal check.  There's also the removal itself that
assumes we're just removing a single relation. I'd guess that would
need to loop over the min_righthand relids with a bms_next_member loop
and remove each base rel one by one.  I'd need to look in more detail
to know if there are any other limiting factors there.

Yeah, we'll have to teach remove_useless_joins to work with multiple
relids.

You can see [1] for the discuss for this issue with UniqueKey respect.  search
"In the past we have some limited ability to detect the unqiueness after join, 
 so that's would be ok.  Since we have such ability now,  this may be another 
opportunity to improve the join_is_removable function"

I'm checking it today and will have a feedback soon. 


Best Regards
Andy Fan