Thread: Unify "In" Sublink to EXIST Sublink for better optimize opportunity

Unify "In" Sublink to EXIST Sublink for better optimize opportunity

From
Andy Fan
Date:

Due to the implementation of convert_ANY_sublink_to_join,  we have
limitations below, which has been discussed at [1] [2].

    if (contain_vars_of_level((Node *) subselect, 1))
        return NULL; 

I'm thinking if we can do the ${subject}. If so,  the query like

SELECT * FROM t1 WHERE 
a IN (SELECT * FROM t2 WHERE t2.b > t1.b);

can be converted to

SELECT * FROM t1 WHERE 
EXISTS (SELECT * FROM t2 WHERE t2.b > t1.b AND t1.a = t2.a);

Then the sublink can be removed with existing logic (the NOT-IN format
will not be touched since they have different meanings). 

Any ideas? 


--
Best Regards
Andy Fan

Re: Unify "In" Sublink to EXIST Sublink for better optimize opportunity

From
Andy Fan
Date:
Hi:

On Thu, Oct 6, 2022 at 3:24 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:

Due to the implementation of convert_ANY_sublink_to_join,  we have
limitations below, which has been discussed at [1] [2].

    if (contain_vars_of_level((Node *) subselect, 1))
        return NULL; 

I'm thinking if we can do the ${subject}. If so,  the query like

SELECT * FROM t1 WHERE 
a IN (SELECT * FROM t2 WHERE t2.b > t1.b);

can be converted to

SELECT * FROM t1 WHERE 
EXISTS (SELECT * FROM t2 WHERE t2.b > t1.b AND t1.a = t2.a);

I have coded this and tested my idea, here are some new findings: 1). Not all the 
TargetEntry->expr can be used as qual, for example: WindowFunc, AggFunc, SRFs. 
2). For simple correlated EXISTS query, the current master code also tries to transform it
to IN format and implement it by hashing (make_subplan). So there is no need to
convert an IN query to EXISTS query if the sublink can be pulled up already, 
which means  this patch should only take care of !contain_vars_of_level((Node *) subselect, 1).

Note the changes of  postgres_fdw.out are expected. The 'a' in foreign_tbl has varlevelsup = 1;
SELECT a FROM base_tbl WHERE a IN (SELECT a FROM foreign_tbl);

Here is some performance testing for this patch:

select * from tenk1 t1
where hundred in (select hundred from tenk2 t2
                  where t2.odd = t1.odd
                  and even in (select even from tenk1 t3
                               where t3.fivethous = t2.fivethous))
and even > 0;

master:   892.902 ms
patched:  56.08 ms

Patch attached, any feedback is welcome.

--
Best Regards
Andy Fan
Attachment