Thread: [POC] Allow flattening of subquery with a link to upper query

[POC] Allow flattening of subquery with a link to upper query

From
Andrey Lepikhov
Date:
Hi,

One of the most annoying things in the planner for me is unnesting of 
correlated queries [1]. A number papers on this subject were published 
starting 1980s, but only trivial optimizations exists in the Core. It 
means a lack of performance, especially when we use foreign tables in 
subquery.
In the patch I'm trying to propose a sort of sketch of solution.

Before flattening procedure we just look through the quals of subquery, 
pull to the upper level OpExpr's containing variables from the upper 
relation and replace their positions in the quals with true expression.
Further, the flattening machinery works as usual.

This patch is dedicated to simplest variant of correlated queries - 
without aggregate functions in the target list. It passes regression 
tests and contains some additional tests to demonstrate achievements.

I'd like to get critics on the approach.

[1]  Kim, Won. “On optimizing an SQL-like nested query.” ACM Trans. 
Database Syst. 7 (1982): 443-469.

-- 
Regards
Andrey Lepikhov
Postgres Professional
Attachment

Re: [POC] Allow flattening of subquery with a link to upper query

From
Richard Guo
Date:

On Wed, Aug 31, 2022 at 2:35 PM Andrey Lepikhov <a.lepikhov@postgrespro.ru> wrote:
Before flattening procedure we just look through the quals of subquery,
pull to the upper level OpExpr's containing variables from the upper
relation and replace their positions in the quals with true expression.
Further, the flattening machinery works as usual.
 
Hmm, I'm not sure this patch works correctly in all cases. It seems to
me this patch pulls up the subquery without checking the constraints
imposed by lateral references. If its quals contain any lateral
references to rels outside a higher outer join, we would need to
postpone quals from below an outer join to above it, which is probably
incorrect. As an example, consider

    select * from a left join b on b.i in
        (select c.i from c where c.j = a.j);

If we pull up the ANY SubLink into parent query and pull up its qual
into upper level, as what the patch does, then its qual 'c.j = a.j'
would have to be postponed past the B/C semi join, which is totally
wrong. Doing this would firstly trigger the assertion failure in
distribute_qual_to_rels

  Assert(root->hasLateralRTEs);   /* shouldn't happen otherwise */
  Assert(jointype == JOIN_INNER); /* mustn't postpone past outer join */

Even if we ignore these assertion checks, in the final plan we would
have to access the RHS of the B/C semi join, i.e. C, to evaluate qual
'c.j = a.j' at the join level of A/BC join, which is wrong.

Thanks
Richard

Re: [POC] Allow flattening of subquery with a link to upper query

From
Andrey Lepikhov
Date:
On 9/1/22 17:24, Richard Guo wrote:
> 
> On Wed, Aug 31, 2022 at 2:35 PM Andrey Lepikhov 
> <a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote:
> 
>     Before flattening procedure we just look through the quals of subquery,
>     pull to the upper level OpExpr's containing variables from the upper
>     relation and replace their positions in the quals with true expression.
>     Further, the flattening machinery works as usual.
> 
> Hmm, I'm not sure this patch works correctly in all cases. It seems to
> me this patch pulls up the subquery without checking the constraints
> imposed by lateral references. If its quals contain any lateral
> references to rels outside a higher outer join, we would need to
> postpone quals from below an outer join to above it, which is probably
> incorrect.Yeah, it's not easy-to-solve problem. If I correctly understand the 
code, to fix this problem we must implement the same logic, as 
pull_up_subqueries (lowest_outer_join/safe_upper_varnos). It looks ugly. 
But, more important, does this feature deserve such efforts/changes?

-- 
Regards
Andrey Lepikhov
Postgres Professional




Re: [POC] Allow flattening of subquery with a link to upper query

From
Richard Guo
Date:

On Fri, Sep 2, 2022 at 7:09 PM Andrey Lepikhov <a.lepikhov@postgrespro.ru> wrote:
On 9/1/22 17:24, Richard Guo wrote:
> On Wed, Aug 31, 2022 at 2:35 PM Andrey Lepikhov
> <a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote:
>     Before flattening procedure we just look through the quals of subquery,
>     pull to the upper level OpExpr's containing variables from the upper
>     relation and replace their positions in the quals with true expression.
>     Further, the flattening machinery works as usual.
>
> Hmm, I'm not sure this patch works correctly in all cases. It seems to
> me this patch pulls up the subquery without checking the constraints
> imposed by lateral references. If its quals contain any lateral
> references to rels outside a higher outer join, we would need to
> postpone quals from below an outer join to above it, which is probably
> incorrect.
 
Yeah, it's not easy-to-solve problem. If I correctly understand the
code, to fix this problem we must implement the same logic, as
pull_up_subqueries (lowest_outer_join/safe_upper_varnos). 
 
Yeah, I think we'd have to consider the restrictions from lateral
references to guarantee correctness when we pull up subqueries. We need
to avoid the situation where quals need to be postponed past outer join.

However, even if we have taken care of that, there may be other issues
with flattening direct-correlated ANY SubLink. The constraints imposed
by LATERAL references may make it impossible for us to find any legal
join orders, as discussed in [1].

Re: [POC] Allow flattening of subquery with a link to upper query

From
Andrey Lepikhov
Date:
On 9/5/22 12:22, Richard Guo wrote:
> 
> On Fri, Sep 2, 2022 at 7:09 PM Andrey Lepikhov 
>     Yeah, it's not easy-to-solve problem. If I correctly understand the
>     code, to fix this problem we must implement the same logic, as
>     pull_up_subqueries (lowest_outer_join/safe_upper_varnos). 
> 
> Yeah, I think we'd have to consider the restrictions from lateral
> references to guarantee correctness when we pull up subqueries. We need
> to avoid the situation where quals need to be postponed past outer join.
> 
> However, even if we have taken care of that, there may be other issues
> with flattening direct-correlated ANY SubLink. The constraints imposed
> by LATERAL references may make it impossible for us to find any legal
> join orders, as discussed in [1].
> 
> [1] 
> https://www.postgresql.org/message-id/CAMbWs49cvkF9akbomz_fCCKS=D5TY=4KGHEQcfHPZCXS1GVhkA@mail.gmail.com
<https://www.postgresql.org/message-id/CAMbWs49cvkF9akbomz_fCCKS=D5TY=4KGHEQcfHPZCXS1GVhkA@mail.gmail.com>

The problem you mentioned under this link is about ineffective query 
plan - as I understand it.
This is a problem, especially if we would think about more complex 
pull-ups of subqueries - with aggregate functions in the target list.
I think about that problem as about next step - we already have an 
example - machinery of alternative plans. This problem may be solved in 
this way, or by a GUC, as usual.

-- 
Regards
Andrey Lepikhov
Postgres Professional




Re: [POC] Allow flattening of subquery with a link to upper query

From
Andrey Lepikhov
Date:
On 5/9/2022 12:22, Richard Guo wrote:
> 
> On Fri, Sep 2, 2022 at 7:09 PM Andrey Lepikhov 
> <a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote:
>      > Hmm, I'm not sure this patch works correctly in all cases. It
>     seems to
>      > me this patch pulls up the subquery without checking the constraints
>      > imposed by lateral references. If its quals contain any lateral
>      > references to rels outside a higher outer join, we would need to
>      > postpone quals from below an outer join to above it, which is
>     probably
>      > incorrect.
> 
>     Yeah, it's not easy-to-solve problem. If I correctly understand the
>     code, to fix this problem we must implement the same logic, as
>     pull_up_subqueries (lowest_outer_join/safe_upper_varnos). 
> 
> Yeah, I think we'd have to consider the restrictions from lateral
> references to guarantee correctness when we pull up subqueries. We need
> to avoid the situation where quals need to be postponed past outer join.
> 
> However, even if we have taken care of that, there may be other issues
> with flattening direct-correlated ANY SubLink. The constraints imposed
> by LATERAL references may make it impossible for us to find any legal
> join orders, as discussed in [1].
To resolve both issues, lower outer join passes through pull_sublinks_* 
into flattening routine (see attachment).
I've added these cases into subselect.sql

-- 
regards,
Andrey Lepikhov
Postgres Professional

Attachment

Re: [POC] Allow flattening of subquery with a link to upper query

From
Andrey Lepikhov
Date:
On 9/13/22 16:40, Andrey Lepikhov wrote:
> On 5/9/2022 12:22, Richard Guo wrote:
>> On Fri, Sep 2, 2022 at 7:09 PM Andrey Lepikhov 
>> <a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote:
> To resolve both issues, lower outer join passes through pull_sublinks_* 
> into flattening routine (see attachment).
> I've added these cases into subselect.sql
In attachment - new version of the patch, rebased onto current master.

-- 
Regards
Andrey Lepikhov
Postgres Professional

Attachment

Re: [POC] Allow flattening of subquery with a link to upper query

From
Zhihong Yu
Date:
Hi,
For contain_placeholders():

+   if (IsA(node, Query))
+       return query_tree_walker((Query *) node, contain_placeholders, context, 0);
+   else if (IsA(node, PlaceHolderVar))

The `else` is not needed.

For correlated_t struct, it would be better if the fields have comments.

+                    * (for grouping, as an example). So, revert its status to
+                    * a full valued entry.

full valued -> fully valued

Cheers

Re: [POC] Allow flattening of subquery with a link to upper query

From
Andrey Lepikhov
Date:
On 5/10/2022 02:45, Zhihong Yu wrote:
> Hi,
> For contain_placeholders():
> 
> +   if (IsA(node, Query))
> +       return query_tree_walker((Query *) node, contain_placeholders, 
> context, 0);
> +   else if (IsA(node, PlaceHolderVar))
Fixed
> 
> The `else` is not needed.
> 
> For correlated_t struct, it would be better if the fields have comments.
Ok, I've added some comments.
> 
> +                    * (for grouping, as an example). So, revert its 
> status to
> +                    * a full valued entry.
> 
> full valued -> fully valued
Fixed

-- 
regards,
Andrey Lepikhov
Postgres Professional

Attachment

Re: [POC] Allow flattening of subquery with a link to upper query

From
Zhihong Yu
Date:


On Wed, Oct 5, 2022 at 4:38 AM Andrey Lepikhov <a.lepikhov@postgrespro.ru> wrote:
On 5/10/2022 02:45, Zhihong Yu wrote:
> Hi,
> For contain_placeholders():
>
> +   if (IsA(node, Query))
> +       return query_tree_walker((Query *) node, contain_placeholders,
> context, 0);
> +   else if (IsA(node, PlaceHolderVar))
Fixed
>
> The `else` is not needed.
>
> For correlated_t struct, it would be better if the fields have comments.
Ok, I've added some comments.
>
> +                    * (for grouping, as an example). So, revert its
> status to
> +                    * a full valued entry.
>
> full valued -> fully valued
Fixed

--
regards,
Andrey Lepikhov
Postgres Professional
Hi,

+   List   *pulling_quals; /* List of expressions contained pulled expressions */

contained -> containing

+           /* Does the var already exists in the target list? */

 exists -> exist

+       {"optimize_correlated_subqueries", PGC_USERSET, QUERY_TUNING_METHOD,

Is it possible that in the future there would be other optimization for correlated subqueries ?
If so, either rename the guc or, make the guc a string which represents an enum.

Cheers

Re: [POC] Allow flattening of subquery with a link to upper query

From
Andrei Lepikhov
Date:
On 1/9/2022 19:24, Richard Guo wrote:
> Even if we ignore these assertion checks, in the final plan we would
> have to access the RHS of the B/C semi join, i.e. C, to evaluate qual
> 'c.j = a.j' at the join level of A/BC join, which is wrong.
Having committed 9f13376396 recently, we did a lot of work in this area. 
By applying regression tests from my last patch [1] to the master, I 
compared these two implementations.
As I see, using the LATERAL trick allowed us to simplify the code 
drastically. But because we know just a fact of the lateral link, not 
its place, in the master we do less when in the patch proposed in that 
thread. For example, having query:

explain (costs off)
SELECT relname FROM pg_class c1
WHERE relname = ANY (
   SELECT a.amname from pg_am a WHERE a.oid=c1.oid GROUP BY a.amname
);

We see on master:
  Nested Loop
    ->  Seq Scan on pg_class c1
    ->  Subquery Scan on "ANY_subquery"
          Filter: (c1.relname = "ANY_subquery".amname)
          ->  Group
                Group Key: a.amname
                ->  Sort
                      Sort Key: a.amname
                      ->  Seq Scan on pg_am a
                            Filter: (oid = c1.oid)

And with this patch:
  Hash Join
    Hash Cond: ((c1.relname = a.amname) AND (c1.oid = a.oid))
    ->  Seq Scan on pg_class c1
    ->  Hash
          ->  HashAggregate
                Group Key: a.amname
                ->  Seq Scan on pg_am a

Also, we attempted to fix links from a non-parent query block.
So, in my opinion, the reason for this patch still exists, and we can 
continue this work further, maybe elaborating on flattening LATERAL 
references - this needs some research.

[1] 
https://www.postgresql.org/message-id/35c8a3e8-d080-dfa8-2be3-cf5fe702010a%40postgrespro.ru

-- 
regards,
Andrei Lepikhov
Postgres Professional




Re: [POC] Allow flattening of subquery with a link to upper query

From
David Rowley
Date:
On Tue, 20 Feb 2024 at 22:57, Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote:
> explain (costs off)
> SELECT relname FROM pg_class c1
> WHERE relname = ANY (
>    SELECT a.amname from pg_am a WHERE a.oid=c1.oid GROUP BY a.amname
> );
>
> We see on master:
>   Nested Loop
>     ->  Seq Scan on pg_class c1
>     ->  Subquery Scan on "ANY_subquery"
>           Filter: (c1.relname = "ANY_subquery".amname)
>           ->  Group
>                 Group Key: a.amname
>                 ->  Sort
>                       Sort Key: a.amname
>                       ->  Seq Scan on pg_am a
>                             Filter: (oid = c1.oid)
>
> And with this patch:
>   Hash Join
>     Hash Cond: ((c1.relname = a.amname) AND (c1.oid = a.oid))
>     ->  Seq Scan on pg_class c1
>     ->  Hash
>           ->  HashAggregate
>                 Group Key: a.amname
>                 ->  Seq Scan on pg_am a

I've only glanced at the patch just so I could determine if you're
making a cost-based decision and doing this transformation only if the
de-correlation of the subquery is deemed the cheaper option. It looks
like since you're doing this in the same location that we do the other
semi / anti join transformations that there's no costing.

I agree that it would be nice to teach the planner how to do this, but
I think it just has to be a cost-based decision.  Imagine how the
transformed query would perform of pg_am had a billion rows and
pg_class had 1 row. That's quite a costly hash table build to be
probing it just once.

I didn't follow the patch, but there was a patch to push aggregate
function evaluation down [1].  I imagine this has the same problem as
if you just blindly pushed and aggregate function evaluation as deep
as you could evaluate all the aggregate's parameters and group by vars
then you may end up aggregating far more than you need to as some join
could eliminate the majority of the groups.  I think we'd need to come
up with some way to have the planner consider these types of
optimisations as alternatives to what happens today and only apply
them when we estimate that they're cheaper.  Right now a Path has no
ability to describe that it's performed GROUP BY.

David

[1] https://commitfest.postgresql.org/46/4019/



Re: [POC] Allow flattening of subquery with a link to upper query

From
Andrei Lepikhov
Date:
On 20/2/2024 17:43, David Rowley wrote:
> On Tue, 20 Feb 2024 at 22:57, Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote: 
> I agree that it would be nice to teach the planner how to do this, but
> I think it just has to be a cost-based decision.  Imagine how the
> transformed query would perform of pg_am had a billion rows and
> pg_class had 1 row. That's quite a costly hash table build to be
> probing it just once.
True, the origins of this work lie in foreign tables where such a query 
generates an even worse situation.

> I didn't follow the patch, but there was a patch to push aggregate
> function evaluation down [1].  I imagine this has the same problem as
> if you just blindly pushed and aggregate function evaluation as deep
> as you could evaluate all the aggregate's parameters and group by vars
> then you may end up aggregating far more than you need to as some join
> could eliminate the majority of the groups.  I think we'd need to come
> up with some way to have the planner consider these types of
> optimisations as alternatives to what happens today and only apply
> them when we estimate that they're cheaper.  Right now a Path has no
> ability to describe that it's performed GROUP BY.
Thanks for the link. We also ended up with the idea of an alternative 
subtree (inspired by the approach of AlternativeSubplan). Here, we just 
explain the current state of the pull-up sublink technique.

-- 
regards,
Andrei Lepikhov
Postgres Professional