Re: why doesn't optimizer can pull up where a > ( ... ) - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: why doesn't optimizer can pull up where a > ( ... )
Date
Msg-id 20191120172001.gri67zdnwycsobxy@development
Whole thread Raw
In response to why doesn't optimizer can pull up where a > ( ... )  (Andy Fan <zhihui.fan1213@gmail.com>)
List pgsql-hackers
On Wed, Nov 20, 2019 at 08:15:19PM +0800, Andy Fan wrote:
>Hi Hackers:
>
> First I found the following queries running bad on pg.
>
> select count(*) from part2 p1 where p_size > 40 and p_retailprice >
>(select avg(p_retailprice) from part2 p2 where p2.p_brand=p1.p_brand);
>
>the plan is
>                                     QUERY PLAN
>------------------------------------------------------------------------------------
> Aggregate  (cost=1899310537.28..1899310537.29 rows=1 width=8)
>   ->  Seq Scan on part2 p1  (cost=0.00..1899310456.00 rows=32513 width=0)
>         Filter: ((p_size > 40) AND (p_retailprice > (SubPlan 1)))
>         SubPlan 1
>           ->  Aggregate  (cost=6331.00..6331.01 rows=1 width=32)
>                 ->  Seq Scan on part2 p2  (cost=0.00..5956.00 rows=150000
>width=4)
>                       Filter: (p_brand = p1.p_brand)
>
>however if we change it to the following format, it runs pretty quick.
>
>select count(*) from part2,
>(select p_brand, avg(p_retailprice) as avg_price from part2 where p_size >
>40 group by p_brand) p2
>where p_retailprice > p2.avg_price
>and p_size > 40
>and part2.p_brand = p2.p_brand;
>
>The above example comes from
>https://community.pivotal.io/s/article/Pivotal-Query-Optimizer-Explained with
>a litter modification.
>
>1.  why pg can't translate the query 1 to query 2.  after some checking
>on pull_up_sublinks_qual_recurse,  I still doesn't get the idea.
>2.  why pg can't do it,  while greenplum can?
>

I don't know the exact place(s) in the optimizer that would need to
consider this optimization, but the primary difference between the two
queries is that the first one is correlated subquery, while the second
one is not.

So I guess our optimizer is not smart enough to recognize this pattern,
and can't do the transformation from

    ... FROM p WHERE x > (SELECT avg(x) FROM q WHERE p.id = q.id) ...

to

    ... FROM p, (SELECT q.id, avg(x) x FROM q) q2 WHERE q2.id = p.id
                                                    AND q2.x < p.x

I.e. we don't have the code to consider this optimization, because no
one considered it interesting enough to submit a patch.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



pgsql-hackers by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: Planning counters in pg_stat_statements (using pgss_store)
Next
From: Tomas Vondra
Date:
Subject: Re: why doesn't optimizer can pull up where a > ( ... )