Re: why doesn't optimizer can pull up where a > ( ... ) - Mailing list pgsql-hackers

From Andy Fan
Subject Re: why doesn't optimizer can pull up where a > ( ... )
Date
Msg-id CAKU4AWrLa4y-t5O10x4w1aDog7TRm5O7YnGZ8FmU7BS9qFTxWQ@mail.gmail.com
Whole thread Raw
In response to why doesn't optimizer can pull up where a > ( ... )  (Andy Fan <zhihui.fan1213@gmail.com>)
List pgsql-hackers


On Wed, Nov 20, 2019 at 8:15 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:
Hi Hackers:

 First I found the following queries running bad on pg. 

 select count(*) from part2 p1 where p_size > 40 and p_retailprice > (select avg(p_retailprice) from part2 p2 where p2.p_brand=p1.p_brand);

the plan is
                                     QUERY PLAN
------------------------------------------------------------------------------------
 Aggregate  (cost=1899310537.28..1899310537.29 rows=1 width=8)
   ->  Seq Scan on part2 p1  (cost=0.00..1899310456.00 rows=32513 width=0)
         Filter: ((p_size > 40) AND (p_retailprice > (SubPlan 1)))
         SubPlan 1
           ->  Aggregate  (cost=6331.00..6331.01 rows=1 width=32)
                 ->  Seq Scan on part2 p2  (cost=0.00..5956.00 rows=150000 width=4)
                       Filter: (p_brand = p1.p_brand)

however if we change it to the following format, it runs pretty quick. 

select count(*) from part2,
(select p_brand, avg(p_retailprice) as avg_price from part2 where p_size > 40 group by p_brand) p2
where p_retailprice > p2.avg_price
and p_size > 40
and part2.p_brand = p2.p_brand;

The above example comes from https://community.pivotal.io/s/article/Pivotal-Query-Optimizer-Explained with a litter modification. 

1.  why pg can't translate the query 1 to query 2.  after some checking on pull_up_sublinks_qual_recurse,  I still doesn't get the idea. 
2.  why pg can't do it,  while greenplum can? 

Thanks


add the sql I used for testing for reference. 

CREATE TABLE part2 (
   p_partkey integer NOT NULL,
   p_brand character(10) NOT NULL,
   p_size integer NOT NULL,
   p_retailprice numeric(15,2) NOT NULL
);
insert into part2 select 1, 'brand1',  random_between(0, 40),  random_between(0, 40)  from generate_series(1, 100000);
insert into part2 select 2, 'brand2',  random_between(40, 80),  random_between(0, 40)  from generate_series(1, 100000);
insert into part2 select 3, 'brand1',  random_between(0, 40),  random_between(0, 40)  from generate_series(1, 100000);

 

pgsql-hackers by date:

Previous
From: Andy Fan
Date:
Subject: why doesn't optimizer can pull up where a > ( ... )
Next
From: Daniel Gustafsson
Date:
Subject: Re: why doesn't optimizer can pull up where a > ( ... )