Thread: BUG #15699: PostgreSQL query rewrite don's use the same rewrite rule for the JOIN QUERY.

BUG #15699: PostgreSQL query rewrite don's use the same rewrite rule for the JOIN QUERY.

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      15699
Logged by:          Zhou Digoal
Email address:      digoal@126.com
PostgreSQL version: 11.2
Operating system:   centos 7.x x64
Description:

why PostgreSQL query rewrite don's use the same rewrite rule for the JOIN
QUERY.

for exp:

1、when using hash join or merge join

query rewrite don't  add this cond:  t2.i<10000000

```
postgres=# explain select count(*) from table5 t1 join table5 t2 using (i)
where t1.i<10000000;
                                                            QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=5211686.12..5211686.13 rows=1 width=8)
   ->  Gather  (cost=5211686.06..5211686.07 rows=20 width=8)
         Workers Planned: 20
         ->  Partial Aggregate  (cost=5211686.06..5211686.07 rows=1
width=8)
               ->  Parallel Hash Join  (cost=98142.42..5210632.23
rows=421532 width=0)
                     Hash Cond: (t2.i = t1.i)
                     ->  Parallel Seq Scan on table5 t2
(cost=0.00..4924779.03 rows=50000003 width=4)
                     ->  Parallel Hash  (cost=92873.27..92873.27 rows=421532
width=4)
                           ->  Parallel Index Only Scan using idx_table5_2
on table5 t1  (cost=0.57..92873.27 rows=421532 width=4)
                                 Index Cond: (i < 10000000)
(10 rows)

postgres=# explain select count(*) from table5 t1 join table5 t2 using (i)
where t1.i<10000000;
                                                           QUERY PLAN
                                                    

---------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=8840398.86..8840398.87 rows=1 width=8)
   ->  Gather  (cost=8840398.80..8840398.81 rows=20 width=8)
         Workers Planned: 20
         ->  Partial Aggregate  (cost=8840398.80..8840398.81 rows=1
width=8)
               ->  Merge Join  (cost=1.15..8839344.97 rows=421532 width=0)
                     Merge Cond: (t2.i = t1.i)
                     ->  Parallel Index Only Scan using idx_table5_2 on
table5 t2  (cost=0.57..8516088.73 rows=50000003 width=4)
                     ->  Index Only Scan using idx_table5_2 on table5 t1
(cost=0.57..172964.32 rows=8430637 width=4)
                           Index Cond: (i < 10000000)
(9 rows)
```

2、when use nestloop join ,

query rewrite do add this cond:  t2.i<10000000

```
postgres=# explain select count(*) from table5 t1 join table5 t2 on
(t1.i=t2.i and t1.i<10000000 and t2.i<10000000);
                                                QUERY PLAN
                             
----------------------------------------------------------------------------------------------------------
 Aggregate  (cost=10014131078.70..10014131078.71 rows=1 width=8)
   ->  Nested Loop  (cost=10000000001.15..10014130901.01 rows=71076
width=0)
         ->  Index Only Scan using idx_table5_2 on table5 t1
(cost=0.57..172964.32 rows=8430637 width=4)
               Index Cond: (i < 10000000)
         ->  Index Only Scan using idx_table5_2 on table5 t2
(cost=0.57..1.65 rows=1 width=4)
               Index Cond: ((i = t1.i) AND (i < 10000000))
(6 rows)
```


sorry, it's my fault.  

FIX:  all join method don't rewrite this cond.

```
postgres=# explain select count(*) from table5 t1 join table5 t2 using (i) where t1.i<10000000;     
                                                          QUERY PLAN                                                            
------------------------------------------------------------------------------------------------------------------------------  
 Finalize Aggregate  (cost=1843938.99..1843939.00 rows=1 width=8)  
   ->  Gather  (cost=1843938.96..1843938.97 rows=8 width=8)  
         Workers Planned: 8  
         ->  Partial Aggregate  (cost=1843938.96..1843938.97 rows=1 width=8)  
               ->  Nested Loop  (cost=1.15..1841304.38 rows=1053830 width=0)  
                     ->  Parallel Index Only Scan using idx_table5_2 on table5 t1  (cost=0.57..99196.25 rows=1053830 width=4)  
                           Index Cond: (i < 10000000)  
                     ->  Index Only Scan using idx_table5_2 on table5 t2  (cost=0.57..1.64 rows=1 width=4)  
                           Index Cond: (i = t1.i)  
(9 rows)
```

------------------------------------------------------------------
发件人:PG Bug reporting form <noreply@postgresql.org>
发送时间:2019年3月17日(星期日) 18:21
收件人:pgsql-bugs <pgsql-bugs@lists.postgresql.org>
抄 送:digoal <digoal@126.com>
主 题:BUG #15699: PostgreSQL query rewrite don's use the same rewrite rule for the JOIN QUERY.

The following bug has been logged on the website:

Bug reference:      15699
Logged by:          Zhou Digoal
Email address:      digoal@126.com
PostgreSQL version: 11.2
Operating system:   centos 7.x x64
Description:        

why PostgreSQL query rewrite don's use the same rewrite rule for the JOIN
QUERY.

for exp:

1、when using hash join or merge join

query rewrite don't  add this cond:  t2.i<10000000

```
postgres=# explain select count(*) from table5 t1 join table5 t2 using (i)
where t1.i<10000000;
                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=5211686.12..5211686.13 rows=1 width=8)
   ->  Gather  (cost=5211686.06..5211686.07 rows=20 width=8)
         Workers Planned: 20
         ->  Partial Aggregate  (cost=5211686.06..5211686.07 rows=1
width=8)
               ->  Parallel Hash Join  (cost=98142.42..5210632.23
rows=421532 width=0)
                     Hash Cond: (t2.i = t1.i)
                     ->  Parallel Seq Scan on table5 t2 
(cost=0.00..4924779.03 rows=50000003 width=4)
                     ->  Parallel Hash  (cost=92873.27..92873.27 rows=421532
width=4)
                           ->  Parallel Index Only Scan using idx_table5_2
on table5 t1  (cost=0.57..92873.27 rows=421532 width=4)
                                 Index Cond: (i < 10000000)
(10 rows)

postgres=# explain select count(*) from table5 t1 join table5 t2 using (i)
where t1.i<10000000;
                                                           QUERY PLAN       
                                                    
---------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=8840398.86..8840398.87 rows=1 width=8)
   ->  Gather  (cost=8840398.80..8840398.81 rows=20 width=8)
         Workers Planned: 20
         ->  Partial Aggregate  (cost=8840398.80..8840398.81 rows=1
width=8)
               ->  Merge Join  (cost=1.15..8839344.97 rows=421532 width=0)
                     Merge Cond: (t2.i = t1.i)
                     ->  Parallel Index Only Scan using idx_table5_2 on
table5 t2  (cost=0.57..8516088.73 rows=50000003 width=4)
                     ->  Index Only Scan using idx_table5_2 on table5 t1 
(cost=0.57..172964.32 rows=8430637 width=4)
                           Index Cond: (i < 10000000)
(9 rows)
```

2、when use nestloop join ,

query rewrite do add this cond:  t2.i<10000000

```
postgres=# explain select count(*) from table5 t1 join table5 t2 on
(t1.i=t2.i and t1.i<10000000 and t2.i<10000000);
                                                QUERY PLAN                  
                             
----------------------------------------------------------------------------------------------------------
 Aggregate  (cost=10014131078.70..10014131078.71 rows=1 width=8)
   ->  Nested Loop  (cost=10000000001.15..10014130901.01 rows=71076
width=0)
         ->  Index Only Scan using idx_table5_2 on table5 t1 
(cost=0.57..172964.32 rows=8430637 width=4)
               Index Cond: (i < 10000000)
         ->  Index Only Scan using idx_table5_2 on table5 t2 
(cost=0.57..1.65 rows=1 width=4)
               Index Cond: ((i = t1.i) AND (i < 10000000))
(6 rows)
```