Thread: Wrong rows estimations with joins of CTEs slows queries by more than factor 500

Wrong rows estimations with joins of CTEs slows queries by more than factor 500

From
Hans Buschmann
Date:

During data refactoring of our Application I encountered $subject when joining 4 CTEs with left join or inner join.


1. Background

PG 15.1 on Windows x64 (OS seems no to have no meening here)


I try to collect data from 4 (analyzed) tables (up,li,in,ou) by grouping certain data (4 CTEs qup,qli,qin,qou)

The grouping of the data in the CTEs gives estimated row counts of about 1000 (1 tenth of the real value) This is OK for estimation.


These 4 CTEs are then used to combine the data by joining them.


2. Problem

The 4 CTEs are joined by left joins as shown below:


from qup
left join qli on (qli.curr_season=qup.curr_season and qli.curr_code=qup.curr_code and qli.ibitmask>0 and cardinality(qli.mat_arr) <=8)
left join qin on (qin.curr_season=qup.curr_season and qin.curr_code=qup.curr_code and qin.ibitmask>0 and cardinality(qin.mat_arr) <=8)
left join qou on (qou.curr_season=qup.curr_season and qou.curr_code=qup.curr_code and qou.ibitmask>0 and cardinality(qou.mat_arr) <=11)
where qup.ibitmask>0 and cardinality(qup.mat_arr) <=21

The plan first retrieves qup and qli, taking the estimated row counts of 1163 and 1147 respectively


BUT the result is then hashed and the row count is estimated as 33!


In a Left join the row count stays always the same as the one of left table (here qup with 1163 rows)


The same algorithm which reduces the row estimation from 1163 to 33 is used in the next step to give an estimation of 1 row.

This is totally wrong.


Here is the execution plan of the query:

(search the plan for rows=33)


                                                                    QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=13673.81..17463.30 rows=5734 width=104) (actual time=168.307..222.670 rows=9963 loops=1)
   CTE qup
     ->  GroupAggregate  (cost=5231.22..6303.78 rows=10320 width=80) (actual time=35.466..68.131 rows=10735 loops=1)
           Group Key: sa_upper.sup_season, sa_upper.sup_sa_code
           ->  Sort  (cost=5231.22..5358.64 rows=50969 width=18) (actual time=35.454..36.819 rows=50969 loops=1)
                 Sort Key: sa_upper.sup_season, sa_upper.sup_sa_code COLLATE "C"
                 Sort Method: quicksort  Memory: 4722kB
                 ->  Hash Left Join  (cost=41.71..1246.13 rows=50969 width=18) (actual time=0.148..10.687 rows=50969 loops=1)
                       Hash Cond: ((sa_upper.sup_mat_code)::text = upper_target.up_mat_code)
                       ->  Seq Scan on sa_upper  (cost=0.00..884.69 rows=50969 width=16) (actual time=0.005..1.972 rows=50969 loops=1)
                       ->  Hash  (cost=35.53..35.53 rows=495 width=6) (actual time=0.140..0.140 rows=495 loops=1)
                             Buckets: 1024  Batches: 1  Memory Usage: 27kB
                             ->  Seq Scan on upper_target  (cost=0.00..35.53 rows=495 width=6) (actual time=0.007..0.103 rows=495 loops=1)
                                   Filter: (id_up <= 495)
                                   Rows Removed by Filter: 1467
   CTE qli
     ->  GroupAggregate  (cost=1097.31..1486.56 rows=10469 width=80) (actual time=9.446..27.388 rows=10469 loops=1)
           Group Key: sa_lining.sli_season, sa_lining.sli_sa_code
           ->  Sort  (cost=1097.31..1126.74 rows=11774 width=18) (actual time=9.440..9.811 rows=11774 loops=1)
                 Sort Key: sa_lining.sli_season, sa_lining.sli_sa_code COLLATE "C"
                 Sort Method: quicksort  Memory: 1120kB
                 ->  Hash Left Join  (cost=7.34..301.19 rows=11774 width=18) (actual time=0.045..2.438 rows=11774 loops=1)
                       Hash Cond: ((sa_lining.sli_mat_code)::text = lining_target.li_mat_code)
                       ->  Seq Scan on sa_lining  (cost=0.00..204.74 rows=11774 width=16) (actual time=0.008..0.470 rows=11774 loops=1)
                       ->  Hash  (cost=5.86..5.86 rows=118 width=6) (actual time=0.034..0.034 rows=119 loops=1)
                             Buckets: 1024  Batches: 1  Memory Usage: 13kB
                             ->  Seq Scan on lining_target  (cost=0.00..5.86 rows=118 width=6) (actual time=0.008..0.024 rows=119 loops=1)
                                   Filter: (id_li <= 119)
                                   Rows Removed by Filter: 190
   CTE qin
     ->  GroupAggregate  (cost=1427.34..1880.73 rows=10678 width=80) (actual time=11.424..31.508 rows=10678 loops=1)
           Group Key: sa_insole.sin_season, sa_insole.sin_sa_code
           ->  Sort  (cost=1427.34..1465.41 rows=15230 width=18) (actual time=11.416..11.908 rows=15230 loops=1)
                 Sort Key: sa_insole.sin_season, sa_insole.sin_sa_code COLLATE "C"
                 Sort Method: quicksort  Memory: 1336kB
                 ->  Hash Left Join  (cost=10.49..369.26 rows=15230 width=18) (actual time=0.051..3.108 rows=15230 loops=1)
                       Hash Cond: ((sa_insole.sin_mat_code)::text = insole_target.in_mat_code)
                       ->  Seq Scan on sa_insole  (cost=0.00..264.30 rows=15230 width=16) (actual time=0.006..0.606 rows=15230 loops=1)
                       ->  Hash  (cost=9.01..9.01 rows=118 width=6) (actual time=0.042..0.043 rows=119 loops=1)
                             Buckets: 1024  Batches: 1  Memory Usage: 13kB
                             ->  Seq Scan on insole_target  (cost=0.00..9.01 rows=118 width=6) (actual time=0.008..0.032 rows=119 loops=1)
                                   Filter: (id_in <= 119)
                                   Rows Removed by Filter: 362
   CTE qou
     ->  GroupAggregate  (cost=2366.22..2986.89 rows=10699 width=80) (actual time=18.198..41.812 rows=10699 loops=1)
           Group Key: sa_outsole.sou_season, sa_outsole.sou_sa_code
           ->  Sort  (cost=2366.22..2428.14 rows=24768 width=18) (actual time=18.187..18.967 rows=24768 loops=1)
                 Sort Key: sa_outsole.sou_season, sa_outsole.sou_sa_code COLLATE "C"
                 Sort Method: quicksort  Memory: 2317kB
                 ->  Hash Left Join  (cost=5.39..558.63 rows=24768 width=18) (actual time=0.046..5.132 rows=24768 loops=1)
                       Hash Cond: ((sa_outsole.sou_mat_code)::text = outsole_target.ou_mat_code)
                       ->  Seq Scan on sa_outsole  (cost=0.00..430.68 rows=24768 width=16) (actual time=0.010..1.015 rows=24768 loops=1)
                       ->  Hash  (cost=5.03..5.03 rows=29 width=6) (actual time=0.032..0.032 rows=29 loops=1)
                             Buckets: 1024  Batches: 1  Memory Usage: 10kB
                             ->  Seq Scan on outsole_target  (cost=0.00..5.03 rows=29 width=6) (actual time=0.010..0.025 rows=29 loops=1)
                                   Filter: (id_ou <= 29)
                                   Rows Removed by Filter: 213
   ->  Hash Join  (cost=1015.85..1319.50 rows=1 width=104) (actual time=168.307..215.513 rows=8548 loops=1)
         Hash Cond: ((qou.curr_season = qli.curr_season) AND ((qou.curr_code)::text = (qli.curr_code)::text))
         Join Filter: ((((qup.ibitmask | qin.ibitmask) | qli.ibitmask) | qou.ibitmask) IS NOT NULL)
         ->  CTE Scan on qou  (cost=0.00..294.22 rows=1189 width=76) (actual time=18.200..45.188 rows=10275 loops=1)
               Filter: ((ibitmask > 0) AND (cardinality(mat_arr) <= 11))
               Rows Removed by Filter: 424
         ->  Hash  (cost=1015.83..1015.83 rows=1 width=228) (actual time=150.094..150.095 rows=8845 loops=1)
               Buckets: 16384 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 1899kB
               ->  Hash Join  (cost=707.35..1015.83 rows=1 width=228) (actual time=121.898..147.726 rows=8845 loops=1)
                     Hash Cond: ((qin.curr_season = qli.curr_season) AND ((qin.curr_code)::text = (qli.curr_code)::text))
                     ->  CTE Scan on qin  (cost=0.00..293.65 rows=1186 width=76) (actual time=11.425..34.674 rows=10197 loops=1)
                           Filter: ((ibitmask > 0) AND (cardinality(mat_arr) <= 8))
                           Rows Removed by Filter: 481
                     ->  Hash  (cost=706.86..706.86 rows=33 width=152) (actual time=110.470..110.470 rows=9007 loops=1)
                           Buckets: 16384 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 1473kB
                           ->  Merge Join  (cost=689.20..706.86 rows=33 width=152) (actual time=105.862..108.925 rows=9007 loops=1)
                                 Merge Cond: ((qup.curr_season = qli.curr_season) AND ((qup.curr_code)::text = (qli.curr_code)::text))
                                 ->  Sort  (cost=342.09..344.96 rows=1147 width=76) (actual time=73.419..73.653 rows=9320 loops=1)
                                       Sort Key: qup.curr_season, qup.curr_code COLLATE "C"
                                       Sort Method: quicksort  Memory: 1391kB
                                       ->  CTE Scan on qup  (cost=0.00..283.80 rows=1147 width=76) (actual time=35.467..71.904 rows=9320 loops=1)
                                             Filter: ((ibitmask > 0) AND (cardinality(mat_arr) <= 21))
                                             Rows Removed by Filter: 1415
                                 ->  Sort  (cost=347.12..350.02 rows=1163 width=76) (actual time=32.440..32.697 rows=10289 loops=1)
                                       Sort Key: qli.curr_season, qli.curr_code COLLATE "C"
                                       Sort Method: quicksort  Memory: 1349kB
                                       ->  CTE Scan on qli  (cost=0.00..287.90 rows=1163 width=76) (actual time=9.447..30.666 rows=10289 loops=1)
                                             Filter: ((ibitmask > 0) AND (cardinality(mat_arr) <= 8))
                                             Rows Removed by Filter: 180
   ->  Merge Left Join  (cost=2625.49..3399.84 rows=5733 width=104) (actual time=4.597..6.700 rows=1415 loops=1)
         Merge Cond: ((qup_1.curr_season = qou_1.curr_season) AND ((qup_1.curr_code)::text = (qou_1.curr_code)::text))
         ->  Merge Left Join  (cost=1958.66..2135.28 rows=5733 width=136) (actual time=3.427..3.863 rows=1415 loops=1)
               Merge Cond: ((qup_1.curr_season = qin_1.curr_season) AND ((qup_1.curr_code)::text = (qin_1.curr_code)::text))
               ->  Merge Left Join  (cost=1293.25..1388.21 rows=5733 width=104) (actual time=2.321..2.556 rows=1415 loops=1)
                     Merge Cond: ((qup_1.curr_season = qli_1.curr_season) AND ((qup_1.curr_code)::text = (qli_1.curr_code)::text))
                     ->  Sort  (cost=641.68..656.02 rows=5733 width=72) (actual time=1.286..1.324 rows=1415 loops=1)
                           Sort Key: qup_1.curr_season, qup_1.curr_code COLLATE "C"
                           Sort Method: quicksort  Memory: 204kB
                           ->  CTE Scan on qup qup_1  (cost=0.00..283.80 rows=5733 width=72) (actual time=0.009..1.093 rows=1415 loops=1)
                                 Filter: ((ibitmask < 0) OR (cardinality(mat_arr) > 21))
                                 Rows Removed by Filter: 9320
                     ->  Sort  (cost=651.57..666.11 rows=5816 width=72) (actual time=1.033..1.038 rows=180 loops=1)
                           Sort Key: qli_1.curr_season, qli_1.curr_code COLLATE "C"
                           Sort Method: quicksort  Memory: 41kB
                           ->  CTE Scan on qli qli_1  (cost=0.00..287.90 rows=5816 width=72) (actual time=0.055..1.007 rows=180 loops=1)
                                 Filter: ((ibitmask < 0) OR (cardinality(mat_arr) > 8))
                                 Rows Removed by Filter: 10289
               ->  Sort  (cost=665.41..680.24 rows=5932 width=72) (actual time=1.104..1.117 rows=481 loops=1)
                     Sort Key: qin_1.curr_season, qin_1.curr_code COLLATE "C"
                     Sort Method: quicksort  Memory: 68kB
                     ->  CTE Scan on qin qin_1  (cost=0.00..293.65 rows=5932 width=72) (actual time=0.016..1.038 rows=481 loops=1)
                           Filter: ((ibitmask < 0) OR (cardinality(mat_arr) > 8))
                           Rows Removed by Filter: 10197
         ->  Sort  (cost=666.83..681.69 rows=5944 width=72) (actual time=1.163..1.174 rows=417 loops=1)
               Sort Key: qou_1.curr_season, qou_1.curr_code COLLATE "C"
               Sort Method: quicksort  Memory: 68kB
               ->  CTE Scan on qou qou_1  (cost=0.00..294.22 rows=5944 width=72) (actual time=0.029..1.068 rows=424 loops=1)
                     Filter: ((ibitmask < 0) OR (cardinality(mat_arr) > 11))
                     Rows Removed by Filter: 10275
 Planning Time: 2.297 ms
 Execution Time: 224.759 ms
(118 Zeilen)

3. Slow query from wrong plan as result on similar case with inner join

When the 3 left joins above are changed to inner joins like:

from qup
join qli on (qli.curr_season=qup.curr_season and qli.curr_code=qup.curr_code and qli.ibitmask>0 and cardinality(qli.mat_arr) <=8)
join qin on (qin.curr_season=qup.curr_season and qin.curr_code=qup.curr_code and qin.ibitmask>0 and cardinality(qin.mat_arr) <=8)
join qou on (qou.curr_season=qup.curr_season and qou.curr_code=qup.curr_code and qou.ibitmask>0 and cardinality(qou.mat_arr) <=11)
where qup.ibitmask>0 and cardinality(qup.mat_arr) <=21

The same rows estimation takes place as with the left joins, but the planner now decides to use a nested loop for the last join, which results in a 500fold execution time:

                                                                 QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=13365.31..17472.18 rows=5734 width=104) (actual time=139.037..13403.310 rows=9963 loops=1)
   CTE qup
     ->  GroupAggregate  (cost=5231.22..6303.78 rows=10320 width=80) (actual time=35.399..67.102 rows=10735 loops=1)
           Group Key: sa_upper.sup_season, sa_upper.sup_sa_code
           ->  Sort  (cost=5231.22..5358.64 rows=50969 width=18) (actual time=35.382..36.743 rows=50969 loops=1)
                 Sort Key: sa_upper.sup_season, sa_upper.sup_sa_code COLLATE "C"
                 Sort Method: quicksort  Memory: 4722kB
                 ->  Hash Left Join  (cost=41.71..1246.13 rows=50969 width=18) (actual time=0.157..10.715 rows=50969 loops=1)
                       Hash Cond: ((sa_upper.sup_mat_code)::text = upper_target.up_mat_code)
                       ->  Seq Scan on sa_upper  (cost=0.00..884.69 rows=50969 width=16) (actual time=0.008..2.001 rows=50969 loops=1)
                       ->  Hash  (cost=35.53..35.53 rows=495 width=6) (actual time=0.146..0.146 rows=495 loops=1)
                             Buckets: 1024  Batches: 1  Memory Usage: 27kB
                             ->  Seq Scan on upper_target  (cost=0.00..35.53 rows=495 width=6) (actual time=0.006..0.105 rows=495 loops=1)
                                   Filter: (id_up <= 495)
                                   Rows Removed by Filter: 1467
   CTE qli
     ->  GroupAggregate  (cost=1097.31..1486.56 rows=10469 width=80) (actual time=9.541..27.419 rows=10469 loops=1)
           Group Key: sa_lining.sli_season, sa_lining.sli_sa_code
           ->  Sort  (cost=1097.31..1126.74 rows=11774 width=18) (actual time=9.534..9.908 rows=11774 loops=1)
                 Sort Key: sa_lining.sli_season, sa_lining.sli_sa_code COLLATE "C"
                 Sort Method: quicksort  Memory: 1120kB
                 ->  Hash Left Join  (cost=7.34..301.19 rows=11774 width=18) (actual time=0.049..2.451 rows=11774 loops=1)
                       Hash Cond: ((sa_lining.sli_mat_code)::text = lining_target.li_mat_code)
                       ->  Seq Scan on sa_lining  (cost=0.00..204.74 rows=11774 width=16) (actual time=0.010..0.462 rows=11774 loops=1)
                       ->  Hash  (cost=5.86..5.86 rows=118 width=6) (actual time=0.035..0.035 rows=119 loops=1)
                             Buckets: 1024  Batches: 1  Memory Usage: 13kB
                             ->  Seq Scan on lining_target  (cost=0.00..5.86 rows=118 width=6) (actual time=0.008..0.025 rows=119 loops=1)
                                   Filter: (id_li <= 119)
                                   Rows Removed by Filter: 190
   CTE qin
     ->  GroupAggregate  (cost=1427.34..1880.73 rows=10678 width=80) (actual time=11.649..30.910 rows=10678 loops=1)
           Group Key: sa_insole.sin_season, sa_insole.sin_sa_code
           ->  Sort  (cost=1427.34..1465.41 rows=15230 width=18) (actual time=11.642..12.115 rows=15230 loops=1)
                 Sort Key: sa_insole.sin_season, sa_insole.sin_sa_code COLLATE "C"
                 Sort Method: quicksort  Memory: 1336kB
                 ->  Hash Left Join  (cost=10.49..369.26 rows=15230 width=18) (actual time=0.056..3.144 rows=15230 loops=1)
                       Hash Cond: ((sa_insole.sin_mat_code)::text = insole_target.in_mat_code)
                       ->  Seq Scan on sa_insole  (cost=0.00..264.30 rows=15230 width=16) (actual time=0.008..0.594 rows=15230 loops=1)
                       ->  Hash  (cost=9.01..9.01 rows=118 width=6) (actual time=0.045..0.046 rows=119 loops=1)
                             Buckets: 1024  Batches: 1  Memory Usage: 13kB
                             ->  Seq Scan on insole_target  (cost=0.00..9.01 rows=118 width=6) (actual time=0.008..0.034 rows=119 loops=1)
                                   Filter: (id_in <= 119)
                                   Rows Removed by Filter: 362
   CTE qou
     ->  GroupAggregate  (cost=2366.22..2986.89 rows=10699 width=80) (actual time=18.163..51.151 rows=10699 loops=1)
           Group Key: sa_outsole.sou_season, sa_outsole.sou_sa_code
           ->  Sort  (cost=2366.22..2428.14 rows=24768 width=18) (actual time=18.150..20.000 rows=24768 loops=1)
                 Sort Key: sa_outsole.sou_season, sa_outsole.sou_sa_code COLLATE "C"
                 Sort Method: quicksort  Memory: 2317kB
                 ->  Hash Left Join  (cost=5.39..558.63 rows=24768 width=18) (actual time=0.036..5.106 rows=24768 loops=1)
                       Hash Cond: ((sa_outsole.sou_mat_code)::text = outsole_target.ou_mat_code)
                       ->  Seq Scan on sa_outsole  (cost=0.00..430.68 rows=24768 width=16) (actual time=0.008..1.005 rows=24768 loops=1)
                       ->  Hash  (cost=5.03..5.03 rows=29 width=6) (actual time=0.024..0.024 rows=29 loops=1)
                             Buckets: 1024  Batches: 1  Memory Usage: 10kB
                             ->  Seq Scan on outsole_target  (cost=0.00..5.03 rows=29 width=6) (actual time=0.007..0.018 rows=29 loops=1)
                                   Filter: (id_ou <= 29)
                                   Rows Removed by Filter: 213
   ->  Nested Loop  (cost=707.35..1328.37 rows=1 width=104) (actual time=139.036..13395.820 rows=8548 loops=1)
         Join Filter: ((qli.curr_season = qin.curr_season) AND ((qli.curr_code)::text = (qin.curr_code)::text))
         Rows Removed by Join Filter: 88552397
         ->  Hash Join  (cost=707.35..1016.45 rows=1 width=216) (actual time=127.374..168.249 rows=8685 loops=1)
               Hash Cond: ((qou.curr_season = qli.curr_season) AND ((qou.curr_code)::text = (qli.curr_code)::text))
               ->  CTE Scan on qou  (cost=0.00..294.22 rows=1189 width=72) (actual time=18.165..54.968 rows=10275 loops=1)
                     Filter: ((ibitmask > 0) AND (cardinality(mat_arr) <= 11))
                     Rows Removed by Filter: 424
               ->  Hash  (cost=706.86..706.86 rows=33 width=144) (actual time=109.205..109.207 rows=9007 loops=1)
                     Buckets: 16384 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 1369kB
                     ->  Merge Join  (cost=689.20..706.86 rows=33 width=144) (actual time=104.785..107.748 rows=9007 loops=1)
                           Merge Cond: ((qup.curr_season = qli.curr_season) AND ((qup.curr_code)::text = (qli.curr_code)::text))
                           ->  Sort  (cost=342.09..344.96 rows=1147 width=72) (actual time=72.320..72.559 rows=9320 loops=1)
                                 Sort Key: qup.curr_season, qup.curr_code COLLATE "C"
                                 Sort Method: quicksort  Memory: 1357kB
                                 ->  CTE Scan on qup  (cost=0.00..283.80 rows=1147 width=72) (actual time=35.401..70.834 rows=9320 loops=1)
                                       Filter: ((ibitmask > 0) AND (cardinality(mat_arr) <= 21))
                                       Rows Removed by Filter: 1415
                           ->  Sort  (cost=347.12..350.02 rows=1163 width=72) (actual time=32.461..32.719 rows=10289 loops=1)
                                 Sort Key: qli.curr_season, qli.curr_code COLLATE "C"
                                 Sort Method: quicksort  Memory: 1269kB
                                 ->  CTE Scan on qli  (cost=0.00..287.90 rows=1163 width=72) (actual time=9.543..30.696 rows=10289 loops=1)
                                       Filter: ((ibitmask > 0) AND (cardinality(mat_arr) <= 8))
                                       Rows Removed by Filter: 180
         ->  CTE Scan on qin  (cost=0.00..293.65 rows=1186 width=72) (actual time=0.001..1.159 rows=10197 loops=8685)
               Filter: ((ibitmask > 0) AND (cardinality(mat_arr) <= 8))
               Rows Removed by Filter: 481
   ->  Merge Left Join  (cost=2625.49..3399.84 rows=5733 width=104) (actual time=4.606..6.733 rows=1415 loops=1)
         Merge Cond: ((qup_1.curr_season = qou_1.curr_season) AND ((qup_1.curr_code)::text = (qou_1.curr_code)::text))
         ->  Merge Left Join  (cost=1958.66..2135.28 rows=5733 width=136) (actual time=3.479..3.930 rows=1415 loops=1)
               Merge Cond: ((qup_1.curr_season = qin_1.curr_season) AND ((qup_1.curr_code)::text = (qin_1.curr_code)::text))
               ->  Merge Left Join  (cost=1293.25..1388.21 rows=5733 width=104) (actual time=2.368..2.610 rows=1415 loops=1)
                     Merge Cond: ((qup_1.curr_season = qli_1.curr_season) AND ((qup_1.curr_code)::text = (qli_1.curr_code)::text))
                     ->  Sort  (cost=641.68..656.02 rows=5733 width=72) (actual time=1.296..1.335 rows=1415 loops=1)
                           Sort Key: qup_1.curr_season, qup_1.curr_code COLLATE "C"
                           Sort Method: quicksort  Memory: 204kB
                           ->  CTE Scan on qup qup_1  (cost=0.00..283.80 rows=5733 width=72) (actual time=0.010..1.119 rows=1415 loops=1)
                                 Filter: ((ibitmask < 0) OR (cardinality(mat_arr) > 21))
                                 Rows Removed by Filter: 9320
                     ->  Sort  (cost=651.57..666.11 rows=5816 width=72) (actual time=1.069..1.075 rows=180 loops=1)
                           Sort Key: qli_1.curr_season, qli_1.curr_code COLLATE "C"
                           Sort Method: quicksort  Memory: 41kB
                           ->  CTE Scan on qli qli_1  (cost=0.00..287.90 rows=5816 width=72) (actual time=0.057..1.026 rows=180 loops=1)
                                 Filter: ((ibitmask < 0) OR (cardinality(mat_arr) > 8))
                                 Rows Removed by Filter: 10289
               ->  Sort  (cost=665.41..680.24 rows=5932 width=72) (actual time=1.110..1.124 rows=481 loops=1)
                     Sort Key: qin_1.curr_season, qin_1.curr_code COLLATE "C"
                     Sort Method: quicksort  Memory: 68kB
                     ->  CTE Scan on qin qin_1  (cost=0.00..293.65 rows=5932 width=72) (actual time=0.016..1.046 rows=481 loops=1)
                           Filter: ((ibitmask < 0) OR (cardinality(mat_arr) > 8))
                           Rows Removed by Filter: 10197
         ->  Sort  (cost=666.83..681.69 rows=5944 width=72) (actual time=1.119..1.128 rows=417 loops=1)
               Sort Key: qou_1.curr_season, qou_1.curr_code COLLATE "C"
               Sort Method: quicksort  Memory: 68kB
               ->  CTE Scan on qou qou_1  (cost=0.00..294.22 rows=5944 width=72) (actual time=0.029..1.056 rows=424 loops=1)
                     Filter: ((ibitmask < 0) OR (cardinality(mat_arr) > 11))
                     Rows Removed by Filter: 10275
 Planning Time: 1.746 ms
 Execution Time: 13405.503 ms
(116 Zeilen)

This case really brought me to detect the problem!

The original query and data are not shown here, but the principle should be clear from the execution plans.

I think the planner shouldn't change the row estimations on further steps after left joins at all, and be a bit more conservative on inner joins.
This may be related to the fact that this case has 2 join-conditions (xx_season an xx_code).

Thanks for looking

Hans Buschmann




On 2/8/23 14:55, Hans Buschmann wrote:
> During data refactoring of our Application I encountered $subject when
> joining 4 CTEs with left join or inner join.
> 
> 
> 1. Background
> 
> PG 15.1 on Windows x64 (OS seems no to have no meening here)
> 
> 
> I try to collect data from 4 (analyzed) tables (up,li,in,ou) by grouping
> certain data (4 CTEs qup,qli,qin,qou)
> 
> The grouping of the data in the CTEs gives estimated row counts of about
> 1000 (1 tenth of the real value) This is OK for estimation.
> 
> 
> These 4 CTEs are then used to combine the data by joining them.
> 
> 
> 2. Problem
> 
> The 4 CTEs are joined by left joins as shown below:
>
...
> 
> This case really brought me to detect the problem!
> 
> The original query and data are not shown here, but the principle should
> be clear from the execution plans.
> 
> I think the planner shouldn't change the row estimations on further
> steps after left joins at all, and be a bit more conservative on inner
> joins.

But the code should alredy do exactly that, see:


https://github.com/postgres/postgres/blob/dbe8a1726cfd5a09cf1ef99e76f5f89e2efada71/src/backend/optimizer/path/costsize.c#L5212

And in fact, the second part of the plains shows it's doing the trick:

 ->  Merge Left Join  (cost=1293.25..1388.21 rows=5733 width=104)
(actual time=2.321..2.556 rows=1415 loops=1)
     Merge Cond: ((qup_1.curr_season = qli_1.curr_season) AND
                 ((qup_1.curr_code)::text = (qli_1.curr_code)::text))
     ->  Sort  (cost=641.68..656.02 rows=5733 width=72)
     ->  Sort  (cost=651.57..666.11 rows=5816 width=72)

But notice the first join (with rows=33) doesn't say "Left". And I see
there's Append on top, so presumably the query is much more complex, and
there's a regular join of these CTEs in some other part.

We'll need to se the whole query, not just one chunk of it.

FWIW it seems you're using materialized CTEs - that's likely pretty bad
for the estimates, because we don't propagate statistics from the CTE.
So a join on CTEs can't see statistics from the underlying tables, and
that can easily produce really bad estimates.

I'm assuming you're not using AS MATERIALIZED explicitly, so I'd bet
this happens because the "cardinality" function is marked as volatile.
Perhaps it can be redefined as stable/immutable.

> This may be related to the fact that this case has 2 join-conditions
> (xx_season an xx_code).

That shouldn't affect outer join estimates this way (but as I explained
above, the join does not seem to be "left" per the explain).
Multi-column joins can cause issues, no doubt about it - but CTEs make
it worse because we can't e.g. see foreign keys.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



AW: Wrong rows estimations with joins of CTEs slows queries by more than factor 500

From
Hans Buschmann
Date:

Hello Tomas,


Thank you for looking at.
First, I miscalculated the factor which should be about 50, not 500. Sorry.


Then I want to show you the table definitions (simple, very similar, ommited child_tables and additional indexes, here using always "ONLY"):

cpsdb_matcol=# \d sa_upper;
                                       Tabelle ╗public.sa_upper½
    Spalte    |          Typ          | Sortierfolge | NULL erlaubt? |           Vorgabewert
--------------+-----------------------+--------------+---------------+----------------------------------
 id_sup       | integer               |              | not null      | generated by default as identity
 sup_season   | smallint              |              |               |
 sup_sa_code  | character varying(10) | C            |               |
 sup_mat_code | character varying(4)  | C            |               |
 sup_clr_code | character varying(3)  | C            |               |
Indexe:
    "sa_upper_active_pkey" PRIMARY KEY, btree (id_sup)
 

cpsdb_matcol=# \d sa_lining+;
                                       Tabelle ╗public.sa_lining½
    Spalte    |          Typ          | Sortierfolge | NULL erlaubt? |           Vorgabewert
--------------+-----------------------+--------------+---------------+----------------------------------
 id_sli       | integer               |              | not null      | generated by default as identity
 sli_season   | smallint              |              |               |
 sli_sa_code  | character varying(10) | C            |               |
 sli_mat_code | character varying(4)  | C            |               |
 sli_clr_code | character varying(3)  | C            |               |
Indexe:
    "sa_lining_active_pkey" PRIMARY KEY, btree (id_sli)
 

cpsdb_matcol=# \d sa_insole;
                                       Tabelle ╗public.sa_insole½
    Spalte    |          Typ          | Sortierfolge | NULL erlaubt? |           Vorgabewert
--------------+-----------------------+--------------+---------------+----------------------------------
 id_sin       | integer               |              | not null      | generated by default as identity
 sin_season   | smallint              |              |               |
 sin_sa_code  | character varying(10) | C            |               |
 sin_mat_code | character varying(4)  | C            |               |
 sin_clr_code | character varying(3)  | C            |               |
Indexe:
    "sa_insole_active_pkey" PRIMARY KEY, btree (id_sin)
 

cpsdb_matcol=# \d sa_outsole;
                                      Tabelle ╗public.sa_outsole½
    Spalte    |          Typ          | Sortierfolge | NULL erlaubt? |           Vorgabewert
--------------+-----------------------+--------------+---------------+----------------------------------
 id_sou       | integer               |              | not null      | generated by default as identity
 sou_season   | smallint              |              |               |
 sou_sa_code  | character varying(10) | C            |               |
 sou_mat_code | character varying(4)  | C            |               |
 sou_clr_code | character varying(3)  | C            |               |
Indexe:
    "sa_outsole_active_pkey" PRIMARY KEY, btree (id_sou)
 
The xxx_target tables are very similiar, here the upper one as an example:
They are count_aggregates of the whole dataset, where up_mat_code=sup_mat_code etc.

cpsdb_matcol=# \d upper_target
                    Tabelle ╗admin.upper_target½
   Spalte    |   Typ    | Sortierfolge | NULL erlaubt? | Vorgabewert
-------------+----------+--------------+---------------+-------------
 id_up       | smallint |              |               |
 nup         | integer  |              |               |
 up_mat_code | text     | C            |               |



I have reworked the two queries to show their complete explain plans:

1. query with left join in the qupd CTE:

\set only 'ONLY'

cpsdb_matcol=# explain analyze -- explain analyze verbose -- explain -- select * from ( -- select count(*) from ( -- select length(sel) from (
cpsdb_matcol-# with
cpsdb_matcol-# qup as (
cpsdb_matcol(# select
cpsdb_matcol(#  curr_season -- all xxx_seasosn are always smallint
cpsdb_matcol(# ,curr_code-- all xx_code are always varchar(10)
cpsdb_matcol(# ,array_agg(id_up order by id_up)||array_fill(0::smallint,array[10]) as mat_arr
cpsdb_matcol(# ,array_agg(curr_mat_code order by id_up) as matcode_arr
cpsdb_matcol(# ,bit_or(imask) as ibitmask
cpsdb_matcol(# from(
cpsdb_matcol(# select
cpsdb_matcol(#  sup_season as curr_season
cpsdb_matcol(# ,sup_sa_code as curr_code
cpsdb_matcol(# ,sup_mat_code as curr_mat_code
cpsdb_matcol(# ,sup_clr_code as curr_clr_code
cpsdb_matcol(# ,id_up
cpsdb_matcol(# ,coalesce(id_up,-1) as imask
cpsdb_matcol(# from :only sa_upper
cpsdb_matcol(# left join upper_target on up_mat_code=sup_mat_code and id_up <= (512-1-16)
cpsdb_matcol(# )qr
cpsdb_matcol(# group by 1,2
cpsdb_matcol(# )
cpsdb_matcol-# ,qli as (
cpsdb_matcol(# select
cpsdb_matcol(#  curr_season
cpsdb_matcol(# ,curr_code
cpsdb_matcol(# ,array_agg(id_li order by id_li)||array_fill(0::smallint,array[4]) as mat_arr
cpsdb_matcol(# ,array_agg(curr_mat_code order by id_li) as matcode_arr
cpsdb_matcol(# ,bit_or(imask) as ibitmask
cpsdb_matcol(# from(
cpsdb_matcol(# select
cpsdb_matcol(#  sli_season as curr_season
cpsdb_matcol(# ,sli_sa_code as curr_code
cpsdb_matcol(# ,sli_mat_code as curr_mat_code
cpsdb_matcol(# ,sli_clr_code as curr_clr_code
cpsdb_matcol(# ,id_li
cpsdb_matcol(# ,coalesce(id_li,-1) as imask
cpsdb_matcol(# from :only sa_lining
cpsdb_matcol(# left join lining_target on li_mat_code=sli_mat_code and id_li <= (128-1-8)
cpsdb_matcol(# )qr
cpsdb_matcol(# group by 1,2
cpsdb_matcol(# )
cpsdb_matcol-# ,qin as (
cpsdb_matcol(# select
cpsdb_matcol(#  curr_season
cpsdb_matcol(# ,curr_code
cpsdb_matcol(# ,array_agg(id_in order by id_in)||array_fill(0::smallint,array[4]) as mat_arr
cpsdb_matcol(# ,array_agg(curr_mat_code order by id_in) as matcode_arr
cpsdb_matcol(# ,bit_or(imask) as ibitmask
cpsdb_matcol(# from(
cpsdb_matcol(# select
cpsdb_matcol(#  sin_season as curr_season
cpsdb_matcol(# ,sin_sa_code as curr_code
cpsdb_matcol(# ,sin_mat_code as curr_mat_code
cpsdb_matcol(# ,sin_clr_code as curr_clr_code
cpsdb_matcol(# ,id_in
cpsdb_matcol(# ,coalesce(id_in,-1) as imask
cpsdb_matcol(# from :only sa_insole
cpsdb_matcol(# left join insole_target on in_mat_code=sin_mat_code and id_in <= (128-1-8)
cpsdb_matcol(# )qr
cpsdb_matcol(# group by 1,2
cpsdb_matcol(# )
cpsdb_matcol-# ,qou as (
cpsdb_matcol(# select
cpsdb_matcol(#  curr_season
cpsdb_matcol(# ,curr_code
cpsdb_matcol(# ,array_agg(id_ou order by id_ou)||array_fill(0::smallint,array[6]) as mat_arr
cpsdb_matcol(# ,array_agg(curr_mat_code order by id_ou) as matcode_arr
cpsdb_matcol(# ,bit_or(imask) as ibitmask
cpsdb_matcol(# from(
cpsdb_matcol(# select
cpsdb_matcol(#  sou_season as curr_season
cpsdb_matcol(# ,sou_sa_code as curr_code
cpsdb_matcol(# ,sou_mat_code as curr_mat_code
cpsdb_matcol(# ,sou_clr_code as curr_clr_code
cpsdb_matcol(# ,id_ou
cpsdb_matcol(# ,coalesce(id_ou,-1) as imask
cpsdb_matcol(# from :only sa_outsole
cpsdb_matcol(# left join outsole_target on ou_mat_code=sou_mat_code and id_ou <= (32-1-2)
cpsdb_matcol(# )qr
cpsdb_matcol(# group by 1,2
cpsdb_matcol(# )
cpsdb_matcol-# ,qupd as (
cpsdb_matcol(# select * from (
cpsdb_matcol(# select
cpsdb_matcol(#  qup.curr_season
cpsdb_matcol(# ,qup.curr_code
cpsdb_matcol(# ,qup.ibitmask|qin.ibitmask|qli.ibitmask|qou.ibitmask as ibitmask
cpsdb_matcol(# -- the calculations of new_mat_x are simplified here
cpsdb_matcol(# -- in the production version they are a more complex combination of bit masks, bit shifts and bit or of different elements of the arrays
cpsdb_matcol(# ,(qup.mat_arr[1]|qli.mat_arr[1]|qin.mat_arr[1]|qou.mat_arr[1])::bigint as new_mat_1
cpsdb_matcol(#
cpsdb_matcol(# ,(qup.mat_arr[2]|qli.mat_arr[2]|qin.mat_arr[2]|qou.mat_arr[2])::bigint as new_mat_2
cpsdb_matcol(#
cpsdb_matcol(# ,(qup.mat_arr[3]|qli.mat_arr[3]|qin.mat_arr[3]|qou.mat_arr[3])::bigint as new_mat_3
cpsdb_matcol(#
cpsdb_matcol(# from qup
cpsdb_matcol(# left join qli on (qli.curr_season=qup.curr_season and qli.curr_code=qup.curr_code and qli.ibitmask>0 and cardinality(qli.mat_arr) <=8)
cpsdb_matcol(# left join qin on (qin.curr_season=qup.curr_season and qin.curr_code=qup.curr_code and qin.ibitmask>0 and cardinality(qin.mat_arr) <=8)
cpsdb_matcol(# left join qou on (qou.curr_season=qup.curr_season and qou.curr_code=qup.curr_code and qou.ibitmask>0 and cardinality(qou.mat_arr) <=11)
cpsdb_matcol(# where qup.ibitmask>0 and cardinality(qup.mat_arr) <=21
cpsdb_matcol(# )qj
cpsdb_matcol(# where ibitmask is not null
cpsdb_matcol(# )
cpsdb_matcol-# ,qupda as (
cpsdb_matcol(# select
cpsdb_matcol(#  qup.curr_season
cpsdb_matcol(# ,qup.curr_code
cpsdb_matcol(# ,repeat('0',64)||
cpsdb_matcol(# repeat('11',coalesce(cardinality(qou.matcode_arr),0))||repeat('10',coalesce(cardinality(qin.matcode_arr),0))||
cpsdb_matcol(# repeat('01',coalesce(cardinality(qou.matcode_arr),0))||repeat('00',coalesce(cardinality(qup.matcode_arr),0))||
cpsdb_matcol(# '00' as curr_mattype_bitmask
cpsdb_matcol(# ,qup.matcode_arr||qli.matcode_arr||qin.matcode_arr||qou.matcode_arr as curr_matcode_arr
cpsdb_matcol(# from qup
cpsdb_matcol(# left join qli on qli.curr_season=qup.curr_season and qli.curr_code=qup.curr_code and (qli.ibitmask<0 or cardinality(qli.mat_arr) >8)
cpsdb_matcol(# left join qin on qin.curr_season=qup.curr_season and qin.curr_code=qup.curr_code and (qin.ibitmask<0 or cardinality(qin.mat_arr) >8)
cpsdb_matcol(# left join qou on qou.curr_season=qup.curr_season and qou.curr_code=qup.curr_code and (qou.ibitmask<0 or cardinality(qou.mat_arr) >11)
cpsdb_matcol(# where qup.ibitmask<0 or cardinality(qup.mat_arr) >21
cpsdb_matcol(# )
cpsdb_matcol-# select
cpsdb_matcol-#  curr_season
cpsdb_matcol-# ,curr_code
cpsdb_matcol-# ,new_mat_1
cpsdb_matcol-# ,new_mat_2
cpsdb_matcol-# ,new_mat_3
cpsdb_matcol-# ,NULL::bigint as new_mattype_bitmask
cpsdb_matcol-# ,NULL as new_mat_codes
cpsdb_matcol-# from qupd
cpsdb_matcol-# union all
cpsdb_matcol-# select
cpsdb_matcol-#  curr_season
cpsdb_matcol-# ,curr_code
cpsdb_matcol-# ,NULL::bigint as new_mat_1
cpsdb_matcol-# ,NULL::bigint as new_mat_2
cpsdb_matcol-# ,NULL::bigint as new_mat_3
cpsdb_matcol-# ,substr(curr_mattype_bitmask,length(curr_mattype_bitmask)-63)::bit(64)::bigint as new_mattype_bitmask
cpsdb_matcol-# ,curr_matcode_arr as new_mat_codes
cpsdb_matcol-# from qupda
cpsdb_matcol-# ;
                                                                    QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=13673.81..17462.84 rows=5734 width=104) (actual time=169.382..210.799 rows=9963 loops=1)
   CTE qup
     ->  GroupAggregate  (cost=5231.22..6303.78 rows=10320 width=80) (actual time=35.064..68.308 rows=10735 loops=1)
           Group Key: sa_upper.sup_season, sa_upper.sup_sa_code
           ->  Sort  (cost=5231.22..5358.64 rows=50969 width=18) (actual time=35.053..36.412 rows=50969 loops=1)
                 Sort Key: sa_upper.sup_season, sa_upper.sup_sa_code COLLATE "C"
                 Sort Method: quicksort  Memory: 4722kB
                 ->  Hash Left Join  (cost=41.71..1246.13 rows=50969 width=18) (actual time=0.165..10.562 rows=50969 loops=1)
                       Hash Cond: ((sa_upper.sup_mat_code)::text = upper_target.up_mat_code)
                       ->  Seq Scan on sa_upper  (cost=0.00..884.69 rows=50969 width=16) (actual time=0.006..1.990 rows=50969 loops=1)
                       ->  Hash  (cost=35.53..35.53 rows=495 width=6) (actual time=0.157..0.157 rows=495 loops=1)
                             Buckets: 1024  Batches: 1  Memory Usage: 27kB
                             ->  Seq Scan on upper_target  (cost=0.00..35.53 rows=495 width=6) (actual time=0.006..0.115 rows=495 loops=1)
                                   Filter: (id_up <= 495)
                                   Rows Removed by Filter: 1467
   CTE qli
     ->  GroupAggregate  (cost=1097.31..1486.56 rows=10469 width=80) (actual time=9.354..28.199 rows=10469 loops=1)
           Group Key: sa_lining.sli_season, sa_lining.sli_sa_code
           ->  Sort  (cost=1097.31..1126.74 rows=11774 width=18) (actual time=9.347..9.711 rows=11774 loops=1)
                 Sort Key: sa_lining.sli_season, sa_lining.sli_sa_code COLLATE "C"
                 Sort Method: quicksort  Memory: 1120kB
                 ->  Hash Left Join  (cost=7.34..301.19 rows=11774 width=18) (actual time=0.049..2.397 rows=11774 loops=1)
                       Hash Cond: ((sa_lining.sli_mat_code)::text = lining_target.li_mat_code)
                       ->  Seq Scan on sa_lining  (cost=0.00..204.74 rows=11774 width=16) (actual time=0.009..0.469 rows=11774 loops=1)
                       ->  Hash  (cost=5.86..5.86 rows=118 width=6) (actual time=0.037..0.037 rows=119 loops=1)
                             Buckets: 1024  Batches: 1  Memory Usage: 13kB
                             ->  Seq Scan on lining_target  (cost=0.00..5.86 rows=118 width=6) (actual time=0.008..0.025 rows=119 loops=1)
                                   Filter: (id_li <= 119)
                                   Rows Removed by Filter: 190
   CTE qin
     ->  GroupAggregate  (cost=1427.34..1880.73 rows=10678 width=80) (actual time=11.453..32.317 rows=10678 loops=1)
           Group Key: sa_insole.sin_season, sa_insole.sin_sa_code
           ->  Sort  (cost=1427.34..1465.41 rows=15230 width=18) (actual time=11.444..11.943 rows=15230 loops=1)
                 Sort Key: sa_insole.sin_season, sa_insole.sin_sa_code COLLATE "C"
                 Sort Method: quicksort  Memory: 1336kB
                 ->  Hash Left Join  (cost=10.49..369.26 rows=15230 width=18) (actual time=0.051..3.098 rows=15230 loops=1)
                       Hash Cond: ((sa_insole.sin_mat_code)::text = insole_target.in_mat_code)
                       ->  Seq Scan on sa_insole  (cost=0.00..264.30 rows=15230 width=16) (actual time=0.007..0.608 rows=15230 loops=1)
                       ->  Hash  (cost=9.01..9.01 rows=118 width=6) (actual time=0.041..0.041 rows=119 loops=1)
                             Buckets: 1024  Batches: 1  Memory Usage: 13kB
                             ->  Seq Scan on insole_target  (cost=0.00..9.01 rows=118 width=6) (actual time=0.007..0.031 rows=119 loops=1)
                                   Filter: (id_in <= 119)
                                   Rows Removed by Filter: 362
   CTE qou
     ->  GroupAggregate  (cost=2366.22..2986.89 rows=10699 width=80) (actual time=18.055..42.079 rows=10699 loops=1)
           Group Key: sa_outsole.sou_season, sa_outsole.sou_sa_code
           ->  Sort  (cost=2366.22..2428.14 rows=24768 width=18) (actual time=18.043..18.798 rows=24768 loops=1)
                 Sort Key: sa_outsole.sou_season, sa_outsole.sou_sa_code COLLATE "C"
                 Sort Method: quicksort  Memory: 2317kB
                 ->  Hash Left Join  (cost=5.39..558.63 rows=24768 width=18) (actual time=0.037..5.017 rows=24768 loops=1)
                       Hash Cond: ((sa_outsole.sou_mat_code)::text = outsole_target.ou_mat_code)
                       ->  Seq Scan on sa_outsole  (cost=0.00..430.68 rows=24768 width=16) (actual time=0.008..0.998 rows=24768 loops=1)
                       ->  Hash  (cost=5.03..5.03 rows=29 width=6) (actual time=0.025..0.025 rows=29 loops=1)
                             Buckets: 1024  Batches: 1  Memory Usage: 10kB
                             ->  Seq Scan on outsole_target  (cost=0.00..5.03 rows=29 width=6) (actual time=0.009..0.020 rows=29 loops=1)
                                   Filter: (id_ou <= 29)
                                   Rows Removed by Filter: 213
   ->  Hash Join  (cost=1015.85..1319.04 rows=1 width=104) (actual time=169.382..203.707 rows=8548 loops=1)
         Hash Cond: ((qou.curr_season = qli.curr_season) AND ((qou.curr_code)::text = (qli.curr_code)::text))
         Join Filter: ((((qup.ibitmask | qin.ibitmask) | qli.ibitmask) | qou.ibitmask) IS NOT NULL)
         ->  CTE Scan on qou  (cost=0.00..294.22 rows=1189 width=76) (actual time=18.057..45.448 rows=10275 loops=1)
               Filter: ((ibitmask > 0) AND (cardinality(mat_arr) <= 11))
               Rows Removed by Filter: 424
         ->  Hash  (cost=1015.83..1015.83 rows=1 width=228) (actual time=151.316..151.317 rows=8845 loops=1)
               Buckets: 16384 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 1899kB
               ->  Hash Join  (cost=707.35..1015.83 rows=1 width=228) (actual time=122.483..149.030 rows=8845 loops=1)
                     Hash Cond: ((qin.curr_season = qli.curr_season) AND ((qin.curr_code)::text = (qli.curr_code)::text))
                     ->  CTE Scan on qin  (cost=0.00..293.65 rows=1186 width=76) (actual time=11.454..35.456 rows=10197 loops=1)
                           Filter: ((ibitmask > 0) AND (cardinality(mat_arr) <= 8))
                           Rows Removed by Filter: 481
                     ->  Hash  (cost=706.86..706.86 rows=33 width=152) (actual time=111.026..111.027 rows=9007 loops=1)
                           Buckets: 16384 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 1473kB
                           ->  Merge Join  (cost=689.20..706.86 rows=33 width=152) (actual time=106.441..109.505 rows=9007 loops=1)
                                 Merge Cond: ((qup.curr_season = qli.curr_season) AND ((qup.curr_code)::text = (qli.curr_code)::text))
                                 ->  Sort  (cost=342.09..344.96 rows=1147 width=76) (actual time=73.200..73.429 rows=9320 loops=1)
                                       Sort Key: qup.curr_season, qup.curr_code COLLATE "C"
                                       Sort Method: quicksort  Memory: 1391kB
                                       ->  CTE Scan on qup  (cost=0.00..283.80 rows=1147 width=76) (actual time=35.067..71.872 rows=9320 loops=1)
                                             Filter: ((ibitmask > 0) AND (cardinality(mat_arr) <= 21))
                                             Rows Removed by Filter: 1415
                                 ->  Sort  (cost=347.12..350.02 rows=1163 width=76) (actual time=33.239..33.490 rows=10289 loops=1)
                                       Sort Key: qli.curr_season, qli.curr_code COLLATE "C"
                                       Sort Method: quicksort  Memory: 1349kB
                                       ->  CTE Scan on qli  (cost=0.00..287.90 rows=1163 width=76) (actual time=9.355..31.457 rows=10289 loops=1)
                                             Filter: ((ibitmask > 0) AND (cardinality(mat_arr) <= 8))
                                             Rows Removed by Filter: 180
   ->  Merge Left Join  (cost=2625.49..3399.84 rows=5733 width=104) (actual time=4.529..6.645 rows=1415 loops=1)
         Merge Cond: ((qup_1.curr_season = qou_1.curr_season) AND ((qup_1.curr_code)::text = (qou_1.curr_code)::text))
         ->  Merge Left Join  (cost=1958.66..2135.28 rows=5733 width=136) (actual time=3.388..3.833 rows=1415 loops=1)
               Merge Cond: ((qup_1.curr_season = qin_1.curr_season) AND ((qup_1.curr_code)::text = (qin_1.curr_code)::text))
               ->  Merge Left Join  (cost=1293.25..1388.21 rows=5733 width=104) (actual time=2.297..2.534 rows=1415 loops=1)
                     Merge Cond: ((qup_1.curr_season = qli_1.curr_season) AND ((qup_1.curr_code)::text = (qli_1.curr_code)::text))
                     ->  Sort  (cost=641.68..656.02 rows=5733 width=72) (actual time=1.278..1.315 rows=1415 loops=1)
                           Sort Key: qup_1.curr_season, qup_1.curr_code COLLATE "C"
                           Sort Method: quicksort  Memory: 204kB
                           ->  CTE Scan on qup qup_1  (cost=0.00..283.80 rows=5733 width=72) (actual time=0.009..1.081 rows=1415 loops=1)
                                 Filter: ((ibitmask < 0) OR (cardinality(mat_arr) > 21))
                                 Rows Removed by Filter: 9320
                     ->  Sort  (cost=651.57..666.11 rows=5816 width=72) (actual time=1.017..1.022 rows=180 loops=1)
                           Sort Key: qli_1.curr_season, qli_1.curr_code COLLATE "C"
                           Sort Method: quicksort  Memory: 41kB
                           ->  CTE Scan on qli qli_1  (cost=0.00..287.90 rows=5816 width=72) (actual time=0.054..0.994 rows=180 loops=1)
                                 Filter: ((ibitmask < 0) OR (cardinality(mat_arr) > 8))
                                 Rows Removed by Filter: 10289
               ->  Sort  (cost=665.41..680.24 rows=5932 width=72) (actual time=1.089..1.103 rows=481 loops=1)
                     Sort Key: qin_1.curr_season, qin_1.curr_code COLLATE "C"
                     Sort Method: quicksort  Memory: 68kB
                     ->  CTE Scan on qin qin_1  (cost=0.00..293.65 rows=5932 width=72) (actual time=0.016..1.022 rows=481 loops=1)
                           Filter: ((ibitmask < 0) OR (cardinality(mat_arr) > 8))
                           Rows Removed by Filter: 10197
         ->  Sort  (cost=666.83..681.69 rows=5944 width=72) (actual time=1.134..1.145 rows=417 loops=1)
               Sort Key: qou_1.curr_season, qou_1.curr_code COLLATE "C"
               Sort Method: quicksort  Memory: 68kB
               ->  CTE Scan on qou qou_1  (cost=0.00..294.22 rows=5944 width=72) (actual time=0.029..1.038 rows=424 loops=1)
                     Filter: ((ibitmask < 0) OR (cardinality(mat_arr) > 11))
                     Rows Removed by Filter: 10275
 Planning Time: 1.055 ms
 Execution Time: 212.800 ms
(118 Zeilen)

As seen in the line of the qupd CTE

                           ->  Merge Join  (cost=689.20..706.86 rows=33 width=152) (actual time=106.441..109.505 rows=9007 loops=1)

the row count of the second join round drops to 33 and for the third round it drops to 1

               ->  Hash Join  (cost=707.35..1015.83 rows=1 width=228) (actual time=122.483..149.030 rows=8845 loops=1)

BTW, I don't know, why the second join group (part of qupda) gets a complete different plan.


--------------------------------------------


Here is the second question, different from the first only by replacing the left join to inner join in the join group of qupd:

\set only 'ONLY'

cpsdb_matcol=# explain analyze -- explain analyze verbose -- explain -- select * from ( -- select count(*) from ( -- select length(sel) from (
cpsdb_matcol-# with
cpsdb_matcol-# qup as (
cpsdb_matcol(# select
cpsdb_matcol(#  curr_season -- all xxx_seasosn are always smallint
cpsdb_matcol(# ,curr_code-- all xx_code are always varchar(10)
cpsdb_matcol(# ,array_agg(id_up order by id_up)||array_fill(0::smallint,array[10]) as mat_arr
cpsdb_matcol(# ,array_agg(curr_mat_code order by id_up) as matcode_arr
cpsdb_matcol(# ,bit_or(imask) as ibitmask
cpsdb_matcol(# from(
cpsdb_matcol(# select
cpsdb_matcol(#  sup_season as curr_season
cpsdb_matcol(# ,sup_sa_code as curr_code
cpsdb_matcol(# ,sup_mat_code as curr_mat_code
cpsdb_matcol(# ,sup_clr_code as curr_clr_code
cpsdb_matcol(# ,id_up
cpsdb_matcol(# ,coalesce(id_up,-1) as imask
cpsdb_matcol(# from :only sa_upper
cpsdb_matcol(# left join upper_target on up_mat_code=sup_mat_code and id_up <= (512-1-16)
cpsdb_matcol(# )qr
cpsdb_matcol(# group by 1,2
cpsdb_matcol(# )
cpsdb_matcol-# ,qli as (
cpsdb_matcol(# select
cpsdb_matcol(#  curr_season
cpsdb_matcol(# ,curr_code
cpsdb_matcol(# ,array_agg(id_li order by id_li)||array_fill(0::smallint,array[4]) as mat_arr
cpsdb_matcol(# ,array_agg(curr_mat_code order by id_li) as matcode_arr
cpsdb_matcol(# ,bit_or(imask) as ibitmask
cpsdb_matcol(# from(
cpsdb_matcol(# select
cpsdb_matcol(#  sli_season as curr_season
cpsdb_matcol(# ,sli_sa_code as curr_code
cpsdb_matcol(# ,sli_mat_code as curr_mat_code
cpsdb_matcol(# ,sli_clr_code as curr_clr_code
cpsdb_matcol(# ,id_li
cpsdb_matcol(# ,coalesce(id_li,-1) as imask
cpsdb_matcol(# from :only sa_lining
cpsdb_matcol(# left join lining_target on li_mat_code=sli_mat_code and id_li <= (128-1-8)
cpsdb_matcol(# )qr
cpsdb_matcol(# group by 1,2
cpsdb_matcol(# )
cpsdb_matcol-# ,qin as (
cpsdb_matcol(# select
cpsdb_matcol(#  curr_season
cpsdb_matcol(# ,curr_code
cpsdb_matcol(# ,array_agg(id_in order by id_in)||array_fill(0::smallint,array[4]) as mat_arr
cpsdb_matcol(# ,array_agg(curr_mat_code order by id_in) as matcode_arr
cpsdb_matcol(# ,bit_or(imask) as ibitmask
cpsdb_matcol(# from(
cpsdb_matcol(# select
cpsdb_matcol(#  sin_season as curr_season
cpsdb_matcol(# ,sin_sa_code as curr_code
cpsdb_matcol(# ,sin_mat_code as curr_mat_code
cpsdb_matcol(# ,sin_clr_code as curr_clr_code
cpsdb_matcol(# ,id_in
cpsdb_matcol(# ,coalesce(id_in,-1) as imask
cpsdb_matcol(# from :only sa_insole
cpsdb_matcol(# left join insole_target on in_mat_code=sin_mat_code and id_in <= (128-1-8)
cpsdb_matcol(# )qr
cpsdb_matcol(# group by 1,2
cpsdb_matcol(# )
cpsdb_matcol-# ,qou as (
cpsdb_matcol(# select
cpsdb_matcol(#  curr_season
cpsdb_matcol(# ,curr_code
cpsdb_matcol(# ,array_agg(id_ou order by id_ou)||array_fill(0::smallint,array[6]) as mat_arr
cpsdb_matcol(# ,array_agg(curr_mat_code order by id_ou) as matcode_arr
cpsdb_matcol(# ,bit_or(imask) as ibitmask
cpsdb_matcol(# from(
cpsdb_matcol(# select
cpsdb_matcol(#  sou_season as curr_season
cpsdb_matcol(# ,sou_sa_code as curr_code
cpsdb_matcol(# ,sou_mat_code as curr_mat_code
cpsdb_matcol(# ,sou_clr_code as curr_clr_code
cpsdb_matcol(# ,id_ou
cpsdb_matcol(# ,coalesce(id_ou,-1) as imask
cpsdb_matcol(# from :only sa_outsole
cpsdb_matcol(# left join outsole_target on ou_mat_code=sou_mat_code and id_ou <= (32-1-2)
cpsdb_matcol(# )qr
cpsdb_matcol(# group by 1,2
cpsdb_matcol(# )
cpsdb_matcol-# ,qupd as (
cpsdb_matcol(# select
cpsdb_matcol(#  qup.curr_season
cpsdb_matcol(# ,qup.curr_code
cpsdb_matcol(# ,qup.ibitmask|qin.ibitmask|qli.ibitmask|qou.ibitmask as ibitmask
cpsdb_matcol(# -- the calculations of new_mat_x are simplified here
cpsdb_matcol(# -- in the production version they are a more complex combination of bit masks, bit shifts and bit or of different elements of the arrays
cpsdb_matcol(# ,(qup.mat_arr[1]|qli.mat_arr[1]|qin.mat_arr[1]|qou.mat_arr[1])::bigint as new_mat_1
cpsdb_matcol(#
cpsdb_matcol(# ,(qup.mat_arr[2]|qli.mat_arr[2]|qin.mat_arr[2]|qou.mat_arr[2])::bigint as new_mat_2
cpsdb_matcol(#
cpsdb_matcol(# ,(qup.mat_arr[3]|qli.mat_arr[3]|qin.mat_arr[3]|qou.mat_arr[3])::bigint as new_mat_3
cpsdb_matcol(#
cpsdb_matcol(# from qup
cpsdb_matcol(# join qli on (qli.curr_season=qup.curr_season and qli.curr_code=qup.curr_code and qli.ibitmask>0 and cardinality(qli.mat_arr) <=8)
cpsdb_matcol(# join qin on (qin.curr_season=qup.curr_season and qin.curr_code=qup.curr_code and qin.ibitmask>0 and cardinality(qin.mat_arr) <=8)
cpsdb_matcol(# join qou on (qou.curr_season=qup.curr_season and qou.curr_code=qup.curr_code and qou.ibitmask>0 and cardinality(qou.mat_arr) <=11)
cpsdb_matcol(# where qup.ibitmask>0 and cardinality(qup.mat_arr) <=21
cpsdb_matcol(# )
cpsdb_matcol-# ,qupda as (
cpsdb_matcol(# select
cpsdb_matcol(#  qup.curr_season
cpsdb_matcol(# ,qup.curr_code
cpsdb_matcol(# ,repeat('0',64)||
cpsdb_matcol(# repeat('11',coalesce(cardinality(qou.matcode_arr),0))||repeat('10',coalesce(cardinality(qin.matcode_arr),0))||
cpsdb_matcol(# repeat('01',coalesce(cardinality(qou.matcode_arr),0))||repeat('00',coalesce(cardinality(qup.matcode_arr),0))||
cpsdb_matcol(# '00' as curr_mattype_bitmask
cpsdb_matcol(# ,qup.matcode_arr||qli.matcode_arr||qin.matcode_arr||qou.matcode_arr as curr_matcode_arr
cpsdb_matcol(# from qup
cpsdb_matcol(# left join qli on qli.curr_season=qup.curr_season and qli.curr_code=qup.curr_code and (qli.ibitmask<0 or cardinality(qli.mat_arr) >8)
cpsdb_matcol(# left join qin on qin.curr_season=qup.curr_season and qin.curr_code=qup.curr_code and (qin.ibitmask<0 or cardinality(qin.mat_arr) >8)
cpsdb_matcol(# left join qou on qou.curr_season=qup.curr_season and qou.curr_code=qup.curr_code and (qou.ibitmask<0 or cardinality(qou.mat_arr) >11)
cpsdb_matcol(# where qup.ibitmask<0 or cardinality(qup.mat_arr) >21
cpsdb_matcol(# )
cpsdb_matcol-# select
cpsdb_matcol-#  curr_season
cpsdb_matcol-# ,curr_code
cpsdb_matcol-# ,new_mat_1
cpsdb_matcol-# ,new_mat_2
cpsdb_matcol-# ,new_mat_3
cpsdb_matcol-# ,NULL::bigint as new_mattype_bitmask
cpsdb_matcol-# ,NULL as new_mat_codes
cpsdb_matcol-# from qupd
cpsdb_matcol-# union all
cpsdb_matcol-# select
cpsdb_matcol-#  curr_season
cpsdb_matcol-# ,curr_code
cpsdb_matcol-# ,NULL::bigint as new_mat_1
cpsdb_matcol-# ,NULL::bigint as new_mat_2
cpsdb_matcol-# ,NULL::bigint as new_mat_3
cpsdb_matcol-# ,substr(curr_mattype_bitmask,length(curr_mattype_bitmask)-63)::bit(64)::bigint as new_mattype_bitmask
cpsdb_matcol-# ,curr_matcode_arr as new_mat_codes
cpsdb_matcol-# from qupda
cpsdb_matcol-# ;
                                                                 QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=13365.31..17471.72 rows=5734 width=104) (actual time=139.730..13430.641 rows=9963 loops=1)
   CTE qup
     ->  GroupAggregate  (cost=5231.22..6303.78 rows=10320 width=80) (actual time=35.337..67.779 rows=10735 loops=1)
           Group Key: sa_upper.sup_season, sa_upper.sup_sa_code
           ->  Sort  (cost=5231.22..5358.64 rows=50969 width=18) (actual time=35.326..36.704 rows=50969 loops=1)
                 Sort Key: sa_upper.sup_season, sa_upper.sup_sa_code COLLATE "C"
                 Sort Method: quicksort  Memory: 4722kB
                 ->  Hash Left Join  (cost=41.71..1246.13 rows=50969 width=18) (actual time=0.179..10.787 rows=50969 loops=1)
                       Hash Cond: ((sa_upper.sup_mat_code)::text = upper_target.up_mat_code)
                       ->  Seq Scan on sa_upper  (cost=0.00..884.69 rows=50969 width=16) (actual time=0.009..1.990 rows=50969 loops=1)
                       ->  Hash  (cost=35.53..35.53 rows=495 width=6) (actual time=0.164..0.164 rows=495 loops=1)
                             Buckets: 1024  Batches: 1  Memory Usage: 27kB
                             ->  Seq Scan on upper_target  (cost=0.00..35.53 rows=495 width=6) (actual time=0.006..0.128 rows=495 loops=1)
                                   Filter: (id_up <= 495)
                                   Rows Removed by Filter: 1467
   CTE qli
     ->  GroupAggregate  (cost=1097.31..1486.56 rows=10469 width=80) (actual time=9.434..27.620 rows=10469 loops=1)
           Group Key: sa_lining.sli_season, sa_lining.sli_sa_code
           ->  Sort  (cost=1097.31..1126.74 rows=11774 width=18) (actual time=9.424..9.796 rows=11774 loops=1)
                 Sort Key: sa_lining.sli_season, sa_lining.sli_sa_code COLLATE "C"
                 Sort Method: quicksort  Memory: 1120kB
                 ->  Hash Left Join  (cost=7.34..301.19 rows=11774 width=18) (actual time=0.049..2.444 rows=11774 loops=1)
                       Hash Cond: ((sa_lining.sli_mat_code)::text = lining_target.li_mat_code)
                       ->  Seq Scan on sa_lining  (cost=0.00..204.74 rows=11774 width=16) (actual time=0.009..0.476 rows=11774 loops=1)
                       ->  Hash  (cost=5.86..5.86 rows=118 width=6) (actual time=0.036..0.036 rows=119 loops=1)
                             Buckets: 1024  Batches: 1  Memory Usage: 13kB
                             ->  Seq Scan on lining_target  (cost=0.00..5.86 rows=118 width=6) (actual time=0.008..0.026 rows=119 loops=1)
                                   Filter: (id_li <= 119)
                                   Rows Removed by Filter: 190
   CTE qin
     ->  GroupAggregate  (cost=1427.34..1880.73 rows=10678 width=80) (actual time=11.578..31.510 rows=10678 loops=1)
           Group Key: sa_insole.sin_season, sa_insole.sin_sa_code
           ->  Sort  (cost=1427.34..1465.41 rows=15230 width=18) (actual time=11.572..12.044 rows=15230 loops=1)
                 Sort Key: sa_insole.sin_season, sa_insole.sin_sa_code COLLATE "C"
                 Sort Method: quicksort  Memory: 1336kB
                 ->  Hash Left Join  (cost=10.49..369.26 rows=15230 width=18) (actual time=0.056..3.120 rows=15230 loops=1)
                       Hash Cond: ((sa_insole.sin_mat_code)::text = insole_target.in_mat_code)
                       ->  Seq Scan on sa_insole  (cost=0.00..264.30 rows=15230 width=16) (actual time=0.008..0.609 rows=15230 loops=1)
                       ->  Hash  (cost=9.01..9.01 rows=118 width=6) (actual time=0.044..0.045 rows=119 loops=1)
                             Buckets: 1024  Batches: 1  Memory Usage: 13kB
                             ->  Seq Scan on insole_target  (cost=0.00..9.01 rows=118 width=6) (actual time=0.008..0.033 rows=119 loops=1)
                                   Filter: (id_in <= 119)
                                   Rows Removed by Filter: 362
   CTE qou
     ->  GroupAggregate  (cost=2366.22..2986.89 rows=10699 width=80) (actual time=18.295..51.236 rows=10699 loops=1)
           Group Key: sa_outsole.sou_season, sa_outsole.sou_sa_code
           ->  Sort  (cost=2366.22..2428.14 rows=24768 width=18) (actual time=18.281..20.157 rows=24768 loops=1)
                 Sort Key: sa_outsole.sou_season, sa_outsole.sou_sa_code COLLATE "C"
                 Sort Method: quicksort  Memory: 2317kB
                 ->  Hash Left Join  (cost=5.39..558.63 rows=24768 width=18) (actual time=0.036..5.080 rows=24768 loops=1)
                       Hash Cond: ((sa_outsole.sou_mat_code)::text = outsole_target.ou_mat_code)
                       ->  Seq Scan on sa_outsole  (cost=0.00..430.68 rows=24768 width=16) (actual time=0.009..1.017 rows=24768 loops=1)
                       ->  Hash  (cost=5.03..5.03 rows=29 width=6) (actual time=0.024..0.025 rows=29 loops=1)
                             Buckets: 1024  Batches: 1  Memory Usage: 10kB
                             ->  Seq Scan on outsole_target  (cost=0.00..5.03 rows=29 width=6) (actual time=0.007..0.018 rows=29 loops=1)
                                   Filter: (id_ou <= 29)
                                   Rows Removed by Filter: 213
   ->  Nested Loop  (cost=707.35..1327.91 rows=1 width=104) (actual time=139.729..13423.084 rows=8548 loops=1)
         Join Filter: ((qli.curr_season = qin.curr_season) AND ((qli.curr_code)::text = (qin.curr_code)::text))
         Rows Removed by Join Filter: 88552397
         ->  Hash Join  (cost=707.35..1016.45 rows=1 width=216) (actual time=128.145..169.287 rows=8685 loops=1)
               Hash Cond: ((qou.curr_season = qli.curr_season) AND ((qou.curr_code)::text = (qli.curr_code)::text))
               ->  CTE Scan on qou  (cost=0.00..294.22 rows=1189 width=72) (actual time=18.297..55.085 rows=10275 loops=1)
                     Filter: ((ibitmask > 0) AND (cardinality(mat_arr) <= 11))
                     Rows Removed by Filter: 424
               ->  Hash  (cost=706.86..706.86 rows=33 width=144) (actual time=109.843..109.845 rows=9007 loops=1)
                     Buckets: 16384 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 1369kB
                     ->  Merge Join  (cost=689.20..706.86 rows=33 width=144) (actual time=105.294..108.377 rows=9007 loops=1)
                           Merge Cond: ((qup.curr_season = qli.curr_season) AND ((qup.curr_code)::text = (qli.curr_code)::text))
                           ->  Sort  (cost=342.09..344.96 rows=1147 width=72) (actual time=72.693..72.923 rows=9320 loops=1)
                                 Sort Key: qup.curr_season, qup.curr_code COLLATE "C"
                                 Sort Method: quicksort  Memory: 1357kB
                                 ->  CTE Scan on qup  (cost=0.00..283.80 rows=1147 width=72) (actual time=35.339..71.419 rows=9320 loops=1)
                                       Filter: ((ibitmask > 0) AND (cardinality(mat_arr) <= 21))
                                       Rows Removed by Filter: 1415
                           ->  Sort  (cost=347.12..350.02 rows=1163 width=72) (actual time=32.598..32.861 rows=10289 loops=1)
                                 Sort Key: qli.curr_season, qli.curr_code COLLATE "C"
                                 Sort Method: quicksort  Memory: 1269kB
                                 ->  CTE Scan on qli  (cost=0.00..287.90 rows=1163 width=72) (actual time=9.436..30.852 rows=10289 loops=1)
                                       Filter: ((ibitmask > 0) AND (cardinality(mat_arr) <= 8))
                                       Rows Removed by Filter: 180
         ->  CTE Scan on qin  (cost=0.00..293.65 rows=1186 width=72) (actual time=0.001..1.163 rows=10197 loops=8685)
               Filter: ((ibitmask > 0) AND (cardinality(mat_arr) <= 8))
               Rows Removed by Filter: 481
   ->  Merge Left Join  (cost=2625.49..3399.84 rows=5733 width=104) (actual time=4.622..6.715 rows=1415 loops=1)
         Merge Cond: ((qup_1.curr_season = qou_1.curr_season) AND ((qup_1.curr_code)::text = (qou_1.curr_code)::text))
         ->  Merge Left Join  (cost=1958.66..2135.28 rows=5733 width=136) (actual time=3.489..3.937 rows=1415 loops=1)
               Merge Cond: ((qup_1.curr_season = qin_1.curr_season) AND ((qup_1.curr_code)::text = (qin_1.curr_code)::text))
               ->  Merge Left Join  (cost=1293.25..1388.21 rows=5733 width=104) (actual time=2.376..2.614 rows=1415 loops=1)
                     Merge Cond: ((qup_1.curr_season = qli_1.curr_season) AND ((qup_1.curr_code)::text = (qli_1.curr_code)::text))
                     ->  Sort  (cost=641.68..656.02 rows=5733 width=72) (actual time=1.300..1.337 rows=1415 loops=1)
                           Sort Key: qup_1.curr_season, qup_1.curr_code COLLATE "C"
                           Sort Method: quicksort  Memory: 204kB
                           ->  CTE Scan on qup qup_1  (cost=0.00..283.80 rows=5733 width=72) (actual time=0.010..1.119 rows=1415 loops=1)
                                 Filter: ((ibitmask < 0) OR (cardinality(mat_arr) > 21))
                                 Rows Removed by Filter: 9320
                     ->  Sort  (cost=651.57..666.11 rows=5816 width=72) (actual time=1.073..1.078 rows=180 loops=1)
                           Sort Key: qli_1.curr_season, qli_1.curr_code COLLATE "C"
                           Sort Method: quicksort  Memory: 41kB
                           ->  CTE Scan on qli qli_1  (cost=0.00..287.90 rows=5816 width=72) (actual time=0.057..1.029 rows=180 loops=1)
                                 Filter: ((ibitmask < 0) OR (cardinality(mat_arr) > 8))
                                 Rows Removed by Filter: 10289
               ->  Sort  (cost=665.41..680.24 rows=5932 width=72) (actual time=1.111..1.124 rows=481 loops=1)
                     Sort Key: qin_1.curr_season, qin_1.curr_code COLLATE "C"
                     Sort Method: quicksort  Memory: 68kB
                     ->  CTE Scan on qin qin_1  (cost=0.00..293.65 rows=5932 width=72) (actual time=0.016..1.045 rows=481 loops=1)
                           Filter: ((ibitmask < 0) OR (cardinality(mat_arr) > 8))
                           Rows Removed by Filter: 10197
         ->  Sort  (cost=666.83..681.69 rows=5944 width=72) (actual time=1.125..1.135 rows=417 loops=1)
               Sort Key: qou_1.curr_season, qou_1.curr_code COLLATE "C"
               Sort Method: quicksort  Memory: 68kB
               ->  CTE Scan on qou qou_1  (cost=0.00..294.22 rows=5944 width=72) (actual time=0.029..1.063 rows=424 loops=1)
                     Filter: ((ibitmask < 0) OR (cardinality(mat_arr) > 11))
                     Rows Removed by Filter: 10275
 Planning Time: 0.969 ms
 Execution Time: 13432.726 ms
(116 Zeilen)

(All plans are unchanged, cut/pasted from psql window)

In qupd we find the same rows estimations as above, as shown in the lines

              ->  Hash  (cost=706.86..706.86 rows=33 width=144) (actual time=109.843..109.845 rows=9007 loops=1)

 ->  Nested Loop  (cost=707.35..1327.91 rows=1 width=104) (actual time=139.729..13423.084 rows=8548 loops=1)

---------

In both queries I haven't used materialized CTEs explicitely, but the first 4 CTE's are used in 2 different subsequent CTE's.

This query is not fully optimized for frequent use, it is only used for refactoring old data, but finally it will use a 10fold bigger dataset.
(Optimizing could eleminate the cardinality function in join conditions, eliminate materialized CTEs etc).

I only encountered the long execution time in the second query (with inner joins), which let me analyze and dig to the root cause.
The use of the nested loop in the third inner join round took very long and eliminated about 9 million rows (on a quad join with 4 datasets of about 10000 tuples).

I wanted to draw attention on my accidently findings, but I am not able to fully understand or investigate in the source code :-(.

I conclude that the row estimation in this example seems wrong ((left) outer join case) or too strict (inner join case, only 1/33 estimated from the previous step!)

I Hope this updated information may help you

Hans Buschmann







Von: Tomas Vondra <tomas.vondra@enterprisedb.com>
Gesendet: Mittwoch, 8. Februar 2023 22:27
An: Hans Buschmann; pgsql-hackers@lists.postgresql.org
Betreff: Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500
 
On 2/8/23 14:55, Hans Buschmann wrote:
> During data refactoring of our Application I encountered $subject when
> joining 4 CTEs with left join or inner join.
>
>
> 1. Background
>
> PG 15.1 on Windows x64 (OS seems no to have no meening here)
>
>
> I try to collect data from 4 (analyzed) tables (up,li,in,ou) by grouping
> certain data (4 CTEs qup,qli,qin,qou)
>
> The grouping of the data in the CTEs gives estimated row counts of about
> 1000 (1 tenth of the real value) This is OK for estimation.
>
>
> These 4 CTEs are then used to combine the data by joining them.
>
>
> 2. Problem
>
> The 4 CTEs are joined by left joins as shown below:
>
...
>
> This case really brought me to detect the problem!
>
> The original query and data are not shown here, but the principle should
> be clear from the execution plans.
>
> I think the planner shouldn't change the row estimations on further
> steps after left joins at all, and be a bit more conservative on inner
> joins.

But the code should alredy do exactly that, see:

https://github.com/postgres/postgres/blob/dbe8a1726cfd5a09cf1ef99e76f5f89e2efada71/src/backend/optimizer/path/costsize.c#L5212

And in fact, the second part of the plains shows it's doing the trick:

 ->  Merge Left Join  (cost=1293.25..1388.21 rows=5733 width=104)
(actual time=2.321..2.556 rows=1415 loops=1)
     Merge Cond: ((qup_1.curr_season = qli_1.curr_season) AND
                 ((qup_1.curr_code)::text = (qli_1.curr_code)::text))
     ->  Sort  (cost=641.68..656.02 rows=5733 width=72)
     ->  Sort  (cost=651.57..666.11 rows=5816 width=72)

But notice the first join (with rows=33) doesn't say "Left". And I see
there's Append on top, so presumably the query is much more complex, and
there's a regular join of these CTEs in some other part.

We'll need to se the whole query, not just one chunk of it.

FWIW it seems you're using materialized CTEs - that's likely pretty bad
for the estimates, because we don't propagate statistics from the CTE.
So a join on CTEs can't see statistics from the underlying tables, and
that can easily produce really bad estimates.

I'm assuming you're not using AS MATERIALIZED explicitly, so I'd bet
this happens because the "cardinality" function is marked as volatile.
Perhaps it can be redefined as stable/immutable.

> This may be related to the fact that this case has 2 join-conditions
> (xx_season an xx_code).

That shouldn't affect outer join estimates this way (but as I explained
above, the join does not seem to be "left" per the explain).
Multi-column joins can cause issues, no doubt about it - but CTEs make
it worse because we can't e.g. see foreign keys.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

On 2/9/23 10:03, Hans Buschmann wrote:
> Hello Tomas,
> 
> 
> Thank you for looking at.
> 
> 
> First, I miscalculated the factor which should be about 50, not 500. Sorry.
> 
> Then I want to show you the table definitions (simple, very similar,
> ommited child_tables and additional indexes, here using always "ONLY"):
> 
> cpsdb_matcol=# \d sa_upper;
>                                        Tabelle ╗public.sa_upper½
>     Spalte    |          Typ          | Sortierfolge | NULL erlaubt? | 
>          Vorgabewert
> --------------+-----------------------+--------------+---------------+----------------------------------
>  id_sup       | integer               |              | not null      |
> generated by default as identity
>  sup_season   | smallint              |              |               |
>  sup_sa_code  | character varying(10) | C            |               |
>  sup_mat_code | character varying(4)  | C            |               |
>  sup_clr_code | character varying(3)  | C            |               |
> Indexe:
>     "sa_upper_active_pkey" PRIMARY KEY, btree (id_sup)
>  
> 
> cpsdb_matcol=# \d sa_lining+;
>                                        Tabelle ╗public.sa_lining½
>     Spalte    |          Typ          | Sortierfolge | NULL erlaubt? | 
>          Vorgabewert
> --------------+-----------------------+--------------+---------------+----------------------------------
>  id_sli       | integer               |              | not null      |
> generated by default as identity
>  sli_season   | smallint              |              |               |
>  sli_sa_code  | character varying(10) | C            |               |
>  sli_mat_code | character varying(4)  | C            |               |
>  sli_clr_code | character varying(3)  | C            |               |
> Indexe:
>     "sa_lining_active_pkey" PRIMARY KEY, btree (id_sli)
>  
> 
> cpsdb_matcol=# \d sa_insole;
>                                        Tabelle ╗public.sa_insole½
>     Spalte    |          Typ          | Sortierfolge | NULL erlaubt? | 
>          Vorgabewert
> --------------+-----------------------+--------------+---------------+----------------------------------
>  id_sin       | integer               |              | not null      |
> generated by default as identity
>  sin_season   | smallint              |              |               |
>  sin_sa_code  | character varying(10) | C            |               |
>  sin_mat_code | character varying(4)  | C            |               |
>  sin_clr_code | character varying(3)  | C            |               |
> Indexe:
>     "sa_insole_active_pkey" PRIMARY KEY, btree (id_sin)
>  
> 
> cpsdb_matcol=# \d sa_outsole;
>                                       Tabelle ╗public.sa_outsole½
>     Spalte    |          Typ          | Sortierfolge | NULL erlaubt? | 
>          Vorgabewert
> --------------+-----------------------+--------------+---------------+----------------------------------
>  id_sou       | integer               |              | not null      |
> generated by default as identity
>  sou_season   | smallint              |              |               |
>  sou_sa_code  | character varying(10) | C            |               |
>  sou_mat_code | character varying(4)  | C            |               |
>  sou_clr_code | character varying(3)  | C            |               |
> Indexe:
>     "sa_outsole_active_pkey" PRIMARY KEY, btree (id_sou)
>  
> The xxx_target tables are very similiar, here the upper one as an example:
> They are count_aggregates of the whole dataset, where
> up_mat_code=sup_mat_code etc.
> 
> cpsdb_matcol=# \d upper_target
>                     Tabelle ╗admin.upper_target½
>    Spalte    |   Typ    | Sortierfolge | NULL erlaubt? | Vorgabewert
> -------------+----------+--------------+---------------+-------------
>  id_up       | smallint |              |               |
>  nup         | integer  |              |               |
>  up_mat_code | text     | C            |               |
> 
> 
> 
> I have reworked the two queries to show their complete explain plans:
> 
> 1. query with left join in the qupd CTE:
> 
> \set only 'ONLY'
> 
> cpsdb_matcol=# explain analyze -- explain analyze verbose -- explain --
> select * from ( -- select count(*) from ( -- select length(sel) from (
> cpsdb_matcol-# with
> cpsdb_matcol-# qup as (
> cpsdb_matcol(# select
> cpsdb_matcol(#  curr_season -- all xxx_seasosn are always smallint
> cpsdb_matcol(# ,curr_code-- all xx_code are always varchar(10)
> cpsdb_matcol(# ,array_agg(id_up order by
> id_up)||array_fill(0::smallint,array[10]) as mat_arr
> cpsdb_matcol(# ,array_agg(curr_mat_code order by id_up) as matcode_arr
> cpsdb_matcol(# ,bit_or(imask) as ibitmask
> cpsdb_matcol(# from(
> cpsdb_matcol(# select
> cpsdb_matcol(#  sup_season as curr_season
> cpsdb_matcol(# ,sup_sa_code as curr_code
> cpsdb_matcol(# ,sup_mat_code as curr_mat_code
> cpsdb_matcol(# ,sup_clr_code as curr_clr_code
> cpsdb_matcol(# ,id_up
> cpsdb_matcol(# ,coalesce(id_up,-1) as imask
> cpsdb_matcol(# from :only sa_upper
> cpsdb_matcol(# left join upper_target on up_mat_code=sup_mat_code and
> id_up <= (512-1-16)
> cpsdb_matcol(# )qr
> cpsdb_matcol(# group by 1,2
> cpsdb_matcol(# )
> cpsdb_matcol-# ,qli as (
> cpsdb_matcol(# select
> cpsdb_matcol(#  curr_season
> cpsdb_matcol(# ,curr_code
> cpsdb_matcol(# ,array_agg(id_li order by
> id_li)||array_fill(0::smallint,array[4]) as mat_arr
> cpsdb_matcol(# ,array_agg(curr_mat_code order by id_li) as matcode_arr
> cpsdb_matcol(# ,bit_or(imask) as ibitmask
> cpsdb_matcol(# from(
> cpsdb_matcol(# select
> cpsdb_matcol(#  sli_season as curr_season
> cpsdb_matcol(# ,sli_sa_code as curr_code
> cpsdb_matcol(# ,sli_mat_code as curr_mat_code
> cpsdb_matcol(# ,sli_clr_code as curr_clr_code
> cpsdb_matcol(# ,id_li
> cpsdb_matcol(# ,coalesce(id_li,-1) as imask
> cpsdb_matcol(# from :only sa_lining
> cpsdb_matcol(# left join lining_target on li_mat_code=sli_mat_code and
> id_li <= (128-1-8)
> cpsdb_matcol(# )qr
> cpsdb_matcol(# group by 1,2
> cpsdb_matcol(# )
> cpsdb_matcol-# ,qin as (
> cpsdb_matcol(# select
> cpsdb_matcol(#  curr_season
> cpsdb_matcol(# ,curr_code
> cpsdb_matcol(# ,array_agg(id_in order by
> id_in)||array_fill(0::smallint,array[4]) as mat_arr
> cpsdb_matcol(# ,array_agg(curr_mat_code order by id_in) as matcode_arr
> cpsdb_matcol(# ,bit_or(imask) as ibitmask
> cpsdb_matcol(# from(
> cpsdb_matcol(# select
> cpsdb_matcol(#  sin_season as curr_season
> cpsdb_matcol(# ,sin_sa_code as curr_code
> cpsdb_matcol(# ,sin_mat_code as curr_mat_code
> cpsdb_matcol(# ,sin_clr_code as curr_clr_code
> cpsdb_matcol(# ,id_in
> cpsdb_matcol(# ,coalesce(id_in,-1) as imask
> cpsdb_matcol(# from :only sa_insole
> cpsdb_matcol(# left join insole_target on in_mat_code=sin_mat_code and
> id_in <= (128-1-8)
> cpsdb_matcol(# )qr
> cpsdb_matcol(# group by 1,2
> cpsdb_matcol(# )
> cpsdb_matcol-# ,qou as (
> cpsdb_matcol(# select
> cpsdb_matcol(#  curr_season
> cpsdb_matcol(# ,curr_code
> cpsdb_matcol(# ,array_agg(id_ou order by
> id_ou)||array_fill(0::smallint,array[6]) as mat_arr
> cpsdb_matcol(# ,array_agg(curr_mat_code order by id_ou) as matcode_arr
> cpsdb_matcol(# ,bit_or(imask) as ibitmask
> cpsdb_matcol(# from(
> cpsdb_matcol(# select
> cpsdb_matcol(#  sou_season as curr_season
> cpsdb_matcol(# ,sou_sa_code as curr_code
> cpsdb_matcol(# ,sou_mat_code as curr_mat_code
> cpsdb_matcol(# ,sou_clr_code as curr_clr_code
> cpsdb_matcol(# ,id_ou
> cpsdb_matcol(# ,coalesce(id_ou,-1) as imask
> cpsdb_matcol(# from :only sa_outsole
> cpsdb_matcol(# left join outsole_target on ou_mat_code=sou_mat_code and
> id_ou <= (32-1-2)
> cpsdb_matcol(# )qr
> cpsdb_matcol(# group by 1,2
> cpsdb_matcol(# )
> cpsdb_matcol-# ,qupd as (
> cpsdb_matcol(# select * from (
> cpsdb_matcol(# select
> cpsdb_matcol(#  qup.curr_season
> cpsdb_matcol(# ,qup.curr_code
> cpsdb_matcol(# ,qup.ibitmask|qin.ibitmask|qli.ibitmask|qou.ibitmask as
> ibitmask
> cpsdb_matcol(# -- the calculations of new_mat_x are simplified here
> cpsdb_matcol(# -- in the production version they are a more complex
> combination of bit masks, bit shifts and bit or of different elements of
> the arrays
> cpsdb_matcol(#
> ,(qup.mat_arr[1]|qli.mat_arr[1]|qin.mat_arr[1]|qou.mat_arr[1])::bigint
> as new_mat_1
> cpsdb_matcol(#
> cpsdb_matcol(#
> ,(qup.mat_arr[2]|qli.mat_arr[2]|qin.mat_arr[2]|qou.mat_arr[2])::bigint
> as new_mat_2
> cpsdb_matcol(#
> cpsdb_matcol(#
> ,(qup.mat_arr[3]|qli.mat_arr[3]|qin.mat_arr[3]|qou.mat_arr[3])::bigint
> as new_mat_3
> cpsdb_matcol(#
> cpsdb_matcol(# from qup
> cpsdb_matcol(# left join qli on (qli.curr_season=qup.curr_season and
> qli.curr_code=qup.curr_code and qli.ibitmask>0 and
> cardinality(qli.mat_arr) <=8)
> cpsdb_matcol(# left join qin on (qin.curr_season=qup.curr_season and
> qin.curr_code=qup.curr_code and qin.ibitmask>0 and
> cardinality(qin.mat_arr) <=8)
> cpsdb_matcol(# left join qou on (qou.curr_season=qup.curr_season and
> qou.curr_code=qup.curr_code and qou.ibitmask>0 and
> cardinality(qou.mat_arr) <=11)
> cpsdb_matcol(# where qup.ibitmask>0 and cardinality(qup.mat_arr) <=21
> cpsdb_matcol(# )qj
> cpsdb_matcol(# where ibitmask is not null
> cpsdb_matcol(# )
> cpsdb_matcol-# ,qupda as (
> cpsdb_matcol(# select
> cpsdb_matcol(#  qup.curr_season
> cpsdb_matcol(# ,qup.curr_code
> cpsdb_matcol(# ,repeat('0',64)||
> cpsdb_matcol(#
> repeat('11',coalesce(cardinality(qou.matcode_arr),0))||repeat('10',coalesce(cardinality(qin.matcode_arr),0))||
> cpsdb_matcol(#
> repeat('01',coalesce(cardinality(qou.matcode_arr),0))||repeat('00',coalesce(cardinality(qup.matcode_arr),0))||
> cpsdb_matcol(# '00' as curr_mattype_bitmask
> cpsdb_matcol(#
> ,qup.matcode_arr||qli.matcode_arr||qin.matcode_arr||qou.matcode_arr as
> curr_matcode_arr
> cpsdb_matcol(# from qup
> cpsdb_matcol(# left join qli on qli.curr_season=qup.curr_season and
> qli.curr_code=qup.curr_code and (qli.ibitmask<0 or
> cardinality(qli.mat_arr) >8)
> cpsdb_matcol(# left join qin on qin.curr_season=qup.curr_season and
> qin.curr_code=qup.curr_code and (qin.ibitmask<0 or
> cardinality(qin.mat_arr) >8)
> cpsdb_matcol(# left join qou on qou.curr_season=qup.curr_season and
> qou.curr_code=qup.curr_code and (qou.ibitmask<0 or
> cardinality(qou.mat_arr) >11)
> cpsdb_matcol(# where qup.ibitmask<0 or cardinality(qup.mat_arr) >21
> cpsdb_matcol(# )
> cpsdb_matcol-# select
> cpsdb_matcol-#  curr_season
> cpsdb_matcol-# ,curr_code
> cpsdb_matcol-# ,new_mat_1
> cpsdb_matcol-# ,new_mat_2
> cpsdb_matcol-# ,new_mat_3
> cpsdb_matcol-# ,NULL::bigint as new_mattype_bitmask
> cpsdb_matcol-# ,NULL as new_mat_codes
> cpsdb_matcol-# from qupd
> cpsdb_matcol-# union all
> cpsdb_matcol-# select
> cpsdb_matcol-#  curr_season
> cpsdb_matcol-# ,curr_code
> cpsdb_matcol-# ,NULL::bigint as new_mat_1
> cpsdb_matcol-# ,NULL::bigint as new_mat_2
> cpsdb_matcol-# ,NULL::bigint as new_mat_3
> cpsdb_matcol-#
> ,substr(curr_mattype_bitmask,length(curr_mattype_bitmask)-63)::bit(64)::bigint as new_mattype_bitmask
> cpsdb_matcol-# ,curr_matcode_arr as new_mat_codes
> cpsdb_matcol-# from qupda,qup.ibitmask|qin.ibitmask|qli.ibitmask|qou.ibitmask as ibitmask
> cpsdb_matcol-# ;
>                                                                    
> QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------------------------------
>  Append  (cost=13673.81..17462.84 rows=5734 width=104) (actual
> time=169.382..210.799 rows=9963 loops=1)
>    CTE qup
>      ->  GroupAggregate  (cost=5231.22..6303.78 rows=10320 width=80)
> (actual time=35.064..68.308 rows=10735 loops=1)
>            Group Key: sa_upper.sup_season, sa_upper.sup_sa_code
>            ->  Sort  (cost=5231.22..5358.64 rows=50969 width=18) (actual
> time=35.053..36.412 rows=50969 loops=1)
>                  Sort Key: sa_upper.sup_season, sa_upper.sup_sa_code
> COLLATE "C"
>                  Sort Method: quicksort  Memory: 4722kB
>                  ->  Hash Left Join  (cost=41.71..1246.13 rows=50969
> width=18) (actual time=0.165..10.562 rows=50969 loops=1)
>                        Hash Cond: ((sa_upper.sup_mat_code)::text =
> upper_target.up_mat_code)
>                        ->  Seq Scan on sa_upper  (cost=0.00..884.69
> rows=50969 width=16) (actual time=0.006..1.990 rows=50969 loops=1)
>                        ->  Hash  (cost=35.53..35.53 rows=495 width=6)
> (actual time=0.157..0.157 rows=495 loops=1)
>                              Buckets: 1024  Batches: 1  Memory Usage: 27kB
>                              ->  Seq Scan on upper_target 
> (cost=0.00..35.53 rows=495 width=6) (actual time=0.006..0.115 rows=495
> loops=1)
>                                    Filter: (id_up <= 495)
>                                    Rows Removed by Filter: 1467
>    CTE qli
>      ->  GroupAggregate  (cost=1097.31..1486.56 rows=10469 width=80)
> (actual time=9.354..28.199 rows=10469 loops=1)
>            Group Key: sa_lining.sli_season, sa_lining.sli_sa_code
>            ->  Sort  (cost=1097.31..1126.74 rows=11774 width=18) (actual
> time=9.347..9.711 rows=11774 loops=1)
>                  Sort Key: sa_lining.sli_season, sa_lining.sli_sa_code
> COLLATE "C"
>                  Sort Method: quicksort  Memory: 1120kB
>                  ->  Hash Left Join  (cost=7.34..301.19 rows=11774
> width=18) (actual time=0.049..2.397 rows=11774 loops=1)
>                        Hash Cond: ((sa_lining.sli_mat_code)::text =
> lining_target.li_mat_code)
>                        ->  Seq Scan on sa_lining  (cost=0.00..204.74
> rows=11774 width=16) (actual time=0.009..0.469 rows=11774 loops=1)
>                        ->  Hash  (cost=5.86..5.86 rows=118 width=6)
> (actual time=0.037..0.037 rows=119 loops=1)
>                              Buckets: 1024  Batches: 1  Memory Usage: 13kB
>                              ->  Seq Scan on lining_target 
> (cost=0.00..5.86 rows=118 width=6) (actual time=0.008..0.025 rows=119
> loops=1)
>                                    Filter: (id_li <= 119)
>                                    Rows Removed by Filter: 190
>    CTE qin
>      ->  GroupAggregate  (cost=1427.34..1880.73 rows=10678 width=80)
> (actual time=11.453..32.317 rows=10678 loops=1)
>            Group Key: sa_insole.sin_season, sa_insole.sin_sa_code
>            ->  Sort  (cost=1427.34..1465.41 rows=15230 width=18) (actual
> time=11.444..11.943 rows=15230 loops=1)
>                  Sort Key: sa_insole.sin_season, sa_insole.sin_sa_code
> COLLATE "C"
>                  Sort Method: quicksort  Memory: 1336kB
>                  ->  Hash Left Join  (cost=10.49..369.26 rows=15230
> width=18) (actual time=0.051..3.098 rows=15230 loops=1)
>                        Hash Cond: ((sa_insole.sin_mat_code)::text =
> insole_target.in_mat_code)
>                        ->  Seq Scan on sa_insole  (cost=0.00..264.30
> rows=15230 width=16) (actual time=0.007..0.608 rows=15230 loops=1)
>                        ->  Hash  (cost=9.01..9.01 rows=118 width=6)
> (actual time=0.041..0.041 rows=119 loops=1)
>                              Buckets: 1024  Batches: 1  Memory Usage: 13kB
>                              ->  Seq Scan on insole_target 
> (cost=0.00..9.01 rows=118 width=6) (actual time=0.007..0.031 rows=119
> loops=1)
>                                    Filter: (id_in <= 119)
>                                    Rows Removed by Filter: 362
>    CTE qou
>      ->  GroupAggregate  (cost=2366.22..2986.89 rows=10699 width=80)
> (actual time=18.055..42.079 rows=10699 loops=1)
>            Group Key: sa_outsole.sou_season, sa_outsole.sou_sa_code
>            ->  Sort  (cost=2366.22..2428.14 rows=24768 width=18) (actual
> time=18.043..18.798 rows=24768 loops=1)
>                  Sort Key: sa_outsole.sou_season, sa_outsole.sou_sa_code
> COLLATE "C"
>                  Sort Method: quicksort  Memory: 2317kB
>                  ->  Hash Left Join  (cost=5.39..558.63 rows=24768
> width=18) (actual time=0.037..5.017 rows=24768 loops=1)
>                        Hash Cond: ((sa_outsole.sou_mat_code)::text =
> outsole_target.ou_mat_code)
>                        ->  Seq Scan on sa_outsole  (cost=0.00..430.68
> rows=24768 width=16) (actual time=0.008..0.998 rows=24768 loops=1)
>                        ->  Hash  (cost=5.03..5.03 rows=29 width=6)
> (actual time=0.025..0.025 rows=29 loops=1)
>                              Buckets: 1024  Batches: 1  Memory Usage: 10kB
>                              ->  Seq Scan on outsole_target 
> (cost=0.00..5.03 rows=29 width=6) (actual time=0.009..0.020 rows=29 loops=1)
>                                    Filter: (id_ou <= 29)
>                                    Rows Removed by Filter: 213
>    ->  Hash Join  (cost=1015.85..1319.04 rows=1 width=104) (actual
> time=169.382..203.707 rows=8548 loops=1)
>          Hash Cond: ((qou.curr_season = qli.curr_season) AND
> ((qou.curr_code)::text = (qli.curr_code)::text))
>          Join Filter: ((((qup.ibitmask | qin.ibitmask) | qli.ibitmask) |
> qou.ibitmask) IS NOT NULL)
>          ->  CTE Scan on qou  (cost=0.00..294.22 rows=1189 width=76)
> (actual time=18.057..45.448 rows=10275 loops=1)
>                Filter: ((ibitmask > 0) AND (cardinality(mat_arr) <= 11))
>                Rows Removed by Filter: 424
>          ->  Hash  (cost=1015.83..1015.83 rows=1 width=228) (actual
> time=151.316..151.317 rows=8845 loops=1)
>                Buckets: 16384 (originally 1024)  Batches: 1 (originally
> 1)  Memory Usage: 1899kB
>                ->  Hash Join  (cost=707.35..1015.83 rows=1 width=228)
> (actual time=122.483..149.030 rows=8845 loops=1)
>                      Hash Cond: ((qin.curr_season = qli.curr_season) AND
> ((qin.curr_code)::text = (qli.curr_code)::text))
>                      ->  CTE Scan on qin  (cost=0.00..293.65 rows=1186
> width=76) (actual time=11.454..35.456 rows=10197 loops=1)
>                            Filter: ((ibitmask > 0) AND
> (cardinality(mat_arr) <= 8))
>                            Rows Removed by Filter: 481
>                      ->  Hash  (cost=706.86..706.86 rows=33 width=152)
> (actual time=111.026..111.027 rows=9007 loops=1)
>                            Buckets: 16384 (originally 1024)  Batches: 1
> (originally 1)  Memory Usage: 1473kB
>                            ->  Merge Join  (cost=689.20..706.86 rows=33
> width=152) (actual time=106.441..109.505 rows=9007 loops=1)
>                                  Merge Cond: ((qup.curr_season =
> qli.curr_season) AND ((qup.curr_code)::text = (qli.curr_code)::text))
>                                  ->  Sort  (cost=342.09..344.96
> rows=1147 width=76) (actual time=73.200..73.429 rows=9320 loops=1)
>                                        Sort Key: qup.curr_season,
> qup.curr_code COLLATE "C"
>                                        Sort Method: quicksort  Memory:
> 1391kB
>                                        ->  CTE Scan on qup 
> (cost=0.00..283.80 rows=1147 width=76) (actual time=35.067..71.872
> rows=9320 loops=1)
>                                              Filter: ((ibitmask > 0) AND
> (cardinality(mat_arr) <= 21))
>                                              Rows Removed by Filter: 1415
>                                  ->  Sort  (cost=347.12..350.02
> rows=1163 width=76) (actual time=33.239..33.490 rows=10289 loops=1)
>                                        Sort Key: qli.curr_season,
> qli.curr_code COLLATE "C"
>                                        Sort Method: quicksort  Memory:
> 1349kB
>                                        ->  CTE Scan on qli 
> (cost=0.00..287.90 rows=1163 width=76) (actual time=9.355..31.457
> rows=10289 loops=1)
>                                              Filter: ((ibitmask > 0) AND
> (cardinality(mat_arr) <= 8))
>                                              Rows Removed by Filter: 180
>    ->  Merge Left Join  (cost=2625.49..3399.84 rows=5733 width=104)
> (actual time=4.529..6.645 rows=1415 loops=1)
>          Merge Cond: ((qup_1.curr_season = qou_1.curr_season) AND
> ((qup_1.curr_code)::text = (qou_1.curr_code)::text))
>          ->  Merge Left Join  (cost=1958.66..2135.28 rows=5733
> width=136) (actual time=3.388..3.833 rows=1415 loops=1)
>                Merge Cond: ((qup_1.curr_season = qin_1.curr_season) AND
> ((qup_1.curr_code)::text = (qin_1.curr_code)::text))
>                ->  Merge Left Join  (cost=1293.25..1388.21 rows=5733
> width=104) (actual time=2.297..2.534 rows=1415 loops=1)
>                      Merge Cond: ((qup_1.curr_season =
> qli_1.curr_season) AND ((qup_1.curr_code)::text = (qli_1.curr_code)::text))
>                      ->  Sort  (cost=641.68..656.02 rows=5733 width=72)
> (actual time=1.278..1.315 rows=1415 loops=1)
>                            Sort Key: qup_1.curr_season, qup_1.curr_code
> COLLATE "C"
>                            Sort Method: quicksort  Memory: 204kB
>                            ->  CTE Scan on qup qup_1  (cost=0.00..283.80
> rows=5733 width=72) (actual time=0.009..1.081 rows=1415 loops=1)
>                                  Filter: ((ibitmask < 0) OR
> (cardinality(mat_arr) > 21))
>                                  Rows Removed by Filter: 9320
>                      ->  Sort  (cost=651.57..666.11 rows=5816 width=72)
> (actual time=1.017..1.022 rows=180 loops=1)
>                            Sort Key: qli_1.curr_season, qli_1.curr_code
> COLLATE "C"
>                            Sort Method: quicksort  Memory: 41kB
>                            ->  CTE Scan on qli qli_1  (cost=0.00..287.90
> rows=5816 width=72) (actual time=0.054..0.994 rows=180 loops=1)
>                                  Filter: ((ibitmask < 0) OR
> (cardinality(mat_arr) > 8))
>                                  Rows Removed by Filter: 10289
>                ->  Sort  (cost=665.41..680.24 rows=5932 width=72)
> (actual time=1.089..1.103 rows=481 loops=1)
>                      Sort Key: qin_1.curr_season, qin_1.curr_code
> COLLATE "C"
>                      Sort Method: quicksort  Memory: 68kB
>                      ->  CTE Scan on qin qin_1  (cost=0.00..293.65
> rows=5932 width=72) (actual time=0.016..1.022 rows=481 loops=1)
>                            Filter: ((ibitmask < 0) OR
> (cardinality(mat_arr) > 8))
>                            Rows Removed by Filter: 10197
>          ->  Sort  (cost=666.83..681.69 rows=5944 width=72) (actual
> time=1.134..1.145 rows=417 loops=1)
>                Sort Key: qou_1.curr_season, qou_1.curr_code COLLATE "C"
>                Sort Method: quicksort  Memory: 68kB
>                ->  CTE Scan on qou qou_1  (cost=0.00..294.22 rows=5944
> width=72) (actual time=0.029..1.038 rows=424 loops=1)
>                      Filter: ((ibitmask < 0) OR (cardinality(mat_arr) > 11))
>                      Rows Removed by Filter: 10275
>  Planning Time: 1.055 ms
>  Execution Time: 212.800 ms
> (118 Zeilen)
> 
> As seen in the line of the qupd CTE
> 
>                            ->  Merge Join  (cost=689.20..706.86 rows=33
> width=152) (actual time=106.441..109.505 rows=9007 loops=1)
> 
> the row count of the second join round drops to 33 and for the third
> round it drops to 1
> 
>                ->  Hash Join  (cost=707.35..1015.83 rows=1 width=228)
> (actual time=122.483..149.030 rows=8845 loops=1)
> 
> BTW, I don't know, why the second join group (part of qupda) gets a
> complete different plan.
> 

It gets a different plan because the "qupd" CTE does this:

  SELECT
   ...
   ,qup.ibitmask|qin.ibitmask|qli.ibitmask|qou.ibitmask as ibitmask
   ...
  FROM ... left join of the CTEs
  WHERE qup.ibitmask>0 AND ..

Which means all the inputs must be non-NULL, hence the optimizer changes
the plan to inner join (and that seems to be perfectly correct).

I think this suggests this join cardinality estimation is not the real
issue. The estimates are off, but there's an order of magnitude
difference for the scans, like here:

    ->  CTE Scan on qup  (cost=0.00..283.80 rows=1147 width=72)
                 (actual time=35.339..71.419 rows=9320 loops=1)

and this tends to "snowball" in the join estimation (it amplifies the
issue - it can't really improve them, except by chance).

FWIW the UNION ALL also explains why we materialize the CTEs, because by
default we fold CTEs into the query only when there's a single
reference. And here both "qupd" and "qupda" reference them.

I'd suggest adding AS NOT MATERIALIZED to the CTEs, to fold them into
the main query despite multiple references. That might improve the
estimate, with a bit of luck.

If not, you'll need to look into improving the scan estimates first,
it's pointless to try to make join estimates better when the input
estimates are this off. This however depends on the conditions, and as
the CTEs do aggregations that may not be possible.

FWIW I suggest you provide the data in a form that's easier to use (like
a working SQL script). More people are likely to look and help than when
they have to extract stuff from an e-mail, fill in missing pieces etc.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



> 
> FWIW I suggest you provide the data in a form that's easier to use (like
> a working SQL script). More people are likely to look and help than when
> they have to extract stuff from an e-mail, fill in missing pieces etc.
> 

BTW if anyone wants to play with this, here are the SQL scripts I used
to create the tables and the queries. There's no data, but it's enough
to see how the plans change.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment
Hi hackers,

I have written a patch to add stats info for Vars in CTEs. With this patch, the join size estimation on the upper of CTE scans became more accurate.

In the function selfuncs.c:eqjoinsel it uses the number of the distinct values of the two join variables to estimate join size, and in the function selfuncs.c:get_variable_numdistinct return a default value DEFAULT_NUM_DISTINCT (200 in Postgres and 1000 in Greenplum), with the default value, you can never expect a good plan.

Thanks if anyone could give a review.

Regards,
Jian


From: Hans Buschmann <buschmann@nidsa.net>
Sent: Wednesday, February 8, 2023 21:55
To: pgsql-hackers@lists.postgresql.org <pgsql-hackers@lists.postgresql.org>
Subject: Wrong rows estimations with joins of CTEs slows queries by more than factor 500
 
!! External Email

During data refactoring of our Application I encountered $subject when joining 4 CTEs with left join or inner join.


1. Background

PG 15.1 on Windows x64 (OS seems no to have no meening here)


I try to collect data from 4 (analyzed) tables (up,li,in,ou) by grouping certain data (4 CTEs qup,qli,qin,qou)

The grouping of the data in the CTEs gives estimated row counts of about 1000 (1 tenth of the real value) This is OK for estimation.


These 4 CTEs are then used to combine the data by joining them.


2. Problem

The 4 CTEs are joined by left joins as shown below:


from qup
left join qli on (qli.curr_season=qup.curr_season and qli.curr_code=qup.curr_code and qli.ibitmask>0 and cardinality(qli.mat_arr) <=8)
left join qin on (qin.curr_season=qup.curr_season and qin.curr_code=qup.curr_code and qin.ibitmask>0 and cardinality(qin.mat_arr) <=8)
left join qou on (qou.curr_season=qup.curr_season and qou.curr_code=qup.curr_code and qou.ibitmask>0 and cardinality(qou.mat_arr) <=11)
where qup.ibitmask>0 and cardinality(qup.mat_arr) <=21

The plan first retrieves qup and qli, taking the estimated row counts of 1163 and 1147 respectively


BUT the result is then hashed and the row count is estimated as 33!


In a Left join the row count stays always the same as the one of left table (here qup with 1163 rows)


The same algorithm which reduces the row estimation from 1163 to 33 is used in the next step to give an estimation of 1 row.

This is totally wrong.


Here is the execution plan of the query:

(search the plan for rows=33)


                                                                    QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=13673.81..17463.30 rows=5734 width=104) (actual time=168.307..222.670 rows=9963 loops=1)
   CTE qup
     ->  GroupAggregate  (cost=5231.22..6303.78 rows=10320 width=80) (actual time=35.466..68.131 rows=10735 loops=1)
           Group Key: sa_upper.sup_season, sa_upper.sup_sa_code
           ->  Sort  (cost=5231.22..5358.64 rows=50969 width=18) (actual time=35.454..36.819 rows=50969 loops=1)
                 Sort Key: sa_upper.sup_season, sa_upper.sup_sa_code COLLATE "C"
                 Sort Method: quicksort  Memory: 4722kB
                 ->  Hash Left Join  (cost=41.71..1246.13 rows=50969 width=18) (actual time=0.148..10.687 rows=50969 loops=1)
                       Hash Cond: ((sa_upper.sup_mat_code)::text = upper_target.up_mat_code)
                       ->  Seq Scan on sa_upper  (cost=0.00..884.69 rows=50969 width=16) (actual time=0.005..1.972 rows=50969 loops=1)
                       ->  Hash  (cost=35.53..35.53 rows=495 width=6) (actual time=0.140..0.140 rows=495 loops=1)
                             Buckets: 1024  Batches: 1  Memory Usage: 27kB
                             ->  Seq Scan on upper_target  (cost=0.00..35.53 rows=495 width=6) (actual time=0.007..0.103 rows=495 loops=1)
                                   Filter: (id_up <= 495)
                                   Rows Removed by Filter: 1467
   CTE qli
     ->  GroupAggregate  (cost=1097.31..1486.56 rows=10469 width=80) (actual time=9.446..27.388 rows=10469 loops=1)
           Group Key: sa_lining.sli_season, sa_lining.sli_sa_code
           ->  Sort  (cost=1097.31..1126.74 rows=11774 width=18) (actual time=9.440..9.811 rows=11774 loops=1)
                 Sort Key: sa_lining.sli_season, sa_lining.sli_sa_code COLLATE "C"
                 Sort Method: quicksort  Memory: 1120kB
                 ->  Hash Left Join  (cost=7.34..301.19 rows=11774 width=18) (actual time=0.045..2.438 rows=11774 loops=1)
                       Hash Cond: ((sa_lining.sli_mat_code)::text = lining_target.li_mat_code)
                       ->  Seq Scan on sa_lining  (cost=0.00..204.74 rows=11774 width=16) (actual time=0.008..0.470 rows=11774 loops=1)
                       ->  Hash  (cost=5.86..5.86 rows=118 width=6) (actual time=0.034..0.034 rows=119 loops=1)
                             Buckets: 1024  Batches: 1  Memory Usage: 13kB
                             ->  Seq Scan on lining_target  (cost=0.00..5.86 rows=118 width=6) (actual time=0.008..0.024 rows=119 loops=1)
                                   Filter: (id_li <= 119)
                                   Rows Removed by Filter: 190
   CTE qin
     ->  GroupAggregate  (cost=1427.34..1880.73 rows=10678 width=80) (actual time=11.424..31.508 rows=10678 loops=1)
           Group Key: sa_insole.sin_season, sa_insole.sin_sa_code
           ->  Sort  (cost=1427.34..1465.41 rows=15230 width=18) (actual time=11.416..11.908 rows=15230 loops=1)
                 Sort Key: sa_insole.sin_season, sa_insole.sin_sa_code COLLATE "C"
                 Sort Method: quicksort  Memory: 1336kB
                 ->  Hash Left Join  (cost=10.49..369.26 rows=15230 width=18) (actual time=0.051..3.108 rows=15230 loops=1)
                       Hash Cond: ((sa_insole.sin_mat_code)::text = insole_target.in_mat_code)
                       ->  Seq Scan on sa_insole  (cost=0.00..264.30 rows=15230 width=16) (actual time=0.006..0.606 rows=15230 loops=1)
                       ->  Hash  (cost=9.01..9.01 rows=118 width=6) (actual time=0.042..0.043 rows=119 loops=1)
                             Buckets: 1024  Batches: 1  Memory Usage: 13kB
                             ->  Seq Scan on insole_target  (cost=0.00..9.01 rows=118 width=6) (actual time=0.008..0.032 rows=119 loops=1)
                                   Filter: (id_in <= 119)
                                   Rows Removed by Filter: 362
   CTE qou
     ->  GroupAggregate  (cost=2366.22..2986.89 rows=10699 width=80) (actual time=18.198..41.812 rows=10699 loops=1)
           Group Key: sa_outsole.sou_season, sa_outsole.sou_sa_code
           ->  Sort  (cost=2366.22..2428.14 rows=24768 width=18) (actual time=18.187..18.967 rows=24768 loops=1)
                 Sort Key: sa_outsole.sou_season, sa_outsole.sou_sa_code COLLATE "C"
                 Sort Method: quicksort  Memory: 2317kB
                 ->  Hash Left Join  (cost=5.39..558.63 rows=24768 width=18) (actual time=0.046..5.132 rows=24768 loops=1)
                       Hash Cond: ((sa_outsole.sou_mat_code)::text = outsole_target.ou_mat_code)
                       ->  Seq Scan on sa_outsole  (cost=0.00..430.68 rows=24768 width=16) (actual time=0.010..1.015 rows=24768 loops=1)
                       ->  Hash  (cost=5.03..5.03 rows=29 width=6) (actual time=0.032..0.032 rows=29 loops=1)
                             Buckets: 1024  Batches: 1  Memory Usage: 10kB
                             ->  Seq Scan on outsole_target  (cost=0.00..5.03 rows=29 width=6) (actual time=0.010..0.025 rows=29 loops=1)
                                   Filter: (id_ou <= 29)
                                   Rows Removed by Filter: 213
   ->  Hash Join  (cost=1015.85..1319.50 rows=1 width=104) (actual time=168.307..215.513 rows=8548 loops=1)
         Hash Cond: ((qou.curr_season = qli.curr_season) AND ((qou.curr_code)::text = (qli.curr_code)::text))
         Join Filter: ((((qup.ibitmask | qin.ibitmask) | qli.ibitmask) | qou.ibitmask) IS NOT NULL)
         ->  CTE Scan on qou  (cost=0.00..294.22 rows=1189 width=76) (actual time=18.200..45.188 rows=10275 loops=1)
               Filter: ((ibitmask > 0) AND (cardinality(mat_arr) <= 11))
               Rows Removed by Filter: 424
         ->  Hash  (cost=1015.83..1015.83 rows=1 width=228) (actual time=150.094..150.095 rows=8845 loops=1)
               Buckets: 16384 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 1899kB
               ->  Hash Join  (cost=707.35..1015.83 rows=1 width=228) (actual time=121.898..147.726 rows=8845 loops=1)
                     Hash Cond: ((qin.curr_season = qli.curr_season) AND ((qin.curr_code)::text = (qli.curr_code)::text))
                     ->  CTE Scan on qin  (cost=0.00..293.65 rows=1186 width=76) (actual time=11.425..34.674 rows=10197 loops=1)
                           Filter: ((ibitmask > 0) AND (cardinality(mat_arr) <= 8))
                           Rows Removed by Filter: 481
                     ->  Hash  (cost=706.86..706.86 rows=33 width=152) (actual time=110.470..110.470 rows=9007 loops=1)
                           Buckets: 16384 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 1473kB
                           ->  Merge Join  (cost=689.20..706.86 rows=33 width=152) (actual time=105.862..108.925 rows=9007 loops=1)
                                 Merge Cond: ((qup.curr_season = qli.curr_season) AND ((qup.curr_code)::text = (qli.curr_code)::text))
                                 ->  Sort  (cost=342.09..344.96 rows=1147 width=76) (actual time=73.419..73.653 rows=9320 loops=1)
                                       Sort Key: qup.curr_season, qup.curr_code COLLATE "C"
                                       Sort Method: quicksort  Memory: 1391kB
                                       ->  CTE Scan on qup  (cost=0.00..283.80 rows=1147 width=76) (actual time=35.467..71.904 rows=9320 loops=1)
                                             Filter: ((ibitmask > 0) AND (cardinality(mat_arr) <= 21))
                                             Rows Removed by Filter: 1415
                                 ->  Sort  (cost=347.12..350.02 rows=1163 width=76) (actual time=32.440..32.697 rows=10289 loops=1)
                                       Sort Key: qli.curr_season, qli.curr_code COLLATE "C"
                                       Sort Method: quicksort  Memory: 1349kB
                                       ->  CTE Scan on qli  (cost=0.00..287.90 rows=1163 width=76) (actual time=9.447..30.666 rows=10289 loops=1)
                                             Filter: ((ibitmask > 0) AND (cardinality(mat_arr) <= 8))
                                             Rows Removed by Filter: 180
   ->  Merge Left Join  (cost=2625.49..3399.84 rows=5733 width=104) (actual time=4.597..6.700 rows=1415 loops=1)
         Merge Cond: ((qup_1.curr_season = qou_1.curr_season) AND ((qup_1.curr_code)::text = (qou_1.curr_code)::text))
         ->  Merge Left Join  (cost=1958.66..2135.28 rows=5733 width=136) (actual time=3.427..3.863 rows=1415 loops=1)
               Merge Cond: ((qup_1.curr_season = qin_1.curr_season) AND ((qup_1.curr_code)::text = (qin_1.curr_code)::text))
               ->  Merge Left Join  (cost=1293.25..1388.21 rows=5733 width=104) (actual time=2.321..2.556 rows=1415 loops=1)
                     Merge Cond: ((qup_1.curr_season = qli_1.curr_season) AND ((qup_1.curr_code)::text = (qli_1.curr_code)::text))
                     ->  Sort  (cost=641.68..656.02 rows=5733 width=72) (actual time=1.286..1.324 rows=1415 loops=1)
                           Sort Key: qup_1.curr_season, qup_1.curr_code COLLATE "C"
                           Sort Method: quicksort  Memory: 204kB
                           ->  CTE Scan on qup qup_1  (cost=0.00..283.80 rows=5733 width=72) (actual time=0.009..1.093 rows=1415 loops=1)
                                 Filter: ((ibitmask < 0) OR (cardinality(mat_arr) > 21))
                                 Rows Removed by Filter: 9320
                     ->  Sort  (cost=651.57..666.11 rows=5816 width=72) (actual time=1.033..1.038 rows=180 loops=1)
                           Sort Key: qli_1.curr_season, qli_1.curr_code COLLATE "C"
                           Sort Method: quicksort  Memory: 41kB
                           ->  CTE Scan on qli qli_1  (cost=0.00..287.90 rows=5816 width=72) (actual time=0.055..1.007 rows=180 loops=1)
                                 Filter: ((ibitmask < 0) OR (cardinality(mat_arr) > 8))
                                 Rows Removed by Filter: 10289
               ->  Sort  (cost=665.41..680.24 rows=5932 width=72) (actual time=1.104..1.117 rows=481 loops=1)
                     Sort Key: qin_1.curr_season, qin_1.curr_code COLLATE "C"
                     Sort Method: quicksort  Memory: 68kB
                     ->  CTE Scan on qin qin_1  (cost=0.00..293.65 rows=5932 width=72) (actual time=0.016..1.038 rows=481 loops=1)
                           Filter: ((ibitmask < 0) OR (cardinality(mat_arr) > 8))
                           Rows Removed by Filter: 10197
         ->  Sort  (cost=666.83..681.69 rows=5944 width=72) (actual time=1.163..1.174 rows=417 loops=1)
               Sort Key: qou_1.curr_season, qou_1.curr_code COLLATE "C"
               Sort Method: quicksort  Memory: 68kB
               ->  CTE Scan on qou qou_1  (cost=0.00..294.22 rows=5944 width=72) (actual time=0.029..1.068 rows=424 loops=1)
                     Filter: ((ibitmask < 0) OR (cardinality(mat_arr) > 11))
                     Rows Removed by Filter: 10275
 Planning Time: 2.297 ms
 Execution Time: 224.759 ms
(118 Zeilen)

3. Slow query from wrong plan as result on similar case with inner join

When the 3 left joins above are changed to inner joins like:

from qup
join qli on (qli.curr_season=qup.curr_season and qli.curr_code=qup.curr_code and qli.ibitmask>0 and cardinality(qli.mat_arr) <=8)
join qin on (qin.curr_season=qup.curr_season and qin.curr_code=qup.curr_code and qin.ibitmask>0 and cardinality(qin.mat_arr) <=8)
join qou on (qou.curr_season=qup.curr_season and qou.curr_code=qup.curr_code and qou.ibitmask>0 and cardinality(qou.mat_arr) <=11)
where qup.ibitmask>0 and cardinality(qup.mat_arr) <=21

The same rows estimation takes place as with the left joins, but the planner now decides to use a nested loop for the last join, which results in a 500fold execution time:

                                                                 QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=13365.31..17472.18 rows=5734 width=104) (actual time=139.037..13403.310 rows=9963 loops=1)
   CTE qup
     ->  GroupAggregate  (cost=5231.22..6303.78 rows=10320 width=80) (actual time=35.399..67.102 rows=10735 loops=1)
           Group Key: sa_upper.sup_season, sa_upper.sup_sa_code
           ->  Sort  (cost=5231.22..5358.64 rows=50969 width=18) (actual time=35.382..36.743 rows=50969 loops=1)
                 Sort Key: sa_upper.sup_season, sa_upper.sup_sa_code COLLATE "C"
                 Sort Method: quicksort  Memory: 4722kB
                 ->  Hash Left Join  (cost=41.71..1246.13 rows=50969 width=18) (actual time=0.157..10.715 rows=50969 loops=1)
                       Hash Cond: ((sa_upper.sup_mat_code)::text = upper_target.up_mat_code)
                       ->  Seq Scan on sa_upper  (cost=0.00..884.69 rows=50969 width=16) (actual time=0.008..2.001 rows=50969 loops=1)
                       ->  Hash  (cost=35.53..35.53 rows=495 width=6) (actual time=0.146..0.146 rows=495 loops=1)
                             Buckets: 1024  Batches: 1  Memory Usage: 27kB
                             ->  Seq Scan on upper_target  (cost=0.00..35.53 rows=495 width=6) (actual time=0.006..0.105 rows=495 loops=1)
                                   Filter: (id_up <= 495)
                                   Rows Removed by Filter: 1467
   CTE qli
     ->  GroupAggregate  (cost=1097.31..1486.56 rows=10469 width=80) (actual time=9.541..27.419 rows=10469 loops=1)
           Group Key: sa_lining.sli_season, sa_lining.sli_sa_code
           ->  Sort  (cost=1097.31..1126.74 rows=11774 width=18) (actual time=9.534..9.908 rows=11774 loops=1)
                 Sort Key: sa_lining.sli_season, sa_lining.sli_sa_code COLLATE "C"
                 Sort Method: quicksort  Memory: 1120kB
                 ->  Hash Left Join  (cost=7.34..301.19 rows=11774 width=18) (actual time=0.049..2.451 rows=11774 loops=1)
                       Hash Cond: ((sa_lining.sli_mat_code)::text = lining_target.li_mat_code)
                       ->  Seq Scan on sa_lining  (cost=0.00..204.74 rows=11774 width=16) (actual time=0.010..0.462 rows=11774 loops=1)
                       ->  Hash  (cost=5.86..5.86 rows=118 width=6) (actual time=0.035..0.035 rows=119 loops=1)
                             Buckets: 1024  Batches: 1  Memory Usage: 13kB
                             ->  Seq Scan on lining_target  (cost=0.00..5.86 rows=118 width=6) (actual time=0.008..0.025 rows=119 loops=1)
                                   Filter: (id_li <= 119)
                                   Rows Removed by Filter: 190
   CTE qin
     ->  GroupAggregate  (cost=1427.34..1880.73 rows=10678 width=80) (actual time=11.649..30.910 rows=10678 loops=1)
           Group Key: sa_insole.sin_season, sa_insole.sin_sa_code
           ->  Sort  (cost=1427.34..1465.41 rows=15230 width=18) (actual time=11.642..12.115 rows=15230 loops=1)
                 Sort Key: sa_insole.sin_season, sa_insole.sin_sa_code COLLATE "C"
                 Sort Method: quicksort  Memory: 1336kB
                 ->  Hash Left Join  (cost=10.49..369.26 rows=15230 width=18) (actual time=0.056..3.144 rows=15230 loops=1)
                       Hash Cond: ((sa_insole.sin_mat_code)::text = insole_target.in_mat_code)
                       ->  Seq Scan on sa_insole  (cost=0.00..264.30 rows=15230 width=16) (actual time=0.008..0.594 rows=15230 loops=1)
                       ->  Hash  (cost=9.01..9.01 rows=118 width=6) (actual time=0.045..0.046 rows=119 loops=1)
                             Buckets: 1024  Batches: 1  Memory Usage: 13kB
                             ->  Seq Scan on insole_target  (cost=0.00..9.01 rows=118 width=6) (actual time=0.008..0.034 rows=119 loops=1)
                                   Filter: (id_in <= 119)
                                   Rows Removed by Filter: 362
   CTE qou
     ->  GroupAggregate  (cost=2366.22..2986.89 rows=10699 width=80) (actual time=18.163..51.151 rows=10699 loops=1)
           Group Key: sa_outsole.sou_season, sa_outsole.sou_sa_code
           ->  Sort  (cost=2366.22..2428.14 rows=24768 width=18) (actual time=18.150..20.000 rows=24768 loops=1)
                 Sort Key: sa_outsole.sou_season, sa_outsole.sou_sa_code COLLATE "C"
                 Sort Method: quicksort  Memory: 2317kB
                 ->  Hash Left Join  (cost=5.39..558.63 rows=24768 width=18) (actual time=0.036..5.106 rows=24768 loops=1)
                       Hash Cond: ((sa_outsole.sou_mat_code)::text = outsole_target.ou_mat_code)
                       ->  Seq Scan on sa_outsole  (cost=0.00..430.68 rows=24768 width=16) (actual time=0.008..1.005 rows=24768 loops=1)
                       ->  Hash  (cost=5.03..5.03 rows=29 width=6) (actual time=0.024..0.024 rows=29 loops=1)
                             Buckets: 1024  Batches: 1  Memory Usage: 10kB
                             ->  Seq Scan on outsole_target  (cost=0.00..5.03 rows=29 width=6) (actual time=0.007..0.018 rows=29 loops=1)
                                   Filter: (id_ou <= 29)
                                   Rows Removed by Filter: 213
   ->  Nested Loop  (cost=707.35..1328.37 rows=1 width=104) (actual time=139.036..13395.820 rows=8548 loops=1)
         Join Filter: ((qli.curr_season = qin.curr_season) AND ((qli.curr_code)::text = (qin.curr_code)::text))
         Rows Removed by Join Filter: 88552397
         ->  Hash Join  (cost=707.35..1016.45 rows=1 width=216) (actual time=127.374..168.249 rows=8685 loops=1)
               Hash Cond: ((qou.curr_season = qli.curr_season) AND ((qou.curr_code)::text = (qli.curr_code)::text))
               ->  CTE Scan on qou  (cost=0.00..294.22 rows=1189 width=72) (actual time=18.165..54.968 rows=10275 loops=1)
                     Filter: ((ibitmask > 0) AND (cardinality(mat_arr) <= 11))
                     Rows Removed by Filter: 424
               ->  Hash  (cost=706.86..706.86 rows=33 width=144) (actual time=109.205..109.207 rows=9007 loops=1)
                     Buckets: 16384 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 1369kB
                     ->  Merge Join  (cost=689.20..706.86 rows=33 width=144) (actual time=104.785..107.748 rows=9007 loops=1)
                           Merge Cond: ((qup.curr_season = qli.curr_season) AND ((qup.curr_code)::text = (qli.curr_code)::text))
                           ->  Sort  (cost=342.09..344.96 rows=1147 width=72) (actual time=72.320..72.559 rows=9320 loops=1)
                                 Sort Key: qup.curr_season, qup.curr_code COLLATE "C"
                                 Sort Method: quicksort  Memory: 1357kB
                                 ->  CTE Scan on qup  (cost=0.00..283.80 rows=1147 width=72) (actual time=35.401..70.834 rows=9320 loops=1)
                                       Filter: ((ibitmask > 0) AND (cardinality(mat_arr) <= 21))
                                       Rows Removed by Filter: 1415
                           ->  Sort  (cost=347.12..350.02 rows=1163 width=72) (actual time=32.461..32.719 rows=10289 loops=1)
                                 Sort Key: qli.curr_season, qli.curr_code COLLATE "C"
                                 Sort Method: quicksort  Memory: 1269kB
                                 ->  CTE Scan on qli  (cost=0.00..287.90 rows=1163 width=72) (actual time=9.543..30.696 rows=10289 loops=1)
                                       Filter: ((ibitmask > 0) AND (cardinality(mat_arr) <= 8))
                                       Rows Removed by Filter: 180
         ->  CTE Scan on qin  (cost=0.00..293.65 rows=1186 width=72) (actual time=0.001..1.159 rows=10197 loops=8685)
               Filter: ((ibitmask > 0) AND (cardinality(mat_arr) <= 8))
               Rows Removed by Filter: 481
   ->  Merge Left Join  (cost=2625.49..3399.84 rows=5733 width=104) (actual time=4.606..6.733 rows=1415 loops=1)
         Merge Cond: ((qup_1.curr_season = qou_1.curr_season) AND ((qup_1.curr_code)::text = (qou_1.curr_code)::text))
         ->  Merge Left Join  (cost=1958.66..2135.28 rows=5733 width=136) (actual time=3.479..3.930 rows=1415 loops=1)
               Merge Cond: ((qup_1.curr_season = qin_1.curr_season) AND ((qup_1.curr_code)::text = (qin_1.curr_code)::text))
               ->  Merge Left Join  (cost=1293.25..1388.21 rows=5733 width=104) (actual time=2.368..2.610 rows=1415 loops=1)
                     Merge Cond: ((qup_1.curr_season = qli_1.curr_season) AND ((qup_1.curr_code)::text = (qli_1.curr_code)::text))
                     ->  Sort  (cost=641.68..656.02 rows=5733 width=72) (actual time=1.296..1.335 rows=1415 loops=1)
                           Sort Key: qup_1.curr_season, qup_1.curr_code COLLATE "C"
                           Sort Method: quicksort  Memory: 204kB
                           ->  CTE Scan on qup qup_1  (cost=0.00..283.80 rows=5733 width=72) (actual time=0.010..1.119 rows=1415 loops=1)
                                 Filter: ((ibitmask < 0) OR (cardinality(mat_arr) > 21))
                                 Rows Removed by Filter: 9320
                     ->  Sort  (cost=651.57..666.11 rows=5816 width=72) (actual time=1.069..1.075 rows=180 loops=1)
                           Sort Key: qli_1.curr_season, qli_1.curr_code COLLATE "C"
                           Sort Method: quicksort  Memory: 41kB
                           ->  CTE Scan on qli qli_1  (cost=0.00..287.90 rows=5816 width=72) (actual time=0.057..1.026 rows=180 loops=1)
                                 Filter: ((ibitmask < 0) OR (cardinality(mat_arr) > 8))
                                 Rows Removed by Filter: 10289
               ->  Sort  (cost=665.41..680.24 rows=5932 width=72) (actual time=1.110..1.124 rows=481 loops=1)
                     Sort Key: qin_1.curr_season, qin_1.curr_code COLLATE "C"
                     Sort Method: quicksort  Memory: 68kB
                     ->  CTE Scan on qin qin_1  (cost=0.00..293.65 rows=5932 width=72) (actual time=0.016..1.046 rows=481 loops=1)
                           Filter: ((ibitmask < 0) OR (cardinality(mat_arr) > 8))
                           Rows Removed by Filter: 10197
         ->  Sort  (cost=666.83..681.69 rows=5944 width=72) (actual time=1.119..1.128 rows=417 loops=1)
               Sort Key: qou_1.curr_season, qou_1.curr_code COLLATE "C"
               Sort Method: quicksort  Memory: 68kB
               ->  CTE Scan on qou qou_1  (cost=0.00..294.22 rows=5944 width=72) (actual time=0.029..1.056 rows=424 loops=1)
                     Filter: ((ibitmask < 0) OR (cardinality(mat_arr) > 11))
                     Rows Removed by Filter: 10275
 Planning Time: 1.746 ms
 Execution Time: 13405.503 ms
(116 Zeilen)

This case really brought me to detect the problem!

The original query and data are not shown here, but the principle should be clear from the execution plans.

I think the planner shouldn't change the row estimations on further steps after left joins at all, and be a bit more conservative on inner joins.
This may be related to the fact that this case has 2 join-conditions (xx_season an xx_code).

Thanks for looking

Hans Buschmann





!! External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.
Attachment
Hi,

I haven't looked at the patch, but please add the patch to the next
commit fest (2023-09), so that we don't lose track of it.

See https://commitfest.postgresql.org


regards

Tomas

On 8/14/23 13:12, Jian Guo wrote:
> Hi hackers,
> 
> I have written a patch to add stats info for Vars in CTEs. With this
> patch, the join size estimation on the upper of CTE scans became more
> accurate.
> 
> In the function |selfuncs.c:eqjoinsel| it uses the number of the
> distinct values of the two join variables to estimate join size, and in
> the function |selfuncs.c:get_variable_numdistinct| return a default
> value |DEFAULT_NUM_DISTINCT| (200 in Postgres and 1000 in Greenplum),
> with the default value, you can never expect a good plan.
> 
> Thanks if anyone could give a review.
> 
> Regards,
> Jian
> 
> ------------------------------------------------------------------------
> *From:* Hans Buschmann <buschmann@nidsa.net>
> *Sent:* Wednesday, February 8, 2023 21:55
> *To:* pgsql-hackers@lists.postgresql.org
> <pgsql-hackers@lists.postgresql.org>
> *Subject:* Wrong rows estimations with joins of CTEs slows queries by
> more than factor 500
>  
>     
> !! External Email
> 
> During data refactoring of our Application I encountered $subject when
> joining 4 CTEs with left join or inner join.
> 
> 
> 1. Background
> 
> PG 15.1 on Windows x64 (OS seems no to have no meening here)
> 
> 
> I try to collect data from 4 (analyzed) tables (up,li,in,ou) by grouping
> certain data (4 CTEs qup,qli,qin,qou)
> 
> The grouping of the data in the CTEs gives estimated row counts of about
> 1000 (1 tenth of the real value) This is OK for estimation.
> 
> 
> These 4 CTEs are then used to combine the data by joining them.
> 
> 
> 2. Problem
> 
> The 4 CTEs are joined by left joins as shown below:
> 
> 
> from qup
> left join qli on (qli.curr_season=qup.curr_season and
> qli.curr_code=qup.curr_code and qli.ibitmask>0 and
> cardinality(qli.mat_arr) <=8)
> left join qin on (qin.curr_season=qup.curr_season and
> qin.curr_code=qup.curr_code and qin.ibitmask>0 and
> cardinality(qin.mat_arr) <=8)
> left join qou on (qou.curr_season=qup.curr_season and
> qou.curr_code=qup.curr_code and qou.ibitmask>0 and
> cardinality(qou.mat_arr) <=11)
> where qup.ibitmask>0 and cardinality(qup.mat_arr) <=21
> 
> The plan first retrieves qup and qli, taking the estimated row counts of
> 1163 and 1147 respectively
> 
> 
> BUT the result is then hashed and the row count is estimated as 33!
> 
> 
> In a Left join the row count stays always the same as the one of left
> table (here qup with 1163 rows)
> 
> 
> The same algorithm which reduces the row estimation from 1163 to 33 is
> used in the next step to give an estimation of 1 row.
> 
> This is totally wrong.
> 
> 
> Here is the execution plan of the query:
> 
> (search the plan for rows=33)
> 
> 
>                                                                    
> QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------------------------------
>  Append  (cost=13673.81..17463.30 rows=5734 width=104) (actual
> time=168.307..222.670 rows=9963 loops=1)
>    CTE qup
>      ->  GroupAggregate  (cost=5231.22..6303.78 rows=10320 width=80)
> (actual time=35.466..68.131 rows=10735 loops=1)
>            Group Key: sa_upper.sup_season, sa_upper.sup_sa_code
>            ->  Sort  (cost=5231.22..5358.64 rows=50969 width=18) (actual
> time=35.454..36.819 rows=50969 loops=1)
>                  Sort Key: sa_upper.sup_season, sa_upper.sup_sa_code
> COLLATE "C"
>                  Sort Method: quicksort  Memory: 4722kB
>                  ->  Hash Left Join  (cost=41.71..1246.13 rows=50969
> width=18) (actual time=0.148..10.687 rows=50969 loops=1)
>                        Hash Cond: ((sa_upper.sup_mat_code)::text =
> upper_target.up_mat_code)
>                        ->  Seq Scan on sa_upper  (cost=0.00..884.69
> rows=50969 width=16) (actual time=0.005..1.972 rows=50969 loops=1)
>                        ->  Hash  (cost=35.53..35.53 rows=495 width=6)
> (actual time=0.140..0.140 rows=495 loops=1)
>                              Buckets: 1024  Batches: 1  Memory Usage: 27kB
>                              ->  Seq Scan on upper_target 
> (cost=0.00..35.53 rows=495 width=6) (actual time=0.007..0.103 rows=495
> loops=1)
>                                    Filter: (id_up <= 495)
>                                    Rows Removed by Filter: 1467
>    CTE qli
>      ->  GroupAggregate  (cost=1097.31..1486.56 rows=10469 width=80)
> (actual time=9.446..27.388 rows=10469 loops=1)
>            Group Key: sa_lining.sli_season, sa_lining.sli_sa_code
>            ->  Sort  (cost=1097.31..1126.74 rows=11774 width=18) (actual
> time=9.440..9.811 rows=11774 loops=1)
>                  Sort Key: sa_lining.sli_season, sa_lining.sli_sa_code
> COLLATE "C"
>                  Sort Method: quicksort  Memory: 1120kB
>                  ->  Hash Left Join  (cost=7.34..301.19 rows=11774
> width=18) (actual time=0.045..2.438 rows=11774 loops=1)
>                        Hash Cond: ((sa_lining.sli_mat_code)::text =
> lining_target.li_mat_code)
>                        ->  Seq Scan on sa_lining  (cost=0.00..204.74
> rows=11774 width=16) (actual time=0.008..0.470 rows=11774 loops=1)
>                        ->  Hash  (cost=5.86..5.86 rows=118 width=6)
> (actual time=0.034..0.034 rows=119 loops=1)
>                              Buckets: 1024  Batches: 1  Memory Usage: 13kB
>                              ->  Seq Scan on lining_target 
> (cost=0.00..5.86 rows=118 width=6) (actual time=0.008..0.024 rows=119
> loops=1)
>                                    Filter: (id_li <= 119)
>                                    Rows Removed by Filter: 190
>    CTE qin
>      ->  GroupAggregate  (cost=1427.34..1880.73 rows=10678 width=80)
> (actual time=11.424..31.508 rows=10678 loops=1)
>            Group Key: sa_insole.sin_season, sa_insole.sin_sa_code
>            ->  Sort  (cost=1427.34..1465.41 rows=15230 width=18) (actual
> time=11.416..11.908 rows=15230 loops=1)
>                  Sort Key: sa_insole.sin_season, sa_insole.sin_sa_code
> COLLATE "C"
>                  Sort Method: quicksort  Memory: 1336kB
>                  ->  Hash Left Join  (cost=10.49..369.26 rows=15230
> width=18) (actual time=0.051..3.108 rows=15230 loops=1)
>                        Hash Cond: ((sa_insole.sin_mat_code)::text =
> insole_target.in_mat_code)
>                        ->  Seq Scan on sa_insole  (cost=0.00..264.30
> rows=15230 width=16) (actual time=0.006..0.606 rows=15230 loops=1)
>                        ->  Hash  (cost=9.01..9.01 rows=118 width=6)
> (actual time=0.042..0.043 rows=119 loops=1)
>                              Buckets: 1024  Batches: 1  Memory Usage: 13kB
>                              ->  Seq Scan on insole_target 
> (cost=0.00..9.01 rows=118 width=6) (actual time=0.008..0.032 rows=119
> loops=1)
>                                    Filter: (id_in <= 119)
>                                    Rows Removed by Filter: 362
>    CTE qou
>      ->  GroupAggregate  (cost=2366.22..2986.89 rows=10699 width=80)
> (actual time=18.198..41.812 rows=10699 loops=1)
>            Group Key: sa_outsole.sou_season, sa_outsole.sou_sa_code
>            ->  Sort  (cost=2366.22..2428.14 rows=24768 width=18) (actual
> time=18.187..18.967 rows=24768 loops=1)
>                  Sort Key: sa_outsole.sou_season, sa_outsole.sou_sa_code
> COLLATE "C"
>                  Sort Method: quicksort  Memory: 2317kB
>                  ->  Hash Left Join  (cost=5.39..558.63 rows=24768
> width=18) (actual time=0.046..5.132 rows=24768 loops=1)
>                        Hash Cond: ((sa_outsole.sou_mat_code)::text =
> outsole_target.ou_mat_code)
>                        ->  Seq Scan on sa_outsole  (cost=0.00..430.68
> rows=24768 width=16) (actual time=0.010..1.015 rows=24768 loops=1)
>                        ->  Hash  (cost=5.03..5.03 rows=29 width=6)
> (actual time=0.032..0.032 rows=29 loops=1)
>                              Buckets: 1024  Batches: 1  Memory Usage: 10kB
>                              ->  Seq Scan on outsole_target 
> (cost=0.00..5.03 rows=29 width=6) (actual time=0.010..0.025 rows=29 loops=1)
>                                    Filter: (id_ou <= 29)
>                                    Rows Removed by Filter: 213
>    ->  Hash Join  (cost=1015.85..1319.50 rows=1 width=104) (actual
> time=168.307..215.513 rows=8548 loops=1)
>          Hash Cond: ((qou.curr_season = qli.curr_season) AND
> ((qou.curr_code)::text = (qli.curr_code)::text))
>          Join Filter: ((((qup.ibitmask | qin.ibitmask) | qli.ibitmask) |
> qou.ibitmask) IS NOT NULL)
>          ->  CTE Scan on qou  (cost=0.00..294.22 rows=1189 width=76)
> (actual time=18.200..45.188 rows=10275 loops=1)
>                Filter: ((ibitmask > 0) AND (cardinality(mat_arr) <= 11))
>                Rows Removed by Filter: 424
>          ->  Hash  (cost=1015.83..1015.83 rows=1 width=228) (actual
> time=150.094..150.095 rows=8845 loops=1)
>                Buckets: 16384 (originally 1024)  Batches: 1 (originally
> 1)  Memory Usage: 1899kB
>                ->  Hash Join  (cost=707.35..1015.83 rows=1 width=228)
> (actual time=121.898..147.726 rows=8845 loops=1)
>                      Hash Cond: ((qin.curr_season = qli.curr_season) AND
> ((qin.curr_code)::text = (qli.curr_code)::text))
>                      ->  CTE Scan on qin  (cost=0.00..293.65 rows=1186
> width=76) (actual time=11.425..34.674 rows=10197 loops=1)
>                            Filter: ((ibitmask > 0) AND
> (cardinality(mat_arr) <= 8))
>                            Rows Removed by Filter: 481
>                      ->  Hash  (cost=706.86..706.86 rows=33 width=152)
> (actual time=110.470..110.470 rows=9007 loops=1)
>                            Buckets: 16384 (originally 1024)  Batches: 1
> (originally 1)  Memory Usage: 1473kB
>                            ->  Merge Join  (cost=689.20..706.86 rows=33
> width=152) (actual time=105.862..108.925 rows=9007 loops=1)
>                                  Merge Cond: ((qup.curr_season =
> qli.curr_season) AND ((qup.curr_code)::text = (qli.curr_code)::text))
>                                  ->  Sort  (cost=342.09..344.96
> rows=1147 width=76) (actual time=73.419..73.653 rows=9320 loops=1)
>                                        Sort Key: qup.curr_season,
> qup.curr_code COLLATE "C"
>                                        Sort Method: quicksort  Memory:
> 1391kB
>                                        ->  CTE Scan on qup 
> (cost=0.00..283.80 rows=1147 width=76) (actual time=35.467..71.904
> rows=9320 loops=1)
>                                              Filter: ((ibitmask > 0) AND
> (cardinality(mat_arr) <= 21))
>                                              Rows Removed by Filter: 1415
>                                  ->  Sort  (cost=347.12..350.02
> rows=1163 width=76) (actual time=32.440..32.697 rows=10289 loops=1)
>                                        Sort Key: qli.curr_season,
> qli.curr_code COLLATE "C"
>                                        Sort Method: quicksort  Memory:
> 1349kB
>                                        ->  CTE Scan on qli 
> (cost=0.00..287.90 rows=1163 width=76) (actual time=9.447..30.666
> rows=10289 loops=1)
>                                              Filter: ((ibitmask > 0) AND
> (cardinality(mat_arr) <= 8))
>                                              Rows Removed by Filter: 180
>    ->  Merge Left Join  (cost=2625.49..3399.84 rows=5733 width=104)
> (actual time=4.597..6.700 rows=1415 loops=1)
>          Merge Cond: ((qup_1.curr_season = qou_1.curr_season) AND
> ((qup_1.curr_code)::text = (qou_1.curr_code)::text))
>          ->  Merge Left Join  (cost=1958.66..2135.28 rows=5733
> width=136) (actual time=3.427..3.863 rows=1415 loops=1)
>                Merge Cond: ((qup_1.curr_season = qin_1.curr_season) AND
> ((qup_1.curr_code)::text = (qin_1.curr_code)::text))
>                ->  Merge Left Join  (cost=1293.25..1388.21 rows=5733
> width=104) (actual time=2.321..2.556 rows=1415 loops=1)
>                      Merge Cond: ((qup_1.curr_season =
> qli_1.curr_season) AND ((qup_1.curr_code)::text = (qli_1.curr_code)::text))
>                      ->  Sort  (cost=641.68..656.02 rows=5733 width=72)
> (actual time=1.286..1.324 rows=1415 loops=1)
>                            Sort Key: qup_1.curr_season, qup_1.curr_code
> COLLATE "C"
>                            Sort Method: quicksort  Memory: 204kB
>                            ->  CTE Scan on qup qup_1  (cost=0.00..283.80
> rows=5733 width=72) (actual time=0.009..1.093 rows=1415 loops=1)
>                                  Filter: ((ibitmask < 0) OR
> (cardinality(mat_arr) > 21))
>                                  Rows Removed by Filter: 9320
>                      ->  Sort  (cost=651.57..666.11 rows=5816 width=72)
> (actual time=1.033..1.038 rows=180 loops=1)
>                            Sort Key: qli_1.curr_season, qli_1.curr_code
> COLLATE "C"
>                            Sort Method: quicksort  Memory: 41kB
>                            ->  CTE Scan on qli qli_1  (cost=0.00..287.90
> rows=5816 width=72) (actual time=0.055..1.007 rows=180 loops=1)
>                                  Filter: ((ibitmask < 0) OR
> (cardinality(mat_arr) > 8))
>                                  Rows Removed by Filter: 10289
>                ->  Sort  (cost=665.41..680.24 rows=5932 width=72)
> (actual time=1.104..1.117 rows=481 loops=1)
>                      Sort Key: qin_1.curr_season, qin_1.curr_code
> COLLATE "C"
>                      Sort Method: quicksort  Memory: 68kB
>                      ->  CTE Scan on qin qin_1  (cost=0.00..293.65
> rows=5932 width=72) (actual time=0.016..1.038 rows=481 loops=1)
>                            Filter: ((ibitmask < 0) OR
> (cardinality(mat_arr) > 8))
>                            Rows Removed by Filter: 10197
>          ->  Sort  (cost=666.83..681.69 rows=5944 width=72) (actual
> time=1.163..1.174 rows=417 loops=1)
>                Sort Key: qou_1.curr_season, qou_1.curr_code COLLATE "C"
>                Sort Method: quicksort  Memory: 68kB
>                ->  CTE Scan on qou qou_1  (cost=0.00..294.22 rows=5944
> width=72) (actual time=0.029..1.068 rows=424 loops=1)
>                      Filter: ((ibitmask < 0) OR (cardinality(mat_arr) > 11))
>                      Rows Removed by Filter: 10275
>  Planning Time: 2.297 ms
>  Execution Time: 224.759 ms
> (118 Zeilen)
> 
> 3. Slow query from wrong plan as result on similar case with inner join
> 
> When the 3 left joins above are changed to inner joins like:
> 
> from qup
> join qli on (qli.curr_season=qup.curr_season and
> qli.curr_code=qup.curr_code and qli.ibitmask>0 and
> cardinality(qli.mat_arr) <=8)
> join qin on (qin.curr_season=qup.curr_season and
> qin.curr_code=qup.curr_code and qin.ibitmask>0 and
> cardinality(qin.mat_arr) <=8)
> join qou on (qou.curr_season=qup.curr_season and
> qou.curr_code=qup.curr_code and qou.ibitmask>0 and
> cardinality(qou.mat_arr) <=11)
> where qup.ibitmask>0 and cardinality(qup.mat_arr) <=21
> 
> The same rows estimation takes place as with the left joins, but the
> planner now decides to use a nested loop for the last join, which
> results in a 500fold execution time:
> 
>                                                                  QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------------------------
>  Append  (cost=13365.31..17472.18 rows=5734 width=104) (actual
> time=139.037..13403.310 rows=9963 loops=1)
>    CTE qup
>      ->  GroupAggregate  (cost=5231.22..6303.78 rows=10320 width=80)
> (actual time=35.399..67.102 rows=10735 loops=1)
>            Group Key: sa_upper.sup_season, sa_upper.sup_sa_code
>            ->  Sort  (cost=5231.22..5358.64 rows=50969 width=18) (actual
> time=35.382..36.743 rows=50969 loops=1)
>                  Sort Key: sa_upper.sup_season, sa_upper.sup_sa_code
> COLLATE "C"
>                  Sort Method: quicksort  Memory: 4722kB
>                  ->  Hash Left Join  (cost=41.71..1246.13 rows=50969
> width=18) (actual time=0.157..10.715 rows=50969 loops=1)
>                        Hash Cond: ((sa_upper.sup_mat_code)::text =
> upper_target.up_mat_code)
>                        ->  Seq Scan on sa_upper  (cost=0.00..884.69
> rows=50969 width=16) (actual time=0.008..2.001 rows=50969 loops=1)
>                        ->  Hash  (cost=35.53..35.53 rows=495 width=6)
> (actual time=0.146..0.146 rows=495 loops=1)
>                              Buckets: 1024  Batches: 1  Memory Usage: 27kB
>                              ->  Seq Scan on upper_target 
> (cost=0.00..35.53 rows=495 width=6) (actual time=0.006..0.105 rows=495
> loops=1)
>                                    Filter: (id_up <= 495)
>                                    Rows Removed by Filter: 1467
>    CTE qli
>      ->  GroupAggregate  (cost=1097.31..1486.56 rows=10469 width=80)
> (actual time=9.541..27.419 rows=10469 loops=1)
>            Group Key: sa_lining.sli_season, sa_lining.sli_sa_code
>            ->  Sort  (cost=1097.31..1126.74 rows=11774 width=18) (actual
> time=9.534..9.908 rows=11774 loops=1)
>                  Sort Key: sa_lining.sli_season, sa_lining.sli_sa_code
> COLLATE "C"
>                  Sort Method: quicksort  Memory: 1120kB
>                  ->  Hash Left Join  (cost=7.34..301.19 rows=11774
> width=18) (actual time=0.049..2.451 rows=11774 loops=1)
>                        Hash Cond: ((sa_lining.sli_mat_code)::text =
> lining_target.li_mat_code)
>                        ->  Seq Scan on sa_lining  (cost=0.00..204.74
> rows=11774 width=16) (actual time=0.010..0.462 rows=11774 loops=1)
>                        ->  Hash  (cost=5.86..5.86 rows=118 width=6)
> (actual time=0.035..0.035 rows=119 loops=1)
>                              Buckets: 1024  Batches: 1  Memory Usage: 13kB
>                              ->  Seq Scan on lining_target 
> (cost=0.00..5.86 rows=118 width=6) (actual time=0.008..0.025 rows=119
> loops=1)
>                                    Filter: (id_li <= 119)
>                                    Rows Removed by Filter: 190
>    CTE qin
>      ->  GroupAggregate  (cost=1427.34..1880.73 rows=10678 width=80)
> (actual time=11.649..30.910 rows=10678 loops=1)
>            Group Key: sa_insole.sin_season, sa_insole.sin_sa_code
>            ->  Sort  (cost=1427.34..1465.41 rows=15230 width=18) (actual
> time=11.642..12.115 rows=15230 loops=1)
>                  Sort Key: sa_insole.sin_season, sa_insole.sin_sa_code
> COLLATE "C"
>                  Sort Method: quicksort  Memory: 1336kB
>                  ->  Hash Left Join  (cost=10.49..369.26 rows=15230
> width=18) (actual time=0.056..3.144 rows=15230 loops=1)
>                        Hash Cond: ((sa_insole.sin_mat_code)::text =
> insole_target.in_mat_code)
>                        ->  Seq Scan on sa_insole  (cost=0.00..264.30
> rows=15230 width=16) (actual time=0.008..0.594 rows=15230 loops=1)
>                        ->  Hash  (cost=9.01..9.01 rows=118 width=6)
> (actual time=0.045..0.046 rows=119 loops=1)
>                              Buckets: 1024  Batches: 1  Memory Usage: 13kB
>                              ->  Seq Scan on insole_target 
> (cost=0.00..9.01 rows=118 width=6) (actual time=0.008..0.034 rows=119
> loops=1)
>                                    Filter: (id_in <= 119)
>                                    Rows Removed by Filter: 362
>    CTE qou
>      ->  GroupAggregate  (cost=2366.22..2986.89 rows=10699 width=80)
> (actual time=18.163..51.151 rows=10699 loops=1)
>            Group Key: sa_outsole.sou_season, sa_outsole.sou_sa_code
>            ->  Sort  (cost=2366.22..2428.14 rows=24768 width=18) (actual
> time=18.150..20.000 rows=24768 loops=1)
>                  Sort Key: sa_outsole.sou_season, sa_outsole.sou_sa_code
> COLLATE "C"
>                  Sort Method: quicksort  Memory: 2317kB
>                  ->  Hash Left Join  (cost=5.39..558.63 rows=24768
> width=18) (actual time=0.036..5.106 rows=24768 loops=1)
>                        Hash Cond: ((sa_outsole.sou_mat_code)::text =
> outsole_target.ou_mat_code)
>                        ->  Seq Scan on sa_outsole  (cost=0.00..430.68
> rows=24768 width=16) (actual time=0.008..1.005 rows=24768 loops=1)
>                        ->  Hash  (cost=5.03..5.03 rows=29 width=6)
> (actual time=0.024..0.024 rows=29 loops=1)
>                              Buckets: 1024  Batches: 1  Memory Usage: 10kB
>                              ->  Seq Scan on outsole_target 
> (cost=0.00..5.03 rows=29 width=6) (actual time=0.007..0.018 rows=29 loops=1)
>                                    Filter: (id_ou <= 29)
>                                    Rows Removed by Filter: 213
>    ->  Nested Loop  (cost=707.35..1328.37 rows=1 width=104) (actual
> time=139.036..13395.820 rows=8548 loops=1)
>          Join Filter: ((qli.curr_season = qin.curr_season) AND
> ((qli.curr_code)::text = (qin.curr_code)::text))
>          Rows Removed by Join Filter: 88552397
>          ->  Hash Join  (cost=707.35..1016.45 rows=1 width=216) (actual
> time=127.374..168.249 rows=8685 loops=1)
>                Hash Cond: ((qou.curr_season = qli.curr_season) AND
> ((qou.curr_code)::text = (qli.curr_code)::text))
>                ->  CTE Scan on qou  (cost=0.00..294.22 rows=1189
> width=72) (actual time=18.165..54.968 rows=10275 loops=1)
>                      Filter: ((ibitmask > 0) AND (cardinality(mat_arr)
> <= 11))
>                      Rows Removed by Filter: 424
>                ->  Hash  (cost=706.86..706.86 rows=33 width=144) (actual
> time=109.205..109.207 rows=9007 loops=1)
>                      Buckets: 16384 (originally 1024)  Batches: 1
> (originally 1)  Memory Usage: 1369kB
>                      ->  Merge Join  (cost=689.20..706.86 rows=33
> width=144) (actual time=104.785..107.748 rows=9007 loops=1)
>                            Merge Cond: ((qup.curr_season =
> qli.curr_season) AND ((qup.curr_code)::text = (qli.curr_code)::text))
>                            ->  Sort  (cost=342.09..344.96 rows=1147
> width=72) (actual time=72.320..72.559 rows=9320 loops=1)
>                                  Sort Key: qup.curr_season,
> qup.curr_code COLLATE "C"
>                                  Sort Method: quicksort  Memory: 1357kB
>                                  ->  CTE Scan on qup  (cost=0.00..283.80
> rows=1147 width=72) (actual time=35.401..70.834 rows=9320 loops=1)
>                                        Filter: ((ibitmask > 0) AND
> (cardinality(mat_arr) <= 21))
>                                        Rows Removed by Filter: 1415
>                            ->  Sort  (cost=347.12..350.02 rows=1163
> width=72) (actual time=32.461..32.719 rows=10289 loops=1)
>                                  Sort Key: qli.curr_season,
> qli.curr_code COLLATE "C"
>                                  Sort Method: quicksort  Memory: 1269kB
>                                  ->  CTE Scan on qli  (cost=0.00..287.90
> rows=1163 width=72) (actual time=9.543..30.696 rows=10289 loops=1)
>                                        Filter: ((ibitmask > 0) AND
> (cardinality(mat_arr) <= 8))
>                                        Rows Removed by Filter: 180
>          ->  CTE Scan on qin  (cost=0.00..293.65 rows=1186 width=72)
> (actual time=0.001..1.159 rows=10197 loops=8685)
>                Filter: ((ibitmask > 0) AND (cardinality(mat_arr) <= 8))
>                Rows Removed by Filter: 481
>    ->  Merge Left Join  (cost=2625.49..3399.84 rows=5733 width=104)
> (actual time=4.606..6.733 rows=1415 loops=1)
>          Merge Cond: ((qup_1.curr_season = qou_1.curr_season) AND
> ((qup_1.curr_code)::text = (qou_1.curr_code)::text))
>          ->  Merge Left Join  (cost=1958.66..2135.28 rows=5733
> width=136) (actual time=3.479..3.930 rows=1415 loops=1)
>                Merge Cond: ((qup_1.curr_season = qin_1.curr_season) AND
> ((qup_1.curr_code)::text = (qin_1.curr_code)::text))
>                ->  Merge Left Join  (cost=1293.25..1388.21 rows=5733
> width=104) (actual time=2.368..2.610 rows=1415 loops=1)
>                      Merge Cond: ((qup_1.curr_season =
> qli_1.curr_season) AND ((qup_1.curr_code)::text = (qli_1.curr_code)::text))
>                      ->  Sort  (cost=641.68..656.02 rows=5733 width=72)
> (actual time=1.296..1.335 rows=1415 loops=1)
>                            Sort Key: qup_1.curr_season, qup_1.curr_code
> COLLATE "C"
>                            Sort Method: quicksort  Memory: 204kB
>                            ->  CTE Scan on qup qup_1  (cost=0.00..283.80
> rows=5733 width=72) (actual time=0.010..1.119 rows=1415 loops=1)
>                                  Filter: ((ibitmask < 0) OR
> (cardinality(mat_arr) > 21))
>                                  Rows Removed by Filter: 9320
>                      ->  Sort  (cost=651.57..666.11 rows=5816 width=72)
> (actual time=1.069..1.075 rows=180 loops=1)
>                            Sort Key: qli_1.curr_season, qli_1.curr_code
> COLLATE "C"
>                            Sort Method: quicksort  Memory: 41kB
>                            ->  CTE Scan on qli qli_1  (cost=0.00..287.90
> rows=5816 width=72) (actual time=0.057..1.026 rows=180 loops=1)
>                                  Filter: ((ibitmask < 0) OR
> (cardinality(mat_arr) > 8))
>                                  Rows Removed by Filter: 10289
>                ->  Sort  (cost=665.41..680.24 rows=5932 width=72)
> (actual time=1.110..1.124 rows=481 loops=1)
>                      Sort Key: qin_1.curr_season, qin_1.curr_code
> COLLATE "C"
>                      Sort Method: quicksort  Memory: 68kB
>                      ->  CTE Scan on qin qin_1  (cost=0.00..293.65
> rows=5932 width=72) (actual time=0.016..1.046 rows=481 loops=1)
>                            Filter: ((ibitmask < 0) OR
> (cardinality(mat_arr) > 8))
>                            Rows Removed by Filter: 10197
>          ->  Sort  (cost=666.83..681.69 rows=5944 width=72) (actual
> time=1.119..1.128 rows=417 loops=1)
>                Sort Key: qou_1.curr_season, qou_1.curr_code COLLATE "C"
>                Sort Method: quicksort  Memory: 68kB
>                ->  CTE Scan on qou qou_1  (cost=0.00..294.22 rows=5944
> width=72) (actual time=0.029..1.056 rows=424 loops=1)
>                      Filter: ((ibitmask < 0) OR (cardinality(mat_arr) > 11))
>                      Rows Removed by Filter: 10275
>  Planning Time: 1.746 ms
>  Execution Time: 13405.503 ms
> (116 Zeilen)
> 
> This case really brought me to detect the problem!
> 
> The original query and data are not shown here, but the principle should
> be clear from the execution plans.
> 
> I think the planner shouldn't change the row estimations on further
> steps after left joins at all, and be a bit more conservative on inner
> joins.
> This may be related to the fact that this case has 2 join-conditions
> (xx_season an xx_code).
> 
> Thanks for looking
> 
> Hans Buschmann
> 
> 
> 
> 
> 
>     
> !! External Email: This email originated from outside of the
> organization. Do not click links or open attachments unless you
> recognize the sender.
> 

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Hi hackers,

I found a new approach to fix this issue, which seems better, so I would like to post another version of the patch here. The origin patch made the assumption of the values of Vars from CTE must be unique, which could be very wrong. This patch examines variables for Vars inside CTE, which avoided the bad assumption, so the results could be much more accurate.

Regards,
Jian


From: Tomas Vondra <tomas.vondra@enterprisedb.com>
Sent: Monday, August 14, 2023 20:58
To: Jian Guo <gjian@vmware.com>; Hans Buschmann <buschmann@nidsa.net>; pgsql-hackers@lists.postgresql.org <pgsql-hackers@lists.postgresql.org>
Subject: Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500
 
!! External Email

Hi,

I haven't looked at the patch, but please add the patch to the next
commit fest (2023-09), so that we don't lose track of it.

See https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcommitfest.postgresql.org%2F&data=05%7C01%7Cgjian%40vmware.com%7C9d40e84af2c946f3517a08db9cc61ee2%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638276146959658928%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=EUlMgo%2BU4Oi%2BWf0cS%2FKnTmhHzZrYzu26PzfxYnZIDFs%3D&reserved=0


regards

Tomas

On 8/14/23 13:12, Jian Guo wrote:
> Hi hackers,
>
> I have written a patch to add stats info for Vars in CTEs. With this
> patch, the join size estimation on the upper of CTE scans became more
> accurate.
>
> In the function |selfuncs.c:eqjoinsel| it uses the number of the
> distinct values of the two join variables to estimate join size, and in
> the function |selfuncs.c:get_variable_numdistinct| return a default
> value |DEFAULT_NUM_DISTINCT| (200 in Postgres and 1000 in Greenplum),
> with the default value, you can never expect a good plan.
>
> Thanks if anyone could give a review.
>
> Regards,
> Jian
>
> ------------------------------------------------------------------------
> *From:* Hans Buschmann <buschmann@nidsa.net>
> *Sent:* Wednesday, February 8, 2023 21:55
> *To:* pgsql-hackers@lists.postgresql.org
> <pgsql-hackers@lists.postgresql.org>
> *Subject:* Wrong rows estimations with joins of CTEs slows queries by
> more than factor 500
>
>
> !! External Email
>
> During data refactoring of our Application I encountered $subject when
> joining 4 CTEs with left join or inner join.
>
>
> 1. Background
>
> PG 15.1 on Windows x64 (OS seems no to have no meening here)
>
>
> I try to collect data from 4 (analyzed) tables (up,li,in,ou) by grouping
> certain data (4 CTEs qup,qli,qin,qou)
>
> The grouping of the data in the CTEs gives estimated row counts of about
> 1000 (1 tenth of the real value) This is OK for estimation.
>
>
> These 4 CTEs are then used to combine the data by joining them.
>
>
> 2. Problem
>
> The 4 CTEs are joined by left joins as shown below:
>
>
> from qup
> left join qli on (qli.curr_season=qup.curr_season and
> qli.curr_code=qup.curr_code and qli.ibitmask>0 and
> cardinality(qli.mat_arr) <=8)
> left join qin on (qin.curr_season=qup.curr_season and
> qin.curr_code=qup.curr_code and qin.ibitmask>0 and
> cardinality(qin.mat_arr) <=8)
> left join qou on (qou.curr_season=qup.curr_season and
> qou.curr_code=qup.curr_code and qou.ibitmask>0 and
> cardinality(qou.mat_arr) <=11)
> where qup.ibitmask>0 and cardinality(qup.mat_arr) <=21
>
> The plan first retrieves qup and qli, taking the estimated row counts of
> 1163 and 1147 respectively
>
>
> BUT the result is then hashed and the row count is estimated as 33!
>
>
> In a Left join the row count stays always the same as the one of left
> table (here qup with 1163 rows)
>
>
> The same algorithm which reduces the row estimation from 1163 to 33 is
> used in the next step to give an estimation of 1 row.
>
> This is totally wrong.
>
>
> Here is the execution plan of the query:
>
> (search the plan for rows=33)
>
>
>
> QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------------------------------------
>  Append  (cost=13673.81..17463.30 rows=5734 width=104) (actual
> time=168.307..222.670 rows=9963 loops=1)
>    CTE qup
>      ->  GroupAggregate  (cost=5231.22..6303.78 rows=10320 width=80)
> (actual time=35.466..68.131 rows=10735 loops=1)
>            Group Key: sa_upper.sup_season, sa_upper.sup_sa_code
>            ->  Sort  (cost=5231.22..5358.64 rows=50969 width=18) (actual
> time=35.454..36.819 rows=50969 loops=1)
>                  Sort Key: sa_upper.sup_season, sa_upper.sup_sa_code
> COLLATE "C"
>                  Sort Method: quicksort  Memory: 4722kB
>                  ->  Hash Left Join  (cost=41.71..1246.13 rows=50969
> width=18) (actual time=0.148..10.687 rows=50969 loops=1)
>                        Hash Cond: ((sa_upper.sup_mat_code)::text =
> upper_target.up_mat_code)
>                        ->  Seq Scan on sa_upper  (cost=0.00..884.69
> rows=50969 width=16) (actual time=0.005..1.972 rows=50969 loops=1)
>                        ->  Hash  (cost=35.53..35.53 rows=495 width=6)
> (actual time=0.140..0.140 rows=495 loops=1)
>                              Buckets: 1024  Batches: 1  Memory Usage: 27kB
>                              ->  Seq Scan on upper_target
> (cost=0.00..35.53 rows=495 width=6) (actual time=0.007..0.103 rows=495
> loops=1)
>                                    Filter: (id_up <= 495)
>                                    Rows Removed by Filter: 1467
>    CTE qli
>      ->  GroupAggregate  (cost=1097.31..1486.56 rows=10469 width=80)
> (actual time=9.446..27.388 rows=10469 loops=1)
>            Group Key: sa_lining.sli_season, sa_lining.sli_sa_code
>            ->  Sort  (cost=1097.31..1126.74 rows=11774 width=18) (actual
> time=9.440..9.811 rows=11774 loops=1)
>                  Sort Key: sa_lining.sli_season, sa_lining.sli_sa_code
> COLLATE "C"
>                  Sort Method: quicksort  Memory: 1120kB
>                  ->  Hash Left Join  (cost=7.34..301.19 rows=11774
> width=18) (actual time=0.045..2.438 rows=11774 loops=1)
>                        Hash Cond: ((sa_lining.sli_mat_code)::text =
> lining_target.li_mat_code)
>                        ->  Seq Scan on sa_lining  (cost=0.00..204.74
> rows=11774 width=16) (actual time=0.008..0.470 rows=11774 loops=1)
>                        ->  Hash  (cost=5.86..5.86 rows=118 width=6)
> (actual time=0.034..0.034 rows=119 loops=1)
>                              Buckets: 1024  Batches: 1  Memory Usage: 13kB
>                              ->  Seq Scan on lining_target
> (cost=0.00..5.86 rows=118 width=6) (actual time=0.008..0.024 rows=119
> loops=1)
>                                    Filter: (id_li <= 119)
>                                    Rows Removed by Filter: 190
>    CTE qin
>      ->  GroupAggregate  (cost=1427.34..1880.73 rows=10678 width=80)
> (actual time=11.424..31.508 rows=10678 loops=1)
>            Group Key: sa_insole.sin_season, sa_insole.sin_sa_code
>            ->  Sort  (cost=1427.34..1465.41 rows=15230 width=18) (actual
> time=11.416..11.908 rows=15230 loops=1)
>                  Sort Key: sa_insole.sin_season, sa_insole.sin_sa_code
> COLLATE "C"
>                  Sort Method: quicksort  Memory: 1336kB
>                  ->  Hash Left Join  (cost=10.49..369.26 rows=15230
> width=18) (actual time=0.051..3.108 rows=15230 loops=1)
>                        Hash Cond: ((sa_insole.sin_mat_code)::text =
> insole_target.in_mat_code)
>                        ->  Seq Scan on sa_insole  (cost=0.00..264.30
> rows=15230 width=16) (actual time=0.006..0.606 rows=15230 loops=1)
>                        ->  Hash  (cost=9.01..9.01 rows=118 width=6)
> (actual time=0.042..0.043 rows=119 loops=1)
>                              Buckets: 1024  Batches: 1  Memory Usage: 13kB
>                              ->  Seq Scan on insole_target
> (cost=0.00..9.01 rows=118 width=6) (actual time=0.008..0.032 rows=119
> loops=1)
>                                    Filter: (id_in <= 119)
>                                    Rows Removed by Filter: 362
>    CTE qou
>      ->  GroupAggregate  (cost=2366.22..2986.89 rows=10699 width=80)
> (actual time=18.198..41.812 rows=10699 loops=1)
>            Group Key: sa_outsole.sou_season, sa_outsole.sou_sa_code
>            ->  Sort  (cost=2366.22..2428.14 rows=24768 width=18) (actual
> time=18.187..18.967 rows=24768 loops=1)
>                  Sort Key: sa_outsole.sou_season, sa_outsole.sou_sa_code
> COLLATE "C"
>                  Sort Method: quicksort  Memory: 2317kB
>                  ->  Hash Left Join  (cost=5.39..558.63 rows=24768
> width=18) (actual time=0.046..5.132 rows=24768 loops=1)
>                        Hash Cond: ((sa_outsole.sou_mat_code)::text =
> outsole_target.ou_mat_code)
>                        ->  Seq Scan on sa_outsole  (cost=0.00..430.68
> rows=24768 width=16) (actual time=0.010..1.015 rows=24768 loops=1)
>                        ->  Hash  (cost=5.03..5.03 rows=29 width=6)
> (actual time=0.032..0.032 rows=29 loops=1)
>                              Buckets: 1024  Batches: 1  Memory Usage: 10kB
>                              ->  Seq Scan on outsole_target
> (cost=0.00..5.03 rows=29 width=6) (actual time=0.010..0.025 rows=29 loops=1)
>                                    Filter: (id_ou <= 29)
>                                    Rows Removed by Filter: 213
>    ->  Hash Join  (cost=1015.85..1319.50 rows=1 width=104) (actual
> time=168.307..215.513 rows=8548 loops=1)
>          Hash Cond: ((qou.curr_season = qli.curr_season) AND
> ((qou.curr_code)::text = (qli.curr_code)::text))
>          Join Filter: ((((qup.ibitmask | qin.ibitmask) | qli.ibitmask) |
> qou.ibitmask) IS NOT NULL)
>          ->  CTE Scan on qou  (cost=0.00..294.22 rows=1189 width=76)
> (actual time=18.200..45.188 rows=10275 loops=1)
>                Filter: ((ibitmask > 0) AND (cardinality(mat_arr) <= 11))
>                Rows Removed by Filter: 424
>          ->  Hash  (cost=1015.83..1015.83 rows=1 width=228) (actual
> time=150.094..150.095 rows=8845 loops=1)
>                Buckets: 16384 (originally 1024)  Batches: 1 (originally
> 1)  Memory Usage: 1899kB
>                ->  Hash Join  (cost=707.35..1015.83 rows=1 width=228)
> (actual time=121.898..147.726 rows=8845 loops=1)
>                      Hash Cond: ((qin.curr_season = qli.curr_season) AND
> ((qin.curr_code)::text = (qli.curr_code)::text))
>                      ->  CTE Scan on qin  (cost=0.00..293.65 rows=1186
> width=76) (actual time=11.425..34.674 rows=10197 loops=1)
>                            Filter: ((ibitmask > 0) AND
> (cardinality(mat_arr) <= 8))
>                            Rows Removed by Filter: 481
>                      ->  Hash  (cost=706.86..706.86 rows=33 width=152)
> (actual time=110.470..110.470 rows=9007 loops=1)
>                            Buckets: 16384 (originally 1024)  Batches: 1
> (originally 1)  Memory Usage: 1473kB
>                            ->  Merge Join  (cost=689.20..706.86 rows=33
> width=152) (actual time=105.862..108.925 rows=9007 loops=1)
>                                  Merge Cond: ((qup.curr_season =
> qli.curr_season) AND ((qup.curr_code)::text = (qli.curr_code)::text))
>                                  ->  Sort  (cost=342.09..344.96
> rows=1147 width=76) (actual time=73.419..73.653 rows=9320 loops=1)
>                                        Sort Key: qup.curr_season,
> qup.curr_code COLLATE "C"
>                                        Sort Method: quicksort  Memory:
> 1391kB
>                                        ->  CTE Scan on qup
> (cost=0.00..283.80 rows=1147 width=76) (actual time=35.467..71.904
> rows=9320 loops=1)
>                                              Filter: ((ibitmask > 0) AND
> (cardinality(mat_arr) <= 21))
>                                              Rows Removed by Filter: 1415
>                                  ->  Sort  (cost=347.12..350.02
> rows=1163 width=76) (actual time=32.440..32.697 rows=10289 loops=1)
>                                        Sort Key: qli.curr_season,
> qli.curr_code COLLATE "C"
>                                        Sort Method: quicksort  Memory:
> 1349kB
>                                        ->  CTE Scan on qli
> (cost=0.00..287.90 rows=1163 width=76) (actual time=9.447..30.666
> rows=10289 loops=1)
>                                              Filter: ((ibitmask > 0) AND
> (cardinality(mat_arr) <= 8))
>                                              Rows Removed by Filter: 180
>    ->  Merge Left Join  (cost=2625.49..3399.84 rows=5733 width=104)
> (actual time=4.597..6.700 rows=1415 loops=1)
>          Merge Cond: ((qup_1.curr_season = qou_1.curr_season) AND
> ((qup_1.curr_code)::text = (qou_1.curr_code)::text))
>          ->  Merge Left Join  (cost=1958.66..2135.28 rows=5733
> width=136) (actual time=3.427..3.863 rows=1415 loops=1)
>                Merge Cond: ((qup_1.curr_season = qin_1.curr_season) AND
> ((qup_1.curr_code)::text = (qin_1.curr_code)::text))
>                ->  Merge Left Join  (cost=1293.25..1388.21 rows=5733
> width=104) (actual time=2.321..2.556 rows=1415 loops=1)
>                      Merge Cond: ((qup_1.curr_season =
> qli_1.curr_season) AND ((qup_1.curr_code)::text = (qli_1.curr_code)::text))
>                      ->  Sort  (cost=641.68..656.02 rows=5733 width=72)
> (actual time=1.286..1.324 rows=1415 loops=1)
>                            Sort Key: qup_1.curr_season, qup_1.curr_code
> COLLATE "C"
>                            Sort Method: quicksort  Memory: 204kB
>                            ->  CTE Scan on qup qup_1  (cost=0.00..283.80
> rows=5733 width=72) (actual time=0.009..1.093 rows=1415 loops=1)
>                                  Filter: ((ibitmask < 0) OR
> (cardinality(mat_arr) > 21))
>                                  Rows Removed by Filter: 9320
>                      ->  Sort  (cost=651.57..666.11 rows=5816 width=72)
> (actual time=1.033..1.038 rows=180 loops=1)
>                            Sort Key: qli_1.curr_season, qli_1.curr_code
> COLLATE "C"
>                            Sort Method: quicksort  Memory: 41kB
>                            ->  CTE Scan on qli qli_1  (cost=0.00..287.90
> rows=5816 width=72) (actual time=0.055..1.007 rows=180 loops=1)
>                                  Filter: ((ibitmask < 0) OR
> (cardinality(mat_arr) > 8))
>                                  Rows Removed by Filter: 10289
>                ->  Sort  (cost=665.41..680.24 rows=5932 width=72)
> (actual time=1.104..1.117 rows=481 loops=1)
>                      Sort Key: qin_1.curr_season, qin_1.curr_code
> COLLATE "C"
>                      Sort Method: quicksort  Memory: 68kB
>                      ->  CTE Scan on qin qin_1  (cost=0.00..293.65
> rows=5932 width=72) (actual time=0.016..1.038 rows=481 loops=1)
>                            Filter: ((ibitmask < 0) OR
> (cardinality(mat_arr) > 8))
>                            Rows Removed by Filter: 10197
>          ->  Sort  (cost=666.83..681.69 rows=5944 width=72) (actual
> time=1.163..1.174 rows=417 loops=1)
>                Sort Key: qou_1.curr_season, qou_1.curr_code COLLATE "C"
>                Sort Method: quicksort  Memory: 68kB
>                ->  CTE Scan on qou qou_1  (cost=0.00..294.22 rows=5944
> width=72) (actual time=0.029..1.068 rows=424 loops=1)
>                      Filter: ((ibitmask < 0) OR (cardinality(mat_arr) > 11))
>                      Rows Removed by Filter: 10275
>  Planning Time: 2.297 ms
>  Execution Time: 224.759 ms
> (118 Zeilen)
>
> 3. Slow query from wrong plan as result on similar case with inner join
>
> When the 3 left joins above are changed to inner joins like:
>
> from qup
> join qli on (qli.curr_season=qup.curr_season and
> qli.curr_code=qup.curr_code and qli.ibitmask>0 and
> cardinality(qli.mat_arr) <=8)
> join qin on (qin.curr_season=qup.curr_season and
> qin.curr_code=qup.curr_code and qin.ibitmask>0 and
> cardinality(qin.mat_arr) <=8)
> join qou on (qou.curr_season=qup.curr_season and
> qou.curr_code=qup.curr_code and qou.ibitmask>0 and
> cardinality(qou.mat_arr) <=11)
> where qup.ibitmask>0 and cardinality(qup.mat_arr) <=21
>
> The same rows estimation takes place as with the left joins, but the
> planner now decides to use a nested loop for the last join, which
> results in a 500fold execution time:
>
>                                                                  QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------------------------------
>  Append  (cost=13365.31..17472.18 rows=5734 width=104) (actual
> time=139.037..13403.310 rows=9963 loops=1)
>    CTE qup
>      ->  GroupAggregate  (cost=5231.22..6303.78 rows=10320 width=80)
> (actual time=35.399..67.102 rows=10735 loops=1)
>            Group Key: sa_upper.sup_season, sa_upper.sup_sa_code
>            ->  Sort  (cost=5231.22..5358.64 rows=50969 width=18) (actual
> time=35.382..36.743 rows=50969 loops=1)
>                  Sort Key: sa_upper.sup_season, sa_upper.sup_sa_code
> COLLATE "C"
>                  Sort Method: quicksort  Memory: 4722kB
>                  ->  Hash Left Join  (cost=41.71..1246.13 rows=50969
> width=18) (actual time=0.157..10.715 rows=50969 loops=1)
>                        Hash Cond: ((sa_upper.sup_mat_code)::text =
> upper_target.up_mat_code)
>                        ->  Seq Scan on sa_upper  (cost=0.00..884.69
> rows=50969 width=16) (actual time=0.008..2.001 rows=50969 loops=1)
>                        ->  Hash  (cost=35.53..35.53 rows=495 width=6)
> (actual time=0.146..0.146 rows=495 loops=1)
>                              Buckets: 1024  Batches: 1  Memory Usage: 27kB
>                              ->  Seq Scan on upper_target
> (cost=0.00..35.53 rows=495 width=6) (actual time=0.006..0.105 rows=495
> loops=1)
>                                    Filter: (id_up <= 495)
>                                    Rows Removed by Filter: 1467
>    CTE qli
>      ->  GroupAggregate  (cost=1097.31..1486.56 rows=10469 width=80)
> (actual time=9.541..27.419 rows=10469 loops=1)
>            Group Key: sa_lining.sli_season, sa_lining.sli_sa_code
>            ->  Sort  (cost=1097.31..1126.74 rows=11774 width=18) (actual
> time=9.534..9.908 rows=11774 loops=1)
>                  Sort Key: sa_lining.sli_season, sa_lining.sli_sa_code
> COLLATE "C"
>                  Sort Method: quicksort  Memory: 1120kB
>                  ->  Hash Left Join  (cost=7.34..301.19 rows=11774
> width=18) (actual time=0.049..2.451 rows=11774 loops=1)
>                        Hash Cond: ((sa_lining.sli_mat_code)::text =
> lining_target.li_mat_code)
>                        ->  Seq Scan on sa_lining  (cost=0.00..204.74
> rows=11774 width=16) (actual time=0.010..0.462 rows=11774 loops=1)
>                        ->  Hash  (cost=5.86..5.86 rows=118 width=6)
> (actual time=0.035..0.035 rows=119 loops=1)
>                              Buckets: 1024  Batches: 1  Memory Usage: 13kB
>                              ->  Seq Scan on lining_target
> (cost=0.00..5.86 rows=118 width=6) (actual time=0.008..0.025 rows=119
> loops=1)
>                                    Filter: (id_li <= 119)
>                                    Rows Removed by Filter: 190
>    CTE qin
>      ->  GroupAggregate  (cost=1427.34..1880.73 rows=10678 width=80)
> (actual time=11.649..30.910 rows=10678 loops=1)
>            Group Key: sa_insole.sin_season, sa_insole.sin_sa_code
>            ->  Sort  (cost=1427.34..1465.41 rows=15230 width=18) (actual
> time=11.642..12.115 rows=15230 loops=1)
>                  Sort Key: sa_insole.sin_season, sa_insole.sin_sa_code
> COLLATE "C"
>                  Sort Method: quicksort  Memory: 1336kB
>                  ->  Hash Left Join  (cost=10.49..369.26 rows=15230
> width=18) (actual time=0.056..3.144 rows=15230 loops=1)
>                        Hash Cond: ((sa_insole.sin_mat_code)::text =
> insole_target.in_mat_code)
>                        ->  Seq Scan on sa_insole  (cost=0.00..264.30
> rows=15230 width=16) (actual time=0.008..0.594 rows=15230 loops=1)
>                        ->  Hash  (cost=9.01..9.01 rows=118 width=6)
> (actual time=0.045..0.046 rows=119 loops=1)
>                              Buckets: 1024  Batches: 1  Memory Usage: 13kB
>                              ->  Seq Scan on insole_target
> (cost=0.00..9.01 rows=118 width=6) (actual time=0.008..0.034 rows=119
> loops=1)
>                                    Filter: (id_in <= 119)
>                                    Rows Removed by Filter: 362
>    CTE qou
>      ->  GroupAggregate  (cost=2366.22..2986.89 rows=10699 width=80)
> (actual time=18.163..51.151 rows=10699 loops=1)
>            Group Key: sa_outsole.sou_season, sa_outsole.sou_sa_code
>            ->  Sort  (cost=2366.22..2428.14 rows=24768 width=18) (actual
> time=18.150..20.000 rows=24768 loops=1)
>                  Sort Key: sa_outsole.sou_season, sa_outsole.sou_sa_code
> COLLATE "C"
>                  Sort Method: quicksort  Memory: 2317kB
>                  ->  Hash Left Join  (cost=5.39..558.63 rows=24768
> width=18) (actual time=0.036..5.106 rows=24768 loops=1)
>                        Hash Cond: ((sa_outsole.sou_mat_code)::text =
> outsole_target.ou_mat_code)
>                        ->  Seq Scan on sa_outsole  (cost=0.00..430.68
> rows=24768 width=16) (actual time=0.008..1.005 rows=24768 loops=1)
>                        ->  Hash  (cost=5.03..5.03 rows=29 width=6)
> (actual time=0.024..0.024 rows=29 loops=1)
>                              Buckets: 1024  Batches: 1  Memory Usage: 10kB
>                              ->  Seq Scan on outsole_target
> (cost=0.00..5.03 rows=29 width=6) (actual time=0.007..0.018 rows=29 loops=1)
>                                    Filter: (id_ou <= 29)
>                                    Rows Removed by Filter: 213
>    ->  Nested Loop  (cost=707.35..1328.37 rows=1 width=104) (actual
> time=139.036..13395.820 rows=8548 loops=1)
>          Join Filter: ((qli.curr_season = qin.curr_season) AND
> ((qli.curr_code)::text = (qin.curr_code)::text))
>          Rows Removed by Join Filter: 88552397
>          ->  Hash Join  (cost=707.35..1016.45 rows=1 width=216) (actual
> time=127.374..168.249 rows=8685 loops=1)
>                Hash Cond: ((qou.curr_season = qli.curr_season) AND
> ((qou.curr_code)::text = (qli.curr_code)::text))
>                ->  CTE Scan on qou  (cost=0.00..294.22 rows=1189
> width=72) (actual time=18.165..54.968 rows=10275 loops=1)
>                      Filter: ((ibitmask > 0) AND (cardinality(mat_arr)
> <= 11))
>                      Rows Removed by Filter: 424
>                ->  Hash  (cost=706.86..706.86 rows=33 width=144) (actual
> time=109.205..109.207 rows=9007 loops=1)
>                      Buckets: 16384 (originally 1024)  Batches: 1
> (originally 1)  Memory Usage: 1369kB
>                      ->  Merge Join  (cost=689.20..706.86 rows=33
> width=144) (actual time=104.785..107.748 rows=9007 loops=1)
>                            Merge Cond: ((qup.curr_season =
> qli.curr_season) AND ((qup.curr_code)::text = (qli.curr_code)::text))
>                            ->  Sort  (cost=342.09..344.96 rows=1147
> width=72) (actual time=72.320..72.559 rows=9320 loops=1)
>                                  Sort Key: qup.curr_season,
> qup.curr_code COLLATE "C"
>                                  Sort Method: quicksort  Memory: 1357kB
>                                  ->  CTE Scan on qup  (cost=0.00..283.80
> rows=1147 width=72) (actual time=35.401..70.834 rows=9320 loops=1)
>                                        Filter: ((ibitmask > 0) AND
> (cardinality(mat_arr) <= 21))
>                                        Rows Removed by Filter: 1415
>                            ->  Sort  (cost=347.12..350.02 rows=1163
> width=72) (actual time=32.461..32.719 rows=10289 loops=1)
>                                  Sort Key: qli.curr_season,
> qli.curr_code COLLATE "C"
>                                  Sort Method: quicksort  Memory: 1269kB
>                                  ->  CTE Scan on qli  (cost=0.00..287.90
> rows=1163 width=72) (actual time=9.543..30.696 rows=10289 loops=1)
>                                        Filter: ((ibitmask > 0) AND
> (cardinality(mat_arr) <= 8))
>                                        Rows Removed by Filter: 180
>          ->  CTE Scan on qin  (cost=0.00..293.65 rows=1186 width=72)
> (actual time=0.001..1.159 rows=10197 loops=8685)
>                Filter: ((ibitmask > 0) AND (cardinality(mat_arr) <= 8))
>                Rows Removed by Filter: 481
>    ->  Merge Left Join  (cost=2625.49..3399.84 rows=5733 width=104)
> (actual time=4.606..6.733 rows=1415 loops=1)
>          Merge Cond: ((qup_1.curr_season = qou_1.curr_season) AND
> ((qup_1.curr_code)::text = (qou_1.curr_code)::text))
>          ->  Merge Left Join  (cost=1958.66..2135.28 rows=5733
> width=136) (actual time=3.479..3.930 rows=1415 loops=1)
>                Merge Cond: ((qup_1.curr_season = qin_1.curr_season) AND
> ((qup_1.curr_code)::text = (qin_1.curr_code)::text))
>                ->  Merge Left Join  (cost=1293.25..1388.21 rows=5733
> width=104) (actual time=2.368..2.610 rows=1415 loops=1)
>                      Merge Cond: ((qup_1.curr_season =
> qli_1.curr_season) AND ((qup_1.curr_code)::text = (qli_1.curr_code)::text))
>                      ->  Sort  (cost=641.68..656.02 rows=5733 width=72)
> (actual time=1.296..1.335 rows=1415 loops=1)
>                            Sort Key: qup_1.curr_season, qup_1.curr_code
> COLLATE "C"
>                            Sort Method: quicksort  Memory: 204kB
>                            ->  CTE Scan on qup qup_1  (cost=0.00..283.80
> rows=5733 width=72) (actual time=0.010..1.119 rows=1415 loops=1)
>                                  Filter: ((ibitmask < 0) OR
> (cardinality(mat_arr) > 21))
>                                  Rows Removed by Filter: 9320
>                      ->  Sort  (cost=651.57..666.11 rows=5816 width=72)
> (actual time=1.069..1.075 rows=180 loops=1)
>                            Sort Key: qli_1.curr_season, qli_1.curr_code
> COLLATE "C"
>                            Sort Method: quicksort  Memory: 41kB
>                            ->  CTE Scan on qli qli_1  (cost=0.00..287.90
> rows=5816 width=72) (actual time=0.057..1.026 rows=180 loops=1)
>                                  Filter: ((ibitmask < 0) OR
> (cardinality(mat_arr) > 8))
>                                  Rows Removed by Filter: 10289
>                ->  Sort  (cost=665.41..680.24 rows=5932 width=72)
> (actual time=1.110..1.124 rows=481 loops=1)
>                      Sort Key: qin_1.curr_season, qin_1.curr_code
> COLLATE "C"
>                      Sort Method: quicksort  Memory: 68kB
>                      ->  CTE Scan on qin qin_1  (cost=0.00..293.65
> rows=5932 width=72) (actual time=0.016..1.046 rows=481 loops=1)
>                            Filter: ((ibitmask < 0) OR
> (cardinality(mat_arr) > 8))
>                            Rows Removed by Filter: 10197
>          ->  Sort  (cost=666.83..681.69 rows=5944 width=72) (actual
> time=1.119..1.128 rows=417 loops=1)
>                Sort Key: qou_1.curr_season, qou_1.curr_code COLLATE "C"
>                Sort Method: quicksort  Memory: 68kB
>                ->  CTE Scan on qou qou_1  (cost=0.00..294.22 rows=5944
> width=72) (actual time=0.029..1.056 rows=424 loops=1)
>                      Filter: ((ibitmask < 0) OR (cardinality(mat_arr) > 11))
>                      Rows Removed by Filter: 10275
>  Planning Time: 1.746 ms
>  Execution Time: 13405.503 ms
> (116 Zeilen)
>
> This case really brought me to detect the problem!
>
> The original query and data are not shown here, but the principle should
> be clear from the execution plans.
>
> I think the planner shouldn't change the row estimations on further
> steps after left joins at all, and be a bit more conservative on inner
> joins.
> This may be related to the fact that this case has 2 join-conditions
> (xx_season an xx_code).
>
> Thanks for looking
>
> Hans Buschmann
>
>
>
>
>
>
> !! External Email: This email originated from outside of the
> organization. Do not click links or open attachments unless you
> recognize the sender.
>

--
Tomas Vondra
EnterpriseDB: https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.enterprisedb.com%2F&data=05%7C01%7Cgjian%40vmware.com%7C9d40e84af2c946f3517a08db9cc61ee2%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638276146959658928%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=VzTmxC6ay28C8%2BaA3Dsi%2BDDWxGEgh9UVaPfc%2BMiL5Mo%3D&reserved=0
The Enterprise PostgreSQL Company

!! External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.
Attachment
On 8/21/23 10:16, Jian Guo wrote:
> Hi hackers,
> 
> I found a new approach to fix this issue, which seems better, so I would
> like to post another version of the patch here. The origin patch made
> the assumption of the values of Vars from CTE must be unique, which
> could be very wrong. This patch examines variables for Vars inside CTE,
> which avoided the bad assumption, so the results could be much more
> accurate.
> 

No problem with posting a reworked patch to the same thread, but I'll
repeat my suggestion to register this in the CF app [1]. The benefit is
that people are more likely to notice the patch and also cfbot [2] will
run regression tests.

[1] https://commitfest.postgresql.org
[2] http://cfbot.cputube.org/


-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Sure, Tomas. 

Here is the PG Commitfest link: https://commitfest.postgresql.org/44/4510/

From: Tomas Vondra <tomas.vondra@enterprisedb.com>
Sent: Monday, August 21, 2023 18:56
To: Jian Guo <gjian@vmware.com>; Hans Buschmann <buschmann@nidsa.net>; pgsql-hackers@lists.postgresql.org <pgsql-hackers@lists.postgresql.org>
Cc: Zhenghua Lyu <zlyu@vmware.com>
Subject: Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500
 
!! External Email

On 8/21/23 10:16, Jian Guo wrote:
> Hi hackers,
>
> I found a new approach to fix this issue, which seems better, so I would
> like to post another version of the patch here. The origin patch made
> the assumption of the values of Vars from CTE must be unique, which
> could be very wrong. This patch examines variables for Vars inside CTE,
> which avoided the bad assumption, so the results could be much more
> accurate.
>

No problem with posting a reworked patch to the same thread, but I'll
repeat my suggestion to register this in the CF app [1]. The benefit is
that people are more likely to notice the patch and also cfbot [2] will
run regression tests.

[1] https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcommitfest.postgresql.org%2F&data=05%7C01%7Cgjian%40vmware.com%7C4562125966b248a1e18308dba2353d8f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638282121775872407%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=OmMo0lQtSvDFWu8VbI0ZorDpZ3BuxsmkTjagGfnryEc%3D&reserved=0
[2] https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcfbot.cputube.org%2F&data=05%7C01%7Cgjian%40vmware.com%7C4562125966b248a1e18308dba2353d8f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638282121775872407%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=xTYDRybLm0AYyvRNqtN85fZWeUJREshIq7PYhz8bMgU%3D&reserved=0


--
Tomas Vondra
EnterpriseDB: https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.enterprisedb.com%2F&data=05%7C01%7Cgjian%40vmware.com%7C4562125966b248a1e18308dba2353d8f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638282121775872407%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Zn4W8nPFmKxCLQ3XM555UlnM%2F9q1XLkJU5PRxT1VSig%3D&reserved=0
The Enterprise PostgreSQL Company

!! External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.
On Tue, Aug 22, 2023 at 10:35 AM Jian Guo <gjian@vmware.com> wrote:
>
> Sure, Tomas.
>
> Here is the PG Commitfest link: https://commitfest.postgresql.org/44/4510/
> ________________________________

hi.
wondering around http://cfbot.cputube.org/
there is a compiler warning: https://cirrus-ci.com/task/6052087599988736

I slightly edited the code, making the compiler warning out.

I am not sure if the following duplicate comment from (rte->rtekind ==
RTE_SUBQUERY && !rte->inh) branch is correct.
/*
* OK, recurse into the subquery.  Note that the original setting
* of vardata->isunique (which will surely be false) is left
* unchanged in this situation.  That's what we want, since even
* if the underlying column is unique, the subquery may have
* joined to other tables in a way that creates duplicates.
*/

Index varnoSaved = var->varno;
here varnoSaved should be int?

image attached is the coverage report
if I  understand coverage report correctly,
`
if (rel->subroot) examine_simple_variable(rel->subroot, var, vardata);
`
the above never actually executed?

Attachment
Hi Jian He,

Thanks for fixing the compiler warnings, seems the CI used a little old compiler and complained:

 ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]

But later C standard have relaxed the requirements for this, ISO C99 and later standard allow declarations and code to be freely mixed within compound statements: https://gcc.gnu.org/onlinedocs/gcc/Mixed-Labels-and-Declarations.html


From: jian he <jian.universality@gmail.com>
Sent: Wednesday, September 6, 2023 14:00
To: Jian Guo <gjian@vmware.com>
Cc: Tomas Vondra <tomas.vondra@enterprisedb.com>; Hans Buschmann <buschmann@nidsa.net>; pgsql-hackers@lists.postgresql.org <pgsql-hackers@lists.postgresql.org>
Subject: Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500
 
!! External Email

On Tue, Aug 22, 2023 at 10:35 AM Jian Guo <gjian@vmware.com> wrote:
>
> Sure, Tomas.
>
> Here is the PG Commitfest link: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcommitfest.postgresql.org%2F44%2F4510%2F&data=05%7C01%7Cgjian%40vmware.com%7C711eddbb381e4e5ed2cb08dbae9ea0cf%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638295768555223775%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2FuPl5rS1rFaQRnNevIVxKZCNA2Bbmr2rg%2BRoX5yUE9s%3D&reserved=0
> ________________________________

hi.
wondering around https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcfbot.cputube.org%2F&data=05%7C01%7Cgjian%40vmware.com%7C711eddbb381e4e5ed2cb08dbae9ea0cf%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638295768555223775%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=t%2B8JrNQQAibe3Hdeico06U3HhLx70B17kzPMERY39os%3D&reserved=0
there is a compiler warning: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcirrus-ci.com%2Ftask%2F6052087599988736&data=05%7C01%7Cgjian%40vmware.com%7C711eddbb381e4e5ed2cb08dbae9ea0cf%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638295768555223775%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=8WbXadRi7MhO0AiHjtJOs4y5mqCP8VHBdcQao%2FXPpM8%3D&reserved=0

I slightly edited the code, making the compiler warning out.

I am not sure if the following duplicate comment from (rte->rtekind ==
RTE_SUBQUERY && !rte->inh) branch is correct.
/*
* OK, recurse into the subquery.  Note that the original setting
* of vardata->isunique (which will surely be false) is left
* unchanged in this situation.  That's what we want, since even
* if the underlying column is unique, the subquery may have
* joined to other tables in a way that creates duplicates.
*/

Index varnoSaved = var->varno;
here varnoSaved should be int?

image attached is the coverage report
if I  understand coverage report correctly,
`
if (rel->subroot) examine_simple_variable(rel->subroot, var, vardata);
`
the above never actually executed?

!! External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.
Jian Guo <gjian@vmware.com> writes:
> I found a new approach to fix this issue, which seems better, so I would like to post another version of the patch
here.The origin patch made the assumption of the values of Vars from CTE must be unique, which could be very wrong.
Thispatch examines variables for Vars inside CTE, which avoided the bad assumption, so the results could be much more
accurate.

You have the right general idea, but there is nothing about this patch
that's right in detail.  The outer Var doesn't refer to any particular
RTE within the subquery; it refers to a targetlist entry.  You have to
drill down to that, see if it's a Var, and if so you can recurse into
the subroot with that Var.  As this stands, it might accidentally get
the right answer for "SELECT * FROM foo" subqueries, but it will get
the wrong answer or even crash for anything that's not that.

The existing RTE_SUBQUERY stanza has most of what we need for this,
so I experimented with extending that to also handle RTE_CTE.  It
seems to work, though I soon found out that it needed tweaking for
the case where the CTE is INSERT/UPDATE/DELETE RETURNING.

Interestingly, this does not change any existing regression test
results.  I'd supposed there might be at least one place with a
visible plan change, but nope.  Running a coverage test does show
that the new code paths are exercised, but I wonder if we ought
to try to devise a regression test that proves it more directly.

            regards, tom lane

PS: please, please, please do not quote the entire damn thread
when replying.  Trim it to just a minimum amount of relevant
text.  You think people want to read all that again?

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index c4fcd0076e..196f50b241 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -5493,13 +5493,17 @@ examine_simple_variable(PlannerInfo *root, Var *var,
             vardata->acl_ok = true;
         }
     }
-    else if (rte->rtekind == RTE_SUBQUERY && !rte->inh)
+    else if ((rte->rtekind == RTE_SUBQUERY && !rte->inh) ||
+             (rte->rtekind == RTE_CTE && !rte->self_reference))
     {
         /*
-         * Plain subquery (not one that was converted to an appendrel).
+         * Plain subquery (not one that was converted to an appendrel) or
+         * non-recursive CTE.  In either case, we can try to find out what the
+         * Var refers to within the subquery.
          */
-        Query       *subquery = rte->subquery;
-        RelOptInfo *rel;
+        Query       *subquery;
+        PlannerInfo *subroot;
+        List       *subtlist;
         TargetEntry *ste;

         /*
@@ -5508,6 +5512,80 @@ examine_simple_variable(PlannerInfo *root, Var *var,
         if (var->varattno == InvalidAttrNumber)
             return;

+        /*
+         * Otherwise, find the subquery's query tree and planner subroot.
+         */
+        if (rte->rtekind == RTE_SUBQUERY)
+        {
+            RelOptInfo *rel;
+
+            /*
+             * Fetch RelOptInfo for subquery.  Note that we don't change the
+             * rel returned in vardata, since caller expects it to be a rel of
+             * the caller's query level.  Because we might already be
+             * recursing, we can't use that rel pointer either, but have to
+             * look up the Var's rel afresh.
+             */
+            rel = find_base_rel(root, var->varno);
+
+            subquery = rte->subquery;
+            subroot = rel->subroot;
+        }
+        else
+        {
+            /* CTE case is more difficult */
+            PlannerInfo *cteroot;
+            Index        levelsup;
+            int            ndx;
+            int            plan_id;
+            ListCell   *lc;
+
+            /*
+             * Find the referenced CTE, and locate the subroot previously made
+             * for it.
+             */
+            levelsup = rte->ctelevelsup;
+            cteroot = root;
+            while (levelsup-- > 0)
+            {
+                cteroot = cteroot->parent_root;
+                if (!cteroot)    /* shouldn't happen */
+                    elog(ERROR, "bad levelsup for CTE \"%s\"", rte->ctename);
+            }
+
+            /*
+             * Note: cte_plan_ids can be shorter than cteList, if we are still
+             * working on planning the CTEs (ie, this is a side-reference from
+             * another CTE).  So we mustn't use forboth here.
+             */
+            ndx = 0;
+            subquery = NULL;
+            foreach(lc, cteroot->parse->cteList)
+            {
+                CommonTableExpr *cte = (CommonTableExpr *) lfirst(lc);
+
+                if (strcmp(cte->ctename, rte->ctename) == 0)
+                {
+                    subquery = castNode(Query, cte->ctequery);
+                    break;
+                }
+                ndx++;
+            }
+            if (subquery == NULL)    /* shouldn't happen */
+                elog(ERROR, "could not find CTE \"%s\"", rte->ctename);
+            if (ndx >= list_length(cteroot->cte_plan_ids))
+                elog(ERROR, "could not find plan for CTE \"%s\"", rte->ctename);
+            plan_id = list_nth_int(cteroot->cte_plan_ids, ndx);
+            if (plan_id <= 0)
+                elog(ERROR, "no plan was made for CTE \"%s\"", rte->ctename);
+            subroot = list_nth(root->glob->subroots, plan_id - 1);
+        }
+
+        /* If the subquery hasn't been planned yet, we have to punt */
+        if (subroot == NULL)
+            return;
+        Assert(IsA(subroot, PlannerInfo));
+
         /*
          * Punt if subquery uses set operations or GROUP BY, as these will
          * mash underlying columns' stats beyond recognition.  (Set ops are
@@ -5521,20 +5599,6 @@ examine_simple_variable(PlannerInfo *root, Var *var,
             subquery->groupingSets)
             return;

-        /*
-         * OK, fetch RelOptInfo for subquery.  Note that we don't change the
-         * rel returned in vardata, since caller expects it to be a rel of the
-         * caller's query level.  Because we might already be recursing, we
-         * can't use that rel pointer either, but have to look up the Var's
-         * rel afresh.
-         */
-        rel = find_base_rel(root, var->varno);
-
-        /* If the subquery hasn't been planned yet, we have to punt */
-        if (rel->subroot == NULL)
-            return;
-        Assert(IsA(rel->subroot, PlannerInfo));
-
         /*
          * Switch our attention to the subquery as mangled by the planner. It
          * was okay to look at the pre-planning version for the tests above,
@@ -5543,11 +5607,15 @@ examine_simple_variable(PlannerInfo *root, Var *var,
          * planning, Vars in the targetlist might have gotten replaced, and we
          * need to see the replacement expressions.
          */
-        subquery = rel->subroot->parse;
+        subquery = subroot->parse;
         Assert(IsA(subquery, Query));

         /* Get the subquery output expression referenced by the upper Var */
-        ste = get_tle_by_resno(subquery->targetList, var->varattno);
+        if (subquery->returningList)
+            subtlist = subquery->returningList;
+        else
+            subtlist = subquery->targetList;
+        ste = get_tle_by_resno(subtlist, var->varattno);
         if (ste == NULL || ste->resjunk)
             elog(ERROR, "subquery %s does not have attribute %d",
                  rte->eref->aliasname, var->varattno);
@@ -5595,16 +5663,16 @@ examine_simple_variable(PlannerInfo *root, Var *var,
              * if the underlying column is unique, the subquery may have
              * joined to other tables in a way that creates duplicates.
              */
-            examine_simple_variable(rel->subroot, var, vardata);
+            examine_simple_variable(subroot, var, vardata);
         }
     }
     else
     {
         /*
-         * Otherwise, the Var comes from a FUNCTION, VALUES, or CTE RTE.  (We
-         * won't see RTE_JOIN here because join alias Vars have already been
+         * Otherwise, the Var comes from a FUNCTION or VALUES RTE.  (We won't
+         * see RTE_JOIN here because join alias Vars have already been
          * flattened.)    There's not much we can do with function outputs, but
-         * maybe someday try to be smarter about VALUES and/or CTEs.
+         * maybe someday try to be smarter about VALUES.
          */
     }
 }


On Thu, Nov 9, 2023 at 6:45 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
The existing RTE_SUBQUERY stanza has most of what we need for this,
so I experimented with extending that to also handle RTE_CTE.  It
seems to work, though I soon found out that it needed tweaking for
the case where the CTE is INSERT/UPDATE/DELETE RETURNING.

The change looks good to me.  To nitpick, should we modify the comment
of examine_simple_variable to also mention 'CTEs'?

  * This is split out as a subroutine so that we can recurse to deal with
- * Vars referencing subqueries.
+ * Vars referencing subqueries or CTEs.
 
Interestingly, this does not change any existing regression test
results.  I'd supposed there might be at least one place with a
visible plan change, but nope.  Running a coverage test does show
that the new code paths are exercised, but I wonder if we ought
to try to devise a regression test that proves it more directly.

I think we ought to.  Here is one regression test that proves that this
change improves query plans in some cases.

Unpatched:

explain (costs off)
with x as MATERIALIZED (select unique1 from tenk1 b)
select count(*) from tenk1 a where unique1 in (select * from x);
                         QUERY PLAN
------------------------------------------------------------
 Aggregate
   CTE x
     ->  Index Only Scan using tenk1_unique1 on tenk1 b
   ->  Nested Loop
         ->  HashAggregate
               Group Key: x.unique1
               ->  CTE Scan on x
         ->  Index Only Scan using tenk1_unique1 on tenk1 a
               Index Cond: (unique1 = x.unique1)
(9 rows)

Patched:

explain (costs off)
with x as MATERIALIZED (select unique1 from tenk1 b)
select count(*) from tenk1 a where unique1 in (select * from x);
                         QUERY PLAN
------------------------------------------------------------
 Aggregate
   CTE x
     ->  Index Only Scan using tenk1_unique1 on tenk1 b
   ->  Hash Semi Join
         Hash Cond: (a.unique1 = x.unique1)
         ->  Index Only Scan using tenk1_unique1 on tenk1 a
         ->  Hash
               ->  CTE Scan on x
(8 rows)

I think the second plan (patched) makes more sense.  In the first plan
(unpatched), the HashAggregate node actually does not reduce the the
number of rows because it groups by 'unique1', but planner does not know
that because it lacks statistics for Vars referencing the CTE.

Thanks
Richard
Richard Guo <guofenglinux@gmail.com> writes:
> I think the second plan (patched) makes more sense.  In the first plan
> (unpatched), the HashAggregate node actually does not reduce the the
> number of rows because it groups by 'unique1', but planner does not know
> that because it lacks statistics for Vars referencing the CTE.

Yeah.  It's faster in reality too:

regression=# explain analyze with x as MATERIALIZED (select unique1 from tenk1 b)
select count(*) from tenk1 a where unique1 in (select * from x);
                                                                  QUERY PLAN
                      

----------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=692.29..692.30 rows=1 width=8) (actual time=15.186..15.188 rows=1 loops=1)
   CTE x
     ->  Index Only Scan using tenk1_unique1 on tenk1 b  (cost=0.29..270.29 rows=10000 width=4) (actual
time=0.028..0.754rows=10000 loops=1) 
           Heap Fetches: 0
   ->  Nested Loop  (cost=225.28..409.50 rows=5000 width=0) (actual time=3.652..14.733 rows=10000 loops=1)
         ->  HashAggregate  (cost=225.00..227.00 rows=200 width=4) (actual time=3.644..4.510 rows=10000 loops=1)
               Group Key: x.unique1
               Batches: 1  Memory Usage: 929kB
               ->  CTE Scan on x  (cost=0.00..200.00 rows=10000 width=4) (actual time=0.030..1.932 rows=10000 loops=1)
         ->  Index Only Scan using tenk1_unique1 on tenk1 a  (cost=0.29..0.90 rows=1 width=4) (actual time=0.001..0.001
rows=1loops=10000) 
               Index Cond: (unique1 = x.unique1)
               Heap Fetches: 0
 Planning Time: 0.519 ms
 Execution Time: 15.479 ms
(14 rows)

vs

regression=# explain analyze with x as MATERIALIZED (select unique1 from tenk1 b)
select count(*) from tenk1 a where unique1 in (select * from x);
                                                                    QUERY PLAN
                          

--------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1028.07..1028.08 rows=1 width=8) (actual time=4.578..4.579 rows=1 loops=1)
   CTE x
     ->  Index Only Scan using tenk1_unique1 on tenk1 b  (cost=0.29..270.29 rows=10000 width=4) (actual
time=0.011..0.751rows=10000 loops=1) 
           Heap Fetches: 0
   ->  Hash Semi Join  (cost=325.28..732.78 rows=10000 width=0) (actual time=2.706..4.305 rows=10000 loops=1)
         Hash Cond: (a.unique1 = x.unique1)
         ->  Index Only Scan using tenk1_unique1 on tenk1 a  (cost=0.29..270.29 rows=10000 width=4) (actual
time=0.011..0.676rows=10000 loops=1) 
               Heap Fetches: 0
         ->  Hash  (cost=200.00..200.00 rows=10000 width=4) (actual time=2.655..2.655 rows=10000 loops=1)
               Buckets: 16384  Batches: 1  Memory Usage: 480kB
               ->  CTE Scan on x  (cost=0.00..200.00 rows=10000 width=4) (actual time=0.012..1.963 rows=10000 loops=1)
 Planning Time: 0.504 ms
 Execution Time: 4.821 ms
(13 rows)

Now, what you get if you remove MATERIALIZED is faster yet:

regression=# explain analyze with x as (select unique1 from tenk1 b)
select count(*) from tenk1 a where unique1 in (select * from x);
                                                                    QUERY PLAN
                          

--------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=715.57..715.58 rows=1 width=8) (actual time=2.681..2.682 rows=1 loops=1)
   ->  Merge Semi Join  (cost=0.57..690.57 rows=10000 width=0) (actual time=0.016..2.408 rows=10000 loops=1)
         Merge Cond: (a.unique1 = b.unique1)
         ->  Index Only Scan using tenk1_unique1 on tenk1 a  (cost=0.29..270.29 rows=10000 width=4) (actual
time=0.007..0.696rows=10000 loops=1) 
               Heap Fetches: 0
         ->  Index Only Scan using tenk1_unique1 on tenk1 b  (cost=0.29..270.29 rows=10000 width=4) (actual
time=0.007..0.655rows=10000 loops=1) 
               Heap Fetches: 0
 Planning Time: 0.160 ms
 Execution Time: 2.718 ms
(9 rows)

I poked into that and found that the reason we don't get a mergejoin
with the materialized CTE is that the upper planner invocation doesn't
know that the CTE's output is sorted, so it thinks a separate sort
step would be needed.

So you could argue that there's more to do here, but I'm hesitant
to go further.  Part of the point of MATERIALIZED is to be an
optimization fence, so breaking down that fence is something to be
wary of.  Maybe we shouldn't even take this patch --- but on
balance I think it's an OK compromise.

            regards, tom lane




On Fri, Nov 17, 2023 at 2:16 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
So you could argue that there's more to do here, but I'm hesitant
to go further.  Part of the point of MATERIALIZED is to be an
optimization fence, so breaking down that fence is something to be
wary of.  Maybe we shouldn't even take this patch --- but on
balance I think it's an OK compromise.

Agreed.  I think the patch is still valuable on its own, although it
does not go down into MATERIALIZED case for further optimization.  Maybe
we can take another query as regression test to prove its value, in
which the CTE is not inlined without MATERIALIZED, such as

explain (costs off)
with x as (select unique1, unique2 from tenk1 b)
select count(*) from tenk1 a
where unique1 in (select unique1 from x x1) and
      unique1 in (select unique2 from x x2);
                            QUERY PLAN
------------------------------------------------------------------
 Aggregate
   CTE x
     ->  Seq Scan on tenk1 b
   ->  Hash Join
         Hash Cond: (a.unique1 = x2.unique2)
         ->  Nested Loop
               ->  HashAggregate
                     Group Key: x1.unique1
                     ->  CTE Scan on x x1
               ->  Index Only Scan using tenk1_unique1 on tenk1 a
                     Index Cond: (unique1 = x1.unique1)
         ->  Hash
               ->  HashAggregate
                     Group Key: x2.unique2
                     ->  CTE Scan on x x2
(15 rows)

vs

explain (costs off)
with x as (select unique1, unique2 from tenk1 b)
select count(*) from tenk1 a
where unique1 in (select unique1 from x x1) and
      unique1 in (select unique2 from x x2);
                            QUERY PLAN
------------------------------------------------------------------
 Aggregate
   CTE x
     ->  Seq Scan on tenk1 b
   ->  Hash Semi Join
         Hash Cond: (a.unique1 = x2.unique2)
         ->  Hash Semi Join
               Hash Cond: (a.unique1 = x1.unique1)
               ->  Index Only Scan using tenk1_unique1 on tenk1 a
               ->  Hash
                     ->  CTE Scan on x x1
         ->  Hash
               ->  CTE Scan on x x2
(12 rows)

I believe the second plan is faster in reality too.

Thanks
Richard
Richard Guo <guofenglinux@gmail.com> writes:
> On Fri, Nov 17, 2023 at 2:16 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> So you could argue that there's more to do here, but I'm hesitant
>> to go further.  Part of the point of MATERIALIZED is to be an
>> optimization fence, so breaking down that fence is something to be
>> wary of.  Maybe we shouldn't even take this patch --- but on
>> balance I think it's an OK compromise.

> Agreed.  I think the patch is still valuable on its own, although it
> does not go down into MATERIALIZED case for further optimization.

Right.  My earlier response was rather rushed, so let me explain
my thinking a bit more.

When I realized that the discrepancy between MATERIALIZED-and-not
plans was due to the upper planner not seeing the pathkeys for the
CTE scan, my first thought was to try to export those pathkeys.
And my second thought was that the CTE should return multiple
potential paths, much as we do for sub-SELECT-in-FROM subqueries,
with the upper planner eventually choosing one of those paths.
But that second idea would break down the optimization fence
almost completely, because the opinions of the upper planner would
influence which plan we choose for the CTE query.  I think we
shouldn't go there, at least not for a CTE explicitly marked
MATERIALIZED.  (Maybe if it's not marked MATERIALIZED, but we
chose not to flatten it for some other reason, we could think
about that differently?  Not sure.)

I think that when we say that MATERIALIZED is meant as an optimization
fence, what we mostly mean is that the upper query shouldn't influence
the choice of plan for the sub-query.  However, we surely allow our
statistics or guesses for the sub-query to subsequently influence what
the upper planner does.  If that weren't true, we shouldn't even
expose any non-default rowcount guess to the upper planner --- but
that would lead to really horrid results, so we allow that information
to percolate up from the sub-query.  It seems like exposing column
statistics to the upper planner, as the proposed patch does, isn't
fundamentally different from exposing rowcount estimates.

That line of argument also leads to the conclusion that it'd be
okay to expose info about the ordering of the CTE result to the
upper planner.  This patch doesn't do that, and I'm not sufficiently
excited about the issue to go write some code.  But if someone else
does, I think we shouldn't exclude doing it on the grounds of wanting
to preserve an optimization fence.  The fence is sort of one-way
in this line of thinking: information can propagate up to the outer
planner level, but not down into the CTE plan.

Thoughts?

            regards, tom lane



Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500

From
"David G. Johnston"
Date:
On Thursday, November 16, 2023, Tom Lane <tgl@sss.pgh.pa.us> wrote:

That line of argument also leads to the conclusion that it'd be
okay to expose info about the ordering of the CTE result to the
upper planner.  This patch doesn't do that, and I'm not sufficiently
excited about the issue to go write some code.  But if someone else
does, I think we shouldn't exclude doing it on the grounds of wanting
to preserve an optimization fence.  The fence is sort of one-way
in this line of thinking: information can propagate up to the outer
planner level, but not down into the CTE plan.

This is indeed my understanding of what materialized means.  Basically, the CTE is done first and in isolation; but any knowledge of its result shape can be used when referencing it.

David J.

On Thu, 2023-11-16 at 22:38 -0500, Tom Lane wrote:
> That line of argument also leads to the conclusion that it'd be
> okay to expose info about the ordering of the CTE result to the
> upper planner.  [...]  The fence is sort of one-way
> in this line of thinking: information can propagate up to the outer
> planner level, but not down into the CTE plan.
>
> Thoughts?

That agrees with my intuition about MATERIALIZED CTEs.
I think of them as "first calculate the CTE, then calculate the
rest of the query" or an ad-hoc temporary table for the duration
of a query.  I would expect the upper planner to know estimates
and other data about the result of the CTE.

Yours,
Laurenz Albe




On Fri, Nov 17, 2023 at 11:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
That line of argument also leads to the conclusion that it'd be
okay to expose info about the ordering of the CTE result to the
upper planner.  This patch doesn't do that, and I'm not sufficiently
excited about the issue to go write some code.  But if someone else
does, I think we shouldn't exclude doing it on the grounds of wanting
to preserve an optimization fence.  The fence is sort of one-way
in this line of thinking: information can propagate up to the outer
planner level, but not down into the CTE plan.

Thoughts?

Exactly!  Thanks for the detailed explanation.

Thanks
Richard
Richard Guo <guofenglinux@gmail.com> writes:
> On Fri, Nov 17, 2023 at 11:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> That line of argument also leads to the conclusion that it'd be
>> okay to expose info about the ordering of the CTE result to the
>> upper planner.  This patch doesn't do that, and I'm not sufficiently
>> excited about the issue to go write some code.  But if someone else
>> does, I think we shouldn't exclude doing it on the grounds of wanting
>> to preserve an optimization fence.  The fence is sort of one-way
>> in this line of thinking: information can propagate up to the outer
>> planner level, but not down into the CTE plan.

> Exactly!  Thanks for the detailed explanation.

OK.  I pushed the patch after a bit more review: we can simplify
things some more by using the subroot->parse querytree for all
tests.  After the previous refactoring, it wasn't buying us anything
to do some initial tests with the raw querytree.  (The original
idea of that, I believe, was to avoid doing find_base_rel if we
could; but now that's not helpful.)

            regards, tom lane




On Fri, Nov 17, 2023 at 11:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
That line of argument also leads to the conclusion that it'd be
okay to expose info about the ordering of the CTE result to the
upper planner.  This patch doesn't do that, and I'm not sufficiently
excited about the issue to go write some code.  But if someone else
does, I think we shouldn't exclude doing it on the grounds of wanting
to preserve an optimization fence.  The fence is sort of one-way
in this line of thinking: information can propagate up to the outer
planner level, but not down into the CTE plan.

In the light of this conclusion, I had a go at propagating the pathkeys
from CTEs up to the outer planner and came up with the attached.

Comments/thoughts?

Thanks
Richard
Attachment
Richard Guo <guofenglinux@gmail.com> writes:
> On Fri, Nov 17, 2023 at 11:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> That line of argument also leads to the conclusion that it'd be
>> okay to expose info about the ordering of the CTE result to the
>> upper planner.

> In the light of this conclusion, I had a go at propagating the pathkeys
> from CTEs up to the outer planner and came up with the attached.

Oh, nice!  I remembered we had code already to do this for regular
SubqueryScans, but I thought we'd need to do some refactoring to
apply it to CTEs.  I think you are right though that
convert_subquery_pathkeys can be used as-is.  Some thoughts:

* Do we really need to use make_tlist_from_pathtarget?  Why isn't
the tlist of the cteplan good enough (indeed, more so)?

* I don't love having this code assume that it knows how to find
the Path the cteplan was made from.  It'd be better to make
SS_process_ctes save that somewhere, maybe in a list paralleling
root->cte_plan_ids.

Alternatively: maybe it's time to do what the comments in
SS_process_ctes vaguely speculate about, and just save the Path
at that point, with construction of the plan left for createplan()?
That might be a lot of refactoring for not much gain, so not sure.

            regards, tom lane




On Tue, Nov 21, 2023 at 1:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
* Do we really need to use make_tlist_from_pathtarget?  Why isn't
the tlist of the cteplan good enough (indeed, more so)?

I think you are right.  The cteplan->targetlist is built for the CTE's
best path by build_path_tlist(), which is almost the same as
make_tlist_from_pathtarget() except that it also replaces nestloop
params.  So cteplan->targetlist is good enough here.
 
* I don't love having this code assume that it knows how to find
the Path the cteplan was made from.  It'd be better to make
SS_process_ctes save that somewhere, maybe in a list paralleling
root->cte_plan_ids.

Fair point.

I've updated the patch to v2 for the changes.
 
Alternatively: maybe it's time to do what the comments in
SS_process_ctes vaguely speculate about, and just save the Path
at that point, with construction of the plan left for createplan()?
That might be a lot of refactoring for not much gain, so not sure.

I'm not sure if this is worth the effort.  And it seems that we have the
same situation with SubLinks where we construct the plan in subselect.c
rather than createplan.c.

Thanks
Richard
Attachment
Hello Tom and Richard,

17.11.2023 22:42, Tom Lane wrote:
> OK.  I pushed the patch after a bit more review: we can simplify
> things some more by using the subroot->parse querytree for all
> tests.  After the previous refactoring, it wasn't buying us anything
> to do some initial tests with the raw querytree.  (The original
> idea of that, I believe, was to avoid doing find_base_rel if we
> could; but now that's not helpful.)

Please look at the following query:
CREATE TABLE t(i int);
INSERT INTO t VALUES (1);
VACUUM ANALYZE t;

WITH ir AS (INSERT INTO t VALUES (2) RETURNING i)
SELECT * FROM ir WHERE i = 2;

which produces ERROR:  no relation entry for relid 1
starting from f7816aec2.

Best regards,
Alexander



Alexander Lakhin <exclusion@gmail.com> writes:
> Please look at the following query:
> CREATE TABLE t(i int);
> INSERT INTO t VALUES (1);
> VACUUM ANALYZE t;

> WITH ir AS (INSERT INTO t VALUES (2) RETURNING i)
> SELECT * FROM ir WHERE i = 2;

> which produces ERROR:  no relation entry for relid 1
> starting from f7816aec2.

Thanks for the report!  I guess we need something like the attached.
I'm surprised that this hasn't been noticed before; was the case
really unreachable before?

            regards, tom lane

diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 5a9ce44532..22d01cef5b 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -420,6 +420,19 @@ find_base_rel(PlannerInfo *root, int relid)
     return NULL;                /* keep compiler quiet */
 }

+/*
+ * find_base_rel_noerr
+ *      Find a base or otherrel relation entry, returning NULL if there's none
+ */
+RelOptInfo *
+find_base_rel_noerr(PlannerInfo *root, int relid)
+{
+    /* use an unsigned comparison to prevent negative array element access */
+    if ((uint32) relid < (uint32) root->simple_rel_array_size)
+        return root->simple_rel_array[relid];
+    return NULL;
+}
+
 /*
  * find_base_rel_ignore_join
  *      Find a base or otherrel relation entry, which must already exist.
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index dbcd98d985..cea777e9d4 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -119,6 +119,7 @@
 #include "optimizer/paths.h"
 #include "optimizer/plancat.h"
 #include "parser/parse_clause.h"
+#include "parser/parse_relation.h"
 #include "parser/parsetree.h"
 #include "statistics/statistics.h"
 #include "storage/bufmgr.h"
@@ -5434,17 +5435,30 @@ examine_simple_variable(PlannerInfo *root, Var *var,

         if (HeapTupleIsValid(vardata->statsTuple))
         {
-            RelOptInfo *onerel = find_base_rel(root, var->varno);
+            RelOptInfo *onerel = find_base_rel_noerr(root, var->varno);
             Oid            userid;

             /*
              * Check if user has permission to read this column.  We require
              * all rows to be accessible, so there must be no securityQuals
-             * from security barrier views or RLS policies.  Use
-             * onerel->userid if it's set, in case we're accessing the table
-             * via a view.
+             * from security barrier views or RLS policies.
+             *
+             * Normally the Var will have an associated RelOptInfo from which
+             * we can find out which userid to do the check as; but it might
+             * not if it's a RETURNING Var for an INSERT target relation.  In
+             * that case use the RTEPermissionInfo associated with the RTE.
              */
-            userid = OidIsValid(onerel->userid) ? onerel->userid : GetUserId();
+            if (onerel)
+                userid = onerel->userid;
+            else
+            {
+                RTEPermissionInfo *perminfo;
+
+                perminfo = getRTEPermissionInfo(root->parse->rteperminfos, rte);
+                userid = perminfo->checkAsUser;
+            }
+            if (!OidIsValid(userid))
+                userid = GetUserId();

             vardata->acl_ok =
                 rte->securityQuals == NIL &&
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 03a0e46e70..c43d97b48a 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -307,6 +307,7 @@ extern void expand_planner_arrays(PlannerInfo *root, int add_size);
 extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid,
                                     RelOptInfo *parent);
 extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
+extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
 extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
 extern RelOptInfo *find_join_rel(PlannerInfo *root, Relids relids);
 extern RelOptInfo *build_join_rel(PlannerInfo *root,
diff --git a/src/test/regress/expected/with.out b/src/test/regress/expected/with.out
index 69c56ce207..3cf969d938 100644
--- a/src/test/regress/expected/with.out
+++ b/src/test/regress/expected/with.out
@@ -654,6 +654,24 @@ select count(*) from tenk1 a
                ->  CTE Scan on x
 (8 rows)

+explain (costs off)
+with x as materialized (insert into tenk1 default values returning unique1)
+select count(*) from tenk1 a
+  where unique1 in (select * from x);
+                         QUERY PLAN
+------------------------------------------------------------
+ Aggregate
+   CTE x
+     ->  Insert on tenk1
+           ->  Result
+   ->  Nested Loop
+         ->  HashAggregate
+               Group Key: x.unique1
+               ->  CTE Scan on x
+         ->  Index Only Scan using tenk1_unique1 on tenk1 a
+               Index Cond: (unique1 = x.unique1)
+(10 rows)
+
 -- SEARCH clause
 create temp table graph0( f int, t int, label text );
 insert into graph0 values
diff --git a/src/test/regress/sql/with.sql b/src/test/regress/sql/with.sql
index 3ef9898866..ff68e21e2e 100644
--- a/src/test/regress/sql/with.sql
+++ b/src/test/regress/sql/with.sql
@@ -354,6 +354,11 @@ with x as materialized (select unique1 from tenk1 b)
 select count(*) from tenk1 a
   where unique1 in (select * from x);

+explain (costs off)
+with x as materialized (insert into tenk1 default values returning unique1)
+select count(*) from tenk1 a
+  where unique1 in (select * from x);
+
 -- SEARCH clause

 create temp table graph0( f int, t int, label text );


On Sun, Jan 7, 2024 at 6:41 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alexander Lakhin <exclusion@gmail.com> writes:
> Please look at the following query:
> CREATE TABLE t(i int);
> INSERT INTO t VALUES (1);
> VACUUM ANALYZE t;

> WITH ir AS (INSERT INTO t VALUES (2) RETURNING i)
> SELECT * FROM ir WHERE i = 2;

> which produces ERROR:  no relation entry for relid 1
> starting from f7816aec2.

Nice catch.
 
Thanks for the report!  I guess we need something like the attached.

+1.
 
I'm surprised that this hasn't been noticed before; was the case
really unreachable before?

It seems that this case is only reachable with Vars of an INSERT target
relation, and it seems that there is no other way to reference such a
Var other than using CTE.

Thanks
Richard
Richard Guo <guofenglinux@gmail.com> writes:
> On Sun, Jan 7, 2024 at 6:41 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Thanks for the report!  I guess we need something like the attached.

> +1.

Pushed, thanks for looking at it.

>> I'm surprised that this hasn't been noticed before; was the case
>> really unreachable before?

> It seems that this case is only reachable with Vars of an INSERT target
> relation, and it seems that there is no other way to reference such a
> Var other than using CTE.

I'm a little uncomfortable with that conclusion, but for the moment
I refrained from back-patching.  We can always add the patch to v16
later if we find it's not so unreachable.  (Before v16, there was
no find_base_rel here at all.)

            regards, tom lane



On Mon, 8 Jan 2024 at 22:21, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Richard Guo <guofenglinux@gmail.com> writes:
> > On Sun, Jan 7, 2024 at 6:41 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Thanks for the report!  I guess we need something like the attached.
>
> > +1.
>
> Pushed, thanks for looking at it.

I have changed the status of the commitfest entry to "Committed" as I
noticed the patch has already been committed.

Regards,
Vignesh




On Sat, Jan 27, 2024 at 10:08 AM vignesh C <vignesh21@gmail.com> wrote:
On Mon, 8 Jan 2024 at 22:21, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Richard Guo <guofenglinux@gmail.com> writes:
> > On Sun, Jan 7, 2024 at 6:41 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Thanks for the report!  I guess we need something like the attached.
>
> > +1.
>
> Pushed, thanks for looking at it.

I have changed the status of the commitfest entry to "Committed" as I
noticed the patch has already been committed.

Well, the situation seems a little complex here.  At first, this thread
was dedicated to discussing the 'Examine-simple-variable-for-Var-in-CTE'
patch, which has already been pushed in [1].  Subsequently, I proposed
another patch 'Propagate-pathkeys-from-CTEs-up-to-the-outer-query' in
[2], which is currently under review and is what the commitfest entry
for.  Later on, within the same thread, another patch was posted as a
fix to the first patch and was subsequently pushed in [3].  I believe
this sequence of events might have led to confusion.

What is the usual practice in such situations?  I guess I'd better to
fork a new thread to discuss my proposed patch which is about the
'Propagate-pathkeys-from-CTEs-up-to-the-outer-query'.

[1] https://www.postgresql.org/message-id/754093.1700250120%40sss.pgh.pa.us
[2] https://www.postgresql.org/message-id/CAMbWs49gAHeEOn0rpdUUYXryaa60KZ8JKwk1aSERttY9caCYkA%40mail.gmail.com
[3] https://www.postgresql.org/message-id/1941515.1704732682%40sss.pgh.pa.us

Thanks
Richard
On Mon, 29 Jan 2024 at 08:01, Richard Guo <guofenglinux@gmail.com> wrote:
>
>
> On Sat, Jan 27, 2024 at 10:08 AM vignesh C <vignesh21@gmail.com> wrote:
>>
>> On Mon, 8 Jan 2024 at 22:21, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> >
>> > Richard Guo <guofenglinux@gmail.com> writes:
>> > > On Sun, Jan 7, 2024 at 6:41 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> > >> Thanks for the report!  I guess we need something like the attached.
>> >
>> > > +1.
>> >
>> > Pushed, thanks for looking at it.
>>
>> I have changed the status of the commitfest entry to "Committed" as I
>> noticed the patch has already been committed.
>
>
> Well, the situation seems a little complex here.  At first, this thread
> was dedicated to discussing the 'Examine-simple-variable-for-Var-in-CTE'
> patch, which has already been pushed in [1].  Subsequently, I proposed
> another patch 'Propagate-pathkeys-from-CTEs-up-to-the-outer-query' in
> [2], which is currently under review and is what the commitfest entry
> for.  Later on, within the same thread, another patch was posted as a
> fix to the first patch and was subsequently pushed in [3].  I believe
> this sequence of events might have led to confusion.
>
> What is the usual practice in such situations?  I guess I'd better to
> fork a new thread to discuss my proposed patch which is about the
> 'Propagate-pathkeys-from-CTEs-up-to-the-outer-query'.

Sorry I missed to notice that there was one pending patch yet to be
committed, I feel you can continue discussing here itself just to
avoid losing any historical information about the issue and the
continuation of the discussion. You can add a new commitfest entry for
this.

Regards,
Vignesh




On Mon, Jan 29, 2024 at 11:20 AM vignesh C <vignesh21@gmail.com> wrote:
On Mon, 29 Jan 2024 at 08:01, Richard Guo <guofenglinux@gmail.com> wrote:
> On Sat, Jan 27, 2024 at 10:08 AM vignesh C <vignesh21@gmail.com> wrote:
>> I have changed the status of the commitfest entry to "Committed" as I
>> noticed the patch has already been committed.
>
> Well, the situation seems a little complex here.  At first, this thread
> was dedicated to discussing the 'Examine-simple-variable-for-Var-in-CTE'
> patch, which has already been pushed in [1].  Subsequently, I proposed
> another patch 'Propagate-pathkeys-from-CTEs-up-to-the-outer-query' in
> [2], which is currently under review and is what the commitfest entry
> for.  Later on, within the same thread, another patch was posted as a
> fix to the first patch and was subsequently pushed in [3].  I believe
> this sequence of events might have led to confusion.
>
> What is the usual practice in such situations?  I guess I'd better to
> fork a new thread to discuss my proposed patch which is about the
> 'Propagate-pathkeys-from-CTEs-up-to-the-outer-query'.

Sorry I missed to notice that there was one pending patch yet to be
committed, I feel you can continue discussing here itself just to
avoid losing any historical information about the issue and the
continuation of the discussion. You can add a new commitfest entry for
this.

It seems to me that a fresh new thread is a better option.  I have just
started a new thread in [1], and have tried to migrate the necessary
context over there.  I have also updated the commitfest entry
accordingly.

[1] https://www.postgresql.org/message-id/flat/CAMbWs49xYd3f8CrE8-WW3--dV1zH_sDSDn-vs2DzHj81Wcnsew%40mail.gmail.com

Thanks
Richard