Re: Early WIP/PoC for inlining CTEs - Mailing list pgsql-hackers

From Tatsuo Ishii
Subject Re: Early WIP/PoC for inlining CTEs
Date
Msg-id 20190313.094527.1905766529049763845.t-ishii@sraoss.co.jp
Whole thread Raw
In response to Re: Early WIP/PoC for inlining CTEs  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
> On 2018-08-08 16:55:22 +1200, Thomas Munro wrote:
>> On Fri, Jul 27, 2018 at 8:10 PM, David Fetter <david@fetter.org> wrote:
>> > On Fri, Jul 27, 2018 at 02:55:26PM +1200, Thomas Munro wrote:
>> >> On Thu, Jul 26, 2018 at 7:14 AM, David Fetter <david@fetter.org> wrote:
>> >> > Please find attached the next version, which passes 'make check'.
>> >>
>> >> ... but not 'make check-world' (contrib/postgres_fdw's EXPLAIN is different).
>> >
>> > Please find attached a patch that does.
>> >
>> > It doesn't always pass make installcheck-world, but I need to sleep
>> > rather than investigate that at the moment.
>> 
>> One observation I wanted to share: CTE scans inhibit parallelism today
>> (something we might eventually want to fix with shared tuplestores).
>> This patch therefore allows parallelism in some WITH queries, which
>> seems like a very valuable thing.
> 
> Might be interesting to see how big a difference it makes for
> TPC-DS. Currently the results are bad (as in many queries don't finish
> in a relevant time) because it uses CTEs so widely, and there's often
> predicates outside the CTE that could be pushed down.

Now that the patch was committed, I played with TPCS-DS and found at
least one of their queries gets speedup. Query 2 runs 2 times faster
than 11. In 12, it seems that CTE wscs is pushed down.

with wscs as
 (select sold_date_sk
        ,sales_price
  from (select ws_sold_date_sk sold_date_sk
              ,ws_ext_sales_price sales_price
        from web_sales 
        union all
        select cs_sold_date_sk sold_date_sk
              ,cs_ext_sales_price sales_price
        from catalog_sales) as s1),
 wswscs as 
 (select d_week_seq,
        sum(case when (d_day_name='Sunday') then sales_price else null end) sun_sales,
        sum(case when (d_day_name='Monday') then sales_price else null end) mon_sales,
        sum(case when (d_day_name='Tuesday') then sales_price else  null end) tue_sales,
        sum(case when (d_day_name='Wednesday') then sales_price else null end) wed_sales,
        sum(case when (d_day_name='Thursday') then sales_price else null end) thu_sales,
        sum(case when (d_day_name='Friday') then sales_price else null end) fri_sales,
        sum(case when (d_day_name='Saturday') then sales_price else null end) sat_sales
 from wscs
     ,date_dim
 where d_date_sk = sold_date_sk
 group by d_week_seq)
 select d_week_seq1
       ,round(sun_sales1/sun_sales2,2)
       ,round(mon_sales1/mon_sales2,2)
       ,round(tue_sales1/tue_sales2,2)
       ,round(wed_sales1/wed_sales2,2)
       ,round(thu_sales1/thu_sales2,2)
       ,round(fri_sales1/fri_sales2,2)
       ,round(sat_sales1/sat_sales2,2)
 from
 (select wswscs.d_week_seq d_week_seq1
        ,sun_sales sun_sales1
        ,mon_sales mon_sales1
        ,tue_sales tue_sales1
        ,wed_sales wed_sales1
        ,thu_sales thu_sales1
        ,fri_sales fri_sales1
        ,sat_sales sat_sales1
  from wswscs,date_dim 
  where date_dim.d_week_seq = wswscs.d_week_seq and
        d_year = 1998) y,
 (select wswscs.d_week_seq d_week_seq2
        ,sun_sales sun_sales2
        ,mon_sales mon_sales2
        ,tue_sales tue_sales2
        ,wed_sales wed_sales2
        ,thu_sales thu_sales2
        ,fri_sales fri_sales2
        ,sat_sales sat_sales2
  from wswscs
      ,date_dim 
  where date_dim.d_week_seq = wswscs.d_week_seq and
        d_year = 1998+1) z
 where d_week_seq1=d_week_seq2-53
 order by d_week_seq1;

Here's the 12's plan:
                                                                                  QUERY PLAN
                                                     
 

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=118929.39..118929.43 rows=13 width=228) (actual time=792.588..792.710 rows=2513 loops=1)
   Sort Key: wswscs.d_week_seq
   Sort Method: quicksort  Memory: 323kB
   CTE wswscs
     ->  Finalize GroupAggregate  (cost=110164.09..113672.71 rows=10447 width=228) (actual time=766.232..768.415
rows=263loops=1)
 
           Group Key: date_dim_2.d_week_seq
           ->  Gather Merge  (cost=110164.09..112601.89 rows=20894 width=228) (actual time=766.209..767.158 rows=789
loops=1)
                 Workers Planned: 2
                 Workers Launched: 2
                 ->  Sort  (cost=109164.06..109190.18 rows=10447 width=228) (actual time=763.059..763.078 rows=263
loops=3)
                       Sort Key: date_dim_2.d_week_seq
                       Sort Method: quicksort  Memory: 160kB
                       Worker 0:  Sort Method: quicksort  Memory: 160kB
                       Worker 1:  Sort Method: quicksort  Memory: 160kB
                       ->  Partial HashAggregate  (cost=108179.39..108466.69 rows=10447 width=228) (actual
time=762.202..762.889rows=263 loops=3)
 
                             Group Key: date_dim_2.d_week_seq
                             ->  Parallel Hash Join  (cost=2371.82..74413.79 rows=900416 width=20) (actual
time=17.166..424.834rows=717854 loops=3)
 
                                   Hash Cond: (catalog_sales.cs_sold_date_sk = date_dim_2.d_date_sk)
                                   ->  Parallel Append  (cost=0.00..69678.24 rows=900416 width=10) (actual
time=0.029..248.992rows=720311 loops=3)
 
                                         ->  Parallel Seq Scan on catalog_sales  (cost=0.00..43411.73 rows=600673
width=10)(actual time=0.018..130.163 rows=480516 loops=3)
 
                                         ->  Parallel Seq Scan on web_sales  (cost=0.00..21764.43 rows=299743 width=10)
(actualtime=0.026..95.629 rows=359692 loops=2)
 
                                   ->  Parallel Hash  (cost=1834.70..1834.70 rows=42970 width=18) (actual
time=16.610..16.610rows=24350 loops=3)
 
                                         Buckets: 131072  Batches: 1  Memory Usage: 5056kB
                                         ->  Parallel Seq Scan on date_dim date_dim_2  (cost=0.00..1834.70 rows=42970
width=18)(actual time=0.020..7.617 rows=24350 loops=3)
 
   ->  Hash Join  (cost=5007.74..5256.44 rows=13 width=228) (actual time=785.300..792.123 rows=2513 loops=1)
         Hash Cond: (wswscs.d_week_seq = date_dim.d_week_seq)
         ->  CTE Scan on wswscs  (cost=0.00..208.94 rows=10447 width=228) (actual time=766.236..766.263 rows=263
loops=1)
         ->  Hash  (cost=5007.58..5007.58 rows=13 width=232) (actual time=19.033..19.033 rows=2513 loops=1)
               Buckets: 4096 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 288kB
               ->  Merge Join  (cost=5001.97..5007.58 rows=13 width=232) (actual time=17.739..18.210 rows=2513
loops=1)
                     Merge Cond: (((wswscs_1.d_week_seq - 53)) = date_dim.d_week_seq)
                     ->  Sort  (cost=2668.33..2669.24 rows=365 width=228) (actual time=9.906..9.924 rows=365 loops=1)
                           Sort Key: ((wswscs_1.d_week_seq - 53))
                           Sort Method: quicksort  Memory: 76kB
                           ->  Hash Join  (cost=2322.68..2652.79 rows=365 width=228) (actual time=7.864..9.764 rows=365
loops=1)
                                 Hash Cond: (wswscs_1.d_week_seq = date_dim_1.d_week_seq)
                                 ->  CTE Scan on wswscs wswscs_1  (cost=0.00..208.94 rows=10447 width=228) (actual
time=0.001..2.287rows=263 loops=1)
 
                                 ->  Hash  (cost=2318.11..2318.11 rows=365 width=4) (actual time=7.389..7.389 rows=365
loops=1)
                                       Buckets: 1024  Batches: 1  Memory Usage: 21kB
                                       ->  Seq Scan on date_dim date_dim_1  (cost=0.00..2318.11 rows=365 width=4)
(actualtime=3.876..7.348 rows=365 loops=1)
 
                                             Filter: (d_year = 1999)
                                             Rows Removed by Filter: 72684
                     ->  Sort  (cost=2333.65..2334.56 rows=365 width=4) (actual time=7.824..7.930 rows=2514 loops=1)
                           Sort Key: date_dim.d_week_seq
                           Sort Method: quicksort  Memory: 42kB
                           ->  Seq Scan on date_dim  (cost=0.00..2318.11 rows=365 width=4) (actual time=3.950..7.765
rows=365loops=1)
 
                                 Filter: (d_year = 1998)
                                 Rows Removed by Filter: 72684
 Planning Time: 0.956 ms
 Execution Time: 794.072 ms
(50 rows)

Here's 11's plan:
                                                                          QUERY PLAN
                                     
 

--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=227105.32..227105.35 rows=13 width=228) (actual time=2019.071..2019.163 rows=2513 loops=1)
   Sort Key: wswscs.d_week_seq
   Sort Method: quicksort  Memory: 323kB
   CTE wscs
     ->  Append  (cost=0.00..88586.64 rows=2160976 width=10) (actual time=0.015..521.937 rows=2160932 loops=1)
           ->  Seq Scan on web_sales  (cost=0.00..25960.84 rows=719384 width=10) (actual time=0.014..137.557
rows=719384loops=1)
 
           ->  Seq Scan on catalog_sales  (cost=0.00..51820.92 rows=1441592 width=10) (actual time=0.011..269.559
rows=1441548loops=1)
 
   CTE wswscs
     ->  HashAggregate  (cost=132977.62..133264.03 rows=10415 width=228) (actual time=1996.856..1997.387 rows=263
loops=1)
           Group Key: date_dim_2.d_week_seq
           ->  Hash Join  (cost=3048.60..51941.02 rows=2160976 width=28) (actual time=36.414..1323.387 rows=2153563
loops=1)
                 Hash Cond: (wscs.sold_date_sk = date_dim_2.d_date_sk)
                 ->  CTE Scan on wscs  (cost=0.00..43219.52 rows=2160976 width=18) (actual time=0.017..942.090
rows=2160932loops=1)
 
                 ->  Hash  (cost=2135.49..2135.49 rows=73049 width=18) (actual time=35.870..35.871 rows=73049 loops=1)
                       Buckets: 131072  Batches: 1  Memory Usage: 4734kB
                       ->  Seq Scan on date_dim date_dim_2  (cost=0.00..2135.49 rows=73049 width=18) (actual
time=0.011..17.749rows=73049 loops=1)
 
   ->  Hash Join  (cost=5006.47..5254.41 rows=13 width=228) (actual time=2012.260..2018.602 rows=2513 loops=1)
         Hash Cond: (wswscs.d_week_seq = date_dim.d_week_seq)
         ->  CTE Scan on wswscs  (cost=0.00..208.30 rows=10415 width=228) (actual time=1996.858..1996.876 rows=263
loops=1)
         ->  Hash  (cost=5006.31..5006.31 rows=13 width=232) (actual time=15.380..15.380 rows=2513 loops=1)
               Buckets: 4096 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 288kB
               ->  Merge Join  (cost=5000.73..5006.31 rows=13 width=232) (actual time=14.272..14.683 rows=2513
loops=1)
                     Merge Cond: (((wswscs_1.d_week_seq - 53)) = date_dim.d_week_seq)
                     ->  Sort  (cost=2667.18..2668.09 rows=363 width=228) (actual time=7.374..7.393 rows=365 loops=1)
                           Sort Key: ((wswscs_1.d_week_seq - 53))
                           Sort Method: quicksort  Memory: 76kB
                           ->  Hash Join  (cost=2322.65..2651.75 rows=363 width=228) (actual time=6.386..7.154 rows=365
loops=1)
                                 Hash Cond: (wswscs_1.d_week_seq = date_dim_1.d_week_seq)
                                 ->  CTE Scan on wswscs wswscs_1  (cost=0.00..208.30 rows=10415 width=228) (actual
time=0.000..0.707rows=263 loops=1)
 
                                 ->  Hash  (cost=2318.11..2318.11 rows=363 width=4) (actual time=6.367..6.367 rows=365
loops=1)
                                       Buckets: 1024  Batches: 1  Memory Usage: 21kB
                                       ->  Seq Scan on date_dim date_dim_1  (cost=0.00..2318.11 rows=363 width=4)
(actualtime=3.000..6.330 rows=365 loops=1)
 
                                             Filter: (d_year = 1999)
                                             Rows Removed by Filter: 72684
                     ->  Sort  (cost=2333.55..2334.45 rows=363 width=4) (actual time=6.890..6.975 rows=2514 loops=1)
                           Sort Key: date_dim.d_week_seq
                           Sort Method: quicksort  Memory: 42kB
                           ->  Seq Scan on date_dim  (cost=0.00..2318.11 rows=363 width=4) (actual time=3.832..6.841
rows=365loops=1)
 
                                 Filter: (d_year = 1998)
                                 Rows Removed by Filter: 72684
 Planning Time: 0.962 ms
 Execution Time: 2027.758 ms
(42 rows)

BTW, in my small TPC-DS environment (2GB), only two queries were not
finished within 30 minutes (query 4 and 11). My guess is these seem to
be suffered from statistic errors (I got this hint from Kaigai-san's
slide: https://www.slideshare.net/kaigai/tpcdspostgresql, especially
page 20. Unfortunately the slide is in Japanese).  And it leads to
choosing a bad plan: nested loop join. If I disable nested loop join,
these two queries finished quickly.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: Should we add GUCs to allow partition pruning to be disabled?
Next
From: Shawn Debnath
Date:
Subject: Re: Introduce timeout capability for ConditionVariableSleep