Performance of Query 4 on TPC-DS Benchmark - Mailing list pgsql-performance

From Ba Jinsheng
Subject Performance of Query 4 on TPC-DS Benchmark
Date
Msg-id SEZPR06MB6494F6A2837995BDD4E0BF9A8A5F2@SEZPR06MB6494.apcprd06.prod.outlook.com
Whole thread Raw
Responses Re: Performance of Query 4 on TPC-DS Benchmark
Re: Performance of Query 4 on TPC-DS Benchmark
Re: Performance of Query 4 on TPC-DS Benchmark
List pgsql-performance
Hi all,

Please see this case:


Query 4 on TPC-DS benchmark:

with year_total as (
 select c_customer_id customer_id
       ,c_first_name customer_first_name
       ,c_last_name customer_last_name
       ,c_preferred_cust_flag customer_preferred_cust_flag
       ,c_birth_country customer_birth_country
       ,c_login customer_login
       ,c_email_address customer_email_address
       ,d_year dyear
       ,sum(((ss_ext_list_price-ss_ext_wholesale_cost-ss_ext_discount_amt)+ss_ext_sales_price)/2) year_total
       ,'s' sale_type
 from customer
     ,store_sales
     ,date_dim
 where c_customer_sk = ss_customer_sk
   and ss_sold_date_sk = d_date_sk
 group by c_customer_id
         ,c_first_name
         ,c_last_name
         ,c_preferred_cust_flag
         ,c_birth_country
         ,c_login
         ,c_email_address
         ,d_year
 union all
 select c_customer_id customer_id
       ,c_first_name customer_first_name
       ,c_last_name customer_last_name
       ,c_preferred_cust_flag customer_preferred_cust_flag
       ,c_birth_country customer_birth_country
       ,c_login customer_login
       ,c_email_address customer_email_address
       ,d_year dyear
       ,sum((((cs_ext_list_price-cs_ext_wholesale_cost-cs_ext_discount_amt)+cs_ext_sales_price)/2) ) year_total
       ,'c' sale_type
 from customer
     ,catalog_sales
     ,date_dim
 where c_customer_sk = cs_bill_customer_sk
   and cs_sold_date_sk = d_date_sk
 group by c_customer_id
         ,c_first_name
         ,c_last_name
         ,c_preferred_cust_flag
         ,c_birth_country
         ,c_login
         ,c_email_address
         ,d_year
union all
 select c_customer_id customer_id
       ,c_first_name customer_first_name
       ,c_last_name customer_last_name
       ,c_preferred_cust_flag customer_preferred_cust_flag
       ,c_birth_country customer_birth_country
       ,c_login customer_login
       ,c_email_address customer_email_address
       ,d_year dyear
       ,sum((((ws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt)+ws_ext_sales_price)/2) ) year_total
       ,'w' sale_type
 from customer
     ,web_sales
     ,date_dim
 where c_customer_sk = ws_bill_customer_sk
   and ws_sold_date_sk = d_date_sk
 group by c_customer_id
         ,c_first_name
         ,c_last_name
         ,c_preferred_cust_flag
         ,c_birth_country
         ,c_login
         ,c_email_address
         ,d_year
         )
  select  
                  t_s_secyear.customer_id
                 ,t_s_secyear.customer_first_name
                 ,t_s_secyear.customer_last_name
                 ,t_s_secyear.customer_email_address
 from year_total t_s_firstyear
     ,year_total t_s_secyear
     ,year_total t_c_firstyear
     ,year_total t_c_secyear
     ,year_total t_w_firstyear
     ,year_total t_w_secyear
 where t_s_secyear.customer_id = t_s_firstyear.customer_id
   and t_s_firstyear.customer_id = t_c_secyear.customer_id
   and t_s_firstyear.customer_id = t_c_firstyear.customer_id
   and t_s_firstyear.customer_id = t_w_firstyear.customer_id
   and t_s_firstyear.customer_id = t_w_secyear.customer_id
   and t_s_firstyear.sale_type = 's'
   and t_c_firstyear.sale_type = 'c'
   and t_w_firstyear.sale_type = 'w'
   and t_s_secyear.sale_type = 's'
   and t_c_secyear.sale_type = 'c'
   and t_w_secyear.sale_type = 'w'
   and t_s_firstyear.dyear =  2001
   and t_s_secyear.dyear = 2001+1
   and t_c_firstyear.dyear =  2001
   and t_c_secyear.dyear =  2001+1
   and t_w_firstyear.dyear = 2001
   and t_w_secyear.dyear = 2001+1
   and t_s_firstyear.year_total > 0
   and t_c_firstyear.year_total > 0
   and t_w_firstyear.year_total > 0
   and case when t_c_firstyear.year_total > 0 then t_c_secyear.year_total / t_c_firstyear.year_total else null end
           > case when t_s_firstyear.year_total > 0 then t_s_secyear.year_total / t_s_firstyear.year_total else null end
   and case when t_c_firstyear.year_total > 0 then t_c_secyear.year_total / t_c_firstyear.year_total else null end
           > case when t_w_firstyear.year_total > 0 then t_w_secyear.year_total / t_w_firstyear.year_total else null end
 order by t_s_secyear.customer_id
         ,t_s_secyear.customer_first_name
         ,t_s_secyear.customer_last_name
         ,t_s_secyear.customer_email_address
limit 100;





The execution time is more than 50 minutes:
                                                                                                                                                                               QUERY PLAN                                                                                                                                                                              
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1255378.56..1255378.57 rows=1 width=132) (actual time=3024403.311..3024403.342 rows=8 loops=1)
   CTE year_total
     ->  Append  (cost=197433.23..461340.62 rows=5041142 width=216) (actual time=4126.043..7897.747 rows=384208 loops=1)
           ->  HashAggregate  (cost=197433.23..233436.60 rows=2880270 width=216) (actual time=4126.042..4231.703 rows=190581 loops=1)
                 Group Key: customer.c_customer_id, customer.c_first_name, customer.c_last_name, customer.c_preferred_cust_flag, customer.c_birth_country, customer.c_login, customer.c_email_address, date_dim.d_year
                 Batches: 1  Memory Usage: 213017kB
                 ->  Hash Join  (cost=8151.60..103824.45 rows=2880270 width=174) (actual time=69.110..1686.608 rows=2685453 loops=1)
                       Hash Cond: (store_sales.ss_sold_date_sk = date_dim.d_date_sk)
                       ->  Hash Join  (cost=5103.00..93214.72 rows=2880270 width=174) (actual time=49.517..1162.567 rows=2750652 loops=1)
                             Hash Cond: (store_sales.ss_customer_sk = customer.c_customer_sk)
                             ->  Seq Scan on store_sales  (cost=0.00..80550.70 rows=2880270 width=30) (actual time=0.018..208.022 rows=2880404 loops=1)
                             ->  Hash  (cost=3853.00..3853.00 rows=100000 width=152) (actual time=49.271..49.271 rows=100000 loops=1)
                                   Buckets: 131072  Batches: 1  Memory Usage: 17161kB
                                   ->  Seq Scan on customer  (cost=0.00..3853.00 rows=100000 width=152) (actual time=0.011..26.448 rows=100000 loops=1)
                       ->  Hash  (cost=2135.49..2135.49 rows=73049 width=8) (actual time=19.369..19.370 rows=73049 loops=1)
                             Buckets: 131072  Batches: 1  Memory Usage: 3878kB
                             ->  Seq Scan on date_dim  (cost=0.00..2135.49 rows=73049 width=8) (actual time=0.037..11.763 rows=73049 loops=1)
           ->  HashAggregate  (cost=114410.03..132428.63 rows=1441488 width=216) (actual time=2369.202..2447.868 rows=136978 loops=1)
                 Group Key: customer_1.c_customer_id, customer_1.c_first_name, customer_1.c_last_name, customer_1.c_preferred_cust_flag, customer_1.c_birth_country, customer_1.c_login, customer_1.c_email_address, date_dim_1.d_year
                 Batches: 1  Memory Usage: 131097kB
                 ->  Hash Join  (cost=8151.60..67561.67 rows=1441488 width=177) (actual time=62.483..974.143 rows=1430939 loops=1)
                       Hash Cond: (catalog_sales.cs_sold_date_sk = date_dim_1.d_date_sk)
                       ->  Hash Join  (cost=5103.00..60728.94 rows=1441488 width=177) (actual time=46.571..687.972 rows=1434519 loops=1)
                             Hash Cond: (catalog_sales.cs_bill_customer_sk = customer_1.c_customer_sk)
                             ->  Seq Scan on catalog_sales  (cost=0.00..51841.88 rows=1441488 width=33) (actual time=0.029..128.238 rows=1441548 loops=1)
                             ->  Hash  (cost=3853.00..3853.00 rows=100000 width=152) (actual time=46.311..46.325 rows=100000 loops=1)
                                   Buckets: 131072  Batches: 1  Memory Usage: 17161kB
                                   ->  Seq Scan on customer customer_1  (cost=0.00..3853.00 rows=100000 width=152) (actual time=0.005..23.350 rows=100000 loops=1)
                       ->  Hash  (cost=2135.49..2135.49 rows=73049 width=8) (actual time=15.677..15.677 rows=73049 loops=1)
                             Buckets: 131072  Batches: 1  Memory Usage: 3878kB
                             ->  Seq Scan on date_dim date_dim_1  (cost=0.00..2135.49 rows=73049 width=8) (actual time=0.015..7.957 rows=73049 loops=1)
           ->  HashAggregate  (cost=61277.38..70269.68 rows=719384 width=216) (actual time=1166.953..1198.730 rows=56649 loops=1)
                 Group Key: customer_2.c_customer_id, customer_2.c_first_name, customer_2.c_last_name, customer_2.c_preferred_cust_flag, customer_2.c_birth_country, customer_2.c_login, customer_2.c_email_address, date_dim_2.d_year
                 Batches: 1  Memory Usage: 57369kB
                 ->  Hash Join  (cost=8151.60..37897.40 rows=719384 width=177) (actual time=68.327..508.594 rows=719119 loops=1)
                       Hash Cond: (web_sales.ws_sold_date_sk = date_dim_2.d_date_sk)
                       ->  Hash Join  (cost=5103.00..32960.30 rows=719384 width=177) (actual time=52.240..357.963 rows=719217 loops=1)
                             Hash Cond: (web_sales.ws_bill_customer_sk = customer_2.c_customer_sk)
                             ->  Seq Scan on web_sales  (cost=0.00..25968.84 rows=719384 width=33) (actual time=0.032..62.464 rows=719384 loops=1)
                             ->  Hash  (cost=3853.00..3853.00 rows=100000 width=152) (actual time=51.959..51.960 rows=100000 loops=1)
                                   Buckets: 131072  Batches: 1  Memory Usage: 17161kB
                                   ->  Seq Scan on customer customer_2  (cost=0.00..3853.00 rows=100000 width=152) (actual time=0.004..25.350 rows=100000 loops=1)
                       ->  Hash  (cost=2135.49..2135.49 rows=73049 width=8) (actual time=15.831..15.834 rows=73049 loops=1)
                             Buckets: 131072  Batches: 1  Memory Usage: 3878kB
                             ->  Seq Scan on date_dim date_dim_2  (cost=0.00..2135.49 rows=73049 width=8) (actual time=0.014..8.100 rows=73049 loops=1)
   ->  Sort  (cost=794037.94..794037.95 rows=1 width=132) (actual time=3024403.310..3024403.313 rows=8 loops=1)
         Sort Key: t_s_secyear.customer_id, t_s_secyear.customer_first_name, t_s_secyear.customer_last_name, t_s_secyear.customer_email_address
         Sort Method: quicksort  Memory: 26kB
         ->  Nested Loop  (cost=0.00..794037.93 rows=1 width=132) (actual time=354851.431..3024403.218 rows=8 loops=1)
               Join Filter: ((t_s_secyear.customer_id = t_w_secyear.customer_id) AND (CASE WHEN (t_c_firstyear.year_total > '0'::numeric) THEN (t_c_secyear.year_total / t_c_firstyear.year_total) ELSE NULL::numeric END > CASE WHEN (t_w_firstyear.year_total > '0'::numeric) THEN (t_w_secyear.year_total / t_w_firstyear.year_total) ELSE NULL::numeric END))
               Rows Removed by Join Filter: 810136
               ->  Nested Loop  (cost=0.00..668006.23 rows=1 width=308) (actual time=33554.075..3021248.646 rows=72 loops=1)
                     Join Filter: ((t_s_secyear.customer_id = t_c_secyear.customer_id) AND (CASE WHEN (t_c_firstyear.year_total > '0'::numeric) THEN (t_c_secyear.year_total / t_c_firstyear.year_total) ELSE NULL::numeric END > CASE WHEN (t_s_firstyear.year_total > '0'::numeric) THEN (t_s_secyear.year_total / t_s_firstyear.year_total) ELSE NULL::numeric END))
                     Rows Removed by Join Filter: 11876277
                     ->  Nested Loop  (cost=0.00..541974.53 rows=1 width=320) (actual time=14866.104..3001271.961 rows=437 loops=1)
                           Join Filter: (t_s_firstyear.customer_id = t_s_secyear.customer_id)
                           Rows Removed by Join Filter: 44702488
                           ->  Nested Loop  (cost=0.00..415941.57 rows=2 width=156) (actual time=11739.944..2946020.749 rows=1171 loops=1)
                                 Join Filter: (t_s_firstyear.customer_id = t_w_firstyear.customer_id)
                                 Rows Removed by Join Filter: 112695277
                                 ->  Nested Loop  (cost=0.00..277302.08 rows=9 width=104) (actual time=8139.729..2351733.795 rows=9952 loops=1)
                                       Join Filter: (t_s_firstyear.customer_id = t_c_firstyear.customer_id)
                                       Rows Removed by Join Filter: 997895870
                                       ->  CTE Scan on year_total t_s_firstyear  (cost=0.00..138631.41 rows=42 width=52) (actual time=4126.046..4234.598 rows=37923 loops=1)
                                             Filter: ((year_total > '0'::numeric) AND (sale_type = 's'::text) AND (dyear = 2001))
                                             Rows Removed by Filter: 346285
                                       ->  CTE Scan on year_total t_c_firstyear  (cost=0.00..138631.41 rows=42 width=52) (actual time=28.926..60.356 rows=26314 loops=37923)
                                             Filter: ((year_total > '0'::numeric) AND (sale_type = 'c'::text) AND (dyear = 2001))
                                             Rows Removed by Filter: 357894
                                 ->  CTE Scan on year_total t_w_firstyear  (cost=0.00..138631.41 rows=42 width=52) (actual time=49.572..59.057 rows=11324 loops=9952)
                                       Filter: ((year_total > '0'::numeric) AND (sale_type = 'w'::text) AND (dyear = 2001))
                                       Rows Removed by Filter: 372884
                           ->  CTE Scan on year_total t_s_secyear  (cost=0.00..126028.55 rows=126 width=164) (actual time=0.002..44.949 rows=38175 loops=1171)
                                 Filter: ((sale_type = 's'::text) AND (dyear = 2002))
                                 Rows Removed by Filter: 346033
                     ->  CTE Scan on year_total t_c_secyear  (cost=0.00..126028.55 rows=126 width=52) (actual time=21.023..44.097 rows=27177 loops=437)
                           Filter: ((sale_type = 'c'::text) AND (dyear = 2002))
                           Rows Removed by Filter: 357031
               ->  CTE Scan on year_total t_w_secyear  (cost=0.00..126028.55 rows=126 width=52) (actual time=36.137..43.090 rows=11252 loops=72)
                     Filter: ((sale_type = 'w'::text) AND (dyear = 2002))
                     Rows Removed by Filter: 372956
 Planning Time: 4.529 ms
 Execution Time: 3024486.695 ms
(83 rows)




If we disable this checking:
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index c42742d2c7..d47d0f0e59 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -555,16 +555,6 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
                                                }
                                                break;
                                        case COSTS_BETTER2:
-                                               if (keyscmp != PATHKEYS_BETTER1)
-                                               {
-                                                       outercmp = bms_subset_compare(PATH_REQ_OUTER(new_path),
-                                                                                                                      PATH_REQ_OUTER(old_path));
-                                                       if ((outercmp == BMS_EQUAL ||
-                                                                outercmp == BMS_SUBSET2) &&
-                                                               new_path->rows >= old_path->rows &&
-                                                               new_path->parallel_safe <= old_path->parallel_safe)
-                                                               accept_new = false; /* old dominates new */
-                                               }
                                                break;
                                        case COSTS_DIFFERENT:
 

The execution time is reduced to around 8 seconds:
                                                                                                                                                                                                                                                                                         QUERY PLAN                                                                                                                                                                                                                                                                                        
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1255392.10..1255392.10 rows=1 width=132) (actual time=8068.870..8068.887 rows=8 loops=1)
   CTE year_total
     ->  Append  (cost=197441.91..461358.17 rows=5041411 width=216) (actual time=4004.918..7547.884 rows=384208 loops=1)
           ->  HashAggregate  (cost=197441.91..233447.56 rows=2880452 width=216) (actual time=4004.917..4112.220 rows=190581 loops=1)
                 Group Key: customer.c_customer_id, customer.c_first_name, customer.c_last_name, customer.c_preferred_cust_flag, customer.c_birth_country, customer.c_login, customer.c_email_address, date_dim.d_year
                 Batches: 1  Memory Usage: 213017kB
                 ->  Hash Join  (cost=8151.60..103827.22 rows=2880452 width=174) (actual time=73.291..1594.391 rows=2685453 loops=1)
                       Hash Cond: (store_sales.ss_sold_date_sk = date_dim.d_date_sk)
                       ->  Hash Join  (cost=5103.00..93217.01 rows=2880452 width=174) (actual time=52.407..1100.642 rows=2750652 loops=1)
                             Hash Cond: (store_sales.ss_customer_sk = customer.c_customer_sk)
                             ->  Seq Scan on store_sales  (cost=0.00..80552.52 rows=2880452 width=30) (actual time=0.017..213.629 rows=2880404 loops=1)
                             ->  Hash  (cost=3853.00..3853.00 rows=100000 width=152) (actual time=52.083..52.083 rows=100000 loops=1)
                                   Buckets: 131072  Batches: 1  Memory Usage: 17161kB
                                   ->  Seq Scan on customer  (cost=0.00..3853.00 rows=100000 width=152) (actual time=0.013..28.104 rows=100000 loops=1)
                       ->  Hash  (cost=2135.49..2135.49 rows=73049 width=8) (actual time=20.589..20.592 rows=73049 loops=1)
                             Buckets: 131072  Batches: 1  Memory Usage: 3878kB
                             ->  Seq Scan on date_dim  (cost=0.00..2135.49 rows=73049 width=8) (actual time=0.036..12.464 rows=73049 loops=1)
           ->  HashAggregate  (cost=114414.19..132433.87 rows=1441575 width=216) (actual time=2186.304..2263.365 rows=136978 loops=1)
                 Group Key: customer_1.c_customer_id, customer_1.c_first_name, customer_1.c_last_name, customer_1.c_preferred_cust_flag, customer_1.c_birth_country, customer_1.c_login, customer_1.c_email_address, date_dim_1.d_year
                 Batches: 1  Memory Usage: 131097kB
                 ->  Hash Join  (cost=8151.60..67563.00 rows=1441575 width=177) (actual time=63.765..880.282 rows=1430939 loops=1)
                       Hash Cond: (catalog_sales.cs_sold_date_sk = date_dim_1.d_date_sk)
                       ->  Hash Join  (cost=5103.00..60730.04 rows=1441575 width=177) (actual time=47.735..619.585 rows=1434519 loops=1)
                             Hash Cond: (catalog_sales.cs_bill_customer_sk = customer_1.c_customer_sk)
                             ->  Seq Scan on catalog_sales  (cost=0.00..51842.75 rows=1441575 width=33) (actual time=0.033..121.738 rows=1441548 loops=1)
                             ->  Hash  (cost=3853.00..3853.00 rows=100000 width=152) (actual time=47.433..47.433 rows=100000 loops=1)
                                   Buckets: 131072  Batches: 1  Memory Usage: 17161kB
                                   ->  Seq Scan on customer customer_1  (cost=0.00..3853.00 rows=100000 width=152) (actual time=0.004..23.960 rows=100000 loops=1)
                       ->  Hash  (cost=2135.49..2135.49 rows=73049 width=8) (actual time=15.755..15.756 rows=73049 loops=1)
                             Buckets: 131072  Batches: 1  Memory Usage: 3878kB
                             ->  Seq Scan on date_dim date_dim_1  (cost=0.00..2135.49 rows=73049 width=8) (actual time=0.017..8.030 rows=73049 loops=1)
           ->  HashAggregate  (cost=61277.38..70269.68 rows=719384 width=216) (actual time=1121.896..1152.982 rows=56649 loops=1)
                 Group Key: customer_2.c_customer_id, customer_2.c_first_name, customer_2.c_last_name, customer_2.c_preferred_cust_flag, customer_2.c_birth_country, customer_2.c_login, customer_2.c_email_address, date_dim_2.d_year
                 Batches: 1  Memory Usage: 57369kB
                 ->  Hash Join  (cost=8151.60..37897.40 rows=719384 width=177) (actual time=63.239..470.700 rows=719119 loops=1)
                       Hash Cond: (web_sales.ws_sold_date_sk = date_dim_2.d_date_sk)
                       ->  Hash Join  (cost=5103.00..32960.30 rows=719384 width=177) (actual time=47.365..329.499 rows=719217 loops=1)
                             Hash Cond: (web_sales.ws_bill_customer_sk = customer_2.c_customer_sk)
                             ->  Seq Scan on web_sales  (cost=0.00..25968.84 rows=719384 width=33) (actual time=0.036..60.068 rows=719384 loops=1)
                             ->  Hash  (cost=3853.00..3853.00 rows=100000 width=152) (actual time=47.076..47.078 rows=100000 loops=1)
                                   Buckets: 131072  Batches: 1  Memory Usage: 17161kB
                                   ->  Seq Scan on customer customer_2  (cost=0.00..3853.00 rows=100000 width=152) (actual time=0.005..23.909 rows=100000 loops=1)
                       ->  Hash  (cost=2135.49..2135.49 rows=73049 width=8) (actual time=15.615..15.615 rows=73049 loops=1)
                             Buckets: 131072  Batches: 1  Memory Usage: 3878kB
                             ->  Seq Scan on date_dim date_dim_2  (cost=0.00..2135.49 rows=73049 width=8) (actual time=0.019..8.060 rows=73049 loops=1)
   ->  Sort  (cost=794033.93..794033.94 rows=1 width=132) (actual time=8068.869..8068.872 rows=8 loops=1)
         Sort Key: t_s_secyear.customer_id, t_s_secyear.customer_first_name, t_s_secyear.customer_last_name, t_s_secyear.customer_email_address
         Sort Method: quicksort  Memory: 26kB
         ->  Hash Join  (cost=794032.91..794033.92 rows=1 width=132) (actual time=8061.079..8068.861 rows=8 loops=1)
               Hash Cond: (t_s_secyear.customer_id = t_s_firstyear.customer_id)
               Join Filter: ((CASE WHEN (t_c_firstyear.year_total > '0'::numeric) THEN (t_c_secyear.year_total / t_c_firstyear.year_total) ELSE NULL::numeric END > CASE WHEN (t_s_firstyear.year_total > '0'::numeric) THEN (t_s_secyear.year_total / t_s_firstyear.year_total) ELSE NULL::numeric END) AND (CASE WHEN (t_c_firstyear.year_total > '0'::numeric) THEN (t_c_secyear.year_total / t_c_firstyear.year_total) ELSE NULL::numeric END > CASE WHEN (t_w_firstyear.year_total > '0'::numeric) THEN (t_w_secyear.year_total / t_w_firstyear.year_total) ELSE NULL::numeric END))
               Rows Removed by Join Filter: 11
               ->  Merge Join  (cost=390716.09..390717.01 rows=16 width=268) (actual time=7877.494..7885.971 rows=2793 loops=1)
                     Merge Cond: (t_s_secyear.customer_id = t_c_secyear.customer_id)
                     ->  Sort  (cost=126039.67..126039.99 rows=126 width=164) (actual time=7764.696..7768.885 rows=38175 loops=1)
                           Sort Key: t_s_secyear.customer_id
                           Sort Method: quicksort  Memory: 7021kB
                           ->  CTE Scan on year_total t_s_secyear  (cost=0.00..126035.28 rows=126 width=164) (actual time=4004.934..7746.174 rows=38175 loops=1)
                                 Filter: ((sale_type = 's'::text) AND (dyear = 2002))
                                 Rows Removed by Filter: 346033
                     ->  Sort  (cost=264676.42..264676.48 rows=26 width=104) (actual time=112.790..113.328 rows=7188 loops=1)
                           Sort Key: t_c_firstyear.customer_id
                           Sort Method: quicksort  Memory: 687kB
                           ->  Hash Join  (cost=138639.33..264675.81 rows=26 width=104) (actual time=83.825..109.977 rows=7188 loops=1)
                                 Hash Cond: (t_c_secyear.customer_id = t_c_firstyear.customer_id)
                                 ->  CTE Scan on year_total t_c_secyear  (cost=0.00..126035.28 rows=126 width=52) (actual time=20.337..42.859 rows=27177 loops=1)
                                       Filter: ((sale_type = 'c'::text) AND (dyear = 2002))
                                       Rows Removed by Filter: 357031
                                 ->  Hash  (cost=138638.80..138638.80 rows=42 width=52) (actual time=63.460..63.461 rows=26314 loops=1)
                                       Buckets: 32768 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 1725kB
                                       ->  CTE Scan on year_total t_c_firstyear  (cost=0.00..138638.80 rows=42 width=52) (actual time=28.367..59.710 rows=26314 loops=1)
                                             Filter: ((year_total > '0'::numeric) AND (sale_type = 'c'::text) AND (dyear = 2001))
                                             Rows Removed by Filter: 357894
               ->  Hash  (cost=403316.74..403316.74 rows=6 width=156) (actual time=182.649..182.650 rows=479 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 59kB
                     ->  Merge Join  (cost=403316.35..403316.74 rows=6 width=156) (actual time=177.380..182.540 rows=479 loops=1)
                           Merge Cond: (t_s_firstyear.customer_id = t_w_firstyear.customer_id)
                           ->  Sort  (cost=138639.93..138640.04 rows=42 width=52) (actual time=75.347..77.869 rows=37909 loops=1)
                                 Sort Key: t_s_firstyear.customer_id
                                 Sort Method: quicksort  Memory: 3273kB
                                 ->  CTE Scan on year_total t_s_firstyear  (cost=0.00..138638.80 rows=42 width=52) (actual time=0.002..60.630 rows=37923 loops=1)
                                       Filter: ((year_total > '0'::numeric) AND (sale_type = 's'::text) AND (dyear = 2001))
                                       Rows Removed by Filter: 346285
                           ->  Sort  (cost=264676.42..264676.48 rows=26 width=104) (actual time=102.001..102.053 rows=1306 loops=1)
                                 Sort Key: t_w_firstyear.customer_id
                                 Sort Method: quicksort  Memory: 139kB
                                 ->  Hash Join  (cost=138639.33..264675.81 rows=26 width=104) (actual time=93.653..101.516 rows=1306 loops=1)
                                       Hash Cond: (t_w_secyear.customer_id = t_w_firstyear.customer_id)
                                       ->  CTE Scan on year_total t_w_secyear  (cost=0.00..126035.28 rows=126 width=52) (actual time=33.759..40.504 rows=11252 loops=1)
                                             Filter: ((sale_type = 'w'::text) AND (dyear = 2002))
                                             Rows Removed by Filter: 372956
                                       ->  Hash  (cost=138638.80..138638.80 rows=42 width=52) (actual time=59.875..59.875 rows=11324 loops=1)
                                             Buckets: 16384 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 763kB
                                             ->  CTE Scan on year_total t_w_firstyear  (cost=0.00..138638.80 rows=42 width=52) (actual time=48.632..58.272 rows=11324 loops=1)
                                                   Filter: ((year_total > '0'::numeric) AND (sale_type = 'w'::text) AND (dyear = 2001))
                                                   Rows Removed by Filter: 372884
 Planning Time: 6.118 ms
 Execution Time: 8179.205 ms
(98 rows)



The execution time is reduced by 99%. Similarly to my previous reports, I wonder if we can optimize the logic somewhere to improve the performance.


Environment:

The default configurations of PostgreSQL incur the error: "ERROR:  could not resize shared memory segment "/PostgreSQL.3539600020" to 2097152 bytes: No space left on device"
Therefore, to run this query, I set up "work_mem = 1GB"

For the benchmark, I used 1 GB data, and my entire data folder can be downloaded here: https://drive.google.com/file/d/1iK5gfyKudfn2BczpoZbNRY_IAD_rITZu/view?usp=sharing

The connection string is: postgresql://ubuntu:ubuntu@127.0.0.1:5432/tpcds"

tpch=# select version();
                                             version                                              
--------------------------------------------------------------------------------------------------
 PostgreSQL 17.0 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 13.2.0-23ubuntu4) 13.2.0, 64-bit
(1 row)




Best regards,

Jinsheng Ba

 

Notice: This email is generated from the account of an NUS alumnus. Contents, views, and opinions therein are solely those of the sender.

pgsql-performance by date:

Previous
From: Achilleas Mantzios
Date:
Subject: Re: tds_fdw : Severe performance degradation from postgresql 10.23 to 16.4
Next
From: Thomas Munro
Date:
Subject: Re: Performance of Query 4 on TPC-DS Benchmark