Re: Performance of Query 4 on TPC-DS Benchmark - Mailing list pgsql-performance
From | Alena Rybakina |
---|---|
Subject | Re: Performance of Query 4 on TPC-DS Benchmark |
Date | |
Msg-id | f1b0fa27-dba4-4025-9113-eeb3e251b256@postgrespro.ru Whole thread Raw |
In response to | Performance of Query 4 on TPC-DS Benchmark (Ba Jinsheng <bajinsheng@u.nus.edu>) |
Responses |
Re: Performance of Query 4 on TPC-DS Benchmark
|
List | pgsql-performance |
Hi!
On 10.11.2024 22:35, Ba Jinsheng wrote:
I have noticed significant underestimation here. In addition, these nodes are executed noticeably slower in the plan, you can see it by the current time in the explain.P {margin-top:0;margin-bottom:0;} Hi all,Please see this case:Query 4 on TPC-DS benchmark:with year_total as (select c_customer_id customer_id,c_first_name customer_first_name,c_last_name customer_last_name,c_preferred_cust_flag customer_preferred_cust_flag,c_birth_country customer_birth_country,c_login customer_login,c_email_address customer_email_address,d_year dyear,sum(((ss_ext_list_price-ss_ext_wholesale_cost-ss_ext_discount_amt)+ss_ext_sales_price)/2) year_total,'s' sale_typefrom customer,store_sales,date_dimwhere c_customer_sk = ss_customer_skand ss_sold_date_sk = d_date_skgroup by c_customer_id,c_first_name,c_last_name,c_preferred_cust_flag,c_birth_country,c_login,c_email_address,d_yearunion allselect c_customer_id customer_id,c_first_name customer_first_name,c_last_name customer_last_name,c_preferred_cust_flag customer_preferred_cust_flag,c_birth_country customer_birth_country,c_login customer_login,c_email_address customer_email_address,d_year dyear,sum((((cs_ext_list_price-cs_ext_wholesale_cost-cs_ext_discount_amt)+cs_ext_sales_price)/2) ) year_total,'c' sale_typefrom customer,catalog_sales,date_dimwhere c_customer_sk = cs_bill_customer_skand cs_sold_date_sk = d_date_skgroup by c_customer_id,c_first_name,c_last_name,c_preferred_cust_flag,c_birth_country,c_login,c_email_address,d_yearunion allselect c_customer_id customer_id,c_first_name customer_first_name,c_last_name customer_last_name,c_preferred_cust_flag customer_preferred_cust_flag,c_birth_country customer_birth_country,c_login customer_login,c_email_address customer_email_address,d_year dyear,sum((((ws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt)+ws_ext_sales_price)/2) ) year_total,'w' sale_typefrom customer,web_sales,date_dimwhere c_customer_sk = ws_bill_customer_skand ws_sold_date_sk = d_date_skgroup by c_customer_id,c_first_name,c_last_name,c_preferred_cust_flag,c_birth_country,c_login,c_email_address,d_year)selectt_s_secyear.customer_id,t_s_secyear.customer_first_name,t_s_secyear.customer_last_name,t_s_secyear.customer_email_addressfrom year_total t_s_firstyear,year_total t_s_secyear,year_total t_c_firstyear,year_total t_c_secyear,year_total t_w_firstyear,year_total t_w_secyearwhere t_s_secyear.customer_id = t_s_firstyear.customer_idand t_s_firstyear.customer_id = t_c_secyear.customer_idand t_s_firstyear.customer_id = t_c_firstyear.customer_idand t_s_firstyear.customer_id = t_w_firstyear.customer_idand t_s_firstyear.customer_id = t_w_secyear.customer_idand t_s_firstyear.sale_type = 's'and t_c_firstyear.sale_type = 'c'and t_w_firstyear.sale_type = 'w'and t_s_secyear.sale_type = 's'and t_c_secyear.sale_type = 'c'and t_w_secyear.sale_type = 'w'and t_s_firstyear.dyear = 2001and t_s_secyear.dyear = 2001+1and t_c_firstyear.dyear = 2001and t_c_secyear.dyear = 2001+1and t_w_firstyear.dyear = 2001and t_w_secyear.dyear = 2001+1and t_s_firstyear.year_total > 0and t_c_firstyear.year_total > 0and t_w_firstyear.year_total > 0and case when t_c_firstyear.year_total > 0 then t_c_secyear.year_total / t_c_firstyear.year_total else null end> case when t_s_firstyear.year_total > 0 then t_s_secyear.year_total / t_s_firstyear.year_total else null endand case when t_c_firstyear.year_total > 0 then t_c_secyear.year_total / t_c_firstyear.year_total else null end> case when t_w_firstyear.year_total > 0 then t_w_secyear.year_total / t_w_firstyear.year_total else null endorder by t_s_secyear.customer_id,t_s_secyear.customer_first_name,t_s_secyear.customer_last_name,t_s_secyear.customer_email_addresslimit 100;The execution time is more than 50 minutes:QUERY PLAN------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Limit (cost=1255378.56..1255378.57 rows=1 width=132) (actual time=3024403.311..3024403.342 rows=8 loops=1)CTE year_total-> Append (cost=197433.23..461340.62 rows=5041142 width=216) (actual time=4126.043..7897.747 rows=384208 loops=1)-> HashAggregate (cost=197433.23..233436.60 rows=2880270 width=216) (actual time=4126.042..4231.703 rows=190581 loops=1)Group Key: customer.c_customer_id, customer.c_first_name, customer.c_last_name, customer.c_preferred_cust_flag, customer.c_birth_country, customer.c_login, customer.c_email_address, date_dim.d_yearBatches: 1 Memory Usage: 213017kB-> Hash Join (cost=8151.60..103824.45 rows=2880270 width=174) (actual time=69.110..1686.608 rows=2685453 loops=1)Hash Cond: (store_sales.ss_sold_date_sk = date_dim.d_date_sk)-> Hash Join (cost=5103.00..93214.72 rows=2880270 width=174) (actual time=49.517..1162.567 rows=2750652 loops=1)Hash Cond: (store_sales.ss_customer_sk = customer.c_customer_sk)-> Seq Scan on store_sales (cost=0.00..80550.70 rows=2880270 width=30) (actual time=0.018..208.022 rows=2880404 loops=1)-> Hash (cost=3853.00..3853.00 rows=100000 width=152) (actual time=49.271..49.271 rows=100000 loops=1)Buckets: 131072 Batches: 1 Memory Usage: 17161kB-> Seq Scan on customer (cost=0.00..3853.00 rows=100000 width=152) (actual time=0.011..26.448 rows=100000 loops=1)-> Hash (cost=2135.49..2135.49 rows=73049 width=8) (actual time=19.369..19.370 rows=73049 loops=1)Buckets: 131072 Batches: 1 Memory Usage: 3878kB-> Seq Scan on date_dim (cost=0.00..2135.49 rows=73049 width=8) (actual time=0.037..11.763 rows=73049 loops=1)-> HashAggregate (cost=114410.03..132428.63 rows=1441488 width=216) (actual time=2369.202..2447.868 rows=136978 loops=1)Group Key: customer_1.c_customer_id, customer_1.c_first_name, customer_1.c_last_name, customer_1.c_preferred_cust_flag, customer_1.c_birth_country, customer_1.c_login, customer_1.c_email_address, date_dim_1.d_yearBatches: 1 Memory Usage: 131097kB-> Hash Join (cost=8151.60..67561.67 rows=1441488 width=177) (actual time=62.483..974.143 rows=1430939 loops=1)Hash Cond: (catalog_sales.cs_sold_date_sk = date_dim_1.d_date_sk)-> Hash Join (cost=5103.00..60728.94 rows=1441488 width=177) (actual time=46.571..687.972 rows=1434519 loops=1)Hash Cond: (catalog_sales.cs_bill_customer_sk = customer_1.c_customer_sk)-> Seq Scan on catalog_sales (cost=0.00..51841.88 rows=1441488 width=33) (actual time=0.029..128.238 rows=1441548 loops=1)-> Hash (cost=3853.00..3853.00 rows=100000 width=152) (actual time=46.311..46.325 rows=100000 loops=1)Buckets: 131072 Batches: 1 Memory Usage: 17161kB-> Seq Scan on customer customer_1 (cost=0.00..3853.00 rows=100000 width=152) (actual time=0.005..23.350 rows=100000 loops=1)-> Hash (cost=2135.49..2135.49 rows=73049 width=8) (actual time=15.677..15.677 rows=73049 loops=1)Buckets: 131072 Batches: 1 Memory Usage: 3878kB-> Seq Scan on date_dim date_dim_1 (cost=0.00..2135.49 rows=73049 width=8) (actual time=0.015..7.957 rows=73049 loops=1)-> HashAggregate (cost=61277.38..70269.68 rows=719384 width=216) (actual time=1166.953..1198.730 rows=56649 loops=1)Group Key: customer_2.c_customer_id, customer_2.c_first_name, customer_2.c_last_name, customer_2.c_preferred_cust_flag, customer_2.c_birth_country, customer_2.c_login, customer_2.c_email_address, date_dim_2.d_yearBatches: 1 Memory Usage: 57369kB-> Hash Join (cost=8151.60..37897.40 rows=719384 width=177) (actual time=68.327..508.594 rows=719119 loops=1)Hash Cond: (web_sales.ws_sold_date_sk = date_dim_2.d_date_sk)-> Hash Join (cost=5103.00..32960.30 rows=719384 width=177) (actual time=52.240..357.963 rows=719217 loops=1)Hash Cond: (web_sales.ws_bill_customer_sk = customer_2.c_customer_sk)-> Seq Scan on web_sales (cost=0.00..25968.84 rows=719384 width=33) (actual time=0.032..62.464 rows=719384 loops=1)-> Hash (cost=3853.00..3853.00 rows=100000 width=152) (actual time=51.959..51.960 rows=100000 loops=1)Buckets: 131072 Batches: 1 Memory Usage: 17161kB-> Seq Scan on customer customer_2 (cost=0.00..3853.00 rows=100000 width=152) (actual time=0.004..25.350 rows=100000 loops=1)-> Hash (cost=2135.49..2135.49 rows=73049 width=8) (actual time=15.831..15.834 rows=73049 loops=1)Buckets: 131072 Batches: 1 Memory Usage: 3878kB-> Seq Scan on date_dim date_dim_2 (cost=0.00..2135.49 rows=73049 width=8) (actual time=0.014..8.100 rows=73049 loops=1)-> Sort (cost=794037.94..794037.95 rows=1 width=132) (actual time=3024403.310..3024403.313 rows=8 loops=1)Sort Key: t_s_secyear.customer_id, t_s_secyear.customer_first_name, t_s_secyear.customer_last_name, t_s_secyear.customer_email_addressSort Method: quicksort Memory: 26kB-> Nested Loop (cost=0.00..794037.93 rows=1 width=132) (actual time=354851.431..3024403.218 rows=8 loops=1)Join Filter: ((t_s_secyear.customer_id = t_w_secyear.customer_id) AND (CASE WHEN (t_c_firstyear.year_total > '0'::numeric) THEN (t_c_secyear.year_total / t_c_firstyear.year_total) ELSE NULL::numeric END > CASE WHEN (t_w_firstyear.year_total > '0'::numeric) THEN (t_w_secyear.year_total / t_w_firstyear.year_total) ELSE NULL::numeric END))Rows Removed by Join Filter: 810136-> Nested Loop (cost=0.00..668006.23 rows=1 width=308) (actual time=33554.075..3021248.646 rows=72 loops=1)Join Filter: ((t_s_secyear.customer_id = t_c_secyear.customer_id) AND (CASE WHEN (t_c_firstyear.year_total > '0'::numeric) THEN (t_c_secyear.year_total / t_c_firstyear.year_total) ELSE NULL::numeric END > CASE WHEN (t_s_firstyear.year_total > '0'::numeric) THEN (t_s_secyear.year_total / t_s_firstyear.year_total) ELSE NULL::numeric END))Rows Removed by Join Filter: 11876277-> Nested Loop (cost=0.00..541974.53 rows=1 width=320) (actual time=14866.104..3001271.961 rows=437 loops=1)Join Filter: (t_s_firstyear.customer_id = t_s_secyear.customer_id)Rows Removed by Join Filter: 44702488-> Nested Loop (cost=0.00..415941.57 rows=2 width=156) (actual time=11739.944..2946020.749 rows=1171 loops=1)Join Filter: (t_s_firstyear.customer_id = t_w_firstyear.customer_id)Rows Removed by Join Filter: 112695277-> Nested Loop (cost=0.00..277302.08 rows=9 width=104) (actual time=8139.729..2351733.795 rows=9952 loops=1)Join Filter: (t_s_firstyear.customer_id = t_c_firstyear.customer_id)Rows Removed by Join Filter: 997895870-> CTE Scan on year_total t_s_firstyear (cost=0.00..138631.41 rows=42 width=52) (actual time=4126.046..4234.598 rows=37923 loops=1)Filter: ((year_total > '0'::numeric) AND (sale_type = 's'::text) AND (dyear = 2001))Rows Removed by Filter: 346285-> CTE Scan on year_total t_c_firstyear (cost=0.00..138631.41 rows=42 width=52) (actual time=28.926..60.356 rows=26314 loops=37923)Filter: ((year_total > '0'::numeric) AND (sale_type = 'c'::text) AND (dyear = 2001))Rows Removed by Filter: 357894-> CTE Scan on year_total t_w_firstyear (cost=0.00..138631.41 rows=42 width=52) (actual time=49.572..59.057 rows=11324 loops=9952)Filter: ((year_total > '0'::numeric) AND (sale_type = 'w'::text) AND (dyear = 2001))Rows Removed by Filter: 372884-> CTE Scan on year_total t_s_secyear (cost=0.00..126028.55 rows=126 width=164) (actual time=0.002..44.949 rows=38175 loops=1171)Filter: ((sale_type = 's'::text) AND (dyear = 2002))Rows Removed by Filter: 346033-> CTE Scan on year_total t_c_secyear (cost=0.00..126028.55 rows=126 width=52) (actual time=21.023..44.097 rows=27177 loops=437)Filter: ((sale_type = 'c'::text) AND (dyear = 2002))Rows Removed by Filter: 357031-> CTE Scan on year_total t_w_secyear (cost=0.00..126028.55 rows=126 width=52) (actual time=36.137..43.090 rows=11252 loops=72)Filter: ((sale_type = 'w'::text) AND (dyear = 2002))Rows Removed by Filter: 372956Planning Time: 4.529 msExecution Time: 3024486.695 ms(83 rows)
Have you tried any tools to improve the cardinality yet, like aqo [0]?
[0] https://github.com/postgrespro/aqo
-- Regards, Alena Rybakina Postgres Professional
pgsql-performance by date: