Thread: Performance of Query 4 on TPC-DS Benchmark
Hi all,
Please see this case:
Query 4 on TPC-DS benchmark:
with year_total as (
select c_customer_id customer_id
,c_first_name customer_first_name
,c_last_name customer_last_name
,c_preferred_cust_flag customer_preferred_cust_flag
,c_birth_country customer_birth_country
,c_login customer_login
,c_email_address customer_email_address
,d_year dyear
,sum(((ss_ext_list_price-ss_ext_wholesale_cost-ss_ext_discount_amt)+ss_ext_sales_price)/2) year_total
,'s' sale_type
from customer
,store_sales
,date_dim
where c_customer_sk = ss_customer_sk
and ss_sold_date_sk = d_date_sk
group by c_customer_id
,c_first_name
,c_last_name
,c_preferred_cust_flag
,c_birth_country
,c_login
,c_email_address
,d_year
union all
select c_customer_id customer_id
,c_first_name customer_first_name
,c_last_name customer_last_name
,c_preferred_cust_flag customer_preferred_cust_flag
,c_birth_country customer_birth_country
,c_login customer_login
,c_email_address customer_email_address
,d_year dyear
,sum((((cs_ext_list_price-cs_ext_wholesale_cost-cs_ext_discount_amt)+cs_ext_sales_price)/2) ) year_total
,'c' sale_type
from customer
,catalog_sales
,date_dim
where c_customer_sk = cs_bill_customer_sk
and cs_sold_date_sk = d_date_sk
group by c_customer_id
,c_first_name
,c_last_name
,c_preferred_cust_flag
,c_birth_country
,c_login
,c_email_address
,d_year
union all
select c_customer_id customer_id
,c_first_name customer_first_name
,c_last_name customer_last_name
,c_preferred_cust_flag customer_preferred_cust_flag
,c_birth_country customer_birth_country
,c_login customer_login
,c_email_address customer_email_address
,d_year dyear
,sum((((ws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt)+ws_ext_sales_price)/2) ) year_total
,'w' sale_type
from customer
,web_sales
,date_dim
where c_customer_sk = ws_bill_customer_sk
and ws_sold_date_sk = d_date_sk
group by c_customer_id
,c_first_name
,c_last_name
,c_preferred_cust_flag
,c_birth_country
,c_login
,c_email_address
,d_year
)
select
t_s_secyear.customer_id
,t_s_secyear.customer_first_name
,t_s_secyear.customer_last_name
,t_s_secyear.customer_email_address
from year_total t_s_firstyear
,year_total t_s_secyear
,year_total t_c_firstyear
,year_total t_c_secyear
,year_total t_w_firstyear
,year_total t_w_secyear
where t_s_secyear.customer_id = t_s_firstyear.customer_id
and t_s_firstyear.customer_id = t_c_secyear.customer_id
and t_s_firstyear.customer_id = t_c_firstyear.customer_id
and t_s_firstyear.customer_id = t_w_firstyear.customer_id
and t_s_firstyear.customer_id = t_w_secyear.customer_id
and t_s_firstyear.sale_type = 's'
and t_c_firstyear.sale_type = 'c'
and t_w_firstyear.sale_type = 'w'
and t_s_secyear.sale_type = 's'
and t_c_secyear.sale_type = 'c'
and t_w_secyear.sale_type = 'w'
and t_s_firstyear.dyear = 2001
and t_s_secyear.dyear = 2001+1
and t_c_firstyear.dyear = 2001
and t_c_secyear.dyear = 2001+1
and t_w_firstyear.dyear = 2001
and t_w_secyear.dyear = 2001+1
and t_s_firstyear.year_total > 0
and t_c_firstyear.year_total > 0
and t_w_firstyear.year_total > 0
and case when t_c_firstyear.year_total > 0 then t_c_secyear.year_total / t_c_firstyear.year_total else null end
> case when t_s_firstyear.year_total > 0 then t_s_secyear.year_total / t_s_firstyear.year_total else null end
and case when t_c_firstyear.year_total > 0 then t_c_secyear.year_total / t_c_firstyear.year_total else null end
> case when t_w_firstyear.year_total > 0 then t_w_secyear.year_total / t_w_firstyear.year_total else null end
order by t_s_secyear.customer_id
,t_s_secyear.customer_first_name
,t_s_secyear.customer_last_name
,t_s_secyear.customer_email_address
limit 100;
The execution time is more than 50 minutes:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1255378.56..1255378.57 rows=1 width=132) (actual time=3024403.311..3024403.342 rows=8 loops=1)
CTE year_total
-> Append (cost=197433.23..461340.62 rows=5041142 width=216) (actual time=4126.043..7897.747 rows=384208 loops=1)
-> HashAggregate (cost=197433.23..233436.60 rows=2880270 width=216) (actual time=4126.042..4231.703 rows=190581 loops=1)
Group Key: customer.c_customer_id, customer.c_first_name, customer.c_last_name, customer.c_preferred_cust_flag, customer.c_birth_country, customer.c_login, customer.c_email_address, date_dim.d_year
Batches: 1 Memory Usage: 213017kB
-> Hash Join (cost=8151.60..103824.45 rows=2880270 width=174) (actual time=69.110..1686.608 rows=2685453 loops=1)
Hash Cond: (store_sales.ss_sold_date_sk = date_dim.d_date_sk)
-> Hash Join (cost=5103.00..93214.72 rows=2880270 width=174) (actual time=49.517..1162.567 rows=2750652 loops=1)
Hash Cond: (store_sales.ss_customer_sk = customer.c_customer_sk)
-> Seq Scan on store_sales (cost=0.00..80550.70 rows=2880270 width=30) (actual time=0.018..208.022 rows=2880404 loops=1)
-> Hash (cost=3853.00..3853.00 rows=100000 width=152) (actual time=49.271..49.271 rows=100000 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 17161kB
-> Seq Scan on customer (cost=0.00..3853.00 rows=100000 width=152) (actual time=0.011..26.448 rows=100000 loops=1)
-> Hash (cost=2135.49..2135.49 rows=73049 width=8) (actual time=19.369..19.370 rows=73049 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 3878kB
-> Seq Scan on date_dim (cost=0.00..2135.49 rows=73049 width=8) (actual time=0.037..11.763 rows=73049 loops=1)
-> HashAggregate (cost=114410.03..132428.63 rows=1441488 width=216) (actual time=2369.202..2447.868 rows=136978 loops=1)
Group Key: customer_1.c_customer_id, customer_1.c_first_name, customer_1.c_last_name, customer_1.c_preferred_cust_flag, customer_1.c_birth_country, customer_1.c_login, customer_1.c_email_address, date_dim_1.d_year
Batches: 1 Memory Usage: 131097kB
-> Hash Join (cost=8151.60..67561.67 rows=1441488 width=177) (actual time=62.483..974.143 rows=1430939 loops=1)
Hash Cond: (catalog_sales.cs_sold_date_sk = date_dim_1.d_date_sk)
-> Hash Join (cost=5103.00..60728.94 rows=1441488 width=177) (actual time=46.571..687.972 rows=1434519 loops=1)
Hash Cond: (catalog_sales.cs_bill_customer_sk = customer_1.c_customer_sk)
-> Seq Scan on catalog_sales (cost=0.00..51841.88 rows=1441488 width=33) (actual time=0.029..128.238 rows=1441548 loops=1)
-> Hash (cost=3853.00..3853.00 rows=100000 width=152) (actual time=46.311..46.325 rows=100000 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 17161kB
-> Seq Scan on customer customer_1 (cost=0.00..3853.00 rows=100000 width=152) (actual time=0.005..23.350 rows=100000 loops=1)
-> Hash (cost=2135.49..2135.49 rows=73049 width=8) (actual time=15.677..15.677 rows=73049 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 3878kB
-> Seq Scan on date_dim date_dim_1 (cost=0.00..2135.49 rows=73049 width=8) (actual time=0.015..7.957 rows=73049 loops=1)
-> HashAggregate (cost=61277.38..70269.68 rows=719384 width=216) (actual time=1166.953..1198.730 rows=56649 loops=1)
Group Key: customer_2.c_customer_id, customer_2.c_first_name, customer_2.c_last_name, customer_2.c_preferred_cust_flag, customer_2.c_birth_country, customer_2.c_login, customer_2.c_email_address, date_dim_2.d_year
Batches: 1 Memory Usage: 57369kB
-> Hash Join (cost=8151.60..37897.40 rows=719384 width=177) (actual time=68.327..508.594 rows=719119 loops=1)
Hash Cond: (web_sales.ws_sold_date_sk = date_dim_2.d_date_sk)
-> Hash Join (cost=5103.00..32960.30 rows=719384 width=177) (actual time=52.240..357.963 rows=719217 loops=1)
Hash Cond: (web_sales.ws_bill_customer_sk = customer_2.c_customer_sk)
-> Seq Scan on web_sales (cost=0.00..25968.84 rows=719384 width=33) (actual time=0.032..62.464 rows=719384 loops=1)
-> Hash (cost=3853.00..3853.00 rows=100000 width=152) (actual time=51.959..51.960 rows=100000 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 17161kB
-> Seq Scan on customer customer_2 (cost=0.00..3853.00 rows=100000 width=152) (actual time=0.004..25.350 rows=100000 loops=1)
-> Hash (cost=2135.49..2135.49 rows=73049 width=8) (actual time=15.831..15.834 rows=73049 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 3878kB
-> Seq Scan on date_dim date_dim_2 (cost=0.00..2135.49 rows=73049 width=8) (actual time=0.014..8.100 rows=73049 loops=1)
-> Sort (cost=794037.94..794037.95 rows=1 width=132) (actual time=3024403.310..3024403.313 rows=8 loops=1)
Sort Key: t_s_secyear.customer_id, t_s_secyear.customer_first_name, t_s_secyear.customer_last_name, t_s_secyear.customer_email_address
Sort Method: quicksort Memory: 26kB
-> Nested Loop (cost=0.00..794037.93 rows=1 width=132) (actual time=354851.431..3024403.218 rows=8 loops=1)
Join Filter: ((t_s_secyear.customer_id = t_w_secyear.customer_id) AND (CASE WHEN (t_c_firstyear.year_total > '0'::numeric) THEN (t_c_secyear.year_total / t_c_firstyear.year_total) ELSE NULL::numeric END > CASE WHEN (t_w_firstyear.year_total > '0'::numeric) THEN (t_w_secyear.year_total / t_w_firstyear.year_total) ELSE NULL::numeric END))
Rows Removed by Join Filter: 810136
-> Nested Loop (cost=0.00..668006.23 rows=1 width=308) (actual time=33554.075..3021248.646 rows=72 loops=1)
Join Filter: ((t_s_secyear.customer_id = t_c_secyear.customer_id) AND (CASE WHEN (t_c_firstyear.year_total > '0'::numeric) THEN (t_c_secyear.year_total / t_c_firstyear.year_total) ELSE NULL::numeric END > CASE WHEN (t_s_firstyear.year_total > '0'::numeric) THEN (t_s_secyear.year_total / t_s_firstyear.year_total) ELSE NULL::numeric END))
Rows Removed by Join Filter: 11876277
-> Nested Loop (cost=0.00..541974.53 rows=1 width=320) (actual time=14866.104..3001271.961 rows=437 loops=1)
Join Filter: (t_s_firstyear.customer_id = t_s_secyear.customer_id)
Rows Removed by Join Filter: 44702488
-> Nested Loop (cost=0.00..415941.57 rows=2 width=156) (actual time=11739.944..2946020.749 rows=1171 loops=1)
Join Filter: (t_s_firstyear.customer_id = t_w_firstyear.customer_id)
Rows Removed by Join Filter: 112695277
-> Nested Loop (cost=0.00..277302.08 rows=9 width=104) (actual time=8139.729..2351733.795 rows=9952 loops=1)
Join Filter: (t_s_firstyear.customer_id = t_c_firstyear.customer_id)
Rows Removed by Join Filter: 997895870
-> CTE Scan on year_total t_s_firstyear (cost=0.00..138631.41 rows=42 width=52) (actual time=4126.046..4234.598 rows=37923 loops=1)
Filter: ((year_total > '0'::numeric) AND (sale_type = 's'::text) AND (dyear = 2001))
Rows Removed by Filter: 346285
-> CTE Scan on year_total t_c_firstyear (cost=0.00..138631.41 rows=42 width=52) (actual time=28.926..60.356 rows=26314 loops=37923)
Filter: ((year_total > '0'::numeric) AND (sale_type = 'c'::text) AND (dyear = 2001))
Rows Removed by Filter: 357894
-> CTE Scan on year_total t_w_firstyear (cost=0.00..138631.41 rows=42 width=52) (actual time=49.572..59.057 rows=11324 loops=9952)
Filter: ((year_total > '0'::numeric) AND (sale_type = 'w'::text) AND (dyear = 2001))
Rows Removed by Filter: 372884
-> CTE Scan on year_total t_s_secyear (cost=0.00..126028.55 rows=126 width=164) (actual time=0.002..44.949 rows=38175 loops=1171)
Filter: ((sale_type = 's'::text) AND (dyear = 2002))
Rows Removed by Filter: 346033
-> CTE Scan on year_total t_c_secyear (cost=0.00..126028.55 rows=126 width=52) (actual time=21.023..44.097 rows=27177 loops=437)
Filter: ((sale_type = 'c'::text) AND (dyear = 2002))
Rows Removed by Filter: 357031
-> CTE Scan on year_total t_w_secyear (cost=0.00..126028.55 rows=126 width=52) (actual time=36.137..43.090 rows=11252 loops=72)
Filter: ((sale_type = 'w'::text) AND (dyear = 2002))
Rows Removed by Filter: 372956
Planning Time: 4.529 ms
Execution Time: 3024486.695 ms
(83 rows)
If we disable this checking:
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index c42742d2c7..d47d0f0e59 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -555,16 +555,6 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
}
break;
case COSTS_BETTER2:
- if (keyscmp != PATHKEYS_BETTER1)
- {
- outercmp = bms_subset_compare(PATH_REQ_OUTER(new_path),
- PATH_REQ_OUTER(old_path));
- if ((outercmp == BMS_EQUAL ||
- outercmp == BMS_SUBSET2) &&
- new_path->rows >= old_path->rows &&
- new_path->parallel_safe <= old_path->parallel_safe)
- accept_new = false; /* old dominates new */
- }
break;
case COSTS_DIFFERENT:
The execution time is reduced to around 8 seconds:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1255392.10..1255392.10 rows=1 width=132) (actual time=8068.870..8068.887 rows=8 loops=1)
CTE year_total
-> Append (cost=197441.91..461358.17 rows=5041411 width=216) (actual time=4004.918..7547.884 rows=384208 loops=1)
-> HashAggregate (cost=197441.91..233447.56 rows=2880452 width=216) (actual time=4004.917..4112.220 rows=190581 loops=1)
Group Key: customer.c_customer_id, customer.c_first_name, customer.c_last_name, customer.c_preferred_cust_flag, customer.c_birth_country, customer.c_login, customer.c_email_address, date_dim.d_year
Batches: 1 Memory Usage: 213017kB
-> Hash Join (cost=8151.60..103827.22 rows=2880452 width=174) (actual time=73.291..1594.391 rows=2685453 loops=1)
Hash Cond: (store_sales.ss_sold_date_sk = date_dim.d_date_sk)
-> Hash Join (cost=5103.00..93217.01 rows=2880452 width=174) (actual time=52.407..1100.642 rows=2750652 loops=1)
Hash Cond: (store_sales.ss_customer_sk = customer.c_customer_sk)
-> Seq Scan on store_sales (cost=0.00..80552.52 rows=2880452 width=30) (actual time=0.017..213.629 rows=2880404 loops=1)
-> Hash (cost=3853.00..3853.00 rows=100000 width=152) (actual time=52.083..52.083 rows=100000 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 17161kB
-> Seq Scan on customer (cost=0.00..3853.00 rows=100000 width=152) (actual time=0.013..28.104 rows=100000 loops=1)
-> Hash (cost=2135.49..2135.49 rows=73049 width=8) (actual time=20.589..20.592 rows=73049 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 3878kB
-> Seq Scan on date_dim (cost=0.00..2135.49 rows=73049 width=8) (actual time=0.036..12.464 rows=73049 loops=1)
-> HashAggregate (cost=114414.19..132433.87 rows=1441575 width=216) (actual time=2186.304..2263.365 rows=136978 loops=1)
Group Key: customer_1.c_customer_id, customer_1.c_first_name, customer_1.c_last_name, customer_1.c_preferred_cust_flag, customer_1.c_birth_country, customer_1.c_login, customer_1.c_email_address, date_dim_1.d_year
Batches: 1 Memory Usage: 131097kB
-> Hash Join (cost=8151.60..67563.00 rows=1441575 width=177) (actual time=63.765..880.282 rows=1430939 loops=1)
Hash Cond: (catalog_sales.cs_sold_date_sk = date_dim_1.d_date_sk)
-> Hash Join (cost=5103.00..60730.04 rows=1441575 width=177) (actual time=47.735..619.585 rows=1434519 loops=1)
Hash Cond: (catalog_sales.cs_bill_customer_sk = customer_1.c_customer_sk)
-> Seq Scan on catalog_sales (cost=0.00..51842.75 rows=1441575 width=33) (actual time=0.033..121.738 rows=1441548 loops=1)
-> Hash (cost=3853.00..3853.00 rows=100000 width=152) (actual time=47.433..47.433 rows=100000 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 17161kB
-> Seq Scan on customer customer_1 (cost=0.00..3853.00 rows=100000 width=152) (actual time=0.004..23.960 rows=100000 loops=1)
-> Hash (cost=2135.49..2135.49 rows=73049 width=8) (actual time=15.755..15.756 rows=73049 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 3878kB
-> Seq Scan on date_dim date_dim_1 (cost=0.00..2135.49 rows=73049 width=8) (actual time=0.017..8.030 rows=73049 loops=1)
-> HashAggregate (cost=61277.38..70269.68 rows=719384 width=216) (actual time=1121.896..1152.982 rows=56649 loops=1)
Group Key: customer_2.c_customer_id, customer_2.c_first_name, customer_2.c_last_name, customer_2.c_preferred_cust_flag, customer_2.c_birth_country, customer_2.c_login, customer_2.c_email_address, date_dim_2.d_year
Batches: 1 Memory Usage: 57369kB
-> Hash Join (cost=8151.60..37897.40 rows=719384 width=177) (actual time=63.239..470.700 rows=719119 loops=1)
Hash Cond: (web_sales.ws_sold_date_sk = date_dim_2.d_date_sk)
-> Hash Join (cost=5103.00..32960.30 rows=719384 width=177) (actual time=47.365..329.499 rows=719217 loops=1)
Hash Cond: (web_sales.ws_bill_customer_sk = customer_2.c_customer_sk)
-> Seq Scan on web_sales (cost=0.00..25968.84 rows=719384 width=33) (actual time=0.036..60.068 rows=719384 loops=1)
-> Hash (cost=3853.00..3853.00 rows=100000 width=152) (actual time=47.076..47.078 rows=100000 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 17161kB
-> Seq Scan on customer customer_2 (cost=0.00..3853.00 rows=100000 width=152) (actual time=0.005..23.909 rows=100000 loops=1)
-> Hash (cost=2135.49..2135.49 rows=73049 width=8) (actual time=15.615..15.615 rows=73049 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 3878kB
-> Seq Scan on date_dim date_dim_2 (cost=0.00..2135.49 rows=73049 width=8) (actual time=0.019..8.060 rows=73049 loops=1)
-> Sort (cost=794033.93..794033.94 rows=1 width=132) (actual time=8068.869..8068.872 rows=8 loops=1)
Sort Key: t_s_secyear.customer_id, t_s_secyear.customer_first_name, t_s_secyear.customer_last_name, t_s_secyear.customer_email_address
Sort Method: quicksort Memory: 26kB
-> Hash Join (cost=794032.91..794033.92 rows=1 width=132) (actual time=8061.079..8068.861 rows=8 loops=1)
Hash Cond: (t_s_secyear.customer_id = t_s_firstyear.customer_id)
Join Filter: ((CASE WHEN (t_c_firstyear.year_total > '0'::numeric) THEN (t_c_secyear.year_total / t_c_firstyear.year_total) ELSE NULL::numeric END > CASE WHEN (t_s_firstyear.year_total > '0'::numeric) THEN (t_s_secyear.year_total / t_s_firstyear.year_total) ELSE NULL::numeric END) AND (CASE WHEN (t_c_firstyear.year_total > '0'::numeric) THEN (t_c_secyear.year_total / t_c_firstyear.year_total) ELSE NULL::numeric END > CASE WHEN (t_w_firstyear.year_total > '0'::numeric) THEN (t_w_secyear.year_total / t_w_firstyear.year_total) ELSE NULL::numeric END))
Rows Removed by Join Filter: 11
-> Merge Join (cost=390716.09..390717.01 rows=16 width=268) (actual time=7877.494..7885.971 rows=2793 loops=1)
Merge Cond: (t_s_secyear.customer_id = t_c_secyear.customer_id)
-> Sort (cost=126039.67..126039.99 rows=126 width=164) (actual time=7764.696..7768.885 rows=38175 loops=1)
Sort Key: t_s_secyear.customer_id
Sort Method: quicksort Memory: 7021kB
-> CTE Scan on year_total t_s_secyear (cost=0.00..126035.28 rows=126 width=164) (actual time=4004.934..7746.174 rows=38175 loops=1)
Filter: ((sale_type = 's'::text) AND (dyear = 2002))
Rows Removed by Filter: 346033
-> Sort (cost=264676.42..264676.48 rows=26 width=104) (actual time=112.790..113.328 rows=7188 loops=1)
Sort Key: t_c_firstyear.customer_id
Sort Method: quicksort Memory: 687kB
-> Hash Join (cost=138639.33..264675.81 rows=26 width=104) (actual time=83.825..109.977 rows=7188 loops=1)
Hash Cond: (t_c_secyear.customer_id = t_c_firstyear.customer_id)
-> CTE Scan on year_total t_c_secyear (cost=0.00..126035.28 rows=126 width=52) (actual time=20.337..42.859 rows=27177 loops=1)
Filter: ((sale_type = 'c'::text) AND (dyear = 2002))
Rows Removed by Filter: 357031
-> Hash (cost=138638.80..138638.80 rows=42 width=52) (actual time=63.460..63.461 rows=26314 loops=1)
Buckets: 32768 (originally 1024) Batches: 1 (originally 1) Memory Usage: 1725kB
-> CTE Scan on year_total t_c_firstyear (cost=0.00..138638.80 rows=42 width=52) (actual time=28.367..59.710 rows=26314 loops=1)
Filter: ((year_total > '0'::numeric) AND (sale_type = 'c'::text) AND (dyear = 2001))
Rows Removed by Filter: 357894
-> Hash (cost=403316.74..403316.74 rows=6 width=156) (actual time=182.649..182.650 rows=479 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 59kB
-> Merge Join (cost=403316.35..403316.74 rows=6 width=156) (actual time=177.380..182.540 rows=479 loops=1)
Merge Cond: (t_s_firstyear.customer_id = t_w_firstyear.customer_id)
-> Sort (cost=138639.93..138640.04 rows=42 width=52) (actual time=75.347..77.869 rows=37909 loops=1)
Sort Key: t_s_firstyear.customer_id
Sort Method: quicksort Memory: 3273kB
-> CTE Scan on year_total t_s_firstyear (cost=0.00..138638.80 rows=42 width=52) (actual time=0.002..60.630 rows=37923 loops=1)
Filter: ((year_total > '0'::numeric) AND (sale_type = 's'::text) AND (dyear = 2001))
Rows Removed by Filter: 346285
-> Sort (cost=264676.42..264676.48 rows=26 width=104) (actual time=102.001..102.053 rows=1306 loops=1)
Sort Key: t_w_firstyear.customer_id
Sort Method: quicksort Memory: 139kB
-> Hash Join (cost=138639.33..264675.81 rows=26 width=104) (actual time=93.653..101.516 rows=1306 loops=1)
Hash Cond: (t_w_secyear.customer_id = t_w_firstyear.customer_id)
-> CTE Scan on year_total t_w_secyear (cost=0.00..126035.28 rows=126 width=52) (actual time=33.759..40.504 rows=11252 loops=1)
Filter: ((sale_type = 'w'::text) AND (dyear = 2002))
Rows Removed by Filter: 372956
-> Hash (cost=138638.80..138638.80 rows=42 width=52) (actual time=59.875..59.875 rows=11324 loops=1)
Buckets: 16384 (originally 1024) Batches: 1 (originally 1) Memory Usage: 763kB
-> CTE Scan on year_total t_w_firstyear (cost=0.00..138638.80 rows=42 width=52) (actual time=48.632..58.272 rows=11324 loops=1)
Filter: ((year_total > '0'::numeric) AND (sale_type = 'w'::text) AND (dyear = 2001))
Rows Removed by Filter: 372884
Planning Time: 6.118 ms
Execution Time: 8179.205 ms
(98 rows)
The execution time is reduced by 99%. Similarly to my previous reports, I wonder if we can optimize the logic somewhere to improve the performance.
Environment:
The default configurations of PostgreSQL incur the error: "ERROR: could not resize shared memory segment "/PostgreSQL.3539600020" to 2097152 bytes: No space left on device"
Therefore, to run this query, I set up "work_mem = 1GB"
For the benchmark, I used 1 GB data, and my entire data folder can be downloaded here: https://drive.google.com/file/d/1iK5gfyKudfn2BczpoZbNRY_IAD_rITZu/view?usp=sharing
The connection string is: postgresql://ubuntu:ubuntu@127.0.0.1:5432/tpcds"
tpch=# select version();
version
--------------------------------------------------------------------------------------------------
PostgreSQL 17.0 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 13.2.0-23ubuntu4) 13.2.0, 64-bit
(1 row)
Best regards,
Jinsheng Ba
On Mon, Nov 11, 2024 at 8:36 AM Ba Jinsheng <bajinsheng@u.nus.edu> wrote: > The default configurations of PostgreSQL incur the error: "ERROR: could not resize shared memory segment "/PostgreSQL.3539600020"to 2097152 bytes: No space left on device" No comment on your optimiser experiments for now, but for this error: it reminds me of a low/default --shm-size limit from Docker, or other similar container stuff?
Hi!
On 10.11.2024 22:35, Ba Jinsheng wrote:
I have noticed significant underestimation here. In addition, these nodes are executed noticeably slower in the plan, you can see it by the current time in the explain.P {margin-top:0;margin-bottom:0;} Hi all,Please see this case:Query 4 on TPC-DS benchmark:with year_total as (select c_customer_id customer_id,c_first_name customer_first_name,c_last_name customer_last_name,c_preferred_cust_flag customer_preferred_cust_flag,c_birth_country customer_birth_country,c_login customer_login,c_email_address customer_email_address,d_year dyear,sum(((ss_ext_list_price-ss_ext_wholesale_cost-ss_ext_discount_amt)+ss_ext_sales_price)/2) year_total,'s' sale_typefrom customer,store_sales,date_dimwhere c_customer_sk = ss_customer_skand ss_sold_date_sk = d_date_skgroup by c_customer_id,c_first_name,c_last_name,c_preferred_cust_flag,c_birth_country,c_login,c_email_address,d_yearunion allselect c_customer_id customer_id,c_first_name customer_first_name,c_last_name customer_last_name,c_preferred_cust_flag customer_preferred_cust_flag,c_birth_country customer_birth_country,c_login customer_login,c_email_address customer_email_address,d_year dyear,sum((((cs_ext_list_price-cs_ext_wholesale_cost-cs_ext_discount_amt)+cs_ext_sales_price)/2) ) year_total,'c' sale_typefrom customer,catalog_sales,date_dimwhere c_customer_sk = cs_bill_customer_skand cs_sold_date_sk = d_date_skgroup by c_customer_id,c_first_name,c_last_name,c_preferred_cust_flag,c_birth_country,c_login,c_email_address,d_yearunion allselect c_customer_id customer_id,c_first_name customer_first_name,c_last_name customer_last_name,c_preferred_cust_flag customer_preferred_cust_flag,c_birth_country customer_birth_country,c_login customer_login,c_email_address customer_email_address,d_year dyear,sum((((ws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt)+ws_ext_sales_price)/2) ) year_total,'w' sale_typefrom customer,web_sales,date_dimwhere c_customer_sk = ws_bill_customer_skand ws_sold_date_sk = d_date_skgroup by c_customer_id,c_first_name,c_last_name,c_preferred_cust_flag,c_birth_country,c_login,c_email_address,d_year)selectt_s_secyear.customer_id,t_s_secyear.customer_first_name,t_s_secyear.customer_last_name,t_s_secyear.customer_email_addressfrom year_total t_s_firstyear,year_total t_s_secyear,year_total t_c_firstyear,year_total t_c_secyear,year_total t_w_firstyear,year_total t_w_secyearwhere t_s_secyear.customer_id = t_s_firstyear.customer_idand t_s_firstyear.customer_id = t_c_secyear.customer_idand t_s_firstyear.customer_id = t_c_firstyear.customer_idand t_s_firstyear.customer_id = t_w_firstyear.customer_idand t_s_firstyear.customer_id = t_w_secyear.customer_idand t_s_firstyear.sale_type = 's'and t_c_firstyear.sale_type = 'c'and t_w_firstyear.sale_type = 'w'and t_s_secyear.sale_type = 's'and t_c_secyear.sale_type = 'c'and t_w_secyear.sale_type = 'w'and t_s_firstyear.dyear = 2001and t_s_secyear.dyear = 2001+1and t_c_firstyear.dyear = 2001and t_c_secyear.dyear = 2001+1and t_w_firstyear.dyear = 2001and t_w_secyear.dyear = 2001+1and t_s_firstyear.year_total > 0and t_c_firstyear.year_total > 0and t_w_firstyear.year_total > 0and case when t_c_firstyear.year_total > 0 then t_c_secyear.year_total / t_c_firstyear.year_total else null end> case when t_s_firstyear.year_total > 0 then t_s_secyear.year_total / t_s_firstyear.year_total else null endand case when t_c_firstyear.year_total > 0 then t_c_secyear.year_total / t_c_firstyear.year_total else null end> case when t_w_firstyear.year_total > 0 then t_w_secyear.year_total / t_w_firstyear.year_total else null endorder by t_s_secyear.customer_id,t_s_secyear.customer_first_name,t_s_secyear.customer_last_name,t_s_secyear.customer_email_addresslimit 100;The execution time is more than 50 minutes:QUERY PLAN------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Limit (cost=1255378.56..1255378.57 rows=1 width=132) (actual time=3024403.311..3024403.342 rows=8 loops=1)CTE year_total-> Append (cost=197433.23..461340.62 rows=5041142 width=216) (actual time=4126.043..7897.747 rows=384208 loops=1)-> HashAggregate (cost=197433.23..233436.60 rows=2880270 width=216) (actual time=4126.042..4231.703 rows=190581 loops=1)Group Key: customer.c_customer_id, customer.c_first_name, customer.c_last_name, customer.c_preferred_cust_flag, customer.c_birth_country, customer.c_login, customer.c_email_address, date_dim.d_yearBatches: 1 Memory Usage: 213017kB-> Hash Join (cost=8151.60..103824.45 rows=2880270 width=174) (actual time=69.110..1686.608 rows=2685453 loops=1)Hash Cond: (store_sales.ss_sold_date_sk = date_dim.d_date_sk)-> Hash Join (cost=5103.00..93214.72 rows=2880270 width=174) (actual time=49.517..1162.567 rows=2750652 loops=1)Hash Cond: (store_sales.ss_customer_sk = customer.c_customer_sk)-> Seq Scan on store_sales (cost=0.00..80550.70 rows=2880270 width=30) (actual time=0.018..208.022 rows=2880404 loops=1)-> Hash (cost=3853.00..3853.00 rows=100000 width=152) (actual time=49.271..49.271 rows=100000 loops=1)Buckets: 131072 Batches: 1 Memory Usage: 17161kB-> Seq Scan on customer (cost=0.00..3853.00 rows=100000 width=152) (actual time=0.011..26.448 rows=100000 loops=1)-> Hash (cost=2135.49..2135.49 rows=73049 width=8) (actual time=19.369..19.370 rows=73049 loops=1)Buckets: 131072 Batches: 1 Memory Usage: 3878kB-> Seq Scan on date_dim (cost=0.00..2135.49 rows=73049 width=8) (actual time=0.037..11.763 rows=73049 loops=1)-> HashAggregate (cost=114410.03..132428.63 rows=1441488 width=216) (actual time=2369.202..2447.868 rows=136978 loops=1)Group Key: customer_1.c_customer_id, customer_1.c_first_name, customer_1.c_last_name, customer_1.c_preferred_cust_flag, customer_1.c_birth_country, customer_1.c_login, customer_1.c_email_address, date_dim_1.d_yearBatches: 1 Memory Usage: 131097kB-> Hash Join (cost=8151.60..67561.67 rows=1441488 width=177) (actual time=62.483..974.143 rows=1430939 loops=1)Hash Cond: (catalog_sales.cs_sold_date_sk = date_dim_1.d_date_sk)-> Hash Join (cost=5103.00..60728.94 rows=1441488 width=177) (actual time=46.571..687.972 rows=1434519 loops=1)Hash Cond: (catalog_sales.cs_bill_customer_sk = customer_1.c_customer_sk)-> Seq Scan on catalog_sales (cost=0.00..51841.88 rows=1441488 width=33) (actual time=0.029..128.238 rows=1441548 loops=1)-> Hash (cost=3853.00..3853.00 rows=100000 width=152) (actual time=46.311..46.325 rows=100000 loops=1)Buckets: 131072 Batches: 1 Memory Usage: 17161kB-> Seq Scan on customer customer_1 (cost=0.00..3853.00 rows=100000 width=152) (actual time=0.005..23.350 rows=100000 loops=1)-> Hash (cost=2135.49..2135.49 rows=73049 width=8) (actual time=15.677..15.677 rows=73049 loops=1)Buckets: 131072 Batches: 1 Memory Usage: 3878kB-> Seq Scan on date_dim date_dim_1 (cost=0.00..2135.49 rows=73049 width=8) (actual time=0.015..7.957 rows=73049 loops=1)-> HashAggregate (cost=61277.38..70269.68 rows=719384 width=216) (actual time=1166.953..1198.730 rows=56649 loops=1)Group Key: customer_2.c_customer_id, customer_2.c_first_name, customer_2.c_last_name, customer_2.c_preferred_cust_flag, customer_2.c_birth_country, customer_2.c_login, customer_2.c_email_address, date_dim_2.d_yearBatches: 1 Memory Usage: 57369kB-> Hash Join (cost=8151.60..37897.40 rows=719384 width=177) (actual time=68.327..508.594 rows=719119 loops=1)Hash Cond: (web_sales.ws_sold_date_sk = date_dim_2.d_date_sk)-> Hash Join (cost=5103.00..32960.30 rows=719384 width=177) (actual time=52.240..357.963 rows=719217 loops=1)Hash Cond: (web_sales.ws_bill_customer_sk = customer_2.c_customer_sk)-> Seq Scan on web_sales (cost=0.00..25968.84 rows=719384 width=33) (actual time=0.032..62.464 rows=719384 loops=1)-> Hash (cost=3853.00..3853.00 rows=100000 width=152) (actual time=51.959..51.960 rows=100000 loops=1)Buckets: 131072 Batches: 1 Memory Usage: 17161kB-> Seq Scan on customer customer_2 (cost=0.00..3853.00 rows=100000 width=152) (actual time=0.004..25.350 rows=100000 loops=1)-> Hash (cost=2135.49..2135.49 rows=73049 width=8) (actual time=15.831..15.834 rows=73049 loops=1)Buckets: 131072 Batches: 1 Memory Usage: 3878kB-> Seq Scan on date_dim date_dim_2 (cost=0.00..2135.49 rows=73049 width=8) (actual time=0.014..8.100 rows=73049 loops=1)-> Sort (cost=794037.94..794037.95 rows=1 width=132) (actual time=3024403.310..3024403.313 rows=8 loops=1)Sort Key: t_s_secyear.customer_id, t_s_secyear.customer_first_name, t_s_secyear.customer_last_name, t_s_secyear.customer_email_addressSort Method: quicksort Memory: 26kB-> Nested Loop (cost=0.00..794037.93 rows=1 width=132) (actual time=354851.431..3024403.218 rows=8 loops=1)Join Filter: ((t_s_secyear.customer_id = t_w_secyear.customer_id) AND (CASE WHEN (t_c_firstyear.year_total > '0'::numeric) THEN (t_c_secyear.year_total / t_c_firstyear.year_total) ELSE NULL::numeric END > CASE WHEN (t_w_firstyear.year_total > '0'::numeric) THEN (t_w_secyear.year_total / t_w_firstyear.year_total) ELSE NULL::numeric END))Rows Removed by Join Filter: 810136-> Nested Loop (cost=0.00..668006.23 rows=1 width=308) (actual time=33554.075..3021248.646 rows=72 loops=1)Join Filter: ((t_s_secyear.customer_id = t_c_secyear.customer_id) AND (CASE WHEN (t_c_firstyear.year_total > '0'::numeric) THEN (t_c_secyear.year_total / t_c_firstyear.year_total) ELSE NULL::numeric END > CASE WHEN (t_s_firstyear.year_total > '0'::numeric) THEN (t_s_secyear.year_total / t_s_firstyear.year_total) ELSE NULL::numeric END))Rows Removed by Join Filter: 11876277-> Nested Loop (cost=0.00..541974.53 rows=1 width=320) (actual time=14866.104..3001271.961 rows=437 loops=1)Join Filter: (t_s_firstyear.customer_id = t_s_secyear.customer_id)Rows Removed by Join Filter: 44702488-> Nested Loop (cost=0.00..415941.57 rows=2 width=156) (actual time=11739.944..2946020.749 rows=1171 loops=1)Join Filter: (t_s_firstyear.customer_id = t_w_firstyear.customer_id)Rows Removed by Join Filter: 112695277-> Nested Loop (cost=0.00..277302.08 rows=9 width=104) (actual time=8139.729..2351733.795 rows=9952 loops=1)Join Filter: (t_s_firstyear.customer_id = t_c_firstyear.customer_id)Rows Removed by Join Filter: 997895870-> CTE Scan on year_total t_s_firstyear (cost=0.00..138631.41 rows=42 width=52) (actual time=4126.046..4234.598 rows=37923 loops=1)Filter: ((year_total > '0'::numeric) AND (sale_type = 's'::text) AND (dyear = 2001))Rows Removed by Filter: 346285-> CTE Scan on year_total t_c_firstyear (cost=0.00..138631.41 rows=42 width=52) (actual time=28.926..60.356 rows=26314 loops=37923)Filter: ((year_total > '0'::numeric) AND (sale_type = 'c'::text) AND (dyear = 2001))Rows Removed by Filter: 357894-> CTE Scan on year_total t_w_firstyear (cost=0.00..138631.41 rows=42 width=52) (actual time=49.572..59.057 rows=11324 loops=9952)Filter: ((year_total > '0'::numeric) AND (sale_type = 'w'::text) AND (dyear = 2001))Rows Removed by Filter: 372884-> CTE Scan on year_total t_s_secyear (cost=0.00..126028.55 rows=126 width=164) (actual time=0.002..44.949 rows=38175 loops=1171)Filter: ((sale_type = 's'::text) AND (dyear = 2002))Rows Removed by Filter: 346033-> CTE Scan on year_total t_c_secyear (cost=0.00..126028.55 rows=126 width=52) (actual time=21.023..44.097 rows=27177 loops=437)Filter: ((sale_type = 'c'::text) AND (dyear = 2002))Rows Removed by Filter: 357031-> CTE Scan on year_total t_w_secyear (cost=0.00..126028.55 rows=126 width=52) (actual time=36.137..43.090 rows=11252 loops=72)Filter: ((sale_type = 'w'::text) AND (dyear = 2002))Rows Removed by Filter: 372956Planning Time: 4.529 msExecution Time: 3024486.695 ms(83 rows)
Have you tried any tools to improve the cardinality yet, like aqo [0]?
[0] https://github.com/postgrespro/aqo
-- Regards, Alena Rybakina Postgres Professional
On 10.11.2024 23:16, Alena Rybakina wrote:
Sorry, I meant actual time, not current time.Hi!
On 10.11.2024 22:35, Ba Jinsheng wrote:I have noticed significant underestimation here. In addition, these nodes are executed noticeably slower in the plan, you can see it by the current time in the explain.P {margin-top:0;margin-bottom:0;} Hi all,Please see this case:Query 4 on TPC-DS benchmark:with year_total as (select c_customer_id customer_id,c_first_name customer_first_name,c_last_name customer_last_name,c_preferred_cust_flag customer_preferred_cust_flag,c_birth_country customer_birth_country,c_login customer_login,c_email_address customer_email_address,d_year dyear,sum(((ss_ext_list_price-ss_ext_wholesale_cost-ss_ext_discount_amt)+ss_ext_sales_price)/2) year_total,'s' sale_typefrom customer,store_sales,date_dimwhere c_customer_sk = ss_customer_skand ss_sold_date_sk = d_date_skgroup by c_customer_id,c_first_name,c_last_name,c_preferred_cust_flag,c_birth_country,c_login,c_email_address,d_yearunion allselect c_customer_id customer_id,c_first_name customer_first_name,c_last_name customer_last_name,c_preferred_cust_flag customer_preferred_cust_flag,c_birth_country customer_birth_country,c_login customer_login,c_email_address customer_email_address,d_year dyear,sum((((cs_ext_list_price-cs_ext_wholesale_cost-cs_ext_discount_amt)+cs_ext_sales_price)/2) ) year_total,'c' sale_typefrom customer,catalog_sales,date_dimwhere c_customer_sk = cs_bill_customer_skand cs_sold_date_sk = d_date_skgroup by c_customer_id,c_first_name,c_last_name,c_preferred_cust_flag,c_birth_country,c_login,c_email_address,d_yearunion allselect c_customer_id customer_id,c_first_name customer_first_name,c_last_name customer_last_name,c_preferred_cust_flag customer_preferred_cust_flag,c_birth_country customer_birth_country,c_login customer_login,c_email_address customer_email_address,d_year dyear,sum((((ws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt)+ws_ext_sales_price)/2) ) year_total,'w' sale_typefrom customer,web_sales,date_dimwhere c_customer_sk = ws_bill_customer_skand ws_sold_date_sk = d_date_skgroup by c_customer_id,c_first_name,c_last_name,c_preferred_cust_flag,c_birth_country,c_login,c_email_address,d_year)selectt_s_secyear.customer_id,t_s_secyear.customer_first_name,t_s_secyear.customer_last_name,t_s_secyear.customer_email_addressfrom year_total t_s_firstyear,year_total t_s_secyear,year_total t_c_firstyear,year_total t_c_secyear,year_total t_w_firstyear,year_total t_w_secyearwhere t_s_secyear.customer_id = t_s_firstyear.customer_idand t_s_firstyear.customer_id = t_c_secyear.customer_idand t_s_firstyear.customer_id = t_c_firstyear.customer_idand t_s_firstyear.customer_id = t_w_firstyear.customer_idand t_s_firstyear.customer_id = t_w_secyear.customer_idand t_s_firstyear.sale_type = 's'and t_c_firstyear.sale_type = 'c'and t_w_firstyear.sale_type = 'w'and t_s_secyear.sale_type = 's'and t_c_secyear.sale_type = 'c'and t_w_secyear.sale_type = 'w'and t_s_firstyear.dyear = 2001and t_s_secyear.dyear = 2001+1and t_c_firstyear.dyear = 2001and t_c_secyear.dyear = 2001+1and t_w_firstyear.dyear = 2001and t_w_secyear.dyear = 2001+1and t_s_firstyear.year_total > 0and t_c_firstyear.year_total > 0and t_w_firstyear.year_total > 0and case when t_c_firstyear.year_total > 0 then t_c_secyear.year_total / t_c_firstyear.year_total else null end> case when t_s_firstyear.year_total > 0 then t_s_secyear.year_total / t_s_firstyear.year_total else null endand case when t_c_firstyear.year_total > 0 then t_c_secyear.year_total / t_c_firstyear.year_total else null end> case when t_w_firstyear.year_total > 0 then t_w_secyear.year_total / t_w_firstyear.year_total else null endorder by t_s_secyear.customer_id,t_s_secyear.customer_first_name,t_s_secyear.customer_last_name,t_s_secyear.customer_email_addresslimit 100;The execution time is more than 50 minutes:QUERY PLAN------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Limit (cost=1255378.56..1255378.57 rows=1 width=132) (actual time=3024403.311..3024403.342 rows=8 loops=1)CTE year_total-> Append (cost=197433.23..461340.62 rows=5041142 width=216) (actual time=4126.043..7897.747 rows=384208 loops=1)-> HashAggregate (cost=197433.23..233436.60 rows=2880270 width=216) (actual time=4126.042..4231.703 rows=190581 loops=1)Group Key: customer.c_customer_id, customer.c_first_name, customer.c_last_name, customer.c_preferred_cust_flag, customer.c_birth_country, customer.c_login, customer.c_email_address, date_dim.d_yearBatches: 1 Memory Usage: 213017kB-> Hash Join (cost=8151.60..103824.45 rows=2880270 width=174) (actual time=69.110..1686.608 rows=2685453 loops=1)Hash Cond: (store_sales.ss_sold_date_sk = date_dim.d_date_sk)-> Hash Join (cost=5103.00..93214.72 rows=2880270 width=174) (actual time=49.517..1162.567 rows=2750652 loops=1)Hash Cond: (store_sales.ss_customer_sk = customer.c_customer_sk)-> Seq Scan on store_sales (cost=0.00..80550.70 rows=2880270 width=30) (actual time=0.018..208.022 rows=2880404 loops=1)-> Hash (cost=3853.00..3853.00 rows=100000 width=152) (actual time=49.271..49.271 rows=100000 loops=1)Buckets: 131072 Batches: 1 Memory Usage: 17161kB-> Seq Scan on customer (cost=0.00..3853.00 rows=100000 width=152) (actual time=0.011..26.448 rows=100000 loops=1)-> Hash (cost=2135.49..2135.49 rows=73049 width=8) (actual time=19.369..19.370 rows=73049 loops=1)Buckets: 131072 Batches: 1 Memory Usage: 3878kB-> Seq Scan on date_dim (cost=0.00..2135.49 rows=73049 width=8) (actual time=0.037..11.763 rows=73049 loops=1)-> HashAggregate (cost=114410.03..132428.63 rows=1441488 width=216) (actual time=2369.202..2447.868 rows=136978 loops=1)Group Key: customer_1.c_customer_id, customer_1.c_first_name, customer_1.c_last_name, customer_1.c_preferred_cust_flag, customer_1.c_birth_country, customer_1.c_login, customer_1.c_email_address, date_dim_1.d_yearBatches: 1 Memory Usage: 131097kB-> Hash Join (cost=8151.60..67561.67 rows=1441488 width=177) (actual time=62.483..974.143 rows=1430939 loops=1)Hash Cond: (catalog_sales.cs_sold_date_sk = date_dim_1.d_date_sk)-> Hash Join (cost=5103.00..60728.94 rows=1441488 width=177) (actual time=46.571..687.972 rows=1434519 loops=1)Hash Cond: (catalog_sales.cs_bill_customer_sk = customer_1.c_customer_sk)-> Seq Scan on catalog_sales (cost=0.00..51841.88 rows=1441488 width=33) (actual time=0.029..128.238 rows=1441548 loops=1)-> Hash (cost=3853.00..3853.00 rows=100000 width=152) (actual time=46.311..46.325 rows=100000 loops=1)Buckets: 131072 Batches: 1 Memory Usage: 17161kB-> Seq Scan on customer customer_1 (cost=0.00..3853.00 rows=100000 width=152) (actual time=0.005..23.350 rows=100000 loops=1)-> Hash (cost=2135.49..2135.49 rows=73049 width=8) (actual time=15.677..15.677 rows=73049 loops=1)Buckets: 131072 Batches: 1 Memory Usage: 3878kB-> Seq Scan on date_dim date_dim_1 (cost=0.00..2135.49 rows=73049 width=8) (actual time=0.015..7.957 rows=73049 loops=1)-> HashAggregate (cost=61277.38..70269.68 rows=719384 width=216) (actual time=1166.953..1198.730 rows=56649 loops=1)Group Key: customer_2.c_customer_id, customer_2.c_first_name, customer_2.c_last_name, customer_2.c_preferred_cust_flag, customer_2.c_birth_country, customer_2.c_login, customer_2.c_email_address, date_dim_2.d_yearBatches: 1 Memory Usage: 57369kB-> Hash Join (cost=8151.60..37897.40 rows=719384 width=177) (actual time=68.327..508.594 rows=719119 loops=1)Hash Cond: (web_sales.ws_sold_date_sk = date_dim_2.d_date_sk)-> Hash Join (cost=5103.00..32960.30 rows=719384 width=177) (actual time=52.240..357.963 rows=719217 loops=1)Hash Cond: (web_sales.ws_bill_customer_sk = customer_2.c_customer_sk)-> Seq Scan on web_sales (cost=0.00..25968.84 rows=719384 width=33) (actual time=0.032..62.464 rows=719384 loops=1)-> Hash (cost=3853.00..3853.00 rows=100000 width=152) (actual time=51.959..51.960 rows=100000 loops=1)Buckets: 131072 Batches: 1 Memory Usage: 17161kB-> Seq Scan on customer customer_2 (cost=0.00..3853.00 rows=100000 width=152) (actual time=0.004..25.350 rows=100000 loops=1)-> Hash (cost=2135.49..2135.49 rows=73049 width=8) (actual time=15.831..15.834 rows=73049 loops=1)Buckets: 131072 Batches: 1 Memory Usage: 3878kB-> Seq Scan on date_dim date_dim_2 (cost=0.00..2135.49 rows=73049 width=8) (actual time=0.014..8.100 rows=73049 loops=1)-> Sort (cost=794037.94..794037.95 rows=1 width=132) (actual time=3024403.310..3024403.313 rows=8 loops=1)Sort Key: t_s_secyear.customer_id, t_s_secyear.customer_first_name, t_s_secyear.customer_last_name, t_s_secyear.customer_email_addressSort Method: quicksort Memory: 26kB-> Nested Loop (cost=0.00..794037.93 rows=1 width=132) (actual time=354851.431..3024403.218 rows=8 loops=1)Join Filter: ((t_s_secyear.customer_id = t_w_secyear.customer_id) AND (CASE WHEN (t_c_firstyear.year_total > '0'::numeric) THEN (t_c_secyear.year_total / t_c_firstyear.year_total) ELSE NULL::numeric END > CASE WHEN (t_w_firstyear.year_total > '0'::numeric) THEN (t_w_secyear.year_total / t_w_firstyear.year_total) ELSE NULL::numeric END))Rows Removed by Join Filter: 810136-> Nested Loop (cost=0.00..668006.23 rows=1 width=308) (actual time=33554.075..3021248.646 rows=72 loops=1)Join Filter: ((t_s_secyear.customer_id = t_c_secyear.customer_id) AND (CASE WHEN (t_c_firstyear.year_total > '0'::numeric) THEN (t_c_secyear.year_total / t_c_firstyear.year_total) ELSE NULL::numeric END > CASE WHEN (t_s_firstyear.year_total > '0'::numeric) THEN (t_s_secyear.year_total / t_s_firstyear.year_total) ELSE NULL::numeric END))Rows Removed by Join Filter: 11876277-> Nested Loop (cost=0.00..541974.53 rows=1 width=320) (actual time=14866.104..3001271.961 rows=437 loops=1)Join Filter: (t_s_firstyear.customer_id = t_s_secyear.customer_id)Rows Removed by Join Filter: 44702488-> Nested Loop (cost=0.00..415941.57 rows=2 width=156) (actual time=11739.944..2946020.749 rows=1171 loops=1)Join Filter: (t_s_firstyear.customer_id = t_w_firstyear.customer_id)Rows Removed by Join Filter: 112695277-> Nested Loop (cost=0.00..277302.08 rows=9 width=104) (actual time=8139.729..2351733.795 rows=9952 loops=1)Join Filter: (t_s_firstyear.customer_id = t_c_firstyear.customer_id)Rows Removed by Join Filter: 997895870-> CTE Scan on year_total t_s_firstyear (cost=0.00..138631.41 rows=42 width=52) (actual time=4126.046..4234.598 rows=37923 loops=1)Filter: ((year_total > '0'::numeric) AND (sale_type = 's'::text) AND (dyear = 2001))Rows Removed by Filter: 346285-> CTE Scan on year_total t_c_firstyear (cost=0.00..138631.41 rows=42 width=52) (actual time=28.926..60.356 rows=26314 loops=37923)Filter: ((year_total > '0'::numeric) AND (sale_type = 'c'::text) AND (dyear = 2001))Rows Removed by Filter: 357894-> CTE Scan on year_total t_w_firstyear (cost=0.00..138631.41 rows=42 width=52) (actual time=49.572..59.057 rows=11324 loops=9952)Filter: ((year_total > '0'::numeric) AND (sale_type = 'w'::text) AND (dyear = 2001))Rows Removed by Filter: 372884-> CTE Scan on year_total t_s_secyear (cost=0.00..126028.55 rows=126 width=164) (actual time=0.002..44.949 rows=38175 loops=1171)Filter: ((sale_type = 's'::text) AND (dyear = 2002))Rows Removed by Filter: 346033-> CTE Scan on year_total t_c_secyear (cost=0.00..126028.55 rows=126 width=52) (actual time=21.023..44.097 rows=27177 loops=437)Filter: ((sale_type = 'c'::text) AND (dyear = 2002))Rows Removed by Filter: 357031-> CTE Scan on year_total t_w_secyear (cost=0.00..126028.55 rows=126 width=52) (actual time=36.137..43.090 rows=11252 loops=72)Filter: ((sale_type = 'w'::text) AND (dyear = 2002))Rows Removed by Filter: 372956Planning Time: 4.529 msExecution Time: 3024486.695 ms(83 rows)
-- Regards, Alena Rybakina Postgres Professional
On 11/11/24 02:35, Ba Jinsheng wrote: > Hi all, > > Please see this case: > > > Query 4 on TPC-DS benchmark: Thank you for interesting example! Looking into explains I see two sortings: -> Sort (cost=794037.94..794037.95 rows=1 width=132) (actual time=3024403.310..3024403.313 rows=8 loops=1) -> Sort (cost=794033.93..794033.94 rows=1 width=132) (actual time=8068.869..8068.872 rows=8 loops=1) Almost the same cost and different execution time. So, I think, the core of the problem in accuracy of selectivity estimation. In this specific example I see lots of composite scan filters: - ((sale_type = 'w'::text) AND (dyear = 2002)) - ((year_total > '0'::numeric) AND (sale_type = 'w'::text) AND (dyear = 2001)) - ((year_total > '0'::numeric) AND (sale_type = 's'::text) AND (dyear = 2001)) It is all the time a challenge for PostgreSQL to estimate such a filter because of absent information on joint column distribution. Can you research this way by building extended statistics on these clauses? It could move the plan to the more optimal direction. -- regards, Andrei Lepikhov
>It is all the time a challenge for PostgreSQL to estimate such a filter
>because of absent information on joint column distribution.
>Can you research this way by building extended statistics on these
>clauses? It could move the plan to the more optimal direction.
Thanks a lot for your effort to analyze this issue, and we really appreciate your suggestions! Currently, we focus on exposing these issues that affect performance. In the future, we may consider to look into such a direction as you suggested.
> Have you tried any tools to improve the cardinality yet, like aqo [0]?
Yes, but it takes nearly 1 hour to run this query at a time, so I only run "EXPLAIN ANALYZE" once, and the performance seems slightly improved.
>because of absent information on joint column distribution.
>Can you research this way by building extended statistics on these
>clauses? It could move the plan to the more optimal direction.
Thanks a lot for your effort to analyze this issue, and we really appreciate your suggestions! Currently, we focus on exposing these issues that affect performance. In the future, we may consider to look into such a direction as you suggested.
> Have you tried any tools to improve the cardinality yet, like aqo [0]?
Yes, but it takes nearly 1 hour to run this query at a time, so I only run "EXPLAIN ANALYZE" once, and the performance seems slightly improved.
QUERY PLAN
Limit (cost=293880.50..293880.50 rows=1 width=132) (actual time=2527921.078..2527921.233 rows=8 loops=1)
CTE year_total
-> Gather (cost=115049.92..233367.07 rows=384208 width=216) (actual time=1116.139..4005.105 rows=384208 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Append (cost=114049.92..193946.27 rows=160087 width=216) (actual time=2430.791..2510.131 rows=128069 loops=3)
-> HashAggregate (cost=190763.57..193145.83 rows=190581 width=216) (actual time=3977.521..4070.200 rows=190581 loops=1)
Group Key: customer.c_customer_id, customer.c_first_name, customer.c_last_name, customer.c_preferred_cust_flag, customer.c_birth_country, customer.c_login, customer.c_email_address, date_dim.d_year
Worker 1: Batches: 1 Memory Usage: 120857kB
-> Hash Join (cost=8151.60..103486.35 rows=2685453 width=174) (actual time=64.667..1605.601 rows=2685453 loops=1)
Hash Cond: (store_sales.ss_sold_date_sk = date_dim.d_date_sk)
-> Hash Join (cost=5103.00..93216.88 rows=2750652 width=174) (actual time=48.111..1121.801 rows=2750652 loops=1)
Hash Cond: (store_sales.ss_customer_sk = customer.c_customer_sk)
-> Seq Scan on store_sales (cost=0.00..80552.52 rows=2880404 width=30) (actual time=0.068..230.529 rows=2880404 loops=1)
-> Hash (cost=3853.00..3853.00 rows=100000 width=152) (actual time=47.735..47.735 rows=100000 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 17161kB
-> Seq Scan on customer (cost=0.00..3853.00 rows=100000 width=152) (actual time=0.012..25.023 rows=100000 loops=1)
-> Hash (cost=2135.49..2135.49 rows=73049 width=8) (actual time=16.242..16.242 rows=73049 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 3878kB
-> Seq Scan on date_dim (cost=0.00..2135.49 rows=73049 width=8) (actual time=0.074..8.744 rows=73049 loops=1)
-> HashAggregate (cost=114049.92..115762.15 rows=136978 width=216) (actual time=2199.723..2268.851 rows=136978 loops=1)
Group Key: customer_1.c_customer_id, customer_1.c_first_name, customer_1.c_last_name, customer_1.c_preferred_cust_flag, customer_1.c_birth_country, customer_1.c_login, customer_1.c_email_address, date_dim_1.d_year
Worker 0: Batches: 1 Memory Usage: 88089kB
-> Hash Join (cost=8151.60..67544.41 rows=1430939 width=177) (actual time=81.920..911.231 rows=1430939 loops=1)
Hash Cond: (catalog_sales.cs_sold_date_sk = date_dim_1.d_date_sk)
-> Hash Join (cost=5103.00..60729.97 rows=1434519 width=177) (actual time=53.469..638.140 rows=1434519 loops=1)
Hash Cond: (catalog_sales.cs_bill_customer_sk = customer_1.c_customer_sk)
-> Seq Scan on catalog_sales (cost=0.00..51842.75 rows=1441548 width=33) (actual time=0.066..134.023 rows=1441548 loops=1)
-> Hash (cost=3853.00..3853.00 rows=100000 width=152) (actual time=52.937..52.937 rows=100000 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 17161kB
-> Seq Scan on customer customer_1 (cost=0.00..3853.00 rows=100000 width=152) (actual time=0.019..27.549 rows=100000 loops=1)
-> Hash (cost=2135.49..2135.49 rows=73049 width=8) (actual time=27.968..27.968 rows=73049 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 3878kB
-> Seq Scan on date_dim date_dim_1 (cost=0.00..2135.49 rows=73049 width=8) (actual time=0.099..14.115 rows=73049 loops=1)
-> HashAggregate (cost=61268.33..61976.44 rows=56649 width=216) (actual time=1115.125..1142.838 rows=56649 loops=1)
Group Key: customer_2.c_customer_id, customer_2.c_first_name, customer_2.c_last_name, customer_2.c_preferred_cust_flag, customer_2.c_birth_country, customer_2.c_login, customer_2.c_email_address, date_dim_2.d_year
Batches: 1 Memory Usage: 35865kB
-> Hash Join (cost=8151.60..37896.96 rows=719119 width=177) (actual time=85.606..491.698 rows=719119 loops=1)
Hash Cond: (web_sales.ws_sold_date_sk = date_dim_2.d_date_sk)
-> Hash Join (cost=5103.00..32960.30 rows=719217 width=177) (actual time=59.536..342.685 rows=719217 loops=1)
Hash Cond: (web_sales.ws_bill_customer_sk = customer_2.c_customer_sk)
-> Seq Scan on web_sales (cost=0.00..25968.84 rows=719384 width=33) (actual time=0.032..67.592 rows=719384 loops=1)
-> Hash (cost=3853.00..3853.00 rows=100000 width=152) (actual time=59.430..59.430 rows=100000 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 17161kB
-> Seq Scan on customer customer_2 (cost=0.00..3853.00 rows=100000 width=152) (actual time=0.006..33.826 rows=100000 loops=1)
-> Hash (cost=2135.49..2135.49 rows=73049 width=8) (actual time=25.997..25.998 rows=73049 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 3878kB
-> Seq Scan on date_dim date_dim_2 (cost=0.00..2135.49 rows=73049 width=8) (actual time=0.016..13.499 rows=73049 loops=1)
-> Sort (cost=60513.43..60513.44 rows=1 width=132) (actual time=2527921.077..2527921.080 rows=8 loops=1)
Sort Key: t_s_secyear.customer_id, t_s_secyear.customer_first_name, t_s_secyear.customer_last_name, t_s_secyear.customer_email_address
Sort Method: quicksort Memory: 26kB
-> Nested Loop (cost=0.00..60513.42 rows=1 width=132) (actual time=388081.669..2527921.053 rows=8 loops=1)
Join Filter: ((t_s_secyear.customer_id = t_c_firstyear.customer_id) AND (CASE WHEN (t_c_firstyear.year_total > '0'::numeric) THEN (t_c_secyear.year_total / t_c_firstyear.year_total) ELSE NULL::numeric END > CASE WHEN (t_s_firstyear.year_total > '0'::numeric) THEN (t_s_secyear.year_total / t_s_firstyear.year_total) ELSE NULL::numeric END) AND (CASE WHEN (t_c_firstyear.year_total > '0'::numeric) THEN (t_c_secyear.year_total / t_c_firstyear.year_total) ELSE NULL::numeric END > CASE WHEN (t_w_firstyear.year_total > '0'::numeric) THEN (t_w_secyear.year_total / t_w_firstyear.year_total) ELSE NULL::numeric END))
Rows Removed by Join Filter: 1289378
-> Nested Loop (cost=0.00..49947.59 rows=1 width=372) (actual time=106080.918..2524931.374 rows=49 loops=1)
Join Filter: (t_s_secyear.customer_id = t_w_secyear.customer_id)
Rows Removed by Join Filter: 4962083
-> Nested Loop (cost=0.00..40342.26 rows=1 width=320) (actual time=10316.060..2506476.401 rows=441 loops=1)
Join Filter: (t_s_secyear.customer_id = t_c_secyear.customer_id)
Rows Removed by Join Filter: 44298069
-> Nested Loop (cost=0.00..30736.94 rows=1 width=268) (actual time=7429.522..2435241.747 rows=1630 loops=1)
Join Filter: (t_s_firstyear.customer_id = t_s_secyear.customer_id)
Rows Removed by Join Filter: 165296120
-> Nested Loop (cost=0.00..21131.61 rows=1 width=104) (actual time=3984.981..2240540.776 rows=4330 loops=1)
Join Filter: (t_s_firstyear.customer_id = t_w_firstyear.customer_id)
Rows Removed by Join Filter: 429435722
-> CTE Scan on year_total t_s_firstyear (cost=0.00..10565.72 rows=3 width=52) (actual time=3984.972..4058.789 rows=37923 loops=1)
Filter: ((year_total > '0'::numeric) AND (sale_type = 's'::text) AND (dyear = 2001))
Rows Removed by Filter: 346285
-> CTE Scan on year_total t_w_firstyear (cost=0.00..10565.72 rows=3 width=52) (actual time=0.001..58.349 rows=11324 loops=37923)
Filter: ((year_total > '0'::numeric) AND (sale_type = 'w'::text) AND (dyear = 2001))
Rows Removed by Filter: 372884
-> CTE Scan on year_total t_s_secyear (cost=0.00..9605.20 rows=10 width=164) (actual time=20.421..42.865 rows=38175 loops=4330)
Filter: ((sale_type = 's'::text) AND (dyear = 2002))
Rows Removed by Filter: 346033
-> CTE Scan on year_total t_c_secyear (cost=0.00..9605.20 rows=10 width=52) (actual time=5.974..42.155 rows=27177 loops=1630)
Filter: ((sale_type = 'c'::text) AND (dyear = 2002))
Rows Removed by Filter: 357031
-> CTE Scan on year_total t_w_secyear (cost=0.00..9605.20 rows=10 width=52) (actual time=0.002..41.219 rows=11252 loops=441)
Filter: ((sale_type = 'w'::text) AND (dyear = 2002))
Rows Removed by Filter: 372956
-> CTE Scan on year_total t_c_firstyear (cost=0.00..10565.72 rows=3 width=52) (actual time=8.525..59.572 rows=26314 loops=49)
Filter: ((year_total > '0'::numeric) AND (sale_type = 'c'::text) AND (dyear = 2001))
Rows Removed by Filter: 357894
Planning Time: 17.924 ms
Execution Time: 2527936.040 ms
(86 rows)
Notice: This email is generated from the account of an NUS alumnus. Contents, views, and opinions therein are solely those of the sender.
>> The default configurations of PostgreSQL incur the error: "ERROR: could not resize shared memory segment "/PostgreSQL.3539600020" to 2097152 bytes: No space left on device"
>No comment on your optimiser experiments for now, but for this error:
>it reminds me of a low/default --shm-size limit from Docker, or other
>similar container stuff?
Aha, you are right. The error disappears if running PostgreSQL out of docker container.
Notice: This email is generated from the account of an NUS alumnus. Contents, views, and opinions therein are solely those of the sender.