[HACKERS] TPC-H Q20 from 1 hour to 19 hours! - Mailing list pgsql-hackers

From Rafia Sabih
Subject [HACKERS] TPC-H Q20 from 1 hour to 19 hours!
Date
Msg-id CAOGQiiMsnM-cGjCfnpWQNyrzjf=FM0s_3hn1guqficSmwWWVfw@mail.gmail.com
Whole thread Raw
Responses Re: TPC-H Q20 from 1 hour to 19 hours!  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hello all,

This is to bring to notice a peculiar instance I found recently while
running TPC-H benchmark queries. Q20 of the benchmark took 19 hours to
complete when run on a machine with 512 GB RAM and 32 cores with
following parameter settings on the commit id -
0c2070cefa0e5d097b715c9a3b9b5499470019aa

work_mem = 1 GB
shared_buffers = 8 GB
effective_cache_size = 10 GB
scale factor = 300
max_parallel_workers_per_gather = 4

The output of explain_analyse is as follows,
QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Limit
(cost=60187550.59..60187550.59 rows=1 width=52) (actual
 
time=71064602.963..71064602.964 rows=1 loops=1)  ->  Sort  (cost=60187550.59..60187550.59 rows=3 width=52) (actual
time=71064602.961..71064602.961 rows=1 loops=1)        Sort Key: supplier.s_name        Sort Method: top-N heapsort
Memory:25kB        ->  Nested Loop Semi Join  (cost=52960550.15..60187550.57
 
rows=3 width=52) (actual time=2163639.699..71063153.953 rows=52263
loops=1)              Join Filter: (supplier.s_suppkey = lineitem.l_suppkey)              Rows Removed by Join Filter:
155706242106             ->  Nested Loop  (cost=963.43..10081.54 rows=120000
 
width=56) (actual time=178.589..6282.852 rows=120358 loops=1)                    ->  Seq Scan on nation
(cost=0.00..0.41rows=1
 
width=4) (actual time=0.018..0.042 rows=1 loops=1)                          Filter: (n_name = 'EGYPT'::bpchar)
               Rows Removed by Filter: 24                    ->  Bitmap Heap Scan on supplier
 
(cost=963.43..8881.13 rows=120000 width=64) (actual
time=178.563..6143.786 rows=120358 loops=1)                          Recheck Cond: (s_nationkey = nation.n_nationkey)
                      Heap Blocks: exact=57317                          ->  Bitmap Index Scan on
 
idx_supplier_nation_key  (cost=0.00..933.43 rows=120000 width=0)
(actual time=133.218..133.218 rows=120358 loops=1)                                Index Cond: (s_nationkey =
nation.n_nationkey)             ->  Materialize  (cost=52959586.72..60024469.24 rows=85
 
width=16) (actual time=12.679..175.456 rows=1293693 loops=120358)                    ->  Merge Join
(cost=52959586.72..60024468.82
rows=85 width=16) (actual time=1525322.753..2419045.641 rows=1696742
loops=1)                          Merge Cond: ((lineitem.l_partkey =
partsupp.ps_partkey) AND (lineitem.l_suppkey = partsupp.ps_suppkey))                          Join Filter:
((partsupp.ps_availqty)::numeric > ((0.5 * sum(lineitem.l_quantity))))                          Rows Removed by Join
Filter:3771                          ->  GroupAggregate
 
(cost=48448912.90..53325163.98 rows=144696357 width=48) (actual
time=1342085.116..2178225.340 rows=156665300 loops=1)                                Group Key: lineitem.l_partkey,
lineitem.l_suppkey                                ->  Sort
(cost=48448912.90..49125364.33 rows=270580572 width=21) (actual
time=1342085.072..1495863.254 rows=273057944 loops=1)                                      Sort Key:
lineitem.l_partkey,
lineitem.l_suppkey                                      Sort Method: external merge
Disk: 8282600kB                                      ->  Bitmap Heap Scan on
lineitem  (cost=2847641.84..10552097.42 rows=270580572 width=21)
(actual time=241170.637..650122.225 rows=273058377 loops=1)                                            Recheck Cond:
((l_shipdate >= '1994-01-01'::date) AND (l_shipdate < '1995-01-01
00:00:00'::timestamp without time zone))                                            Heap Blocks: exact=33444433
                                  ->  Bitmap Index Scan on
 
idx_l_shipdate  (cost=0.00..2779996.70 rows=270580572 width=0) (actual
time=190321.155..190321.155 rows=273058377 loops=1)                                                  Index Cond:
((l_shipdate >= '1994-01-01'::date) AND (l_shipdate < '1995-01-01
00:00:00'::timestamp without time zone))                          ->  Sort  (cost=4510673.81..4516734.42
rows=2424244 width=24) (actual time=183237.183..184039.914
rows=2602656 loops=1)                                Sort Key: partsupp.ps_partkey,
partsupp.ps_suppkey                                Sort Method: quicksort  Memory: 301637kB
  ->  Hash Join
 
(cost=379620.36..4253593.60 rows=2424244 width=24) (actual
time=8576.343..179703.563 rows=2602656 loops=1)                                      Hash Cond: (partsupp.ps_partkey
= part.p_partkey)                                      ->  Seq Scan on partsupp
(cost=0.00..2949730.80 rows=240000000 width=20) (actual
time=0.149..71902.279 rows=240000000 loops=1)                                      ->  Hash
(cost=372044.59..372044.59 rows=606062 width=4) (actual
time=8574.476..8574.476 rows=650664 loops=1)                                            Buckets: 1048576
Batches: 1  Memory Usage: 31067kB                                            ->  Gather
(cost=1000.00..372044.59 rows=606062 width=4) (actual
time=0.677..8090.145 rows=650664 loops=1)                                                  Workers Planned: 4
                                      Workers Launched: 4                                                  ->  Parallel
Seq
Scan on part  (cost=0.00..310438.39 rows=151516 width=4) (actual
time=0.056..8259.935 rows=130133 loops=5)                                                        Filter:
((p_name)::text ~~ 'beige%'::text)                                                        Rows Removed
by Filter: 11869867Planning time: 4.669 msExecution time: 71065478.356 ms

It is clear that selectivity estimations are really bad in this case
particularly at node, ->  Merge Join  (cost=52959586.72..60024468.82 rows=85 width=16)
(actual time=1525322.753..2419045.641 rows=1696742 loops=1)                          Merge Cond: ((lineitem.l_partkey
=
partsupp.ps_partkey) AND (lineitem.l_suppkey = partsupp.ps_suppkey))                          Join Filter:
((partsupp.ps_availqty)::numeric > ((0.5 * sum(lineitem.l_quantity))))                          Rows Removed by Join
Filter:3771
 

Still this puzzled me as during earlier runs of this benchmark I never
encountered such prolonged running times. On further investigation I
found that on reverting the commit
7fa93eec4e0c9c3e801e3c51aa4bae3a38aaa218
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Sat Dec 17 15:28:54 2016 -0500   Fix FK-based join selectivity estimation for semi/antijoins.

the query completes in around 1.5 hours, with the following query plan,Limit  (cost=69766456.38..69766456.38 rows=1
width=52)(actual
 
time=6056583.271..6056583.272 rows=1 loops=1)  ->  Sort  (cost=69766456.38..69766606.38 rows=60000 width=52)
(actual time=6056583.270..6056583.270 rows=1 loops=1)        Sort Key: supplier.s_name        Sort Method: top-N
heapsort Memory: 25kB        ->  Nested Loop  (cost=69742098.22..69766156.38 rows=60000
 
width=52) (actual time=6038178.125..6056492.537 rows=52263 loops=1)              Join Filter: (supplier.s_nationkey =
nation.n_nationkey)             Rows Removed by Join Filter: 1247849              ->  Seq Scan on nation
(cost=0.00..0.41rows=1
 
width=4) (actual time=0.011..0.049 rows=1 loops=1)                    Filter: (n_name = 'EGYPT'::bpchar)
   Rows Removed by Filter: 24              ->  Nested Loop  (cost=69742098.22..69747406.00
 
rows=1499997 width=60) (actual time=6038177.711..6056108.835
rows=1300112 loops=1)                    ->  HashAggregate  (cost=69742097.79..69742182.18
rows=8439 width=16) (actual time=6038177.626..6039095.036 rows=1300126
loops=1)                          Group Key: partsupp.ps_suppkey                          ->  Nested Loop Semi Join
(cost=48766000.84..69742076.69 rows=8439 width=16) (actual
time=2400414.925..6034282.917 rows=1696742 loops=1)                                ->  Merge Join
(cost=48766000.27..69735727.56 rows=8439 width=32) (actual
time=2400413.227..4113580.275 rows=156321526 loops=1)                                      Merge Cond:
((lineitem.l_partkey = partsupp.ps_partkey) AND (lineitem.l_suppkey =
partsupp.ps_suppkey))                                      Join Filter:
((partsupp.ps_availqty)::numeric > ((0.5 * sum(lineitem.l_quantity))))                                      Rows
Removedby Join Filter: 344021                                      ->  GroupAggregate
 
(cost=48765999.70..53642250.78 rows=144696357 width=48) (actual
time=2400413.048..3404489.071 rows=156665547 loops=1)                                            Group Key:
lineitem.l_partkey, lineitem.l_suppkey                                            ->  Sort
(cost=48765999.70..49442451.13 rows=270580572 width=21) (actual
time=2400412.958..2692446.813 rows=273058377 loops=1)                                                  Sort Key:
lineitem.l_partkey, lineitem.l_suppkey                                                  Sort Method:
external merge  Disk: 8282776kB                                                  ->  Bitmap Heap
Scan on lineitem  (cost=2847641.84..10552097.42 rows=270580572
width=21) (actual time=147080.637..1825300.462 rows=273058377 loops=1)
     Recheck Cond:
 
((l_shipdate >= '1994-01-01'::date) AND (l_shipdate < '1995-01-01
00:00:00'::timestamp without time zone))                                                        Rows Removed
by Index Recheck: 1346378434                                                        Heap Blocks:
exact=669636 lossy=32774797                                                        ->  Bitmap
Index Scan on idx_l_shipdate  (cost=0.00..2779996.70 rows=270580572
width=0) (actual time=146519.142..146519.142 rows=273058377 loops=1)
         Index
 
Cond: ((l_shipdate >= '1994-01-01'::date) AND (l_shipdate <
'1995-01-01 00:00:00'::timestamp without time zone))                                      ->  Index Scan using
partsupp_pkey on partsupp  (cost=0.57..12722651.69 rows=240000000
width=20) (actual time=0.160..197858.090 rows=240000000 loops=1)                                ->  Index Scan using
part_pkeyon
 
part  (cost=0.56..0.75 rows=1 width=4) (actual time=0.012..0.012
rows=0 loops=156321526)                                      Index Cond: (p_partkey =
lineitem.l_partkey)                                      Filter: ((p_name)::text ~~
'beige%'::text)                                      Rows Removed by Filter: 1                    ->  Index Scan using
supplier_pkeyon supplier
 
(cost=0.43..0.61 rows=1 width=64) (actual time=0.011..0.012 rows=1
loops=1300126)                          Index Cond: (s_suppkey = lineitem.l_suppkey)Planning time: 2.440 msExecution
time:6057329.179 ms
 

I hope there might be some way-out for such a case which includes the
benefits of the commit without hurting other cases (like this one)
this bad.
Thoughts, comments...

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/



pgsql-hackers by date:

Previous
From: Etsuro Fujita
Date:
Subject: Re: [HACKERS] Foreign Join pushdowns not working properly for outerjoins
Next
From: Michael Paquier
Date:
Subject: Re: [HACKERS] Partitioned tables and relfilenode