-HEAD planner issue wrt hash_joins on dbt3 ? - Mailing list pgsql-hackers

From Stefan Kaltenbrunner
Subject -HEAD planner issue wrt hash_joins on dbt3 ?
Date
Msg-id 4505A2E8.7000509@kaltenbrunner.cc
Whole thread Raw
Responses Re: -HEAD planner issue wrt hash_joins on dbt3 ?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hi!

I'm about to do some benchmarking on -HEAD one some hardware I have
available and it seems I'm hitting a rather weird issue causing the osdl
dbt3 benchmark to run very slow and eating CPU time for hours ...

it seems that the issue is caused by the following query:
(in case it gets linewrapped:
http://www.kaltenbrunner.cc/files/dbt3_with_hashjoin.txt)

select n_name, sum(l_extendedprice * (1 - l_discount)) as revenue from
customer, orders, lineitem, supplier, nation, region where c_custkey =
o_custkey and l_orderkey = o_orderkey and l_suppkey = s_suppkey and
c_nationkey = s_nationkey and s_nationkey = n_nationkey and n_regionkey
= r_regionkey and r_name = 'AFRICA' and o_orderdate >= date '1993-01-01'
and o_orderdate < date '1993-01-01' + interval '1 year' group by n_name
order by revenue desc;

that results in the following plan on my box:
Sort  (cost=2543391.75..2543391.81 rows=25 width=37)  Sort Key: sum((lineitem.l_extendedprice * (1::double precision -
lineitem.l_discount)))  ->  HashAggregate  (cost=2543390.73..2543391.17 rows=25 width=37)        ->  Hash Join
(cost=440864.81..2543027.40rows=72666 width=37)              Hash Cond: ((orders.o_custkey = customer.c_custkey) AND
 
(supplier.s_nationkey = customer.c_nationkey))              ->  Hash Join  (cost=377714.59..2415568.01 rows=1816643
width=49)                    Hash Cond: (lineitem.l_orderkey = orders.o_orderkey)                    ->  Nested Loop
(cost=13.65..1719683.85
rows=12000672 width=49)                          ->  Merge Join  (cost=0.00..10248.66
rows=20000 width=41)                                Merge Cond: (nation.n_nationkey =
supplier.s_nationkey)                                ->  Nested Loop  (cost=0.00..19.19
rows=5 width=33)                                      ->  Index Scan using pk_nation on
nation  (cost=0.00..9.38 rows=25 width=37)                                      ->  Index Scan using pk_region on
region  (cost=0.00..0.38 rows=1 width=4)                                            Index Cond:
(nation.n_regionkey = region.r_regionkey)                                            Filter: (r_name =
'AFRICA'::bpchar)                                ->  Index Scan using i_s_nationkey on
supplier  (cost=0.00..9779.46 rows=100000 width=8)                          ->  Bitmap Heap Scan on lineitem
(cost=13.65..77.16 rows=665 width=16)                                Recheck Cond: (lineitem.l_suppkey =
supplier.s_suppkey)                                ->  Bitmap Index Scan on i_l_suppkey
(cost=0.00..13.65 rows=665 width=0)                                      Index Cond: (lineitem.l_suppkey =
supplier.s_suppkey)                    ->  Hash  (cost=372023.51..372023.51 rows=2270971
width=8)                          ->  Bitmap Heap Scan on orders
(cost=41391.94..372023.51 rows=2270971 width=8)                                Recheck Cond: ((o_orderdate >=
'1993-01-01'::date) AND (o_orderdate < '1994-01-01 00:00:00'::timestamp
without time zone))                                ->  Bitmap Index Scan on i_o_orderdate(cost=0.00..41391.94
rows=2270971width=0)                                      Index Cond: ((o_orderdate >=
 
'1993-01-01'::date) AND (o_orderdate < '1994-01-01 00:00:00'::timestamp
without time zone))              ->  Hash  (cost=55647.15..55647.15 rows=1500615 width=8)                    ->  Seq
Scanon customer  (cost=0.00..55647.15
 
rows=1500615 width=8)
(27 rows)

so it really thinks that doing hashes with gigantic amounts of data is
a good idea generally - this seems to be independent on work_mem - the
plan looks the same with 1MB vs 126MB(which I had during the run).

the profile of the backend eating the cpu looks similiar to:

26351    27.9047  ExecScanHashBucket
8239      8.7248  hash_seq_search
6984      7.3958  hash_search_with_hash_value


setting hash_join to off results in a runtime of about 2,5minutes:

(http://www.kaltenbrunner.cc/files/dbt3_without_hashjoin.txt)

Sort  (cost=3700257.38..3700257.45 rows=25 width=37) (actual
time=286820.962..286820.968 rows=5 loops=1)  Sort Key: sum((lineitem.l_extendedprice * (1::double precision -
lineitem.l_discount)))  ->  HashAggregate  (cost=3700256.37..3700256.80 rows=25 width=37)
(actual time=286820.932..286820.941 rows=5 loops=1)        ->  Nested Loop  (cost=730956.43..3699893.04 rows=72666
width=37) (actual time=43551.767..286488.555 rows=72441 loops=1)              Join Filter: (customer.c_nationkey =
supplier.s_nationkey)             ->  Merge Join  (cost=730956.43..3624153.73 rows=1816643
 
width=49) (actual time=43281.710..257082.739 rows=1822547 loops=1)                    Merge Cond: (lineitem.l_orderkey
=orders.o_orderkey)                    ->  Index Scan using i_l_orderkey on lineitem
 
(cost=0.00..2715943.34 rows=60003360 width=16) (actual
time=32.868..123668.380 rows=59991868 loops=1)                    ->  Sort  (cost=730956.43..732091.92 rows=454194
width=41) (actual time=43248.797..45754.223 rows=1822547 loops=1)                          Sort Key: orders.o_orderkey
                       ->  Merge Join  (cost=670885.68..688278.21
 
rows=454194 width=41) (actual time=34469.359..42050.059 rows=455262 loops=1)                                Merge Cond:
(customer.c_custkey=
 
orders.o_custkey)                                ->  Sort  (cost=59105.79..59856.10
rows=300123 width=41) (actual time=8113.826..8491.532 rows=299493 loops=1)                                      Sort
Key:customer.c_custkey                                      ->  Nested Loop
 
(cost=781.13..31801.81 rows=300123 width=41) (actual
time=107.537..7461.355 rows=299493 loops=1)                                            ->  Nested Loop
(cost=1.06..11.00 rows=5 width=33) (actual time=0.030..0.296 rows=5 loops=1)
     Join Filter:
 
(nation.n_regionkey = region.r_regionkey)                                                  ->  Index Scan using
pk_nation on nation  (cost=0.00..9.38 rows=25 width=37) (actual
time=0.007..0.063 rows=25 loops=1)                                                  ->  Materialize
(cost=1.06..1.07 rows=1 width=4) (actual time=0.002..0.004 rows=1 loops=25)
          ->  Seq Scan on
 
region  (cost=0.00..1.06 rows=1 width=4) (actual time=0.009..0.018
rows=1 loops=1)                                                              Filter:
(r_name = 'AFRICA'::bpchar)                                            ->  Bitmap Heap Scan on
customer  (cost=780.07..5607.85 rows=60025 width=8) (actual
time=61.150..1331.466 rows=59899 loops=5)                                                  Recheck Cond:
(nation.n_nationkey = customer.c_nationkey)                                                  ->  Bitmap Index Scan
on i_c_nationkey  (cost=0.00..780.07 rows=60025 width=0) (actual
time=44.637..44.637 rows=59899 loops=5)                                                        Index Cond:
(nation.n_nationkey = customer.c_nationkey)                                ->  Sort  (cost=611779.89..617457.31
rows=2270971 width=8) (actual time=26355.515..29471.963 rows=2276859
loops=1)                                      Sort Key: orders.o_custkey                                      ->
BitmapHeap Scan on orders
 
(cost=41391.94..372023.51 rows=2270971 width=8) (actual
time=1630.604..16266.102 rows=2276859 loops=1)                                            Recheck Cond: ((o_orderdate
>= '1993-01-01'::date) AND (o_orderdate < '1994-01-01
00:00:00'::timestamp without time zone))                                            ->  Bitmap Index Scan on
i_o_orderdate  (cost=0.00..41391.94 rows=2270971 width=0) (actual
time=1352.037..1352.037 rows=2276859 loops=1)                                                  Index Cond:
((o_orderdate >= '1993-01-01'::date) AND (o_orderdate < '1994-01-01
00:00:00'::timestamp without time zone))              ->  Index Scan using pk_supplier on supplier
(cost=0.00..0.03 rows=1 width=8) (actual time=0.010..0.012 rows=1
loops=1822547)                    Index Cond: (lineitem.l_suppkey = supplier.s_suppkey)Total runtime: 286984.386 ms
(34 rows)

(about 120s seem to be explain analyze overhead here)


fwiw the box in question is a Dual 2,6Ghz Opteron with 8GB or RAM - wal
is on the BBWC-onboard Smartarray (RAID 10 on 4 disks) and the data is
on a 14 disk Linux Software RAID 10 running Debian Sarge/AMD64 with
Kernel 2.6.17.7. the dbt3 database got initialized with scaling factor
of 10 (running with just 1 works fine).



Stefan


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Fixed length data types issue
Next
From: Gregory Stark
Date:
Subject: Re: Fixed length data types issue