How to get explain plan to prefer Hash Join - Mailing list pgsql-performance

From atxcanadian
Subject How to get explain plan to prefer Hash Join
Date
Msg-id 1426093264141-5841450.post@n5.nabble.com
Whole thread Raw
Responses Re: How to get explain plan to prefer Hash Join
Re: How to get explain plan to prefer Hash Join
List pgsql-performance
Currently seeing massive increase in performance when optimizer chooses Hash
Join over Nested Loops. I achieve this by temporarily setting nested loops
off. I'd like to setup some database variables where the optimizer prefers
hash joins. Any suggestions?

*Query in question:*
explain analyze
select dp.market_day, dp.hour_ending, dp.repeated_hour_flag,
dp.settlement_point, sum(dp.mw) dp_mw
from dp_hist_gen_actual dp
     Inner Join api_settlement_points sp on
         sp.settlement_point = dp.settlement_point and
         sp.settlement_point_rdfid =
'#_{09F3A628-3B9D-481A-AC90-72AF8EAB64CA}' and
         sp.start_date <= '2015-01-01'::date and
         sp.end_date > '2015-01-01'::date and
         sp.rt_model = (select case when c.rt_model_loaded = 2 then true
else false end
                        from cim_calendar c where c.nodal_load <=
'2015-01-01'::date
                        order by c.cim desc limit 1)
where dp.market_day BETWEEN '2015-01-01'::date and
                            '2015-01-01'::date and
      dp.expiry_date is null
group by dp.market_day, dp.hour_ending, dp.repeated_hour_flag,
dp.settlement_point;

*Nested Loop Explain Analyze Output:*
HashAggregate  (cost=58369.29..58369.30 rows=1 width=24) (actual
time=496287.249..496287.257 rows=24 loops=1)
  InitPlan 1 (returns $0)
    ->  Limit  (cost=8.30..8.30 rows=1 width=9) (actual time=0.145..0.145
rows=1 loops=1)
          ->  Sort  (cost=8.30..8.78 rows=193 width=9) (actual
time=0.145..0.145 rows=1 loops=1)
                Sort Key: c.cim
                Sort Method: top-N heapsort  Memory: 25kB
                ->  Seq Scan on cim_calendar c  (cost=0.00..7.33 rows=193
width=9) (actual time=0.007..0.075 rows=192 loops=1)
                      Filter: (nodal_load <= '2015-01-01'::date)
                      Rows Removed by Filter: 36
  -> * Nested Loop  (cost=0.99..58360.98 rows=1 width=24) (actual
time=883.718..496287.058 rows=24 loops=1)*
        Join Filter: ((dp.settlement_point)::text =
(sp.settlement_point)::text)
        Rows Removed by Join Filter: 12312
        ->  Index Scan using dp_hist_gen_actual_idx2 on dp_hist_gen_actual
dp  (cost=0.56..2.78 rows=1 width=24) (actual time=0.020..20.012 rows=12336
loops=1)
              Index Cond: ((market_day >= '2015-01-01'::date) AND
(market_day <= '2015-01-01'::date) AND (expiry_date IS NULL))
        ->  Index Scan using api_settlement_points_idx on
api_settlement_points sp  (cost=0.43..58358.05 rows=12 width=9) (actual
time=39.066..40.223 rows=1 loops=12336)
              Index Cond: ((rt_model = $0) AND (start_date <=
'2015-01-01'::date) AND (end_date > '2015-01-01'::date))
              Filter: ((settlement_point_rdfid)::text =
'#_{09F3A628-3B9D-481A-AC90-72AF8EAB64CA}'::text)
              Rows Removed by Filter: 5298
*Total runtime: 496287.325 ms*

*Hash Join Explain Analyze Output:*
HashAggregate  (cost=58369.21..58369.22 rows=1 width=24) (actual
time=50.835..50.843 rows=24 loops=1)
  InitPlan 1 (returns $0)
    ->  Limit  (cost=8.30..8.30 rows=1 width=9) (actual time=0.149..0.149
rows=1 loops=1)
          ->  Sort  (cost=8.30..8.78 rows=193 width=9) (actual
time=0.148..0.148 rows=1 loops=1)
                Sort Key: c.cim
                Sort Method: top-N heapsort  Memory: 25kB
                ->  Seq Scan on cim_calendar c  (cost=0.00..7.33 rows=193
width=9) (actual time=0.009..0.082 rows=192 loops=1)
                      Filter: (nodal_load <= '2015-01-01'::date)
                      Rows Removed by Filter: 36
  ->  *Hash Join  (cost=3.23..58360.90 rows=1 width=24) (actual
time=49.644..50.811 rows=24 loops=1)*
        Hash Cond: ((sp.settlement_point)::text =
(dp.settlement_point)::text)
        ->  Index Scan using api_settlement_points_idx on
api_settlement_points sp  (cost=0.43..58358.05 rows=12 width=9) (actual
time=39.662..40.822 rows=1 loops=1)
              Index Cond: ((rt_model = $0) AND (start_date <=
'2015-01-01'::date) AND (end_date > '2015-01-01'::date))
              Filter: ((settlement_point_rdfid)::text =
'#_{09F3A628-3B9D-481A-AC90-72AF8EAB64CA}'::text)
              Rows Removed by Filter: 5298
        ->  Hash  (cost=2.78..2.78 rows=1 width=24) (actual
time=9.962..9.962 rows=12336 loops=1)
              Buckets: 1024  Batches: 1  Memory Usage: 684kB
              ->  Index Scan using dp_hist_gen_actual_idx2 on
dp_hist_gen_actual dp  (cost=0.56..2.78 rows=1 width=24) (actual
time=0.023..5.962 rows=12336 loops=1)
                    Index Cond: ((market_day >= '2015-01-01'::date) AND
(market_day <= '2015-01-01'::date) AND (expiry_date IS NULL))
*Total runtime: 50.906 ms*




--
View this message in context: http://postgresql.nabble.com/How-to-get-explain-plan-to-prefer-Hash-Join-tp5841450.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: EXPLAIN (no ANALYZE) taking an hour for INSERT FROM SELECT
Next
From: Jeff Janes
Date:
Subject: Re: EXPLAIN (no ANALYZE) taking an hour for INSERT FROM SELECT