Cost Issue - How do I force a Hash Join - Mailing list pgsql-performance

From Virag Saksena
Subject Cost Issue - How do I force a Hash Join
Date
Msg-id 028401c636a5$907fb4b0$3100000a@demo1
Whole thread Raw
Responses Re: Cost Issue - How do I force a Hash Join
List pgsql-performance
Hi,
    I have query where I do two inline queries (which involves grouping) and then join them with an outer join.
The individual queries run in 50-300 ms. However the optimizer is choosing a nested loop to join them rather than a Hash join
causing the complete query to take 500+ seconds. It expects that it will get 1 row out from each of the sources, but here is gets
several thousand rows.
 
Is there any way I can get a hash join used on the outer join, while preserving the nested loops.
 
 
explain analyze
select o1.objaddr, o1.fieldname, o1.objsig,
o1.totmem, o1.cnt,
o2.totmem, o2.cnt
from
( select min(ao.objaddr) as objaddr, count(*) as cnt,
         sum(ao.totmem) as totmem, ao.objsig, ar.fieldname, ao.objtype
    from jam_heapobj ao, jam_heaprel ar
   where ar.heap_id = 1  and ar.parentaddr = 0 and ar.fieldname = 'K'
     and ao.heap_id = ar.heap_id and ao.objaddr = ar.childaddr
   group by ao.objsig, ar.fieldname, ao.objtype) o1
left outer join
(select min(bo.objaddr) as objaddr, count(*) as cnt,
        sum(bo.totmem) as totmem, bo.objsig, br.fieldname, bo.objtype
   from jam_heapobj bo, jam_heaprel br
  where br.heap_id = 0  and br.parentaddr = 0 and br.fieldname = 'K'
    and bo.heap_id = br.heap_id and bo.objaddr = br.childaddr
group by bo.objsig, br.fieldname, bo.objtype) o2
on ( o2.objsig = o1.objsig and o2.objtype = o1.objtype
 and o2.fieldname = o1.fieldname)
 order by o1.totmem - coalesce(o2.totmem,0) desc;
 
 Sort (cost=16305.41..16305.42 rows=1 width=562) (actual time=565997.769..566016.255 rows=6115 loops=1)
  Sort Key: (o1.totmem - COALESCE(o2.totmem, 0::bigint))
  ->Nested Loop Left Join (cost=16305.22..16305.40 rows=1 width=562) (actual time=612.631..565896.047 rows=6115 loops=1)
    Join Filter: ((("inner".objsig)::text = ("outer".objsig)::text) AND (("inner".objtype)::text = ("outer".objtype)::text) AND (("inner".fieldname)::text = ("outer".fieldname)::text))
    ->Subquery Scan o1 (cost=12318.12..12318.15 rows=1 width=514) (actual time=309.659..413.311 rows=6115 loops=1)
      ->HashAggregate (cost=12318.12..12318.14 rows=1 width=54) (actual time=309.649..367.206 rows=6115 loops=1)
        ->Nested Loop (cost=0.00..12317.90 rows=10 width=54) (actual time=0.243..264.116 rows=6338 loops=1)
          ->Index Scan using jam_heaprel_n1 on jam_heaprel ar (cost=0.00..12275.00 rows=7 width=19) (actual time=0.176..35.780 rows=6338 loops=1)
            Index Cond: ((heap_id = 1) AND (parentaddr = 0))
            Filter: ((fieldname)::text = 'K'::text)
          ->Index Scan using jam_heapobj_u1 on jam_heapobj ao (cost=0.00..6.10 rows=2 width=51) (actual time=0.019..0.022 rows=1 loops=6338)
            Index Cond: ((ao.heap_id = 1) AND (ao.objaddr = "outer".childaddr))
    ->Subquery Scan o2 (cost=3987.10..3987.13 rows=1 width=514) (actual time=0.062..75.171 rows=6038 loops=6115)
      ->HashAggregate (cost=3987.10..3987.12 rows=1 width=54) (actual time=0.056..36.469 rows=6038 loops=6115)
        ->Nested Loop (cost=0.00..3987.05 rows=2 width=54) (actual time=0.145..257.876 rows=6259 loops=1)
          ->Index Scan using jam_heaprel_n1 on jam_heaprel br (cost=0.00..3974.01 rows=3 width=19) (actual time=0.074..35.124 rows=6259 loops=1)
            Index Cond: ((heap_id = 0) AND (parentaddr = 0))
            Filter: ((fieldname)::text = 'K'::text)
          ->Index Scan using jam_heapobj_u1 on jam_heapobj bo (cost=0.00..4.33 rows=1 width=51) (actual time=0.018..0.022 rows=1 loops=6259)
            Index Cond: ((bo.heap_id = 0) AND (bo.objaddr = "outer".childaddr))
 Total runtime: 566044.187 ms
(21 rows)
Regards,
 
Virag

pgsql-performance by date:

Previous
From: Chris
Date:
Subject: Re: Creating a correct and real benchmark
Next
From: Tom Lane
Date:
Subject: Re: Cost Issue - How do I force a Hash Join