Thread: Cost Issue - How do I force a Hash Join
Hi,
I have query where I do two inline queries (which involves grouping) and then join them with an outer join.
The individual queries run in 50-300 ms. However the optimizer is choosing a nested loop to join them rather than a Hash join
causing the complete query to take 500+ seconds. It expects that it will get 1 row out from each of the sources, but here is gets
several thousand rows.
Is there any way I can get a hash join used on the outer join, while preserving the nested loops.
explain analyze
select o1.objaddr, o1.fieldname, o1.objsig,
o1.totmem, o1.cnt,
o2.totmem, o2.cnt
from
( select min(ao.objaddr) as objaddr, count(*) as cnt,
sum(ao.totmem) as totmem, ao.objsig, ar.fieldname, ao.objtype
from jam_heapobj ao, jam_heaprel ar
where ar.heap_id = 1 and ar.parentaddr = 0 and ar.fieldname = 'K'
and ao.heap_id = ar.heap_id and ao.objaddr = ar.childaddr
group by ao.objsig, ar.fieldname, ao.objtype) o1
left outer join
(select min(bo.objaddr) as objaddr, count(*) as cnt,
sum(bo.totmem) as totmem, bo.objsig, br.fieldname, bo.objtype
from jam_heapobj bo, jam_heaprel br
where br.heap_id = 0 and br.parentaddr = 0 and br.fieldname = 'K'
and bo.heap_id = br.heap_id and bo.objaddr = br.childaddr
group by bo.objsig, br.fieldname, bo.objtype) o2
on ( o2.objsig = o1.objsig and o2.objtype = o1.objtype
and o2.fieldname = o1.fieldname)
order by o1.totmem - coalesce(o2.totmem,0) desc;
select o1.objaddr, o1.fieldname, o1.objsig,
o1.totmem, o1.cnt,
o2.totmem, o2.cnt
from
( select min(ao.objaddr) as objaddr, count(*) as cnt,
sum(ao.totmem) as totmem, ao.objsig, ar.fieldname, ao.objtype
from jam_heapobj ao, jam_heaprel ar
where ar.heap_id = 1 and ar.parentaddr = 0 and ar.fieldname = 'K'
and ao.heap_id = ar.heap_id and ao.objaddr = ar.childaddr
group by ao.objsig, ar.fieldname, ao.objtype) o1
left outer join
(select min(bo.objaddr) as objaddr, count(*) as cnt,
sum(bo.totmem) as totmem, bo.objsig, br.fieldname, bo.objtype
from jam_heapobj bo, jam_heaprel br
where br.heap_id = 0 and br.parentaddr = 0 and br.fieldname = 'K'
and bo.heap_id = br.heap_id and bo.objaddr = br.childaddr
group by bo.objsig, br.fieldname, bo.objtype) o2
on ( o2.objsig = o1.objsig and o2.objtype = o1.objtype
and o2.fieldname = o1.fieldname)
order by o1.totmem - coalesce(o2.totmem,0) desc;
Sort (cost=16305.41..16305.42 rows=1 width=562) (actual time=565997.769..566016.255 rows=6115 loops=1)
Sort Key: (o1.totmem - COALESCE(o2.totmem, 0::bigint))
->Nested Loop Left Join (cost=16305.22..16305.40 rows=1 width=562) (actual time=612.631..565896.047 rows=6115 loops=1)
Join Filter: ((("inner".objsig)::text = ("outer".objsig)::text) AND (("inner".objtype)::text = ("outer".objtype)::text) AND (("inner".fieldname)::text = ("outer".fieldname)::text))
->Subquery Scan o1 (cost=12318.12..12318.15 rows=1 width=514) (actual time=309.659..413.311 rows=6115 loops=1)
->HashAggregate (cost=12318.12..12318.14 rows=1 width=54) (actual time=309.649..367.206 rows=6115 loops=1)
->Nested Loop (cost=0.00..12317.90 rows=10 width=54) (actual time=0.243..264.116 rows=6338 loops=1)
->Index Scan using jam_heaprel_n1 on jam_heaprel ar (cost=0.00..12275.00 rows=7 width=19) (actual time=0.176..35.780 rows=6338 loops=1)
Index Cond: ((heap_id = 1) AND (parentaddr = 0))
Filter: ((fieldname)::text = 'K'::text)
->Index Scan using jam_heapobj_u1 on jam_heapobj ao (cost=0.00..6.10 rows=2 width=51) (actual time=0.019..0.022 rows=1 loops=6338)
Index Cond: ((ao.heap_id = 1) AND (ao.objaddr = "outer".childaddr))
->Subquery Scan o2 (cost=3987.10..3987.13 rows=1 width=514) (actual time=0.062..75.171 rows=6038 loops=6115)
->HashAggregate (cost=3987.10..3987.12 rows=1 width=54) (actual time=0.056..36.469 rows=6038 loops=6115)
->Nested Loop (cost=0.00..3987.05 rows=2 width=54) (actual time=0.145..257.876 rows=6259 loops=1)
->Index Scan using jam_heaprel_n1 on jam_heaprel br (cost=0.00..3974.01 rows=3 width=19) (actual time=0.074..35.124 rows=6259 loops=1)
Index Cond: ((heap_id = 0) AND (parentaddr = 0))
Filter: ((fieldname)::text = 'K'::text)
->Index Scan using jam_heapobj_u1 on jam_heapobj bo (cost=0.00..4.33 rows=1 width=51) (actual time=0.018..0.022 rows=1 loops=6259)
Index Cond: ((bo.heap_id = 0) AND (bo.objaddr = "outer".childaddr))
Total runtime: 566044.187 ms
(21 rows)
Sort Key: (o1.totmem - COALESCE(o2.totmem, 0::bigint))
->Nested Loop Left Join (cost=16305.22..16305.40 rows=1 width=562) (actual time=612.631..565896.047 rows=6115 loops=1)
Join Filter: ((("inner".objsig)::text = ("outer".objsig)::text) AND (("inner".objtype)::text = ("outer".objtype)::text) AND (("inner".fieldname)::text = ("outer".fieldname)::text))
->Subquery Scan o1 (cost=12318.12..12318.15 rows=1 width=514) (actual time=309.659..413.311 rows=6115 loops=1)
->HashAggregate (cost=12318.12..12318.14 rows=1 width=54) (actual time=309.649..367.206 rows=6115 loops=1)
->Nested Loop (cost=0.00..12317.90 rows=10 width=54) (actual time=0.243..264.116 rows=6338 loops=1)
->Index Scan using jam_heaprel_n1 on jam_heaprel ar (cost=0.00..12275.00 rows=7 width=19) (actual time=0.176..35.780 rows=6338 loops=1)
Index Cond: ((heap_id = 1) AND (parentaddr = 0))
Filter: ((fieldname)::text = 'K'::text)
->Index Scan using jam_heapobj_u1 on jam_heapobj ao (cost=0.00..6.10 rows=2 width=51) (actual time=0.019..0.022 rows=1 loops=6338)
Index Cond: ((ao.heap_id = 1) AND (ao.objaddr = "outer".childaddr))
->Subquery Scan o2 (cost=3987.10..3987.13 rows=1 width=514) (actual time=0.062..75.171 rows=6038 loops=6115)
->HashAggregate (cost=3987.10..3987.12 rows=1 width=54) (actual time=0.056..36.469 rows=6038 loops=6115)
->Nested Loop (cost=0.00..3987.05 rows=2 width=54) (actual time=0.145..257.876 rows=6259 loops=1)
->Index Scan using jam_heaprel_n1 on jam_heaprel br (cost=0.00..3974.01 rows=3 width=19) (actual time=0.074..35.124 rows=6259 loops=1)
Index Cond: ((heap_id = 0) AND (parentaddr = 0))
Filter: ((fieldname)::text = 'K'::text)
->Index Scan using jam_heapobj_u1 on jam_heapobj bo (cost=0.00..4.33 rows=1 width=51) (actual time=0.018..0.022 rows=1 loops=6259)
Index Cond: ((bo.heap_id = 0) AND (bo.objaddr = "outer".childaddr))
Total runtime: 566044.187 ms
(21 rows)
Regards,
Virag
"Virag Saksena" <virag@auptyma.com> writes: > The individual queries run in 50-300 ms. However the optimizer is = > choosing a nested loop to join them rather than a Hash join > causing the complete query to take 500+ seconds. It expects that it will = > get 1 row out from each of the sources, but here is gets > several thousand rows. The best approach is to see if you can't fix that estimation error. Are the stats up to date on these tables? If so, maybe raising the statistics targets would help. regards, tom lane
"Virag Saksena" <virag@auptyma.com> writes: >> The individual queries run in 50-300 ms. However the optimizer is >> choosing a nested loop to join them rather than a Hash join... I have what appears to be the identical problem. This is a straightforward query that should be fairly quick, but takes about 30 minutes. It's a query across three tables,call them A, B, and C. The tables are joined on indexed columns. Here's a quick summary: Table A -----> Table B -----> Table C A_ID B_ID C_ID A_ID NAME C_ID Tables A and B have 6 million rows each. Table C is small: 67 names, no repeats. All columns involved in the join are indexed. Summary: 1. Query B only: 2.7 seconds, 302175 rows returned 2. Join B and C: 4.3 seconds, exact same answer 3. Join A and B: 7.2 minutes, exact same answer 4. Join A, B, C: 32.7 minutes, exact same answer Looking at these: Query #1 is doing the real work: finding the rows of interest. Queries #1 and #2 ought to be virtually identical, since Table C has just one row with C_ID = 9, but the time almost doubles. Query #3 should take a bit longer than Query #1 because it has to join 300K rows, but the indexes should make this take just a few seconds, certainly well under a minute. Query #4 should be identical to Query #3, again because there's only one row in Table C. 32 minutes is pretty horrible for such a straightforward query. It looks to me like the problem is the use of nested loops when a hash join should be used, but I'm no expert at query planning. This is psql 8.0.3. Table definitions are at the end. (Table and column names are altered to protect the guilty, otherwisethese are straight from Postgres.) I ran "vacuum full analyze" after the last data were added. Hardware is a Dell,2-CPU Xeon, 4 GB memory, database is on a single SATA 7200RPM disk. Thanks, Craig ---------------------------- QUERY #1: --------- explain analyze select B.A_ID from B where B.B_ID = 9; Index Scan using i_B_B_ID on B (cost=0.00..154401.36 rows=131236 width=4) (actual time=0.158..1387.251 rows=302175 loops=1) Index Cond: (B_ID = 9) Total runtime: 2344.053 ms QUERY #2: --------- explain analyze select B.A_ID from B join C on (B.C_ID = C.C_ID) where C.name = 'Joe'; Nested Loop (cost=0.00..258501.92 rows=177741 width=4) (actual time=0.349..3392.532 rows=302175 loops=1) -> Seq Scan on C (cost=0.00..12.90 rows=1 width=4) (actual time=0.232..0.336 rows=1 loops=1) Filter: ((name)::text = 'Joe'::text) -> Index Scan using i_B_C_ID on B (cost=0.00..254387.31 rows=328137 width=8) (actual time=0.102..1290.002 rows=302175loops=1) Index Cond: (B.C_ID = "outer".C_ID) Total runtime: 4373.916 ms QUERY #3: --------- explain analyze select A.A_ID from A join B on (A.A_ID = B.A_ID) where B.B_ID = 9; Nested Loop (cost=0.00..711336.41 rows=131236 width=4) (actual time=37.118..429419.347 rows=302175 loops=1) -> Index Scan using i_B_B_ID on B (cost=0.00..154401.36 rows=131236 width=4) (actual time=27.344..8858.489 rows=302175loops=1) Index Cond: (B_ID = 9) -> Index Scan using pk_A_test on A (cost=0.00..4.23 rows=1 width=4) (actual time=1.372..1.376 rows=1 loops=302175) Index Cond: (A.A_ID = "outer".A_ID) Total runtime: 430467.686 ms QUERY #4: --------- explain analyze select A.A_ID from A join B on (A.A_ID = B.A_ID) join C on (B.B_ID = C.B_ID) where C.name = 'Joe'; Nested Loop (cost=0.00..1012793.38 rows=177741 width=4) (actual time=70.184..1960112.247 rows=302175 loops=1) -> Nested Loop (cost=0.00..258501.92 rows=177741 width=4) (actual time=52.114..17753.638 rows=302175 loops=1) -> Seq Scan on C (cost=0.00..12.90 rows=1 width=4) (actual time=0.109..0.176 rows=1 loops=1) Filter: ((name)::text = 'Joe'::text) -> Index Scan using i_B_B_ID on B (cost=0.00..254387.31 rows=328137 width=8) (actual time=51.985..15566.896 rows=302175loops=1) Index Cond: (B.B_ID = "outer".B_ID) -> Index Scan using pk_A_test on A (cost=0.00..4.23 rows=1 width=4) (actual time=6.407..6.412 rows=1 loops=302175) Index Cond: (A.A_ID = "outer".A_ID) Total runtime: 1961200.079 ms TABLE DEFINITIONS: ------------------ xxx => \d a Table "xxx.a" Column | Type | Modifiers -------------------+------------------------+----------- a_id | integer | not null ... more columns Indexes: "pk_a_id" PRIMARY KEY, btree (a_id) ... more indexes on other columns xxx => \d b Table "xxx.b" Column | Type | Modifiers --------------------------+------------------------+----------- b_id | integer | not null a_id | integer | not null c_id | integer | not null ... more columns Indexes: "b_pkey" PRIMARY KEY, btree (b_id) "i_b_a_id" btree (a_id) "i_b_c_id" btree (c_id) xxx=> \d c Table "xxx.c" Column | Type | Modifiers ---------------+------------------------+----------- c_id | integer | not null name | character varying(200) | ... more columns Indexes: "c_pkey" PRIMARY KEY, btree (c_id)
Tables are analyzed, though I would love to find a way to increase it's accuracy of statistics Tried raising the statistics target upto 100, but it did not help. Should I bump it even more However I found that if I add depth to the group by clauses, it somehow tells the optimizer that it would get more than 1 row and it goes to a Hash Join .... For this query, only rows with one value of depth are accessed, so we are okay ... but I would like to see if there is some other way I can get a better approximation for the costs Sort (cost=25214.36..25214.39 rows=10 width=958) (actual time=9798.860..9815.670 rows=6115 loops=1) Sort Key: (o1.totmem - COALESCE(o2.totmem, 0::bigint)) ->Hash Left Join (cost=25213.83..25214.19 rows=10 width=958) (actual time=8526.248..9755.721 rows=6115 loops=1) Hash Cond: ((("outer".objsig)::text = ("inner".objsig)::text) AND (("outer".objtype)::text = ("inner".objtype)::text) AND (("outer".fieldname)::text = ("inner".fieldname)::text)) ->Subquery Scan o1 (cost=18993.48..18993.66 rows=10 width=990) (actual time=6059.880..6145.223 rows=6115 loops=1) ->HashAggregate (cost=18993.48..18993.56 rows=10 width=46) (actual time=6059.871..6094.897 rows=6115 loops=1) ->Nested Loop (cost=0.00..18993.22 rows=15 width=46) (actual time=45.510..5980.807 rows=6338 loops=1) ->Index Scan using jam_heaprel_n1 on jam_heaprel ar (cost=0.00..18932.01 rows=10 width=19) (actual time=45.374..205.520 rows=6338 loops=1) Index Cond: ((heap_id = 1) AND (parentaddr = 0)) Filter: ((fieldname)::text = 'K'::text) ->Index Scan using jam_heapobj_u1 on jam_heapobj ao (cost=0.00..6.10 rows=2 width=43) (actual time=0.885..0.890 rows=1 loops=6338) Index Cond: ((ao.heap_id = 1) AND (ao.objaddr = "outer".childaddr)) ->Hash (cost=6220.34..6220.34 rows=2 width=982) (actual time=2466.178..2466.178 rows=0 loops=1) ->Subquery Scan o2 (cost=6220.30..6220.34 rows=2 width=982) (actual time=2225.242..2433.744 rows=6038 loops=1) ->HashAggregate (cost=6220.30..6220.32 rows=2 width=46) (actual time=2225.233..2366.890 rows=6038 loops=1) ->Nested Loop (cost=0.00..6220.27 rows=2 width=46) (actual time=0.449..2149.257 rows=6259 loops=1) ->Index Scan using jam_heaprel_n1 on jam_heaprel br (cost=0.00..6202.89 rows=4 width=19) (actual time=0.296..51.310 rows=6259 loops=1) Index Cond: ((heap_id = 0) AND (parentaddr = 0)) Filter: ((fieldname)::text = 'K'::text) ->Index Scan using jam_heapobj_u1 on jam_heapobj bo (cost=0.00..4.33 rows=1 width=43) (actual time=0.294..0.300 rows=1 loops=6259) Index Cond: ((bo.heap_id = 0) AND (bo.objaddr = "outer".childaddr)) Total runtime: 9950.192 ms Regards, Virag ----- Original Message ----- From: "Tom Lane" <tgl@sss.pgh.pa.us> To: "Virag Saksena" <virag@auptyma.com> Cc: <pgsql-performance@postgresql.org> Sent: Monday, February 20, 2006 9:35 PM Subject: Re: [PERFORM] Cost Issue - How do I force a Hash Join > "Virag Saksena" <virag@auptyma.com> writes: > > The individual queries run in 50-300 ms. However the optimizer is = > > choosing a nested loop to join them rather than a Hash join > > causing the complete query to take 500+ seconds. It expects that it will = > > get 1 row out from each of the sources, but here is gets > > several thousand rows. > > The best approach is to see if you can't fix that estimation error. > Are the stats up to date on these tables? If so, maybe raising the > statistics targets would help. > > regards, tom lane >