Re: slow joining very large table to smaller ones - Mailing list pgsql-performance
From | Tom Lane |
---|---|
Subject | Re: slow joining very large table to smaller ones |
Date | |
Msg-id | 5545.1121382523@sss.pgh.pa.us Whole thread Raw |
In response to | Re: slow joining very large table to smaller ones (Dan Harris <fbsd@drivefaster.net>) |
List | pgsql-performance |
Dan Harris <fbsd@drivefaster.net> writes: > Here's the explain select for that one, since > that's the best I can get. > explain select recordtext from eventactivity,k_r,k_b where > eventactivity.incidentid = k_r.incidentid and > eventactivity.incidentid = k_b.incidentid and k_r.id = 94 and k_b.id > = 107; > QUERY PLAN > ------------------------------------------------------------------------ > -------------------------------------- > Merge Join (cost=9624.61..4679590.52 rows=151009549 width=35) > Merge Cond: (("outer".incidentid)::text = "inner"."?column2?") > -> Merge Join (cost=4766.92..4547684.26 rows=16072733 width=117) > Merge Cond: (("outer".incidentid)::text = "inner"."?column2?") > -> Index Scan using eventactivity1 on eventactivity > (cost=0.00..4186753.16 rows=46029271 width=49) > -> Sort (cost=4766.92..4771.47 rows=1821 width=68) > Sort Key: (k_b.incidentid)::text > -> Index Scan using k_b_idx on k_b > (cost=0.00..4668.31 rows=1821 width=68) > Index Cond: (id = 107) > -> Sort (cost=4857.69..4862.39 rows=1879 width=68) > Sort Key: (k_r.incidentid)::text > -> Index Scan using k_r_idx on k_r (cost=0.00..4755.52 > rows=1879 width=68) > Index Cond: (id = 94) > (13 rows) There's something awfully fishy here. The 8.0 planner is definitely capable of figuring out that it ought to join the smaller relations first. As an example, using 8.0.3+ (CVS branch tip) I did regression=# create table eventactivity(incidentid varchar, recordtext text); CREATE TABLE regression=# create table k_r(incidentid varchar); CREATE TABLE regression=# create table k_b(incidentid varchar); CREATE TABLE regression=# explain select recordtext from eventactivity inner join (select incidentid from k_r) a using (incidentid) inner join (select incidentid from k_b) b using (incidentid); (Being too impatient to actually fill the eventactivity table with 36M rows of data, I just did some debugger magic to make the planner think that that was the table size...) The default plan looks like Merge Join (cost=16137814.70..36563453.23 rows=1361700000 width=32) Merge Cond: (("outer".incidentid)::text = "inner"."?column3?") -> Merge Join (cost=170.85..290.48 rows=7565 width=64) Merge Cond: ("outer"."?column2?" = "inner"."?column2?") -> Sort (cost=85.43..88.50 rows=1230 width=32) Sort Key: (k_r.incidentid)::text -> Seq Scan on k_r (cost=0.00..22.30 rows=1230 width=32) -> Sort (cost=85.43..88.50 rows=1230 width=32) Sort Key: (k_b.incidentid)::text -> Seq Scan on k_b (cost=0.00..22.30 rows=1230 width=32) -> Sort (cost=16137643.84..16227643.84 rows=36000000 width=64) Sort Key: (eventactivity.incidentid)::text -> Seq Scan on eventactivity (cost=0.00..1080000.00 rows=36000000 width=64) and if I "set enable_mergejoin TO 0;" I get Hash Join (cost=612.54..83761451.54 rows=1361700000 width=32) Hash Cond: (("outer".incidentid)::text = ("inner".incidentid)::text) -> Seq Scan on eventactivity (cost=0.00..1080000.00 rows=36000000 width=64) -> Hash (cost=504.62..504.62 rows=7565 width=64) -> Hash Join (cost=25.38..504.62 rows=7565 width=64) Hash Cond: (("outer".incidentid)::text = ("inner".incidentid)::text) -> Seq Scan on k_r (cost=0.00..22.30 rows=1230 width=32) -> Hash (cost=22.30..22.30 rows=1230 width=32) -> Seq Scan on k_b (cost=0.00..22.30 rows=1230 width=32) which is the plan I would judge Most Likely To Succeed based on what we know about Dan's problem. (The fact that the planner is estimating it as twice as expensive as the mergejoin comes from the fact that with no statistics about the join keys, the planner deliberately estimates hash join as expensive, because it can be pretty awful in the presence of many equal keys.) So the planner is certainly capable of finding the desired plan, even without any tweaking of the query text. This means that what we have is mainly a statistical problem. Have you ANALYZEd these tables recently? If so, may we see the pg_stats rows for incidentid in all three tables? regards, tom lane
pgsql-performance by date: