Hi!
(The question is why this simple select takes me 20 minutes to run...)
I have two tables with address data and result data from two different runs
of two different geocoding engines. I want the count of result data
differences when the output address data matches.
In essence, I want:
Engine 1 engine 2
Id = id
Addr = Addr
City = City
Zip = Zip
And then
Result != Result
When I do g.id=m.id, performance is fine. When I add
g.outaddress=m.out_address, performance is fine. But when I add
g.city=m.out_city, the query goes from ~10s to ~20 MINUTES for some reason.
Any ideas why? If I do a subselect rather than just g.outcity=m.out_city,
it's fine, but if I change it back to the former, it goes back to 20
minutes. The queries and explains are below.
resultcodes=# explain analyze select g.outresultcode,m.resultcode,count(*)
from geocoderoutput g,mmoutput m where g.id=m.id and
g.outresultcode<>m.resultcode and g.outaddress=m.out_address and
g.outcity=m.out_city group by g.outresultcode,m.resultcode order by count(*)
desc;
QUERY
PLAN
----------------------------------------------------------------------------
--------------------------------------------------------
----------------
Sort (cost=64772.08..64772.09 rows=1 width=22) (actual
time=1194603.363..1194604.099 rows=515 loops=1)
Sort Key: (count(*))
Sort Method: quicksort Memory: 53kB
-> HashAggregate (cost=64772.06..64772.07 rows=1 width=22) (actual
time=1194601.374..1194602.316 rows=515 loops=1)
-> Hash Join (cost=24865.31..64772.05 rows=1 width=22) (actual
time=373146.994..1194475.482 rows=52179 loops=1)
Hash Cond: ((m.id = g.id) AND ((m.out_address)::text =
(g.outaddress)::text) AND ((m.out_city)::text = (g.outcity)::text))
Join Filter: (g.outresultcode <> m.resultcode)
-> Seq Scan on mmoutput m (cost=0.00..15331.84 rows=502884
width=42) (actual time=0.010..1324.974 rows=502884 loops=1)
-> Hash (cost=11644.84..11644.84 rows=502884 width=41)
(actual time=370411.043..370411.043 rows=502704 loops=1)
-> Seq Scan on geocoderoutput g (cost=0.00..11644.84
rows=502884 width=41) (actual time=0.010..1166.141 rows=502884 loops=1)
Total runtime: 1194605.011 ms
(11 rows)
resultcodes=# explain analyze select g.outresultcode,m.resultcode,count(*)
from geocoderoutput g,mmoutput m where g.id=m.id and
g.outresultcode<>m.resultcode and g.outaddress=m.out_address and
g.outcity=(select out_city from mmoutput where id=g.id and
out_city=g.outcity) group by g.outresultcode,m.resultcode order by count(*)
desc;
QUERY
PLAN
----------------------------------------------------------------------------
--------------------------------------------------------
--------------------
Sort (cost=4218017.33..4218017.34 rows=1 width=22) (actual
time=23095.890..23096.632 rows=515 loops=1)
Sort Key: (count(*))
Sort Method: quicksort Memory: 53kB
-> HashAggregate (cost=4218017.31..4218017.32 rows=1 width=22) (actual
time=23093.901..23094.839 rows=515 loops=1)
-> Nested Loop (cost=0.00..4218017.30 rows=1 width=22) (actual
time=102.356..22930.141 rows=52179 loops=1)
Join Filter: ((g.outresultcode <> m.resultcode) AND
((g.outaddress)::text = (m.out_address)::text))
-> Seq Scan on geocoderoutput g (cost=0.00..4202690.15
rows=2514 width=41) (actual time=98.045..15040.142 rows=468172 loops=1)
Filter: ((outcity)::text = ((subplan))::text)
SubPlan
-> Index Scan using mmoutput_pkey on mmoutput
(cost=0.00..8.33 rows=1 width=10) (actual time=0.018..0.021 rows=1
loops=502884)
Index Cond: (id = $0)
Filter: ((out_city)::text = ($1)::text)
-> Index Scan using mmoutput_pkey on mmoutput m
(cost=0.00..6.08 rows=1 width=32) (actual time=0.006..0.008 rows=1
loops=468172)
Index Cond: (m.id = g.id)
Total runtime: 23097.548 ms
(15 rows)