Mystefied at poor performance of a standard query - Mailing list pgsql-performance
From | David logan |
---|---|
Subject | Mystefied at poor performance of a standard query |
Date | |
Msg-id | 004001c923be$d8b1f730$8a15e590$@net Whole thread Raw |
Responses |
Re: Mystefied at poor performance of a standard query
|
List | pgsql-performance |
Hi! (The question is why this simple select takes me 20 minutes to run...) I have two tables with address data and result data from two different runs of two different geocoding engines. I want the count of result data differences when the output address data matches. In essence, I want: Engine 1 engine 2 Id = id Addr = Addr City = City Zip = Zip And then Result != Result When I do g.id=m.id, performance is fine. When I add g.outaddress=m.out_address, performance is fine. But when I add g.city=m.out_city, the query goes from ~10s to ~20 MINUTES for some reason. Any ideas why? If I do a subselect rather than just g.outcity=m.out_city, it's fine, but if I change it back to the former, it goes back to 20 minutes. The queries and explains are below. resultcodes=# explain analyze select g.outresultcode,m.resultcode,count(*) from geocoderoutput g,mmoutput m where g.id=m.id and g.outresultcode<>m.resultcode and g.outaddress=m.out_address and g.outcity=m.out_city group by g.outresultcode,m.resultcode order by count(*) desc; QUERY PLAN ---------------------------------------------------------------------------- -------------------------------------------------------- ---------------- Sort (cost=64772.08..64772.09 rows=1 width=22) (actual time=1194603.363..1194604.099 rows=515 loops=1) Sort Key: (count(*)) Sort Method: quicksort Memory: 53kB -> HashAggregate (cost=64772.06..64772.07 rows=1 width=22) (actual time=1194601.374..1194602.316 rows=515 loops=1) -> Hash Join (cost=24865.31..64772.05 rows=1 width=22) (actual time=373146.994..1194475.482 rows=52179 loops=1) Hash Cond: ((m.id = g.id) AND ((m.out_address)::text = (g.outaddress)::text) AND ((m.out_city)::text = (g.outcity)::text)) Join Filter: (g.outresultcode <> m.resultcode) -> Seq Scan on mmoutput m (cost=0.00..15331.84 rows=502884 width=42) (actual time=0.010..1324.974 rows=502884 loops=1) -> Hash (cost=11644.84..11644.84 rows=502884 width=41) (actual time=370411.043..370411.043 rows=502704 loops=1) -> Seq Scan on geocoderoutput g (cost=0.00..11644.84 rows=502884 width=41) (actual time=0.010..1166.141 rows=502884 loops=1) Total runtime: 1194605.011 ms (11 rows) resultcodes=# explain analyze select g.outresultcode,m.resultcode,count(*) from geocoderoutput g,mmoutput m where g.id=m.id and g.outresultcode<>m.resultcode and g.outaddress=m.out_address and g.outcity=(select out_city from mmoutput where id=g.id and out_city=g.outcity) group by g.outresultcode,m.resultcode order by count(*) desc; QUERY PLAN ---------------------------------------------------------------------------- -------------------------------------------------------- -------------------- Sort (cost=4218017.33..4218017.34 rows=1 width=22) (actual time=23095.890..23096.632 rows=515 loops=1) Sort Key: (count(*)) Sort Method: quicksort Memory: 53kB -> HashAggregate (cost=4218017.31..4218017.32 rows=1 width=22) (actual time=23093.901..23094.839 rows=515 loops=1) -> Nested Loop (cost=0.00..4218017.30 rows=1 width=22) (actual time=102.356..22930.141 rows=52179 loops=1) Join Filter: ((g.outresultcode <> m.resultcode) AND ((g.outaddress)::text = (m.out_address)::text)) -> Seq Scan on geocoderoutput g (cost=0.00..4202690.15 rows=2514 width=41) (actual time=98.045..15040.142 rows=468172 loops=1) Filter: ((outcity)::text = ((subplan))::text) SubPlan -> Index Scan using mmoutput_pkey on mmoutput (cost=0.00..8.33 rows=1 width=10) (actual time=0.018..0.021 rows=1 loops=502884) Index Cond: (id = $0) Filter: ((out_city)::text = ($1)::text) -> Index Scan using mmoutput_pkey on mmoutput m (cost=0.00..6.08 rows=1 width=32) (actual time=0.006..0.008 rows=1 loops=468172) Index Cond: (m.id = g.id) Total runtime: 23097.548 ms (15 rows)
pgsql-performance by date: