Mystefied at poor performance of a standard query - Mailing list pgsql-performance

From David logan
Subject Mystefied at poor performance of a standard query
Date
Msg-id 004001c923be$d8b1f730$8a15e590$@net
Whole thread Raw
Responses Re: Mystefied at poor performance of a standard query
List pgsql-performance
Hi!

(The question is why this simple select takes me 20 minutes to run...)

I have two tables with address data and result data from two different runs
of two different geocoding engines. I want the count of result data
differences when the output address data matches.

In essence, I want:
Engine 1     engine 2
Id        =  id
Addr      =  Addr
City      =  City
Zip       =  Zip

And then
Result    != Result

When I do g.id=m.id, performance is fine. When I add
g.outaddress=m.out_address, performance is fine. But when I add
g.city=m.out_city, the query goes from ~10s to ~20 MINUTES for some reason.
Any ideas why? If I do a subselect rather than just g.outcity=m.out_city,
it's fine, but if I change it back to the former, it goes back to 20
minutes. The queries and explains are below.

resultcodes=# explain analyze select g.outresultcode,m.resultcode,count(*)
from geocoderoutput g,mmoutput m where g.id=m.id and
g.outresultcode<>m.resultcode and g.outaddress=m.out_address and
g.outcity=m.out_city group by g.outresultcode,m.resultcode order by count(*)
desc;
                                                                     QUERY
PLAN

----------------------------------------------------------------------------
--------------------------------------------------------
----------------
 Sort  (cost=64772.08..64772.09 rows=1 width=22) (actual
time=1194603.363..1194604.099 rows=515 loops=1)
   Sort Key: (count(*))
   Sort Method:  quicksort  Memory: 53kB
   ->  HashAggregate  (cost=64772.06..64772.07 rows=1 width=22) (actual
time=1194601.374..1194602.316 rows=515 loops=1)
         ->  Hash Join  (cost=24865.31..64772.05 rows=1 width=22) (actual
time=373146.994..1194475.482 rows=52179 loops=1)
               Hash Cond: ((m.id = g.id) AND ((m.out_address)::text =
(g.outaddress)::text) AND ((m.out_city)::text = (g.outcity)::text))
               Join Filter: (g.outresultcode <> m.resultcode)
               ->  Seq Scan on mmoutput m  (cost=0.00..15331.84 rows=502884
width=42) (actual time=0.010..1324.974 rows=502884 loops=1)
               ->  Hash  (cost=11644.84..11644.84 rows=502884 width=41)
(actual time=370411.043..370411.043 rows=502704 loops=1)
                     ->  Seq Scan on geocoderoutput g  (cost=0.00..11644.84
rows=502884 width=41) (actual time=0.010..1166.141 rows=502884 loops=1)
 Total runtime: 1194605.011 ms
(11 rows)

resultcodes=# explain analyze select g.outresultcode,m.resultcode,count(*)
from geocoderoutput g,mmoutput m where g.id=m.id and
g.outresultcode<>m.resultcode and g.outaddress=m.out_address and
g.outcity=(select out_city from mmoutput where id=g.id and
out_city=g.outcity) group by g.outresultcode,m.resultcode order by count(*)
desc;
                                                                       QUERY
PLAN

----------------------------------------------------------------------------
--------------------------------------------------------
--------------------
 Sort  (cost=4218017.33..4218017.34 rows=1 width=22) (actual
time=23095.890..23096.632 rows=515 loops=1)
   Sort Key: (count(*))
   Sort Method:  quicksort  Memory: 53kB
   ->  HashAggregate  (cost=4218017.31..4218017.32 rows=1 width=22) (actual
time=23093.901..23094.839 rows=515 loops=1)
         ->  Nested Loop  (cost=0.00..4218017.30 rows=1 width=22) (actual
time=102.356..22930.141 rows=52179 loops=1)
               Join Filter: ((g.outresultcode <> m.resultcode) AND
((g.outaddress)::text = (m.out_address)::text))
               ->  Seq Scan on geocoderoutput g  (cost=0.00..4202690.15
rows=2514 width=41) (actual time=98.045..15040.142 rows=468172 loops=1)
                     Filter: ((outcity)::text = ((subplan))::text)
                     SubPlan
                       ->  Index Scan using mmoutput_pkey on mmoutput
(cost=0.00..8.33 rows=1 width=10) (actual time=0.018..0.021 rows=1
loops=502884)
                             Index Cond: (id = $0)
                             Filter: ((out_city)::text = ($1)::text)
               ->  Index Scan using mmoutput_pkey on mmoutput m
(cost=0.00..6.08 rows=1 width=32) (actual time=0.006..0.008 rows=1
loops=468172)
                     Index Cond: (m.id = g.id)
 Total runtime: 23097.548 ms
(15 rows)



pgsql-performance by date:

Previous
From: Richard Huxton
Date:
Subject: Re: dedicated server & postgresql 8.1 conf tunning
Next
From: Tom Lane
Date:
Subject: Re: Mystefied at poor performance of a standard query