Slow query - Mailing list pgsql-performance

From Jeremy Haile
Subject Slow query
Date
Msg-id 1140635770.1612.255019877@webmail.messagingengine.com
Whole thread Raw
Responses Re: Slow query  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
I am running a query that joins against several large tables (~5 million
rows each).  The query takes an exteremely long time to run, and the
explain output is a bit beyond my level of understanding.  It is an
auto-generated query, so the aliases are fairly ugly.  I can clean them
up (rename them) if it would help.  Also, let me know if I can send any
more information that would help (e.g. table schema)

Also, is there any resources where I can get a better understanding of
what PostgreSQL means when it says "Sort" "Sort Key" "Bitmap Index Scan"
"Hash Cond" etc. etc. - and how to recognize problems by looking at the
output.  I can understand the output for simple queries (e.g. is the
planner using an index or performing a seq. scan), but when you get to
more complex queries like the one below I lose my way =)

I would really appreciate it if someone from this list could tell me if
there is anything that is obviously wrong with the query or schema and
what I could do to improve the performance.

PostgreSQL 8.1
RedHat Enterprise Linux 4

--QUERY
select distinct city4_.region_id as region1_29_, city4_1_.name as
name29_, city4_.state_id as state2_30_
from registered_voters registered0_
         inner join registered_voter_addresses addresses1_ on
         registered0_.registered_voter_id=addresses1_.registered_voter_id
         inner join registered_voter_addresses_regions regions2_ on
         addresses1_.address_id=regions2_.registered_voter_addresses_address_id
         inner join regions region3_ on
         regions2_.regions_region_id=region3_.region_id
         inner join cities city4_ on
         addresses1_.city_id=city4_.region_id
         inner join regions city4_1_ on
         city4_.region_id=city4_1_.region_id
where region3_.region_id='093c44e8-f3b2-4c60-8be3-2b4d148f9f5a'
order by city4_1_.name


--EXPLAIN/ANALYZE OUTPUT
"Unique  (cost=3572907.42..3623589.94 rows=4076438 width=93) (actual
time=2980825.714..3052333.753 rows=1124 loops=1)"
"  ->  Sort  (cost=3572907.42..3585578.05 rows=5068252 width=93) (actual
time=2980825.710..2987407.888 rows=4918204 loops=1)"
"        Sort Key: city4_1_.name, city4_.region_id, city4_.state_id"
"        ->  Hash Join  (cost=717783.40..1430640.10 rows=5068252
width=93) (actual time=1400141.559..2016131.467 rows=4918204 loops=1)"
"              Hash Cond:
(("outer".registered_voter_addresses_address_id)::text =
("inner".address_id)::text)"
"              ->  Bitmap Heap Scan on
registered_voter_addresses_regions regions2_  (cost=54794.95..575616.49
rows=5116843 width=80) (actual time=45814.469..155044.478 rows=4918205
loops=1)"
"                    Recheck Cond:
('093c44e8-f3b2-4c60-8be3-2b4d148f9f5a'::text =
(regions_region_id)::text)"
"                    ->  Bitmap Index Scan on
reg_voter_address_region_region_idx  (cost=0.00..54794.95 rows=5116843
width=0) (actual time=45807.157..45807.157 rows=4918205 loops=1)"
"                          Index Cond:
('093c44e8-f3b2-4c60-8be3-2b4d148f9f5a'::text =
(regions_region_id)::text)"
"              ->  Hash  (cost=642308.89..642308.89 rows=741420
width=173) (actual time=1354217.934..1354217.934 rows=4918204 loops=1)"
"                    ->  Hash Join  (cost=328502.66..642308.89
rows=741420 width=173) (actual time=204565.031..1268303.832 rows=4918204
loops=1)"
"                          Hash Cond:
(("outer".registered_voter_id)::text =
("inner".registered_voter_id)::text)"
"                          ->  Seq Scan on registered_voters
registered0_  (cost=0.00..173703.02 rows=4873202 width=40) (actual
time=0.005..39364.261 rows=4873167 loops=1)"
"                          ->  Hash  (cost=303970.34..303970.34
rows=748528 width=213) (actual time=204523.861..204523.861 rows=4918204
loops=1)"
"                                ->  Hash Join  (cost=263.22..303970.34
rows=748528 width=213) (actual time=101.628..140936.062 rows=4918204
loops=1)"
"                                      Hash Cond:
(("outer".city_id)::text = ("inner".region_id)::text)"
"                                      ->  Seq Scan on
registered_voter_addresses addresses1_  (cost=0.00..271622.23
rows=4919923 width=120) (actual time=0.025..98416.667 rows=4918205
loops=1)"
"                                      ->  Hash  (cost=260.35..260.35
rows=1147 width=173) (actual time=101.582..101.582 rows=1147 loops=1)"
"                                            ->  Hash Join
(cost=48.80..260.35 rows=1147 width=173) (actual time=88.608..98.984
rows=1147 loops=1)"
"                                                  Hash Cond:
(("outer".region_id)::text = ("inner".region_id)::text)"
"                                                  ->  Seq Scan on
regions city4_1_  (cost=0.00..162.39 rows=7539 width=53) (actual
time=0.048..35.204 rows=7539 loops=1)"
"                                                  ->  Hash
(cost=45.93..45.93 rows=1147 width=120) (actual time=48.896..48.896
rows=1147 loops=1)"
"                                                        ->  Nested Loop
 (cost=0.00..45.93 rows=1147 width=120) (actual time=35.791..47.012
rows=1147 loops=1)"
"                                                              ->  Index
Scan using regions_pkey on regions region3_  (cost=0.00..5.99 rows=1
width=40) (actual time=35.761..35.763 rows=1 loops=1)"
"
Index Cond: ((region_id)::text =
'093c44e8-f3b2-4c60-8be3-2b4d148f9f5a'::text)"
"                                                              ->  Seq
Scan on cities city4_  (cost=0.00..28.47 rows=1147 width=80) (actual
time=0.022..9.476 rows=1147 loops=1)"
"Total runtime: 3052707.269 ms"


pgsql-performance by date:

Previous
From: Scott Marlowe
Date:
Subject: Re: Joins and full index scans...mysql vs postgres?
Next
From: Orion
Date:
Subject: Re: Large Database Design Help