Nested loops overpriced - Mailing list pgsql-performance
From | Peter Eisentraut |
---|---|
Subject | Nested loops overpriced |
Date | |
Msg-id | 200705081722.16731.peter_e@gmx.net Whole thread Raw |
Responses |
Re: Nested loops overpriced
|
List | pgsql-performance |
This query does some sort of analysis on an email archive: SELECT eh_subj.header_body AS subject, count(distinct eh_from.header_body) FROM email JOIN mime_part USING (email_id) JOIN email_header eh_subj USING (email_id, mime_part_id) JOIN email_header eh_from USING (email_id, mime_part_id) WHERE eh_subj.header_name = 'subject' AND eh_from.header_name = 'from' AND mime_part_id = 0 AND (time >= timestamp '2007-05-05 17:01:59' AND time < timestamp '2007-05-05 17:01:59' + interval '60 min') GROUP BY eh_subj.header_body; The planner chooses this plan: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- GroupAggregate (cost=87142.18..87366.58 rows=11220 width=184) (actual time=7883.541..8120.647 rows=35000 loops=1) -> Sort (cost=87142.18..87170.23 rows=11220 width=184) (actual time=7883.471..7926.031 rows=35000 loops=1) Sort Key: eh_subj.header_body -> Hash Join (cost=46283.30..86387.42 rows=11220 width=184) (actual time=5140.182..7635.615 rows=35000 loops=1) Hash Cond: (eh_subj.email_id = email.email_id) -> Bitmap Heap Scan on email_header eh_subj (cost=11853.68..50142.87 rows=272434 width=104) (actual time=367.956..1719.736rows=280989 loops=1) Recheck Cond: ((mime_part_id = 0) AND (header_name = 'subject'::text)) -> BitmapAnd (cost=11853.68..11853.68 rows=27607 width=0) (actual time=326.507..326.507 rows=0 loops=1) -> Bitmap Index Scan on idx__email_header__header_body_subject (cost=0.00..5836.24 rows=272434width=0) (actual time=178.041..178.041 rows=280989 loops=1) -> Bitmap Index Scan on idx__email_header__header_name (cost=0.00..5880.97 rows=281247 width=0)(actual time=114.574..114.574 rows=280989 loops=1) Index Cond: (header_name = 'subject'::text) -> Hash (cost=34291.87..34291.87 rows=11020 width=120) (actual time=4772.148..4772.148 rows=35000 loops=1) -> Hash Join (cost=24164.59..34291.87 rows=11020 width=120) (actual time=3131.067..4706.997 rows=35000loops=1) Hash Cond: (mime_part.email_id = email.email_id) -> Seq Scan on mime_part (cost=0.00..8355.81 rows=265804 width=12) (actual time=0.038..514.291rows=267890 loops=1) Filter: (mime_part_id = 0) -> Hash (cost=24025.94..24025.94 rows=11092 width=112) (actual time=3130.982..3130.982 rows=35000loops=1) -> Hash Join (cost=22244.54..24025.94 rows=11092 width=112) (actual time=996.556..3069.280rows=35000 loops=1) Hash Cond: (eh_from.email_id = email.email_id) -> Bitmap Heap Scan on email_header eh_from (cost=15576.58..16041.55 rows=107156width=104) (actual time=569.762..1932.017 rows=280990 loops=1) Recheck Cond: ((mime_part_id = 0) AND (header_name = 'from'::text)) -> BitmapAnd (cost=15576.58..15576.58 rows=160 width=0) (actual time=532.217..532.217rows=0 loops=1) -> Bitmap Index Scan on dummy_index (cost=0.00..3724.22 rows=107156width=0) (actual time=116.386..116.386 rows=280990 loops=1) -> Bitmap Index Scan on idx__email_header__from_local (cost=0.00..5779.24rows=107156 width=0) (actual time=174.883..174.883 rows=280990 loops=1) -> Bitmap Index Scan on dummy2_index (cost=0.00..5992.25 rows=107156width=0) (actual time=173.575..173.575 rows=280990 loops=1) -> Hash (cost=6321.79..6321.79 rows=27694 width=8) (actual time=426.739..426.739rows=35000 loops=1) -> Index Scan using idx__email__time on email (cost=0.00..6321.79 rows=27694width=8) (actual time=50.000..375.021 rows=35000 loops=1) Index Cond: (("time" >= '2007-05-05 17:01:59'::timestamp without timezone) AND ("time" < '2007-05-05 18:01:59'::timestamp without time zone)) Total runtime: 8160.442 ms The estimates all look pretty good and reasonable. A faster plan, however, is this: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- GroupAggregate (cost=1920309.81..1920534.21 rows=11220 width=184) (actual time=5349.493..5587.536 rows=35000 loops=1) -> Sort (cost=1920309.81..1920337.86 rows=11220 width=184) (actual time=5349.427..5392.110 rows=35000 loops=1) Sort Key: eh_subj.header_body -> Nested Loop (cost=15576.58..1919555.05 rows=11220 width=184) (actual time=537.938..5094.377 rows=35000 loops=1) -> Nested Loop (cost=15576.58..475387.23 rows=11020 width=120) (actual time=537.858..4404.330 rows=35000loops=1) -> Nested Loop (cost=15576.58..430265.44 rows=11092 width=112) (actual time=537.768..4024.184 rows=35000loops=1) -> Bitmap Heap Scan on email_header eh_from (cost=15576.58..16041.55 rows=107156 width=104)(actual time=537.621..1801.032 rows=280990 loops=1) Recheck Cond: ((mime_part_id = 0) AND (header_name = 'from'::text)) -> BitmapAnd (cost=15576.58..15576.58 rows=160 width=0) (actual time=500.006..500.006rows=0 loops=1) -> Bitmap Index Scan on dummy_index (cost=0.00..3724.22 rows=107156 width=0) (actualtime=85.025..85.025 rows=280990 loops=1) -> Bitmap Index Scan on idx__email_header__from_local (cost=0.00..5779.24 rows=107156width=0) (actual time=173.006..173.006 rows=280990 loops=1) -> Bitmap Index Scan on dummy2_index (cost=0.00..5992.25 rows=107156 width=0) (actualtime=174.463..174.463 rows=280990 loops=1) -> Index Scan using email_pkey on email (cost=0.00..3.85 rows=1 width=8) (actual time=0.005..0.005rows=0 loops=280990) Index Cond: (email.email_id = eh_from.email_id) Filter: (("time" >= '2007-05-05 17:01:59'::timestamp without time zone) AND ("time" < '2007-05-0518:01:59'::timestamp without time zone)) -> Index Scan using mime_part_pkey on mime_part (cost=0.00..4.06 rows=1 width=12) (actual time=0.005..0.006rows=1 loops=35000) Index Cond: ((email.email_id = mime_part.email_id) AND (mime_part.mime_part_id = 0)) -> Index Scan using idx__email_header__email_id__mime_part_id on email_header eh_subj (cost=0.00..130.89rows=13 width=104) (actual time=0.009..0.015 rows=1 loops=35000) Index Cond: ((email.email_id = eh_subj.email_id) AND (0 = eh_subj.mime_part_id)) Filter: (header_name = 'subject'::text) Total runtime: 5625.024 ms Note how spectacularly overpriced this plan is. The costs for the nested loops are calculated approximately as number of outer tuples times cost of the inner scan. So slight overestimations of the inner scans such as Index Scan using email_pkey on email (cost=0.00..3.85 rows=1 width=8) (actual time=0.005..0.005 rows=0 loops=280990) kill this calculation. Most likely, all of these database is cached, so I tried reducing seq_page_cost and random_page_cost, but I needed to turn them all the way down to 0.02 or 0.03, which is almost like cpu_tuple_cost. Is that reasonable? Or what is wrong here? PostgreSQL 8.2.1 on x86_64-unknown-linux-gnu work_mem = 256MB effective_cache_size = 384MB The machine has 1GB of RAM. -- Peter Eisentraut http://developer.postgresql.org/~petere/
pgsql-performance by date: