query faster using LEFT OUTER join? - Mailing list pgsql-sql
From | Drew |
---|---|
Subject | query faster using LEFT OUTER join? |
Date | |
Msg-id | B8825A6D-229D-4130-9091-7386F2B383DD@fastmail.fm Whole thread Raw |
Responses |
Re: query faster using LEFT OUTER join?
|
List | pgsql-sql |
I have a planner question about a multi-join query, where if I rewrite the query excluding the table that is left joined to the other tables my query speed increases by 75%. I've tested this query on both 8.0.9 and 8.2b3, w/ similar results (except 8.2b3 is 50% faster!) I'm wondering why the first query is planned a faster way and if there is anything I could do to my 2nd query faster (which should be faster because it doesn't have to scan the first table). Here's my initial query that runs in 22.961ms: SELECT train_family_id, object_type, COUNT(*) AS frequency FROM translation_pair tp LEFT OUTER JOIN instance i USING(translation_pair_id) JOIN context c USING(context_id) JOIN loc_submission ls USING(loc_submission_id) JOIN loc_submission_train_info lsti USING(loc_submission_id) WHERE translation_pair_id IN (640352, 6144, 1023028, 18155, 240244, 50157) GROUP BY train_family_id, object_type; If I just remove the translation_pair table and join directly against the instance table, the query speed drops 50% to 36.6ms SELECT train_family_id, object_type, COUNT(*) AS frequency FROM instance i JOIN context c USING(context_id) JOIN loc_submission ls USING(loc_submission_id) JOIN loc_submission_train_infolsti USING(loc_submission_id) WHERE translation_pair_id IN (640352, 6144, 1023028, 18155, 240244, 50157) GROUP BY train_family_id, object_type; Looking at the query plans for the 2 of them, it looks like the inner "Bitmap Index Scan on instance_translation_pair_id" is much slower when using "IN (x,y,z)" instead of comparing against table (tp.translation_pair_id = i.translation_pair_id. Is there anything else I should notice? Can I force the planner to use the translation_pair table first? Here's the plan for the first query (using 8.2b3) ------------------------------------------------------------------------ ------------------------------------------------------------------------ ----------------------------------- HashAggregate (cost=3223.17..3223.41 rows=19 width=18) (actual time=34.913..34.978 rows=122 loops=1) -> Nested Loop Left Join (cost=29.53..3223.03 rows=19 width=18) (actual time=0.606..33.994 rows=770 loops=1) -> Nested Loop Left Join (cost=29.53..3108.80 rows=19 width=22) (actual time=0.552..26.918 rows=770 loops=1) -> Nested Loop Left Join (cost=29.53..2995.26 rows=19 width=18) (actual time=0.497..19.924 rows=768 loops=1) -> Nested Loop Left Join (cost=29.53..2879.59 rows=19 width=8) (actual time=0.431..6.395 rows=768 loops=1) -> Bitmap Heap Scan on translation_pair tp (cost=24.27..48.10 rows=6 width=4) (actual time=0.304..0.376 rows=6 loops=1) Recheck Cond: (translation_pair_id = ANY ('{640352,6144,1023028,18155,240244,50157}'::oid[])) -> Bitmap Index Scan on translation_pair_pkey (cost=0.00..24.27 rows=6 width=0) (actual time=0.254..0.254 rows=6 loops=1) Index Cond: (translation_pair_id = ANY ('{640352,6144,1023028,18155,240244,50157}'::oid[])) -> Bitmap Heap Scan on instance i (cost=5.26..470.38 rows=123 width=12) (actual time=0.119..0.922 rows=128 loops=6) Recheck Cond: (tp.translation_pair_id = i.translation_pair_id) -> Bitmap Index Scan on instance_translation_pair_id (cost=0.00..5.26 rows=123 width=0) (actual time=0.098..0.098 rows=128 loops=6) Index Cond: (tp.translation_pair_id = i.translation_pair_id) -> Index Scan using context_pkey on context c (cost=0.00..6.07 rows=1 width=18) (actual time=0.015..0.016 rows=1 loops=768) Index Cond: (i.context_id = c.context_id) -> Index Scan using loc_submission_train_info_pkey on loc_submission_train_info lsti (cost=0.00..5.96 rows=1 width=8) (actual time=0.007..0.008 rows=1 loops=768) Index Cond: (i.loc_submission_id = lsti.loc_submission_id) -> Index Scan using loc_submission_pkey on loc_submission ls (cost=0.00..6.00 rows=1 width=4) (actual time=0.007..0.008 rows=1 loops=770) Index Cond: (i.loc_submission_id = ls.loc_submission_id) And here's the plan for the second query without the first LEFT OUTER join (using 8.2b3) ------------------------------------------------------------------------ ------------------------------------------------------------------------ ------------------------ HashAggregate (cost=8927.87..8936.89 rows=721 width=18) (actual time=39.484..39.551 rows=122 loops=1) -> Nested Loop Left Join (cost=3706.71..8922.47 rows=721 width=18) (actual time=25.246..38.654 rows=770 loops=1) -> Hash Left Join (cost=3706.71..4000.93 rows=721 width=8) (actual time=25.215..31.169 rows=770 loops=1) Hash Cond: (i.loc_submission_id = ls.loc_submission_id) -> Merge Left Join (cost=3420.14..3481.22 rows=721 width=12) (actual time=15.088..20.018 rows=770 loops=1) Merge Cond: (i.loc_submission_id = lsti.loc_submission_id) -> Sort (cost=2576.82..2578.62 rows=720 width=8) (actual time=2.287..2.438 rows=768 loops=1) Sort Key: i.loc_submission_id -> Bitmap Heap Scan on instance i (cost=25.24..2542.65 rows=720 width=8) (actual time=0.381..1.623 rows=768 loops=1) Recheck Cond: (translation_pair_id = ANY ('{640352,6144,1023028,18155,240244,50157}'::oid[])) -> Bitmap Index Scan on instance_translation_pair_id (cost=0.00..25.24 rows=720 width=0) (actual time=0.293..0.293 rows=768 loops=1) Index Cond: (translation_pair_id = ANY ('{640352,6144,1023028,18155,240244,50157}'::oid[])) -> Sort (cost=843.32..868.47 rows=10059 width=8) (actual time=12.782..14.312 rows=10530 loops=1) Sort Key: lsti.loc_submission_id -> Seq Scan on loc_submission_train_info lsti (cost=0.00..174.59 rows=10059 width=8) (actual time=0.012..3.708 rows=10059 loops=1) -> Hash (cost=261.46..261.46 rows=10046 width=4) (actual time=10.061..10.061 rows=10046 loops=1) -> Seq Scan on loc_submission ls (cost=0.00..261.46 rows=10046 width=4) (actual time=0.013..5.235 rows=10046 loops=1) -> Index Scan using context_pkey on context c (cost=0.00..6.81 rows=1 width=18) (actual time=0.008..0.009 rows=1 loops=770) Index Cond: (i.context_id = c.context_id) Thanks for your help, Drew