Nested Loop trouble : Execution time increases more 1000 time (long) - Mailing list pgsql-performance
From | Antoine Bajolet |
---|---|
Subject | Nested Loop trouble : Execution time increases more 1000 time (long) |
Date | |
Msg-id | 432C3A86.5010000@free.fr Whole thread Raw |
Responses |
Re: Nested Loop trouble : Execution time increases more
Re: Nested Loop trouble : Execution time increases more 1000 time (long) |
List | pgsql-performance |
Hello, We are using postgresql in a search engine on an intranet handling throusand of documents. But we ave a big problem when users use more than two search key. There are more tables around, but the heart of the search engine is made of three tables : fiches (f_id int4, f_title varchar) 52445 rows engine (f_id int4, k_id int4, weight ) 11761700 rows keywords(k_id, keyword) 1072600 rows A "fiche" is linked to any kind of document. The engine table counts how many times a keyword appears in a document. A query to search on one or two keywords is quick to execute (the front-end creates thoses queries): --------------------------------------------------------------------- select count (distinct f.f_id) as results FROM fiches f INNER JOIN engine e1 INNER JOIN keywords k1 USING (k_id) USING (f_id) INNER JOIN engine e2 INNER JOIN keywords k2 USING (k_id) USING (f_id) WHERE TRUE AND k1.keyword like 'maintenance%' AND k2.keyword like 'exploitation%' ; QUERY PLAN Aggregate (cost=3953.00..3953.00 rows=1 width=4) (actual time=525.243..525.243 rows=1 loops=1) -> Nested Loop (cost=1974.79..3952.99 rows=1 width=4) (actual time=211.570..513.758 rows=6879 loops=1) -> Hash Join (cost=1974.79..3949.62 rows=1 width=8) (actual time=211.483..389.340 rows=6879 loops=1) Hash Cond: ("outer".f_id = "inner".f_id) -> Nested Loop (cost=0.00..1974.76 rows=11 width=4) (actual time=0.132..155.499 rows=9520 loops=1) -> Index Scan using keyword_pattern_key on keywords k2 (cost=0.00..3.51 rows=1 width=4) (actual time=0.078..1.887 rows=75 loops=1) Index Cond: (((keyword)::text ~>=~ 'exploitation'::character varying) AND ((keyword)::text ~<~ 'exploitatioo'::character varying)) Filter: ((keyword)::text ~~ 'exploitation%'::text) -> Index Scan using k_id_key on engine e2 (cost=0.00..1954.93 rows=1306 width=8) (actual time=0.049..1.842 rows=127 loops=75) Index Cond: (e2.k_id = "outer".k_id) -> Hash (cost=1974.76..1974.76 rows=11 width=4) (actual time=211.203..211.203 rows=0 loops=1) -> Nested Loop (cost=0.00..1974.76 rows=11 width=4) (actual time=0.296..197.590 rows=11183 loops=1) -> Index Scan using keyword_pattern_key on keywords k1 (cost=0.00..3.51 rows=1 width=4) (actual time=0.189..1.351 rows=73 loops=1) Index Cond: (((keyword)::text ~>=~ 'maintenance'::character varying) AND ((keyword)::text ~<~ 'maintenancf'::character varying)) Filter: ((keyword)::text ~~ 'maintenance%'::text) -> Index Scan using k_id_key on engine e1 (cost=0.00..1954.93 rows=1306 width=8) (actual time=0.029..2.406 rows=153 loops=73) Index Cond: (e1.k_id = "outer".k_id) -> Index Scan using fiches_pkey on fiches f (cost=0.00..3.36 rows=1 width=4) (actual time=0.013..0.014 rows=1 loops=6879) Index Cond: (f.f_id = "outer".f_id) Total runtime: 525.511 ms -------------------------------------------------------------------------- But when there are three keywords or more, the planner chooses to perform a very costly nested loop : -------------------------------------------------------------------------- select count (distinct f.f_id) as results FROM fiches f INNER JOIN engine e1 INNER JOIN keywords k1 USING (k_id) USING (f_id) INNER JOIN engine e2 INNER JOIN keywords k2 USING (k_id) USING (f_id) INNER JOIN engine e3 INNER JOIN keywords k3 USING (k_id) USING (f_id) WHERE TRUE AND k1.keyword like 'maintenance%' AND k2.keyword like 'exploitation%' AND k3.keyword like 'numerique%' ; QUERY PLAN Aggregate (cost=5927.90..5927.90 rows=1 width=4) (actual time=673048.168..673048.169 rows=1 loops=1) -> Nested Loop (cost=1974.79..5927.90 rows=1 width=4) (actual time=1853.789..673038.065 rows=2929 loops=1) -> Nested Loop (cost=1974.79..5924.52 rows=1 width=12) (actual time=1853.719..672881.725 rows=2929 loops=1) Join Filter: ("inner".f_id = "outer".f_id) -> Hash Join (cost=1974.79..3949.62 rows=1 width=8) (actual time=198.845..441.947 rows=6879 loops=1) Hash Cond: ("outer".f_id = "inner".f_id) -> Nested Loop (cost=0.00..1974.76 rows=11 width=4) (actual time=0.129..199.895 rows=9520 loops=1) -> Index Scan using keyword_pattern_key on keywords k2 (cost=0.00..3.51 rows=1 width=4) (actual time=0.077..1.918 rows=75 loops=1) Index Cond: (((keyword)::text ~>=~ 'exploitation'::character varying) AND ((keyword)::text ~<~ 'exploitatioo'::character varying)) Filter: ((keyword)::text ~~ 'exploitation%'::text) -> Index Scan using k_id_key on engine e2 (cost=0.00..1954.93 rows=1306 width=8) (actual time=0.035..2.342 rows=127 loops=75) Index Cond: (e2.k_id = "outer".k_id) -> Hash (cost=1974.76..1974.76 rows=11 width=4) (actual time=198.650..198.650 rows=0 loops=1) -> Nested Loop (cost=0.00..1974.76 rows=11 width=4) (actual time=0.174..187.216 rows=11183 loops=1) -> Index Scan using keyword_pattern_key on keywords k1 (cost=0.00..3.51 rows=1 width=4) (actual time=0.113..1.222 rows=73 loops=1) Index Cond: (((keyword)::text ~>=~ 'maintenance'::character varying) AND ((keyword)::text ~<~ 'maintenancf'::character varying)) Filter: ((keyword)::text ~~ 'maintenance%'::text) -> Index Scan using k_id_key on engine e1 (cost=0.00..1954.93 rows=1306 width=8) (actual time=0.029..2.311 rows=153 loops=73) Index Cond: (e1.k_id = "outer".k_id) -> Nested Loop (cost=0.00..1974.76 rows=11 width=4) (actual time=0.087..90.165 rows=9553 loops=6879) -> Index Scan using keyword_pattern_key on keywords k3 (cost=0.00..3.51 rows=1 width=4) (actual time=0.049..0.628 rows=49 loops=6879) Index Cond: (((keyword)::text ~>=~ 'numerique'::character varying) AND ((keyword)::text ~<~ 'numeriquf'::character varying)) Filter: ((keyword)::text ~~ 'numerique%'::text) -> Index Scan using k_id_key on engine e3 (cost=0.00..1954.93 rows=1306 width=8) (actual time=0.023..1.544 rows=195 loops=337071) Index Cond: (e3.k_id = "outer".k_id) -> Index Scan using fiches_pkey on fiches f (cost=0.00..3.36 rows=1 width=4) (actual time=0.041..0.043 rows=1 loops=2929) Index Cond: (f.f_id = "outer".f_id) Total runtime: 673048.405 ms ---------------------------------------------------------------------- More than 10 minutes ! Is there a specific reason the planner chooses this way ? Can whe do something on the postgresql configuration to avoid this ? Can whe force the planner to use a hash join as it does for the first joins ? Regards, Antoine Bajolet
pgsql-performance by date: