left join query does not perform well - Mailing list pgsql-sql
From | Manuel Sugawara |
---|---|
Subject | left join query does not perform well |
Date | |
Msg-id | m3hehja80c.fsf@dep4.fciencias.unam.mx Whole thread Raw |
Responses |
Re: left join query does not perform well
|
List | pgsql-sql |
Hi all, I have three tables that are related in the following way: '---------------------------------------- | epr_ord_grupo < 1 - n > epr_ord_profesor < 0 - n > epr_ord_horario `---------------------------------------- In average there is 1.0407716514 rows in epr_ord_horario for each row in epr_ord_profesor, but since there may be 0 I use the query explained below: '---------------------------------------- | test=# explain analyze SELECT grupo.grupo_id FROM (epr_vord_grupo AS grupo INNER JOIN (epr_vord_profesor AS profesor LEFTJOIN epr_vord_horario AS horario ON (profesor.profesor_id = horario.profesor_id)) ON (grupo.grupo_id = profesor.grupo_id))WHERE (grupo.año_semestre = cast('20022' as integer)) AND (grupo.grupo = cast('4021' as int)) ORDER BYgrupo.grupo_id, profesor.cargo, profesor.profesor_id, horario.horario_id; | NOTICE: QUERY PLAN: | | Sort (cost=1165.25..1165.25 rows=2 width=24) (actual time=3113.30..3113.31 rows=3 loops=1) | -> Hash Join (cost=670.71..1165.24 rows=2 width=24) (actual time=3016.64..3113.08 rows=3 loops=1) | -> Merge Join (cost=665.46..1110.66 rows=9859 width=20) (actual time=2774.61..3058.77 rows=10157 loops=1) | -> Index Scan using epr_ord_profesor_pkey on epr_ord_profesor (cost=0.00..306.45 rows=9859 width=12) (actualtime=124.07..275.04 rows=9859 loops=1) | -> Sort (cost=665.46..665.46 rows=7607 width=52) (actual time=2650.49..2660.55 rows=7607 loops=1) | -> Subquery Scan horario (cost=0.00..175.07 rows=7607 width=52) (actual time=20.62..2593.15 rows=7607loops=1) | -> Seq Scan on epr_ord_horario h (cost=0.00..175.07 rows=7607 width=52) (actual time=20.61..2538.44rows=7607 loops=1) | -> Hash (cost=5.25..5.25 rows=1 width=4) (actual time=26.26..26.26 rows=0 loops=1) | -> Index Scan using llave primaria-asemg on epr_ord_grupo g (cost=0.00..5.25 rows=1 width=4) (actual time=26.24..26.25rows=1 loops=1) | Total runtime: 3117.48 msec | | EXPLAIN `---------------------------------------- However the full join (which is correct almost always) of the three tables looks like: '---------------------------------------- | test=# explain analyze SELECT grupo.grupo_id FROM epr_vord_grupo grupo, epr_vord_profesor AS profesor, epr_vord_horarioAS horario WHERE (profesor.profesor_id = horario.profesor_id) AND (grupo.grupo_id = profesor.grupo_id) AND(grupo.año_semestre = cast('20022' as integer)) AND (grupo.grupo = cast('4021' as int)) ORDER BY grupo.grupo_id, profesor.cargo,profesor.profesor_id, horario.horario_id; | NOTICE: QUERY PLAN: | | Sort (cost=14.91..14.91 rows=2 width=24) (actual time=0.53..0.53 rows=3 loops=1) | -> Nested Loop (cost=0.00..14.90 rows=2 width=24) (actual time=0.19..0.30 rows=3 loops=1) | -> Nested Loop (cost=0.00..8.48 rows=2 width=16) (actual time=0.14..0.17 rows=2 loops=1) | -> Index Scan using llave primaria-asemg on epr_ord_grupo g (cost=0.00..5.25 rows=1 width=4) (actual time=0.08..0.08rows=1 loops=1) | -> Index Scan using epr_ord_profe_gpo on epr_ord_profesor (cost=0.00..3.20 rows=2 width=12) (actual time=0.04..0.06rows=2 loops=1) | -> Index Scan using epr_ord_horario_prf on epr_ord_horario h (cost=0.00..3.04 rows=1 width=8) (actual time=0.03..0.05rows=2 loops=2) | Total runtime: 1.15 msec | | EXPLAIN `---------------------------------------- Ouch, 3117.48 msec vs. 1.15 msec is a huge difference. I need something else? or may be postgres optimizer can't cope with left/right joins? I was thinking to add a dummy row in epr_ord_horario to ensure that the full join is always correct but may be a better solution arises from the expertise of the list :-) Regards, Manuel.