Re: Two slightly different queries produce same results, - Mailing list pgsql-general
From | Andrew Schmidt |
---|---|
Subject | Re: Two slightly different queries produce same results, |
Date | |
Msg-id | 4374B54A.4090407@lifescale.com Whole thread Raw |
In response to | Re: Two slightly different queries produce same results, (Richard Huxton <dev@archonet.com>) |
List | pgsql-general |
Hi Richard, While in the process of responding to your email I found out what was doing it. It was the grouping of the team_players team_id instead of the team team_id. Here's some table numbers: team => 31720 rows, team_players => 464896 rows, player => 948 rows player_updates => 5414 rows The query: SELECT tp.team_id,SUM(pl.player_id),SUM(goals) as total_goals FROM team t JOIN team_players tp ON tp.team_id = t.team_id JOIN player_updates pu ON pu.player_id = tp.player_id JOIN player pl ON pl.player_id = pu.player_id WHERE t.pool_id = 21699 GROUP BY tp.team_id; It's grouping by the team_players tp.team_id, change the group to t.team_id and bam! 5x faster. thanks for the help, -- Andrew ******* This was my response to you Richard before I found out what it was. ******* I removed position from the query, but still got the same query times. I summed up the pl.player_id just to make sure it was reading the player table. EXPLAIN ANALYSE SELECT tp.team_id,SUM(pl.player_id),SUM(goals) as total_goals FROM team t JOIN team_players tp ON tp.team_id = t.team_id JOIN player_updates pu ON pu.player_id = tp.player_id JOIN player pl ON pl.player_id = pu.player_id WHERE t.pool_id = 21699 GROUP BY tp.team_id; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------- GroupAggregate (cost=8584.45..9364.83 rows=22390 width=12) (actual time=894.364..1140.884 rows=313 loops=1) -> Sort (cost=8584.45..8695.59 rows=44453 width=12) (actual time=893.373..1013.989 rows=40928 loops=1) Sort Key: tp.team_id -> Hash Join (cost=174.52..4701.14 rows=44453 width=12) (actual time=50.433..328.788 rows=40928 loops=1) Hash Cond: ("outer".player_id = "inner".player_id) -> Hash Join (cost=30.85..3840.72 rows=5731 width=12) (actual time=7.279..110.578 rows=5005 loops=1) Hash Cond: ("outer".player_id = "inner".player_id) -> Nested Loop (cost=0.00..3723.90 rows=5731 width=8) (actual time=0.106..69.528 rows=5005 loops=1) -> Index Scan using team_pool_id on team t (cost=0.00..397.90 rows=391 width=4) (actual time=0.058..2.313 rows=313 loops=1) Index Cond: (pool_id = 21699) -> Index Scan using team_id_asdas on team_players tp (cost=0.00..8.24 rows=21 width=8) (actual time=0.027..0.122 rows=16 loops=313) Index Cond: (tp.team_id = "outer".team_id) -> Hash (cost=28.48..28.48 rows=948 width=4) (actual time=7.121..7.121 rows=948 loops=1) -> Seq Scan on player pl (cost=0.00..28.48 rows=948 width=4) (actual time=0.042..3.930 rows=948 loops=1) -> Hash (cost=130.14..130.14 rows=5414 width=8) (actual time=43.070..43.070 rows=5414 loops=1) -> Seq Scan on player_updates pu (cost=0.00..130.14 rows=5414 width=8) (actual time=0.013..22.410 rows=5414 loops=1) Total runtime: 1144.059 ms (17 rows) --- VS fast query EXPLAIN ANALYSE SELECT tp.team_id,SUM(pl.player_id),SUM(goals) as total_goals FROM team t JOIN team_players tp ON tp.team_id = t.team_id AND t.pool_id = 21699 JOIN player_updates pu ON pu.player_id = tp.player_id JOIN player pl ON pl.player_id = pu.player_id WHERE t.pool_id = 21699 GROUP BY tp.team_id; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=270.31..278.80 rows=566 width=12) (actual time=409.428..410.551 rows=313 loops=1) -> Hash Join (cost=89.66..266.06 rows=566 width=12) (actual time=120.341..278.310 rows=40928 loops=1) Hash Cond: ("outer".player_id = "inner".player_id) -> Seq Scan on player_updates pu (cost=0.00..130.14 rows=5414 width=8) (actual time=0.019..16.833 rows=5414 loops=1) -> Hash (cost=89.48..89.48 rows=73 width=12) (actual time=120.209..120.209 rows=5005 loops=1) -> Hash Join (cost=53.16..89.48 rows=73 width=12) (actual time=78.672..102.434 rows=5005 loops=1) Hash Cond: ("outer".player_id = "inner".player_id) -> Seq Scan on player pl (cost=0.00..28.48 rows=948 width=4) (actual time=0.040..3.092 rows=948 loops=1) -> Hash (cost=52.97..52.97 rows=73 width=8) (actual time=78.577..78.577 rows=5005 loops=1) -> Nested Loop (cost=0.00..52.97 rows=73 width=8) (actual time=0.100..57.729 rows=5005 loops=1) -> Index Scan using team_pool_id on team t (cost=0.00..10.44 rows=5 width=4) (actual time=0.050..1.591 rows=313 loops=1) Index Cond: ((pool_id = 21699) AND (pool_id = 21699)) -> Index Scan using team_id_asdas on team_players tp (cost=0.00..8.24 rows=21 width=8) (actual time=0.021..0.099 rows=16 loops=313) Index Cond: (tp.team_id = "outer".team_id) Total runtime: 411.629 ms (15 rows) Time: 417.726 ms I ran ANALYZE on each of the tables, just to be sure, but there was no changes. Also, here's some table info: team => 31720 rows, team_players => 464896 rows, player => 948 rows player_updates => 5414 rows thanks, -- Andrew >> The explain analyze: >> -> Index Scan using team_pool_id on >> team t (cost=0.00..10.44 rows=5 width=4) (actual time=0.049..1.556 >> rows=313 loops=1) >> Index Cond: ((pool_id = 21699) >> AND (pool_id = 21699)) > > > Here it sees two conditions, and PG is expecting that will restrict > things further. It's clearly not expecting the same condition twice, > but is presumably treating it something like "a>-4 AND a<4". This > actually makes the row estimate wrong. > > So - if PG is getting this bit wrong why is query 2 faster? > Looking at the row estimates for case 1 we can see that the final > cases are badly wrong. > > > GroupAggregate (cost=8742.52..9853.85 rows=44453 width=13) (actual > > time=1186.973..1432.548 rows=626 loops=1) > > Here it's got the number of rows badly wrong - if you actually ended > up with 44,000 rows then maybe this would be the best plan. This is > the last step though, so what happens before? > > > -> Sort (cost=8742.52..8853.66 rows=44453 width=13) (actual > > time=1186.237..1309.562 rows=40928 loops=1) > > Sort Key: tp.team_id, (pl."position" = 'G'::bpchar) > > We're about right here. The problem must be in the last step, and > looking at the row estimates it seems to think that no summarising is > actually going to take place. > > I think the "position = 'G'" bit is confusing it. Well, we can test that: > > 1. Just group by "pl.position" - remove the test. Do the estimated and > actual rows match now? > > 2. Rewrite the position='G' as a CASE structure > > 3. Do the position='G' part in a sub-query and join to that. > > The results of those should show whether this is actually the problem. > -- > Richard Huxton > Archonet Ltd > > . > -- _____________________________________________ G l o b a l D i a g n o s t i c s I n c. Andrew Schmidt t.416-304-0049 x206 aschmidt@lifescale.com f.866-697-8726 _____________________________________________
pgsql-general by date: