why is a constraint not 'pushed down' into a subselect when this subselect is using a 'group by' ?? - Mailing list pgsql-performance
From | Frank van Vugt |
---|---|
Subject | why is a constraint not 'pushed down' into a subselect when this subselect is using a 'group by' ?? |
Date | |
Msg-id | 200406150849.29470.ftm.van.vugt@foxi.nl Whole thread Raw |
Responses |
Re: why is a constraint not 'pushed down' into a subselect when this subselect is using a 'group by' ??
|
List | pgsql-performance |
Hi all, I noticed the following. Given two tables, just simply articles and their packages: article(id int) package( id int, article_id int, amount) When taking the minimum package for articles given some constraint on the article table select article.id, p_min from article, (select article_id, min(amount) as p_min from package group by article_id ) as foo where article.id = foo.article_id and article.id < 50; The query plan shows QUERY PLAN ---------------------------------------------------------------------------------------- Hash Join (cost=730.11..808.02 rows=1 width=8) Hash Cond: ("outer".article_id = "inner".id) -> Subquery Scan foo (cost=726.10..781.74 rows=4451 width=8) -> HashAggregate (cost=726.10..737.23 rows=4451 width=8) -> Seq Scan on package (cost=0.00..635.40 rows=18140 width=8) -> Hash (cost=4.01..4.01 rows=1 width=4) -> Index Scan using article_pkey on article (cost=0.00..4.01 rows=1 width=4) Index Cond: (id < 50) So, a seq scan on the complete package table is done, although it seems the planner could have deduced that only the part that has 'article_id < 50' would be relevant, since 'article.id < 50' and 'article.id = foo.article_id' If I ditch the group by, then this contraint does get pushed into the subselect : select article.id, p_min from article, (select article_id, amount as p_min from package ) as foo where article.id = foo.article_id and article.id < 50; QUERY PLAN ----------------------------------------------------------------------------------- Nested Loop (cost=0.00..14.22 rows=5 width=8) -> Index Scan using article_pkey on article (cost=0.00..4.01 rows=1 width=4) Index Cond: (id < 50) -> Index Scan using package_idx3 on package (cost=0.00..10.16 rows=4 width=8) Index Cond: ("outer".id = package.article_id) So it seems the math-knowledge is there ;) I'm wondering about what I'm overlooking here. When I take the first query and add the article constraint 'manually', I get: select article.id, p_min from article, (select article_id, min(amount) as p_min from package where article_id < 50 group by article_id ) as foo where article.id = foo.article_id and article.id < 50; QUERY PLAN ---------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..9.65 rows=1 width=8) Join Filter: ("outer".id = "inner".article_id) -> Index Scan using article_pkey on article (cost=0.00..4.01 rows=1 width=4) Index Cond: (id < 50) -> Subquery Scan foo (cost=0.00..5.62 rows=1 width=8) -> GroupAggregate (cost=0.00..5.61 rows=1 width=8) -> Index Scan using package_idx3 on package (cost=0.00..5.60 rows=2 width=8) Index Cond: (article_id < 50) which would have been a nice plan for the first query ;) Could someone shine a light on this, please? -- Best, Frank.
pgsql-performance by date: