Re: why is a constraint not 'pushed down' into a subselect when this subselect is using a 'group by' ?? - Mailing list pgsql-performance
From | Frank van Vugt |
---|---|
Subject | Re: why is a constraint not 'pushed down' into a subselect when this subselect is using a 'group by' ?? |
Date | |
Msg-id | 200406151730.40464.ftm.van.vugt@foxi.nl Whole thread Raw |
In response to | Re: why is a constraint not 'pushed down' into a subselect when this subselect is using a 'group by' ?? (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: why is a constraint not 'pushed down' into a subselect when this subselect is using a 'group by' ??
|
List | pgsql-performance |
> Obviously this is on toy tables The query is simplified, yes. But the data in the tables is real, albeit they're not that large. > You're misinterpreting it. I might very well be ;) But I also get the feeling I didn't explain to you well enough what I meant... > Without the group by, the plan is a candidate for nestloop-with-inner-index-scan Yes, I understand that. I only ditched the group by to check whether the contraint on the article table was indeed recognized as a constraint on the package table based on 'article.id = foo.article_id'. And it is/was. > with the group by, there's another step in the way. Yep, but on my system, package gets seq-scanned *without* any additional constraint, resulting in a loooooong processing time. > Pushing down into subselects does get done, for instance in CVS tip > I can change the last part of your query to "foo.article_id < 50" > and get ... This is why I think I wasn't clear enough. In the real thing, the constraint on the article table is built by some external source and I cannot easily make assumptions to translate these to a constraint on the package table, especially since I expect the planner to be far better in that than me ;) So, my base query is this: select article.id, p_min from article, (select article_id, min(amount) as p_min from package group by article_id ) as foo where article.id = foo.article_id and <some constraint on article table>; Now, when <constraint> = true, this obviously results in seqscans: Hash Join (cost=1106.79..1251.46 rows=4452 width=8) Hash Cond: ("outer".article_id = "inner".id) -> Subquery Scan foo (cost=726.10..781.74 rows=4451 width=8) -> HashAggregate (cost=726.10..737.23 rows=4451 width=8) -> Seq Scan on package (cost=0.00..635.40 rows=18140 width=8) -> Hash (cost=369.35..369.35 rows=4535 width=4) -> Seq Scan on article (cost=0.00..369.35 rows=4535 width=4) But when <constraint> = 'article.id < 50', only article is indexscanned: Hash Join (cost=730.11..808.02 rows=1 width=8) Hash Cond: ("outer".article_id = "inner".id) -> Subquery Scan foo (cost=726.10..781.74 rows=4451 width=8) -> HashAggregate (cost=726.10..737.23 rows=4451 width=8) -> Seq Scan on package (cost=0.00..635.40 rows=18140 width=8) -> Hash (cost=4.01..4.01 rows=1 width=4) -> Index Scan using article_pkey on article (cost=0.00..4.01 rows=1 width=4) Index Cond: (id < 50) Which still results in poor performance due to the seqscan on package. Putting the constraint on package is boosting performance indeed, but I cannot make that assumption. So, what I was asking was: When the 'article.id < 50' constraint is added, it follows that 'foo.article_id < 50' is a constraint as well. Why is this constraint not used to avoid the seqscan on package? > Obviously this is on toy tables, but the point is that the constraint > does get pushed down through the GROUP BY when appropriate. I've seen it being pushed down when it already was defined as a constraint on the group by, like in your example. If necessary, I'll throw together a few commands that build some example tables to show what I mean. -- Best, Frank.
pgsql-performance by date: