Thread: Any way to Convince postgres to push join clause inside subquery aggregate?
I find I often want to be able to do joins against views where the view are aggregates on a column that has an index. Ie, something like SELECT a.*, v.n FROM a JOIN (select a_id,count(*) as n group by a_id) as v USING (a_id) where there's an index on b.a_id. assume there are other tables being joined so I can't just move the aggregate to the outermost layer. These queries often come about because I often have a fully normalized structure with many-to-many relationships but sometimes only want to display an aggregated view of the data. For example, a list of products with the number in stock, number on order, number sold recently, etc. It would be really great if postgres could notice that the index is still useful and push the index lookup inside the aggregate. Working around this is really awkward and makes it impossible to use real views to hide the details from the queries. Usually working around it actually means bulding denormalized tables to act as "materialized views" and building indexes on those tables. The frustrating part is that it seems like postgres is almost capable of doing it. In a simple query it is fully capable of pushing a where clause inside the aggregate. But in a join query it doesn't. (I put the "limit 1" on the second table to ensure it was the best table to use to drive the join.) Is there some key detail I'm missing that would allow it to push the join clause inside the aggregate and use the index? If so I can simplify a lot of design by creating views for these aggregate views instead of creating denormalized tables and hacking complex queries everywhere. slo=> explain select * from (select foo_id,count(*) as n from foo_bar group by foo_id) as x where foo_id = 1; QUERY PLAN ---------------------------------------------------------------------------------------------------------------- Subquery Scan x (cost=0.00..25.24 rows=1 width=12) -> GroupAggregate (cost=0.00..25.23 rows=1 width=4) -> Index Scan using idx_foo_bar_foo on foo_bar (cost=0.00..25.08 rows=30 width=4) Index Cond: (foo_id = 1) (4 rows) Time: 226.15 ms slo=> explain select * from (select foo_id,count(*) as n from foo_bar group by foo_id) as x join (select * from foo limit 1) as foo using (foo_id) ; QUERY PLAN --------------------------------------------------------------------------------------- Hash Join (cost=108.46..111.28 rows=1 width=227) Hash Cond: ("outer".foo_id = "inner".foo_id) -> Subquery Scan x (cost=108.35..110.36 rows=161 width=12) -> HashAggregate (cost=108.35..108.75 rows=161 width=4) -> Seq Scan on foo_bar (cost=0.00..81.23 rows=5423 width=4) -> Hash (cost=0.11..0.11 rows=1 width=219) -> Subquery Scan foo (cost=0.00..0.11 rows=1 width=219) -> Limit (cost=0.00..0.10 rows=1 width=625) -> Seq Scan on foo (cost=0.00..17.82 rows=182 width=625) (9 rows) -- greg
Greg Stark <gsstark@mit.edu> writes: > slo=> explain > select * > from (select foo_id,count(*) as n from foo_bar group by foo_id) as x > join (select * from foo limit 1) as foo using (foo_id) > ; Why not put the subselect in the output list, if that's the kind of plan you want? regression=# explain select foo.*, (select count(*) from foo_bar where foo_id = foo.foo_id) as n from foo; QUERY PLAN ------------------------------------------------------------------------------------ Seq Scan on foo (cost=0.00..17102.50 rows=1000 width=8) SubPlan -> Aggregate (cost=17.08..17.08 rows=1 width=0) -> Index Scan using foobi on foo_bar (cost=0.00..17.07 rows=5 width=0) Index Cond: (foo_id = $0) (5 rows) regards, tom lane
Re: Any way to Convince postgres to push join clause inside subquery aggregate?
From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes: > Greg Stark <gsstark@mit.edu> writes: > > slo=> explain > > select * > > from (select foo_id,count(*) as n from foo_bar group by foo_id) as x > > join (select * from foo limit 1) as foo using (foo_id) > > ; > > Why not put the subselect in the output list, if that's the kind of plan > you want? Actually that's the way the query is right now. The problem arises because I want to add a second column without duplicating the whole thing. -- greg