Any way to Convince postgres to push join clause inside subquery aggregate? - Mailing list pgsql-general

From Greg Stark
Subject Any way to Convince postgres to push join clause inside subquery aggregate?
Date
Msg-id 877k4rf8ui.fsf@stark.dyndns.tv
Whole thread Raw
Responses Re: Any way to Convince postgres to push join clause inside subquery aggregate?
List pgsql-general
I find I often want to be able to do joins against views where the view are
aggregates on a column that has an index. Ie, something like

SELECT a.*, v.n
  FROM a
  JOIN (select a_id,count(*) as n group by a_id) as v USING (a_id)

where there's an index on b.a_id. assume there are other tables being joined
so I can't just move the aggregate to the outermost layer.

These queries often come about because I often have a fully normalized
structure with many-to-many relationships but sometimes only want to display
an aggregated view of the data. For example, a list of products with the
number in stock, number on order, number sold recently, etc.

It would be really great if postgres could notice that the index is still
useful and push the index lookup inside the aggregate. Working around this is
really awkward and makes it impossible to use real views to hide the details
from the queries. Usually working around it actually means bulding
denormalized tables to act as "materialized views" and building indexes on
those tables.

The frustrating part is that it seems like postgres is almost capable of doing
it. In a simple query it is fully capable of pushing a where clause inside the
aggregate.

But in a join query it doesn't. (I put the "limit 1" on the second table to
ensure it was the best table to use to drive the join.)

Is there some key detail I'm missing that would allow it to push the join
clause inside the aggregate and use the index? If so I can simplify a lot of
design by creating views for these aggregate views instead of creating
denormalized tables and hacking complex queries everywhere.



slo=> explain select * from (select foo_id,count(*) as n from foo_bar group by foo_id) as x where foo_id = 1;
                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Subquery Scan x  (cost=0.00..25.24 rows=1 width=12)
   ->  GroupAggregate  (cost=0.00..25.23 rows=1 width=4)
         ->  Index Scan using idx_foo_bar_foo on foo_bar  (cost=0.00..25.08 rows=30 width=4)
               Index Cond: (foo_id = 1)
(4 rows)

Time: 226.15 ms





slo=> explain
select *
 from (select foo_id,count(*) as n from foo_bar group by foo_id) as x
 join (select * from foo limit 1) as foo using (foo_id)
;

                                      QUERY PLAN
---------------------------------------------------------------------------------------
 Hash Join  (cost=108.46..111.28 rows=1 width=227)
   Hash Cond: ("outer".foo_id = "inner".foo_id)
   ->  Subquery Scan x  (cost=108.35..110.36 rows=161 width=12)
         ->  HashAggregate  (cost=108.35..108.75 rows=161 width=4)
               ->  Seq Scan on foo_bar  (cost=0.00..81.23 rows=5423 width=4)
   ->  Hash  (cost=0.11..0.11 rows=1 width=219)
         ->  Subquery Scan foo  (cost=0.00..0.11 rows=1 width=219)
               ->  Limit  (cost=0.00..0.10 rows=1 width=625)
                     ->  Seq Scan on foo  (cost=0.00..17.82 rows=182 width=625)
(9 rows)


--
greg

pgsql-general by date:

Previous
From: Ron Johnson
Date:
Subject: Re: About GPL and proprietary software
Next
From: Chris Bowlby
Date:
Subject: Table name lengths...