Thread: Any way to Convince postgres to push join clause inside subquery aggregate?

Any way to Convince postgres to push join clause inside subquery aggregate?

From

Greg Stark

Date:

02 September 2003, 13:54:03

I find I often want to be able to do joins against views where the view are
aggregates on a column that has an index. Ie, something like

SELECT a.*, v.n
  FROM a
  JOIN (select a_id,count(*) as n group by a_id) as v USING (a_id)

where there's an index on b.a_id. assume there are other tables being joined
so I can't just move the aggregate to the outermost layer.

These queries often come about because I often have a fully normalized
structure with many-to-many relationships but sometimes only want to display
an aggregated view of the data. For example, a list of products with the
number in stock, number on order, number sold recently, etc.

It would be really great if postgres could notice that the index is still
useful and push the index lookup inside the aggregate. Working around this is
really awkward and makes it impossible to use real views to hide the details
from the queries. Usually working around it actually means bulding
denormalized tables to act as "materialized views" and building indexes on
those tables.

The frustrating part is that it seems like postgres is almost capable of doing
it. In a simple query it is fully capable of pushing a where clause inside the
aggregate.

But in a join query it doesn't. (I put the "limit 1" on the second table to
ensure it was the best table to use to drive the join.)

Is there some key detail I'm missing that would allow it to push the join
clause inside the aggregate and use the index? If so I can simplify a lot of
design by creating views for these aggregate views instead of creating
denormalized tables and hacking complex queries everywhere.



slo=> explain select * from (select foo_id,count(*) as n from foo_bar group by foo_id) as x where foo_id = 1;
                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Subquery Scan x  (cost=0.00..25.24 rows=1 width=12)
   ->  GroupAggregate  (cost=0.00..25.23 rows=1 width=4)
         ->  Index Scan using idx_foo_bar_foo on foo_bar  (cost=0.00..25.08 rows=30 width=4)
               Index Cond: (foo_id = 1)
(4 rows)

Time: 226.15 ms





slo=> explain
select *
 from (select foo_id,count(*) as n from foo_bar group by foo_id) as x
 join (select * from foo limit 1) as foo using (foo_id)
;

                                      QUERY PLAN
---------------------------------------------------------------------------------------
 Hash Join  (cost=108.46..111.28 rows=1 width=227)
   Hash Cond: ("outer".foo_id = "inner".foo_id)
   ->  Subquery Scan x  (cost=108.35..110.36 rows=161 width=12)
         ->  HashAggregate  (cost=108.35..108.75 rows=161 width=4)
               ->  Seq Scan on foo_bar  (cost=0.00..81.23 rows=5423 width=4)
   ->  Hash  (cost=0.11..0.11 rows=1 width=219)
         ->  Subquery Scan foo  (cost=0.00..0.11 rows=1 width=219)
               ->  Limit  (cost=0.00..0.10 rows=1 width=625)
                     ->  Seq Scan on foo  (cost=0.00..17.82 rows=182 width=625)
(9 rows)


--
greg

Re: Any way to Convince postgres to push join clause inside subquery aggregate?

From

Tom Lane

Date:

02 September 2003, 14:31:34

Greg Stark <gsstark@mit.edu> writes:
> slo=> explain
> select *
>  from (select foo_id,count(*) as n from foo_bar group by foo_id) as x
>  join (select * from foo limit 1) as foo using (foo_id)
> ;

Why not put the subselect in the output list, if that's the kind of plan
you want?

regression=# explain select foo.*, (select count(*) from foo_bar where foo_id = foo.foo_id) as n from foo;
                                     QUERY PLAN
------------------------------------------------------------------------------------
 Seq Scan on foo  (cost=0.00..17102.50 rows=1000 width=8)
   SubPlan
     ->  Aggregate  (cost=17.08..17.08 rows=1 width=0)
           ->  Index Scan using foobi on foo_bar  (cost=0.00..17.07 rows=5 width=0)
                 Index Cond: (foo_id = $0)
(5 rows)


            regards, tom lane

Re: Any way to Convince postgres to push join clause inside subquery aggregate?

From

Greg Stark

Date:

02 September 2003, 14:46:30

Tom Lane <tgl@sss.pgh.pa.us> writes:

> Greg Stark <gsstark@mit.edu> writes:
> > slo=> explain
> > select *
> >  from (select foo_id,count(*) as n from foo_bar group by foo_id) as x
> >  join (select * from foo limit 1) as foo using (foo_id)
> > ;
>
> Why not put the subselect in the output list, if that's the kind of plan
> you want?

Actually that's the way the query is right now. The problem arises because I
want to add a second column without duplicating the whole thing.

--
greg