On Wed, Aug 27, 2014 at 2:07 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > Hi > > one user asked about using a partitioning for faster aggregates queries. > > I found so there is not any optimization. > > create table x1(a int, d date); > create table x_1 ( check(d = '2014-01-01'::date)) inherits(x1); > create table x_2 ( check(d = '2014-01-02'::date)) inherits(x1); > create table x_3 ( check(d = '2014-01-03'::date)) inherits(x1); > > When I have this schema, then optimizer try to do > > postgres=# explain verbose select max(a) from x1 group by d order by d; > QUERY PLAN > -------------------------------------------------------------------------------- > GroupAggregate (cost=684.79..750.99 rows=200 width=8) > Output: max(x1.a), x1.d > Group Key: x1.d > -> Sort (cost=684.79..706.19 rows=8561 width=8) > Output: x1.d, x1.a > Sort Key: x1.d > -> Append (cost=0.00..125.60 rows=8561 width=8) > -> Seq Scan on public.x1 (cost=0.00..0.00 rows=1 width=8) > Output: x1.d, x1.a > -> Seq Scan on public.x_1 (cost=0.00..31.40 rows=2140 > width=8) > Output: x_1.d, x_1.a > -> Seq Scan on public.x_2 (cost=0.00..31.40 rows=2140 > width=8) > Output: x_2.d, x_2.a > -> Seq Scan on public.x_3 (cost=0.00..31.40 rows=2140 > width=8) > Output: x_3.d, x_3.a > -> Seq Scan on public.x_4 (cost=0.00..31.40 rows=2140 > width=8) > Output: x_4.d, x_4.a > Planning time: 0.333 ms > > It can be reduced to: > > sort by d > Append > Aggegate (a), d > seq scan from x_1 > Aggregate (a), d > seq scan from x_2 > > Are there some plans to use partitioning for aggregation?
Besides min/max, what other aggregates (mean/stddev come to mind) would you optimize and how would you determine which ones could be? Where is that decision made?
I am thinking so all aggregates are possible
when you have a partitions by column X -- then you have a natural sets by X,
so you can directly calculate any aggregates on any column when GROUP BY clause is a "GROUP BY X"
isn't it?
probably some similar optimizations are possible when you have "GROUP BY X,Y" -- minimally you have more sets, and you can do aggregations on smaller sets.
Pavel
For example, could user defined aggregates be pushed down if you had a reaggregation routine broken out from the main one? merlin