Home > mailing lists

Re: possible optimization: push down aggregates - Mailing list pgsql-hackers

From	Pavel Stehule
Subject	Re: possible optimization: push down aggregates
Date	August 27, 2014 22:47:48
Msg-id	CAFj8pRAkXjWrp1uHaYm1mvOFfbE5p563bVuYsppZ=oqOXv8eqw@mail.gmail.com Whole thread Raw
In response to	Re: possible optimization: push down aggregates (Merlin Moncure <mmoncure@gmail.com>)
List	pgsql-hackers

Tree view

2014-08-27 21:41 GMT+02:00 Merlin Moncure <mmoncure@gmail.com>:

On Wed, Aug 27, 2014 at 2:07 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Hi
>
> one user asked about using a partitioning for faster aggregates queries.
>
> I found so there is not any optimization.
>
> create table x1(a int, d date);
> create table x_1 ( check(d = '2014-01-01'::date)) inherits(x1);
> create table x_2 ( check(d = '2014-01-02'::date)) inherits(x1);
> create table x_3 ( check(d = '2014-01-03'::date)) inherits(x1);
>
> When I have this schema, then optimizer try to do
>
> postgres=# explain verbose select max(a) from x1 group by d order by d;
> QUERY PLAN
> --------------------------------------------------------------------------------
> GroupAggregate (cost=684.79..750.99 rows=200 width=8)
> Output: max(x1.a), x1.d
> Group Key: x1.d
> -> Sort (cost=684.79..706.19 rows=8561 width=8)
> Output: x1.d, x1.a
> Sort Key: x1.d
> -> Append (cost=0.00..125.60 rows=8561 width=8)
> -> Seq Scan on public.x1 (cost=0.00..0.00 rows=1 width=8)
> Output: x1.d, x1.a
> -> Seq Scan on public.x_1 (cost=0.00..31.40 rows=2140
> width=8)
> Output: x_1.d, x_1.a
> -> Seq Scan on public.x_2 (cost=0.00..31.40 rows=2140
> width=8)
> Output: x_2.d, x_2.a
> -> Seq Scan on public.x_3 (cost=0.00..31.40 rows=2140
> width=8)
> Output: x_3.d, x_3.a
> -> Seq Scan on public.x_4 (cost=0.00..31.40 rows=2140
> width=8)
> Output: x_4.d, x_4.a
> Planning time: 0.333 ms
>
> It can be reduced to:
>
> sort by d
> Append
> Aggegate (a), d
> seq scan from x_1
> Aggregate (a), d
> seq scan from x_2
>
> Are there some plans to use partitioning for aggregation?

Besides min/max, what other aggregates (mean/stddev come to mind)
would you optimize and how would you determine which ones could be?
Where is that decision made?

I am thinking so all aggregates are possible

when you have a partitions by column X -- then you have a natural sets by X,

so you can directly calculate any aggregates on any column when GROUP BY clause is a "GROUP BY X"

isn't it?

probably some similar optimizations are possible when you have "GROUP BY X,Y" -- minimally you have more sets, and you can do aggregations on smaller sets.

Pavel

For example, could user defined aggregates be pushed down if you had a
reaggregation routine broken out from the main one?

merlin

pgsql-hackers by date:

From: Claudio Freire
Date: 27 August 2014, 22:46:30
Subject: Re: possible optimization: push down aggregates

From: Pavel Stehule
Date: 27 August 2014, 22:49:22
Subject: Re: possible optimization: push down aggregates

Re: possible optimization: push down aggregates - Mailing list pgsql-hackers

Previous

Next