Thread: possible optimization: push down aggregates

possible optimization: push down aggregates

From

Pavel Stehule

Date:

27 August 2014, 19:08:14

one user asked about using a partitioning for faster aggregates queries.

I found so there is not any optimization.

create table x1(a int, d date);
create table x_1 ( check(d = '2014-01-01'::date)) inherits(x1);
create table x_2 ( check(d = '2014-01-02'::date)) inherits(x1);
create table x_3 ( check(d = '2014-01-03'::date)) inherits(x1);

When I have this schema, then optimizer try to do

postgres=# explain verbose select max(a) from x1 group by d order by d;
                                   QUERY PLAN
--------------------------------------------------------------------------------
GroupAggregate (cost=684.79..750.99 rows=200 width=8)
   Output: max(x1.a), x1.d
   Group Key: x1.d
   -> Sort (cost=684.79..706.19 rows=8561 width=8)
         Output: x1.d, x1.a
         Sort Key: x1.d
         -> Append (cost=0.00..125.60 rows=8561 width=8)
               -> Seq Scan on public.x1 (cost=0.00..0.00 rows=1 width=8)
                     Output: x1.d, x1.a
               -> Seq Scan on public.x_1 (cost=0.00..31.40 rows=2140 width=8)
                     Output: x_1.d, x_1.a
               -> Seq Scan on public.x_2 (cost=0.00..31.40 rows=2140 width=8)
                     Output: x_2.d, x_2.a
               -> Seq Scan on public.x_3 (cost=0.00..31.40 rows=2140 width=8)
                     Output: x_3.d, x_3.a
               -> Seq Scan on public.x_4 (cost=0.00..31.40 rows=2140 width=8)
                     Output: x_4.d, x_4.a
Planning time: 0.333 ms

It can be reduced to:

sort by d

Append

Aggegate (a), d

seq scan from x_1

Aggregate (a), d

seq scan from x_2

Are there some plans to use partitioning for aggregation?

Regards

Pavel

Re: possible optimization: push down aggregates

From

Merlin Moncure

Date:

27 August 2014, 19:42:03

On Wed, Aug 27, 2014 at 2:07 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Hi
>
> one user asked about using a partitioning for faster aggregates queries.
>
> I found so there is not any optimization.
>
> create table x1(a int, d date);
> create table x_1 ( check(d = '2014-01-01'::date)) inherits(x1);
> create table x_2 ( check(d = '2014-01-02'::date)) inherits(x1);
> create table x_3 ( check(d = '2014-01-03'::date)) inherits(x1);
>
> When I have this schema, then optimizer try to do
>
> postgres=# explain verbose select max(a) from x1 group by d order by d;
>                                    QUERY PLAN
> --------------------------------------------------------------------------------
>  GroupAggregate  (cost=684.79..750.99 rows=200 width=8)
>    Output: max(x1.a), x1.d
>    Group Key: x1.d
>    ->  Sort  (cost=684.79..706.19 rows=8561 width=8)
>          Output: x1.d, x1.a
>          Sort Key: x1.d
>          ->  Append  (cost=0.00..125.60 rows=8561 width=8)
>                ->  Seq Scan on public.x1  (cost=0.00..0.00 rows=1 width=8)
>                      Output: x1.d, x1.a
>                ->  Seq Scan on public.x_1  (cost=0.00..31.40 rows=2140
> width=8)
>                      Output: x_1.d, x_1.a
>                ->  Seq Scan on public.x_2  (cost=0.00..31.40 rows=2140
> width=8)
>                      Output: x_2.d, x_2.a
>                ->  Seq Scan on public.x_3  (cost=0.00..31.40 rows=2140
> width=8)
>                      Output: x_3.d, x_3.a
>                ->  Seq Scan on public.x_4  (cost=0.00..31.40 rows=2140
> width=8)
>                      Output: x_4.d, x_4.a
>  Planning time: 0.333 ms
>
> It can be reduced to:
>
>   sort by d
>   Append
>        Aggegate (a), d
>          seq scan from x_1
>        Aggregate (a), d
>          seq scan from x_2
>
> Are there some plans to use partitioning for aggregation?

Besides min/max, what other aggregates (mean/stddev come to mind)
would you optimize and how would you determine which ones could be?
Where is that decision made?

For example, could user defined aggregates be pushed down if you had a
reaggregation routine broken out from the main one?

merlin

Re: possible optimization: push down aggregates

From

Claudio Freire

Date:

27 August 2014, 19:46:30

On Wed, Aug 27, 2014 at 4:41 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Wed, Aug 27, 2014 at 2:07 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> Hi
>>
>> one user asked about using a partitioning for faster aggregates queries.
>>
>> I found so there is not any optimization.
>>
>> create table x1(a int, d date);
>> create table x_1 ( check(d = '2014-01-01'::date)) inherits(x1);
>> create table x_2 ( check(d = '2014-01-02'::date)) inherits(x1);
>> create table x_3 ( check(d = '2014-01-03'::date)) inherits(x1);
>>
>> When I have this schema, then optimizer try to do
>>
>> postgres=# explain verbose select max(a) from x1 group by d order by d;
>>                                    QUERY PLAN
>> --------------------------------------------------------------------------------
>>  GroupAggregate  (cost=684.79..750.99 rows=200 width=8)
>>    Output: max(x1.a), x1.d
>>    Group Key: x1.d
>>    ->  Sort  (cost=684.79..706.19 rows=8561 width=8)
>>          Output: x1.d, x1.a
>>          Sort Key: x1.d
>>          ->  Append  (cost=0.00..125.60 rows=8561 width=8)
>>                ->  Seq Scan on public.x1  (cost=0.00..0.00 rows=1 width=8)
>>                      Output: x1.d, x1.a
>>                ->  Seq Scan on public.x_1  (cost=0.00..31.40 rows=2140
>> width=8)
>>                      Output: x_1.d, x_1.a
>>                ->  Seq Scan on public.x_2  (cost=0.00..31.40 rows=2140
>> width=8)
>>                      Output: x_2.d, x_2.a
>>                ->  Seq Scan on public.x_3  (cost=0.00..31.40 rows=2140
>> width=8)
>>                      Output: x_3.d, x_3.a
>>                ->  Seq Scan on public.x_4  (cost=0.00..31.40 rows=2140
>> width=8)
>>                      Output: x_4.d, x_4.a
>>  Planning time: 0.333 ms
>>
>> It can be reduced to:
>>
>>   sort by d
>>   Append
>>        Aggegate (a), d
>>          seq scan from x_1
>>        Aggregate (a), d
>>          seq scan from x_2
>>
>> Are there some plans to use partitioning for aggregation?
>
> Besides min/max, what other aggregates (mean/stddev come to mind)
> would you optimize and how would you determine which ones could be?
> Where is that decision made?


You can't with mean and stddev, only with associative aggregates.

That's min, max, sum, bit_and, bit_or, bool_and, bool_or, count.

Re: possible optimization: push down aggregates

From

Pavel Stehule

Date:

27 August 2014, 19:47:48

2014-08-27 21:41 GMT+02:00 Merlin Moncure <mmoncure@gmail.com>:

On Wed, Aug 27, 2014 at 2:07 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Hi
>
> one user asked about using a partitioning for faster aggregates queries.
>
> I found so there is not any optimization.
>
> create table x1(a int, d date);
> create table x_1 ( check(d = '2014-01-01'::date)) inherits(x1);
> create table x_2 ( check(d = '2014-01-02'::date)) inherits(x1);
> create table x_3 ( check(d = '2014-01-03'::date)) inherits(x1);
>
> When I have this schema, then optimizer try to do
>
> postgres=# explain verbose select max(a) from x1 group by d order by d;
> QUERY PLAN
> --------------------------------------------------------------------------------
> GroupAggregate (cost=684.79..750.99 rows=200 width=8)
> Output: max(x1.a), x1.d
> Group Key: x1.d
> -> Sort (cost=684.79..706.19 rows=8561 width=8)
> Output: x1.d, x1.a
> Sort Key: x1.d
> -> Append (cost=0.00..125.60 rows=8561 width=8)
> -> Seq Scan on public.x1 (cost=0.00..0.00 rows=1 width=8)
> Output: x1.d, x1.a
> -> Seq Scan on public.x_1 (cost=0.00..31.40 rows=2140
> width=8)
> Output: x_1.d, x_1.a
> -> Seq Scan on public.x_2 (cost=0.00..31.40 rows=2140
> width=8)
> Output: x_2.d, x_2.a
> -> Seq Scan on public.x_3 (cost=0.00..31.40 rows=2140
> width=8)
> Output: x_3.d, x_3.a
> -> Seq Scan on public.x_4 (cost=0.00..31.40 rows=2140
> width=8)
> Output: x_4.d, x_4.a
> Planning time: 0.333 ms
>
> It can be reduced to:
>
> sort by d
> Append
> Aggegate (a), d
> seq scan from x_1
> Aggregate (a), d
> seq scan from x_2
>
> Are there some plans to use partitioning for aggregation?

Besides min/max, what other aggregates (mean/stddev come to mind)
would you optimize and how would you determine which ones could be?
Where is that decision made?

I am thinking so all aggregates are possible

when you have a partitions by column X -- then you have a natural sets by X,

so you can directly calculate any aggregates on any column when GROUP BY clause is a "GROUP BY X"

isn't it?

probably some similar optimizations are possible when you have "GROUP BY X,Y" -- minimally you have more sets, and you can do aggregations on smaller sets.

Pavel

For example, could user defined aggregates be pushed down if you had a
reaggregation routine broken out from the main one?

merlin

Re: possible optimization: push down aggregates

From

Pavel Stehule

Date:

27 August 2014, 19:49:22

2014-08-27 21:46 GMT+02:00 Claudio Freire <klaussfreire@gmail.com>:

On Wed, Aug 27, 2014 at 4:41 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Wed, Aug 27, 2014 at 2:07 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> Hi
>>
>> one user asked about using a partitioning for faster aggregates queries.
>>
>> I found so there is not any optimization.
>>
>> create table x1(a int, d date);
>> create table x_1 ( check(d = '2014-01-01'::date)) inherits(x1);
>> create table x_2 ( check(d = '2014-01-02'::date)) inherits(x1);
>> create table x_3 ( check(d = '2014-01-03'::date)) inherits(x1);
>>
>> When I have this schema, then optimizer try to do
>>
>> postgres=# explain verbose select max(a) from x1 group by d order by d;
>> QUERY PLAN
>> --------------------------------------------------------------------------------
>> GroupAggregate (cost=684.79..750.99 rows=200 width=8)
>> Output: max(x1.a), x1.d
>> Group Key: x1.d
>> -> Sort (cost=684.79..706.19 rows=8561 width=8)
>> Output: x1.d, x1.a
>> Sort Key: x1.d
>> -> Append (cost=0.00..125.60 rows=8561 width=8)
>> -> Seq Scan on public.x1 (cost=0.00..0.00 rows=1 width=8)
>> Output: x1.d, x1.a
>> -> Seq Scan on public.x_1 (cost=0.00..31.40 rows=2140
>> width=8)
>> Output: x_1.d, x_1.a
>> -> Seq Scan on public.x_2 (cost=0.00..31.40 rows=2140
>> width=8)
>> Output: x_2.d, x_2.a
>> -> Seq Scan on public.x_3 (cost=0.00..31.40 rows=2140
>> width=8)
>> Output: x_3.d, x_3.a
>> -> Seq Scan on public.x_4 (cost=0.00..31.40 rows=2140
>> width=8)
>> Output: x_4.d, x_4.a
>> Planning time: 0.333 ms
>>
>> It can be reduced to:
>>
>> sort by d
>> Append
>> Aggegate (a), d
>> seq scan from x_1
>> Aggregate (a), d
>> seq scan from x_2
>>
>> Are there some plans to use partitioning for aggregation?
>
> Besides min/max, what other aggregates (mean/stddev come to mind)
> would you optimize and how would you determine which ones could be?
> Where is that decision made?

You can't with mean and stddev, only with associative aggregates.

That's min, max, sum, bit_and, bit_or, bool_and, bool_or, count.

I don't think

I have a partitions by X .. and my query has group by clause GROUP BY X

so I can calculate any aggregate

Pavel

Re: possible optimization: push down aggregates

From

Merlin Moncure

Date:

27 August 2014, 19:52:12

On Wed, Aug 27, 2014 at 2:46 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Wed, Aug 27, 2014 at 4:41 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> On Wed, Aug 27, 2014 at 2:07 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>>> Hi
>>>
>>> one user asked about using a partitioning for faster aggregates queries.
>>>
>>> I found so there is not any optimization.
>>>
>>> create table x1(a int, d date);
>>> create table x_1 ( check(d = '2014-01-01'::date)) inherits(x1);
>>> create table x_2 ( check(d = '2014-01-02'::date)) inherits(x1);
>>> create table x_3 ( check(d = '2014-01-03'::date)) inherits(x1);
>>>
>>> When I have this schema, then optimizer try to do
>>>
>>> postgres=# explain verbose select max(a) from x1 group by d order by d;
>>>                                    QUERY PLAN
>>> --------------------------------------------------------------------------------
>>>  GroupAggregate  (cost=684.79..750.99 rows=200 width=8)
>>>    Output: max(x1.a), x1.d
>>>    Group Key: x1.d
>>>    ->  Sort  (cost=684.79..706.19 rows=8561 width=8)
>>>          Output: x1.d, x1.a
>>>          Sort Key: x1.d
>>>          ->  Append  (cost=0.00..125.60 rows=8561 width=8)
>>>                ->  Seq Scan on public.x1  (cost=0.00..0.00 rows=1 width=8)
>>>                      Output: x1.d, x1.a
>>>                ->  Seq Scan on public.x_1  (cost=0.00..31.40 rows=2140
>>> width=8)
>>>                      Output: x_1.d, x_1.a
>>>                ->  Seq Scan on public.x_2  (cost=0.00..31.40 rows=2140
>>> width=8)
>>>                      Output: x_2.d, x_2.a
>>>                ->  Seq Scan on public.x_3  (cost=0.00..31.40 rows=2140
>>> width=8)
>>>                      Output: x_3.d, x_3.a
>>>                ->  Seq Scan on public.x_4  (cost=0.00..31.40 rows=2140
>>> width=8)
>>>                      Output: x_4.d, x_4.a
>>>  Planning time: 0.333 ms
>>>
>>> It can be reduced to:
>>>
>>>   sort by d
>>>   Append
>>>        Aggegate (a), d
>>>          seq scan from x_1
>>>        Aggregate (a), d
>>>          seq scan from x_2
>>>
>>> Are there some plans to use partitioning for aggregation?
>>
>> Besides min/max, what other aggregates (mean/stddev come to mind)
>> would you optimize and how would you determine which ones could be?
>> Where is that decision made?
>
>
> You can't with mean and stddev, only with associative aggregates.

associative bit just makes it easier (which is important of course!).
mean for example can be pushed down if the 'pushed down' aggregates
return to the count to the "reaggregator" so that you can weight the
final average.  that's a lot more complicated though.

merlin

Re: possible optimization: push down aggregates

From

Tom Lane

Date:

27 August 2014, 20:27:39

Merlin Moncure <mmoncure@gmail.com> writes:
> associative bit just makes it easier (which is important of course!).
> mean for example can be pushed down if the 'pushed down' aggregates
> return to the count to the "reaggregator" so that you can weight the
> final average.  that's a lot more complicated though.

The real question is what you're expecting to get out of such an
"optimization".  If the aggregate has to visit all rows then it's
not apparent to me that any win emerges from the extra complication.

We do already have optimization of min/max across inheritance trees,
and that's certainly a win because you don't have to visit all rows.

regression=# create table pp(f1 int unique);
CREATE TABLE
regression=# create table cc(unique(f1)) inherits(pp);
CREATE TABLE
regression=# create table cc2(unique(f1)) inherits(pp);
CREATE TABLE
regression=# explain select max(f1) from pp;                                                QUERY PLAN
                              
 
------------------------------------------------------------------------------------------------------------Result
(cost=0.51..0.52rows=1 width=0)  InitPlan 1 (returns $0)    ->  Limit  (cost=0.46..0.51 rows=1 width=4)          ->
MergeAppend  (cost=0.46..267.71 rows=4777 width=4)                Sort Key: pp.f1                ->  Index Only Scan
Backwardusing pp_f1_key on pp  (cost=0.12..8.14 rows=1 width=4)                      Index Cond: (f1 IS NOT NULL)
        ->  Index Only Scan Backward using cc_f1_key on cc  (cost=0.15..85.94 rows=2388 width=4)
IndexCond: (f1 IS NOT NULL)                ->  Index Only Scan Backward using cc2_f1_key on cc2  (cost=0.15..85.94
rows=2388width=4)                      Index Cond: (f1 IS NOT NULL)Planning time: 0.392 ms
 
(12 rows)

That doesn't currently extend to the GROUP BY case unfortunately.
        regards, tom lane

Re: possible optimization: push down aggregates

From

Pavel Stehule

Date:

27 August 2014, 20:57:43

2014-08-27 22:27 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:

Merlin Moncure <mmoncure@gmail.com> writes:
> associative bit just makes it easier (which is important of course!).
> mean for example can be pushed down if the 'pushed down' aggregates
> return to the count to the "reaggregator" so that you can weight the
> final average. that's a lot more complicated though.

The real question is what you're expecting to get out of such an
"optimization". If the aggregate has to visit all rows then it's
not apparent to me that any win emerges from the extra complication.

I expect a remove a hashing or sorting part of aggregation. It can reduce aggregation to seq scan only.

Pavel

We do already have optimization of min/max across inheritance trees,
and that's certainly a win because you don't have to visit all rows.

regression=# create table pp(f1 int unique);
CREATE TABLE
regression=# create table cc(unique(f1)) inherits(pp);
CREATE TABLE
regression=# create table cc2(unique(f1)) inherits(pp);
CREATE TABLE
regression=# explain select max(f1) from pp;
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Result (cost=0.51..0.52 rows=1 width=0)
InitPlan 1 (returns $0)
-> Limit (cost=0.46..0.51 rows=1 width=4)
-> Merge Append (cost=0.46..267.71 rows=4777 width=4)
Sort Key: pp.f1
-> Index Only Scan Backward using pp_f1_key on pp (cost=0.12..8.14 rows=1 width=4)
Index Cond: (f1 IS NOT NULL)
-> Index Only Scan Backward using cc_f1_key on cc (cost=0.15..85.94 rows=2388 width=4)
Index Cond: (f1 IS NOT NULL)
-> Index Only Scan Backward using cc2_f1_key on cc2 (cost=0.15..85.94 rows=2388 width=4)
Index Cond: (f1 IS NOT NULL)
Planning time: 0.392 ms
(12 rows)

That doesn't currently extend to the GROUP BY case unfortunately.

regards, tom lane

Re: possible optimization: push down aggregates

From

"Tomas Vondra"

Date:

27 August 2014, 21:25:58

On 27 Srpen 2014, 21:41, Merlin Moncure wrote:
> On Wed, Aug 27, 2014 at 2:07 PM, Pavel Stehule <pavel.stehule@gmail.com>
>>
>> Are there some plans to use partitioning for aggregation?
>
> Besides min/max, what other aggregates (mean/stddev come to mind)
> would you optimize and how would you determine which ones could be?
> Where is that decision made?
>
> For example, could user defined aggregates be pushed down if you had a
> reaggregation routine broken out from the main one?

I think that what Pavel suggests is that when you are aggregating by
   GROUP BY x

and 'x' happens to be used for partitioning (making it impossible to
groups from different partitions to overlap), then it's perfectly fine to
perform the aggregation per partition, and just append the results.

If you need sorted output, you can sort the results (assuming the
cardinality of the output is much lower than the actual data).

This "append first, then aggregate" may be the cause for switch to sort
(because of fear that the amount of group will exceed work_mem), while we
could just as fine process each partition by hash aggregate separately.

Tomas

Re: possible optimization: push down aggregates

From

Merlin Moncure

Date:

27 August 2014, 21:46:10

On Wed, Aug 27, 2014 at 3:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Merlin Moncure <mmoncure@gmail.com> writes:
>> associative bit just makes it easier (which is important of course!).
>> mean for example can be pushed down if the 'pushed down' aggregates
>> return to the count to the "reaggregator" so that you can weight the
>> final average.  that's a lot more complicated though.
>
> The real question is what you're expecting to get out of such an
> "optimization".  If the aggregate has to visit all rows then it's
> not apparent to me that any win emerges from the extra complication.
>
> We do already have optimization of min/max across inheritance trees,
> and that's certainly a win because you don't have to visit all rows.
>
> regression=# create table pp(f1 int unique);
> CREATE TABLE
> regression=# create table cc(unique(f1)) inherits(pp);
> CREATE TABLE
> regression=# create table cc2(unique(f1)) inherits(pp);
> CREATE TABLE
> regression=# explain select max(f1) from pp;
>                                                  QUERY PLAN
> ------------------------------------------------------------------------------------------------------------
>  Result  (cost=0.51..0.52 rows=1 width=0)
>    InitPlan 1 (returns $0)
>      ->  Limit  (cost=0.46..0.51 rows=1 width=4)
>            ->  Merge Append  (cost=0.46..267.71 rows=4777 width=4)
>                  Sort Key: pp.f1
>                  ->  Index Only Scan Backward using pp_f1_key on pp  (cost=0.12..8.14 rows=1 width=4)
>                        Index Cond: (f1 IS NOT NULL)
>                  ->  Index Only Scan Backward using cc_f1_key on cc  (cost=0.15..85.94 rows=2388 width=4)
>                        Index Cond: (f1 IS NOT NULL)
>                  ->  Index Only Scan Backward using cc2_f1_key on cc2  (cost=0.15..85.94 rows=2388 width=4)
>                        Index Cond: (f1 IS NOT NULL)
>  Planning time: 0.392 ms
> (12 rows)
>
> That doesn't currently extend to the GROUP BY case unfortunately.

Yeah: I was overthinking it.   My mind was on parallel processing of
the aggregate (which is not what Pavel was proposing) because that
just happens to be what I'm working on currently -- using dblink to
decompose various aggregates and distribute the calculation across
servers.  "Woudn't it nice to have to the server to that itself", I
impulsively thought.

merlin

Re: possible optimization: push down aggregates

From

Claudio Freire

Date:

27 August 2014, 21:51:19

On Wed, Aug 27, 2014 at 6:46 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>
> Yeah: I was overthinking it.   My mind was on parallel processing of
> the aggregate (which is not what Pavel was proposing) because that
> just happens to be what I'm working on currently -- using dblink to
> decompose various aggregates and distribute the calculation across
> servers.  "Woudn't it nice to have to the server to that itself", I
> impulsively thought.

But you'd have part of it too. Because then you'd have semantically
independent parallel nodes in the plan that do some meaningful data
wrangling and spit little output, whereas the previous plan did not do
much with the data and spit loads of rows as a result. This is a big
previous step for parallel execution really.

Re: possible optimization: push down aggregates

From

Craig Ringer

Date:

29 August 2014, 02:37:39

On 08/28/2014 03:46 AM, Claudio Freire wrote:
> You can't with mean and stddev, only with associative aggregates.
> 
> That's min, max, sum, bit_and, bit_or, bool_and, bool_or, count.

You could with a new helper function to merge the temporary states for
each scan though.

In the case of mean, for example, it'd just mean adding the counts and sums.

However, I'm not sure how interesting that is without the ability to
execute the subplans in parallel.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services