Thread: possible optimization: push down aggregates
Hi
one user asked about using a partitioning for faster aggregates queries.create table x1(a int, d date);
create table x_1 ( check(d = '2014-01-01'::date)) inherits(x1);
create table x_2 ( check(d = '2014-01-02'::date)) inherits(x1);
create table x_3 ( check(d = '2014-01-03'::date)) inherits(x1);
postgres=# explain verbose select max(a) from x1 group by d order by d;
QUERY PLAN
--------------------------------------------------------------------------------
GroupAggregate (cost=684.79..750.99 rows=200 width=8)
Output: max(x1.a), x1.d
Group Key: x1.d
-> Sort (cost=684.79..706.19 rows=8561 width=8)
Output: x1.d, x1.a
Sort Key: x1.d
-> Append (cost=0.00..125.60 rows=8561 width=8)
-> Seq Scan on public.x1 (cost=0.00..0.00 rows=1 width=8)
Output: x1.d, x1.a
-> Seq Scan on public.x_1 (cost=0.00..31.40 rows=2140 width=8)
Output: x_1.d, x_1.a
-> Seq Scan on public.x_2 (cost=0.00..31.40 rows=2140 width=8)
Output: x_2.d, x_2.a
-> Seq Scan on public.x_3 (cost=0.00..31.40 rows=2140 width=8)
Output: x_3.d, x_3.a
-> Seq Scan on public.x_4 (cost=0.00..31.40 rows=2140 width=8)
Output: x_4.d, x_4.a
Planning time: 0.333 ms
On Wed, Aug 27, 2014 at 2:07 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > Hi > > one user asked about using a partitioning for faster aggregates queries. > > I found so there is not any optimization. > > create table x1(a int, d date); > create table x_1 ( check(d = '2014-01-01'::date)) inherits(x1); > create table x_2 ( check(d = '2014-01-02'::date)) inherits(x1); > create table x_3 ( check(d = '2014-01-03'::date)) inherits(x1); > > When I have this schema, then optimizer try to do > > postgres=# explain verbose select max(a) from x1 group by d order by d; > QUERY PLAN > -------------------------------------------------------------------------------- > GroupAggregate (cost=684.79..750.99 rows=200 width=8) > Output: max(x1.a), x1.d > Group Key: x1.d > -> Sort (cost=684.79..706.19 rows=8561 width=8) > Output: x1.d, x1.a > Sort Key: x1.d > -> Append (cost=0.00..125.60 rows=8561 width=8) > -> Seq Scan on public.x1 (cost=0.00..0.00 rows=1 width=8) > Output: x1.d, x1.a > -> Seq Scan on public.x_1 (cost=0.00..31.40 rows=2140 > width=8) > Output: x_1.d, x_1.a > -> Seq Scan on public.x_2 (cost=0.00..31.40 rows=2140 > width=8) > Output: x_2.d, x_2.a > -> Seq Scan on public.x_3 (cost=0.00..31.40 rows=2140 > width=8) > Output: x_3.d, x_3.a > -> Seq Scan on public.x_4 (cost=0.00..31.40 rows=2140 > width=8) > Output: x_4.d, x_4.a > Planning time: 0.333 ms > > It can be reduced to: > > sort by d > Append > Aggegate (a), d > seq scan from x_1 > Aggregate (a), d > seq scan from x_2 > > Are there some plans to use partitioning for aggregation? Besides min/max, what other aggregates (mean/stddev come to mind) would you optimize and how would you determine which ones could be? Where is that decision made? For example, could user defined aggregates be pushed down if you had a reaggregation routine broken out from the main one? merlin
On Wed, Aug 27, 2014 at 4:41 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Wed, Aug 27, 2014 at 2:07 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> Hi >> >> one user asked about using a partitioning for faster aggregates queries. >> >> I found so there is not any optimization. >> >> create table x1(a int, d date); >> create table x_1 ( check(d = '2014-01-01'::date)) inherits(x1); >> create table x_2 ( check(d = '2014-01-02'::date)) inherits(x1); >> create table x_3 ( check(d = '2014-01-03'::date)) inherits(x1); >> >> When I have this schema, then optimizer try to do >> >> postgres=# explain verbose select max(a) from x1 group by d order by d; >> QUERY PLAN >> -------------------------------------------------------------------------------- >> GroupAggregate (cost=684.79..750.99 rows=200 width=8) >> Output: max(x1.a), x1.d >> Group Key: x1.d >> -> Sort (cost=684.79..706.19 rows=8561 width=8) >> Output: x1.d, x1.a >> Sort Key: x1.d >> -> Append (cost=0.00..125.60 rows=8561 width=8) >> -> Seq Scan on public.x1 (cost=0.00..0.00 rows=1 width=8) >> Output: x1.d, x1.a >> -> Seq Scan on public.x_1 (cost=0.00..31.40 rows=2140 >> width=8) >> Output: x_1.d, x_1.a >> -> Seq Scan on public.x_2 (cost=0.00..31.40 rows=2140 >> width=8) >> Output: x_2.d, x_2.a >> -> Seq Scan on public.x_3 (cost=0.00..31.40 rows=2140 >> width=8) >> Output: x_3.d, x_3.a >> -> Seq Scan on public.x_4 (cost=0.00..31.40 rows=2140 >> width=8) >> Output: x_4.d, x_4.a >> Planning time: 0.333 ms >> >> It can be reduced to: >> >> sort by d >> Append >> Aggegate (a), d >> seq scan from x_1 >> Aggregate (a), d >> seq scan from x_2 >> >> Are there some plans to use partitioning for aggregation? > > Besides min/max, what other aggregates (mean/stddev come to mind) > would you optimize and how would you determine which ones could be? > Where is that decision made? You can't with mean and stddev, only with associative aggregates. That's min, max, sum, bit_and, bit_or, bool_and, bool_or, count.
2014-08-27 21:41 GMT+02:00 Merlin Moncure <mmoncure@gmail.com>:
Besides min/max, what other aggregates (mean/stddev come to mind)On Wed, Aug 27, 2014 at 2:07 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Hi
>
> one user asked about using a partitioning for faster aggregates queries.
>
> I found so there is not any optimization.
>
> create table x1(a int, d date);
> create table x_1 ( check(d = '2014-01-01'::date)) inherits(x1);
> create table x_2 ( check(d = '2014-01-02'::date)) inherits(x1);
> create table x_3 ( check(d = '2014-01-03'::date)) inherits(x1);
>
> When I have this schema, then optimizer try to do
>
> postgres=# explain verbose select max(a) from x1 group by d order by d;
> QUERY PLAN
> --------------------------------------------------------------------------------
> GroupAggregate (cost=684.79..750.99 rows=200 width=8)
> Output: max(x1.a), x1.d
> Group Key: x1.d
> -> Sort (cost=684.79..706.19 rows=8561 width=8)
> Output: x1.d, x1.a
> Sort Key: x1.d
> -> Append (cost=0.00..125.60 rows=8561 width=8)
> -> Seq Scan on public.x1 (cost=0.00..0.00 rows=1 width=8)
> Output: x1.d, x1.a
> -> Seq Scan on public.x_1 (cost=0.00..31.40 rows=2140
> width=8)
> Output: x_1.d, x_1.a
> -> Seq Scan on public.x_2 (cost=0.00..31.40 rows=2140
> width=8)
> Output: x_2.d, x_2.a
> -> Seq Scan on public.x_3 (cost=0.00..31.40 rows=2140
> width=8)
> Output: x_3.d, x_3.a
> -> Seq Scan on public.x_4 (cost=0.00..31.40 rows=2140
> width=8)
> Output: x_4.d, x_4.a
> Planning time: 0.333 ms
>
> It can be reduced to:
>
> sort by d
> Append
> Aggegate (a), d
> seq scan from x_1
> Aggregate (a), d
> seq scan from x_2
>
> Are there some plans to use partitioning for aggregation?
would you optimize and how would you determine which ones could be?
Where is that decision made?
I am thinking so all aggregates are possible
when you have a partitions by column X -- then you have a natural sets by X,
so you can directly calculate any aggregates on any column when GROUP BY clause is a "GROUP BY X"
isn't it?
probably some similar optimizations are possible when you have "GROUP BY X,Y" -- minimally you have more sets, and you can do aggregations on smaller sets.
Pavel
For example, could user defined aggregates be pushed down if you had a
reaggregation routine broken out from the main one?
merlin
2014-08-27 21:46 GMT+02:00 Claudio Freire <klaussfreire@gmail.com>:
You can't with mean and stddev, only with associative aggregates.On Wed, Aug 27, 2014 at 4:41 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Wed, Aug 27, 2014 at 2:07 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> Hi
>>
>> one user asked about using a partitioning for faster aggregates queries.
>>
>> I found so there is not any optimization.
>>
>> create table x1(a int, d date);
>> create table x_1 ( check(d = '2014-01-01'::date)) inherits(x1);
>> create table x_2 ( check(d = '2014-01-02'::date)) inherits(x1);
>> create table x_3 ( check(d = '2014-01-03'::date)) inherits(x1);
>>
>> When I have this schema, then optimizer try to do
>>
>> postgres=# explain verbose select max(a) from x1 group by d order by d;
>> QUERY PLAN
>> --------------------------------------------------------------------------------
>> GroupAggregate (cost=684.79..750.99 rows=200 width=8)
>> Output: max(x1.a), x1.d
>> Group Key: x1.d
>> -> Sort (cost=684.79..706.19 rows=8561 width=8)
>> Output: x1.d, x1.a
>> Sort Key: x1.d
>> -> Append (cost=0.00..125.60 rows=8561 width=8)
>> -> Seq Scan on public.x1 (cost=0.00..0.00 rows=1 width=8)
>> Output: x1.d, x1.a
>> -> Seq Scan on public.x_1 (cost=0.00..31.40 rows=2140
>> width=8)
>> Output: x_1.d, x_1.a
>> -> Seq Scan on public.x_2 (cost=0.00..31.40 rows=2140
>> width=8)
>> Output: x_2.d, x_2.a
>> -> Seq Scan on public.x_3 (cost=0.00..31.40 rows=2140
>> width=8)
>> Output: x_3.d, x_3.a
>> -> Seq Scan on public.x_4 (cost=0.00..31.40 rows=2140
>> width=8)
>> Output: x_4.d, x_4.a
>> Planning time: 0.333 ms
>>
>> It can be reduced to:
>>
>> sort by d
>> Append
>> Aggegate (a), d
>> seq scan from x_1
>> Aggregate (a), d
>> seq scan from x_2
>>
>> Are there some plans to use partitioning for aggregation?
>
> Besides min/max, what other aggregates (mean/stddev come to mind)
> would you optimize and how would you determine which ones could be?
> Where is that decision made?
That's min, max, sum, bit_and, bit_or, bool_and, bool_or, count.
I don't think
I have a partitions by X .. and my query has group by clause GROUP BY X
so I can calculate any aggregate
Pavel
On Wed, Aug 27, 2014 at 2:46 PM, Claudio Freire <klaussfreire@gmail.com> wrote: > On Wed, Aug 27, 2014 at 4:41 PM, Merlin Moncure <mmoncure@gmail.com> wrote: >> On Wed, Aug 27, 2014 at 2:07 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: >>> Hi >>> >>> one user asked about using a partitioning for faster aggregates queries. >>> >>> I found so there is not any optimization. >>> >>> create table x1(a int, d date); >>> create table x_1 ( check(d = '2014-01-01'::date)) inherits(x1); >>> create table x_2 ( check(d = '2014-01-02'::date)) inherits(x1); >>> create table x_3 ( check(d = '2014-01-03'::date)) inherits(x1); >>> >>> When I have this schema, then optimizer try to do >>> >>> postgres=# explain verbose select max(a) from x1 group by d order by d; >>> QUERY PLAN >>> -------------------------------------------------------------------------------- >>> GroupAggregate (cost=684.79..750.99 rows=200 width=8) >>> Output: max(x1.a), x1.d >>> Group Key: x1.d >>> -> Sort (cost=684.79..706.19 rows=8561 width=8) >>> Output: x1.d, x1.a >>> Sort Key: x1.d >>> -> Append (cost=0.00..125.60 rows=8561 width=8) >>> -> Seq Scan on public.x1 (cost=0.00..0.00 rows=1 width=8) >>> Output: x1.d, x1.a >>> -> Seq Scan on public.x_1 (cost=0.00..31.40 rows=2140 >>> width=8) >>> Output: x_1.d, x_1.a >>> -> Seq Scan on public.x_2 (cost=0.00..31.40 rows=2140 >>> width=8) >>> Output: x_2.d, x_2.a >>> -> Seq Scan on public.x_3 (cost=0.00..31.40 rows=2140 >>> width=8) >>> Output: x_3.d, x_3.a >>> -> Seq Scan on public.x_4 (cost=0.00..31.40 rows=2140 >>> width=8) >>> Output: x_4.d, x_4.a >>> Planning time: 0.333 ms >>> >>> It can be reduced to: >>> >>> sort by d >>> Append >>> Aggegate (a), d >>> seq scan from x_1 >>> Aggregate (a), d >>> seq scan from x_2 >>> >>> Are there some plans to use partitioning for aggregation? >> >> Besides min/max, what other aggregates (mean/stddev come to mind) >> would you optimize and how would you determine which ones could be? >> Where is that decision made? > > > You can't with mean and stddev, only with associative aggregates. associative bit just makes it easier (which is important of course!). mean for example can be pushed down if the 'pushed down' aggregates return to the count to the "reaggregator" so that you can weight the final average. that's a lot more complicated though. merlin
Merlin Moncure <mmoncure@gmail.com> writes: > associative bit just makes it easier (which is important of course!). > mean for example can be pushed down if the 'pushed down' aggregates > return to the count to the "reaggregator" so that you can weight the > final average. that's a lot more complicated though. The real question is what you're expecting to get out of such an "optimization". If the aggregate has to visit all rows then it's not apparent to me that any win emerges from the extra complication. We do already have optimization of min/max across inheritance trees, and that's certainly a win because you don't have to visit all rows. regression=# create table pp(f1 int unique); CREATE TABLE regression=# create table cc(unique(f1)) inherits(pp); CREATE TABLE regression=# create table cc2(unique(f1)) inherits(pp); CREATE TABLE regression=# explain select max(f1) from pp; QUERY PLAN ------------------------------------------------------------------------------------------------------------Result (cost=0.51..0.52rows=1 width=0) InitPlan 1 (returns $0) -> Limit (cost=0.46..0.51 rows=1 width=4) -> MergeAppend (cost=0.46..267.71 rows=4777 width=4) Sort Key: pp.f1 -> Index Only Scan Backwardusing pp_f1_key on pp (cost=0.12..8.14 rows=1 width=4) Index Cond: (f1 IS NOT NULL) -> Index Only Scan Backward using cc_f1_key on cc (cost=0.15..85.94 rows=2388 width=4) IndexCond: (f1 IS NOT NULL) -> Index Only Scan Backward using cc2_f1_key on cc2 (cost=0.15..85.94 rows=2388width=4) Index Cond: (f1 IS NOT NULL)Planning time: 0.392 ms (12 rows) That doesn't currently extend to the GROUP BY case unfortunately. regards, tom lane
2014-08-27 22:27 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:
Merlin Moncure <mmoncure@gmail.com> writes:The real question is what you're expecting to get out of such an
> associative bit just makes it easier (which is important of course!).
> mean for example can be pushed down if the 'pushed down' aggregates
> return to the count to the "reaggregator" so that you can weight the
> final average. that's a lot more complicated though.
"optimization". If the aggregate has to visit all rows then it's
not apparent to me that any win emerges from the extra complication.
I expect a remove a hashing or sorting part of aggregation. It can reduce aggregation to seq scan only.
Pavel
We do already have optimization of min/max across inheritance trees,
and that's certainly a win because you don't have to visit all rows.
regression=# create table pp(f1 int unique);
CREATE TABLE
regression=# create table cc(unique(f1)) inherits(pp);
CREATE TABLE
regression=# create table cc2(unique(f1)) inherits(pp);
CREATE TABLE
regression=# explain select max(f1) from pp;
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Result (cost=0.51..0.52 rows=1 width=0)
InitPlan 1 (returns $0)
-> Limit (cost=0.46..0.51 rows=1 width=4)
-> Merge Append (cost=0.46..267.71 rows=4777 width=4)
Sort Key: pp.f1
-> Index Only Scan Backward using pp_f1_key on pp (cost=0.12..8.14 rows=1 width=4)
Index Cond: (f1 IS NOT NULL)
-> Index Only Scan Backward using cc_f1_key on cc (cost=0.15..85.94 rows=2388 width=4)
Index Cond: (f1 IS NOT NULL)
-> Index Only Scan Backward using cc2_f1_key on cc2 (cost=0.15..85.94 rows=2388 width=4)
Index Cond: (f1 IS NOT NULL)
Planning time: 0.392 ms
(12 rows)
That doesn't currently extend to the GROUP BY case unfortunately.
regards, tom lane
On 27 Srpen 2014, 21:41, Merlin Moncure wrote: > On Wed, Aug 27, 2014 at 2:07 PM, Pavel Stehule <pavel.stehule@gmail.com> >> >> Are there some plans to use partitioning for aggregation? > > Besides min/max, what other aggregates (mean/stddev come to mind) > would you optimize and how would you determine which ones could be? > Where is that decision made? > > For example, could user defined aggregates be pushed down if you had a > reaggregation routine broken out from the main one? I think that what Pavel suggests is that when you are aggregating by GROUP BY x and 'x' happens to be used for partitioning (making it impossible to groups from different partitions to overlap), then it's perfectly fine to perform the aggregation per partition, and just append the results. If you need sorted output, you can sort the results (assuming the cardinality of the output is much lower than the actual data). This "append first, then aggregate" may be the cause for switch to sort (because of fear that the amount of group will exceed work_mem), while we could just as fine process each partition by hash aggregate separately. Tomas
On Wed, Aug 27, 2014 at 3:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Merlin Moncure <mmoncure@gmail.com> writes: >> associative bit just makes it easier (which is important of course!). >> mean for example can be pushed down if the 'pushed down' aggregates >> return to the count to the "reaggregator" so that you can weight the >> final average. that's a lot more complicated though. > > The real question is what you're expecting to get out of such an > "optimization". If the aggregate has to visit all rows then it's > not apparent to me that any win emerges from the extra complication. > > We do already have optimization of min/max across inheritance trees, > and that's certainly a win because you don't have to visit all rows. > > regression=# create table pp(f1 int unique); > CREATE TABLE > regression=# create table cc(unique(f1)) inherits(pp); > CREATE TABLE > regression=# create table cc2(unique(f1)) inherits(pp); > CREATE TABLE > regression=# explain select max(f1) from pp; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------ > Result (cost=0.51..0.52 rows=1 width=0) > InitPlan 1 (returns $0) > -> Limit (cost=0.46..0.51 rows=1 width=4) > -> Merge Append (cost=0.46..267.71 rows=4777 width=4) > Sort Key: pp.f1 > -> Index Only Scan Backward using pp_f1_key on pp (cost=0.12..8.14 rows=1 width=4) > Index Cond: (f1 IS NOT NULL) > -> Index Only Scan Backward using cc_f1_key on cc (cost=0.15..85.94 rows=2388 width=4) > Index Cond: (f1 IS NOT NULL) > -> Index Only Scan Backward using cc2_f1_key on cc2 (cost=0.15..85.94 rows=2388 width=4) > Index Cond: (f1 IS NOT NULL) > Planning time: 0.392 ms > (12 rows) > > That doesn't currently extend to the GROUP BY case unfortunately. Yeah: I was overthinking it. My mind was on parallel processing of the aggregate (which is not what Pavel was proposing) because that just happens to be what I'm working on currently -- using dblink to decompose various aggregates and distribute the calculation across servers. "Woudn't it nice to have to the server to that itself", I impulsively thought. merlin
On Wed, Aug 27, 2014 at 6:46 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > > Yeah: I was overthinking it. My mind was on parallel processing of > the aggregate (which is not what Pavel was proposing) because that > just happens to be what I'm working on currently -- using dblink to > decompose various aggregates and distribute the calculation across > servers. "Woudn't it nice to have to the server to that itself", I > impulsively thought. But you'd have part of it too. Because then you'd have semantically independent parallel nodes in the plan that do some meaningful data wrangling and spit little output, whereas the previous plan did not do much with the data and spit loads of rows as a result. This is a big previous step for parallel execution really.
On 08/28/2014 03:46 AM, Claudio Freire wrote: > You can't with mean and stddev, only with associative aggregates. > > That's min, max, sum, bit_and, bit_or, bool_and, bool_or, count. You could with a new helper function to merge the temporary states for each scan though. In the case of mean, for example, it'd just mean adding the counts and sums. However, I'm not sure how interesting that is without the ability to execute the subplans in parallel. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services