Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities) - Mailing list pgsql-hackers

From Hitoshi Harada
Subject Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities)
Date
Msg-id e08cc0400908131005o65f51611pd8c899a813c9316a@mail.gmail.com
Whole thread Raw
In response to Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities)  (Олег Царев <zabivator@gmail.com>)
List pgsql-hackers
2009/8/14 Олег Царев <zabivator@gmail.com>:
> All rights, exclude
>> Because GROUP BY we have today is a subset of GROUPING SETS by
>> definition, I suppose we'll refactor nodeAgg.c so that it is allowed
>> to take multiple group definitions. And we must support both of
>> HashAgg and GroupAgg. For HashAgg, it is easier in any case as the
>> earlier patch does. For GroupAgg, it is a bit complicated since we
>> sort by different key sets.
> because group by it's optimized version of grouping sets.
> Of course, we can extend the current definition of group by, but we
> regress perfomance of it.
> Some questions for you:
>
> How calcualte aggregation on ROLLUP on single pass?

I'd imagine such like:

select a, b, count(*) from x group by rollup(a, b);

PerGroup all = init_agg(), a = init_agg(), ab = init_agg();
while(row = fetch()){ if(group_is_changed(ab, row)){   result_ab = finalize_agg(ab);   ab = init_agg(); }
if(group_is_changed(a,row)){   result_a = finalize_agg(a);   a = init_agg(); } advance_agg(all, row); advance_agg(a,
row);advance_agg(ab, row); 
}
result_all = finalize_agg(all);

of course you should care best way to return result row and continue
aggregates and the number of grouping key varies from 1 to many, it is
quite possible. And normal GROUP BY is a case of key = a only, there
won't be performance regression.

> Better way - add operation "merge aggregations", and calculate one
> buffer on every group, when group has cnahged - merge this "main
> buffer" to other, and return some intermediate result.

"Merge aggregates" sounds fascinating to me in not only this feature
but also partitioned table aggregates. But adding another function
("merge function?") to the current aggregate system is quite far way.

>
> I think, support this of grouping operation isn't simple, and
> different implementation of ROLLUP it's better.

Surely not simple. Adding another node is one of the choices, but from
code maintenance point of view I feel it is better to integrate it
into nodeAgg. nodeWindowAgg and nodeAgg have similar aggregate
processes but don't share it so a bug fix in nodeAgg isn't completed
in itself but we must re-check nodeWindowAgg also. To add another
agg-like node *may* be kind of nightmare.


Regards,

--
Hitoshi Harada


pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities)
Next
From: Pavel Stehule
Date:
Subject: Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities)