Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities) - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities)
Date
Msg-id 162867790908131005j54312e26h5fb12525f64b2c50@mail.gmail.com
Whole thread Raw
In response to Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities)  (Олег Царев <zabivator@gmail.com>)
List pgsql-hackers
2009/8/13 Олег Царев <zabivator@gmail.com>:
> 2009/8/13 Hitoshi Harada <umi.tanuki@gmail.com>:
>> 2009/8/8 Alvaro Herrera <alvherre@commandprompt.com>:
>>> Олег Царев escribió:
>>>> Hello all!
>>>> If no one objecte (all agree, in other say) i continue work on patch -
>>>> particulary, i want support second strategy (tuple store instead of
>>>> hash-table) for save order of source (more cheap solution in case with
>>>> grouping sets + order by), investigate and brainstorm another
>>>> optimisation, writing regression tests and technical documentation.
>>>> But I need some time for complete my investigation internals of
>>>> PostgreSQL, particulary CTE.
>>>
>>> Where are we on this patch?  Is it moving forward?
>>>
>>
>> It seems to me that the patch goes backward.
>>
>> I looked trough the gsets-0.6.diff for about an hour, and found it is
>> now only a syntax sugar that builds multiple GROUP BY queries based on
>> CTE functionality. There's no executor modification.
>>
>> If I remember correctly, the original patch touched executor parts.
>> I'd buy if the GROUPING SETS touches executor but I don't if this is
>> only syntax sugar, because you can write it as the same by yourself
>> without GROUPING SETS syntax. The motivation we push this forward is
>> performance that cannot be made by rewriting query, I guess.
>>
>> Because GROUP BY we have today is a subset of GROUPING SETS by
>> definition, I suppose we'll refactor nodeAgg.c so that it is allowed
>> to take multiple group definitions. And we must support both of
>> HashAgg and GroupAgg. For HashAgg, it is easier in any case as the
>> earlier patch does. For GroupAgg, it is a bit complicated since we
>> sort by different key sets.
>>
>> When we want GROUPING SET(a, b), at first we sort by a and aggregate
>> then sort by b and aggregate. This is the same as:
>>
>> select a, null, count(*) from x group by a
>> union all
>> select null, b, count(*) from x group by b
>>
>> so nothing better than query rewriting unless we invent something new.
>>
>> But in case of sub total and grand total like ROLLUP query, GroupAgg
>> can do it by one-time scan by having multiple life cycle PerGroup
>> state.
>>
>> Anyway, before going ahead we need to find rough sketch of how to
>> implement this feature. Only syntax sugar is acceptable? Or internal
>> executor support is necessary?
>>
>>
>> Regards,
>>
>>
>> --
>> Hitoshi Harada
>>
>
> All rights, exclude
>> Because GROUP BY we have today is a subset of GROUPING SETS by
>> definition, I suppose we'll refactor nodeAgg.c so that it is allowed
>> to take multiple group definitions. And we must support both of
>> HashAgg and GroupAgg. For HashAgg, it is easier in any case as the
>> earlier patch does. For GroupAgg, it is a bit complicated since we
>> sort by different key sets.
> because group by it's optimized version of grouping sets.
> Of course, we can extend the current definition of group by, but we
> regress perfomance of it.
> Some questions for you:
>
> How calcualte aggregation on ROLLUP on single pass?
> Stupid way - store different buffer of aggregations for every group,
> and accumulate every record on group for every calculator. When a
> group has changed, return key of this group to output set with  NULL
> for fields not contains in this group, and restart current buffer of
> aggregation.
> Better way - add operation "merge aggregations", and calculate one
> buffer on every group, when group has cnahged - merge this "main
> buffer" to other, and return some intermediate result.
>

I don't thing, so this is possible for all operations. Don't forgot.
People can to implement own aggregates. example: weighted average

regards
Pavel Stehule

> I think, support this of grouping operation isn't simple, and
> different implementation of ROLLUP it's better.
>
> Regards, Tsarev Oleg
>


pgsql-hackers by date:

Previous
From: Hitoshi Harada
Date:
Subject: Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities)
Next
From: Josh Berkus
Date:
Subject: Re: Hot standby and synchronous replication status