Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities) - Mailing list pgsql-hackers

From Олег Царев
Subject Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities)
Date
Msg-id 54f48e4f0908130922h4c1647b2x945017820126f82a@mail.gmail.com
Whole thread Raw
In response to Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities)  (Hitoshi Harada <umi.tanuki@gmail.com>)
Responses Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities)  (Hitoshi Harada <umi.tanuki@gmail.com>)
Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities)  (Pavel Stehule <pavel.stehule@gmail.com>)
List pgsql-hackers
2009/8/13 Hitoshi Harada <umi.tanuki@gmail.com>:
> 2009/8/8 Alvaro Herrera <alvherre@commandprompt.com>:
>> Олег Царев escribió:
>>> Hello all!
>>> If no one objecte (all agree, in other say) i continue work on patch -
>>> particulary, i want support second strategy (tuple store instead of
>>> hash-table) for save order of source (more cheap solution in case with
>>> grouping sets + order by), investigate and brainstorm another
>>> optimisation, writing regression tests and technical documentation.
>>> But I need some time for complete my investigation internals of
>>> PostgreSQL, particulary CTE.
>>
>> Where are we on this patch?  Is it moving forward?
>>
>
> It seems to me that the patch goes backward.
>
> I looked trough the gsets-0.6.diff for about an hour, and found it is
> now only a syntax sugar that builds multiple GROUP BY queries based on
> CTE functionality. There's no executor modification.
>
> If I remember correctly, the original patch touched executor parts.
> I'd buy if the GROUPING SETS touches executor but I don't if this is
> only syntax sugar, because you can write it as the same by yourself
> without GROUPING SETS syntax. The motivation we push this forward is
> performance that cannot be made by rewriting query, I guess.
>
> Because GROUP BY we have today is a subset of GROUPING SETS by
> definition, I suppose we'll refactor nodeAgg.c so that it is allowed
> to take multiple group definitions. And we must support both of
> HashAgg and GroupAgg. For HashAgg, it is easier in any case as the
> earlier patch does. For GroupAgg, it is a bit complicated since we
> sort by different key sets.
>
> When we want GROUPING SET(a, b), at first we sort by a and aggregate
> then sort by b and aggregate. This is the same as:
>
> select a, null, count(*) from x group by a
> union all
> select null, b, count(*) from x group by b
>
> so nothing better than query rewriting unless we invent something new.
>
> But in case of sub total and grand total like ROLLUP query, GroupAgg
> can do it by one-time scan by having multiple life cycle PerGroup
> state.
>
> Anyway, before going ahead we need to find rough sketch of how to
> implement this feature. Only syntax sugar is acceptable? Or internal
> executor support is necessary?
>
>
> Regards,
>
>
> --
> Hitoshi Harada
>

All rights, exclude
> Because GROUP BY we have today is a subset of GROUPING SETS by
> definition, I suppose we'll refactor nodeAgg.c so that it is allowed
> to take multiple group definitions. And we must support both of
> HashAgg and GroupAgg. For HashAgg, it is easier in any case as the
> earlier patch does. For GroupAgg, it is a bit complicated since we
> sort by different key sets.
because group by it's optimized version of grouping sets.
Of course, we can extend the current definition of group by, but we
regress perfomance of it.
Some questions for you:

How calcualte aggregation on ROLLUP on single pass?
Stupid way - store different buffer of aggregations for every group,
and accumulate every record on group for every calculator. When a
group has changed, return key of this group to output set with  NULL
for fields not contains in this group, and restart current buffer of
aggregation.
Better way - add operation "merge aggregations", and calculate one
buffer on every group, when group has cnahged - merge this "main
buffer" to other, and return some intermediate result.

I think, support this of grouping operation isn't simple, and
different implementation of ROLLUP it's better.

Regards, Tsarev Oleg


pgsql-hackers by date:

Previous
From: Hitoshi Harada
Date:
Subject: Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities)
Next
From: Pavel Stehule
Date:
Subject: Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities)