Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities) - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities)
Date
Msg-id 162867790908130959n9e4da70ubb47b7003f936926@mail.gmail.com
Whole thread Raw
In response to Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities)  (Hitoshi Harada <umi.tanuki@gmail.com>)
Responses Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities)  (Hitoshi Harada <umi.tanuki@gmail.com>)
List pgsql-hackers
2009/8/13 Hitoshi Harada <umi.tanuki@gmail.com>:
> 2009/8/8 Alvaro Herrera <alvherre@commandprompt.com>:
>> Олег Царев escribió:
>>> Hello all!
>>> If no one objecte (all agree, in other say) i continue work on patch -
>>> particulary, i want support second strategy (tuple store instead of
>>> hash-table) for save order of source (more cheap solution in case with
>>> grouping sets + order by), investigate and brainstorm another
>>> optimisation, writing regression tests and technical documentation.
>>> But I need some time for complete my investigation internals of
>>> PostgreSQL, particulary CTE.
>>
>> Where are we on this patch?  Is it moving forward?
>>
>
> It seems to me that the patch goes backward.

little bit.
>
> I looked trough the gsets-0.6.diff for about an hour, and found it is
> now only a syntax sugar that builds multiple GROUP BY queries based on
> CTE functionality. There's no executor modification.
>

I wrote older version in time when CTE wasn't implemented. The old
patch had own executor node, and was based on creating hash table per
group. This patch was maybe little bit faster than 0.6, but had lot of
bugs. Who knows planner code for grouping, then have to agree with me.
This code isn't readable and I wouldn't to it more complicated and
less readable. So I had idea, to join grouping sets with CTE. Grouping
sets is subset of CTE, so it is possible. Grouping sets is non
recursive CTE generally, and (I believe) this should be optimized
together.

I prefered using CTE, because this way was the most short to small
bugs less prototype - with full functionality.

> If I remember correctly, the original patch touched executor parts.
> I'd buy if the GROUPING SETS touches executor but I don't if this is
> only syntax sugar, because you can write it as the same by yourself
> without GROUPING SETS syntax. The motivation we push this forward is
> performance that cannot be made by rewriting query, I guess.
>

I don't thing, so you can do simply transformation from grouping sets
syntax to CTE. And what's more. Why you have optimized grouping sets
and not optimized non recursive CTE?

> Because GROUP BY we have today is a subset of GROUPING SETS by
> definition, I suppose we'll refactor nodeAgg.c so that it is allowed
> to take multiple group definitions. And we must support both of
> HashAgg and GroupAgg. For HashAgg, it is easier in any case as the
> earlier patch does. For GroupAgg, it is a bit complicated since we
> sort by different key sets.
>

This way is possible too. But needs absolutely grouping planner and
executor reworking. Maybe is time do it. It is work for somebody other
to me. My place is stored procedures.

> When we want GROUPING SET(a, b), at first we sort by a and aggregate
> then sort by b and aggregate. This is the same as:
>
> select a, null, count(*) from x group by a
> union all
> select null, b, count(*) from x group by b
>
> so nothing better than query rewriting unless we invent something new.
>

the problem is when x is subquery. Then is better using CTE, because
we don't need repeat x evaluation twice. The most typical use case is,
so x isn't table.

> But in case of sub total and grand total like ROLLUP query, GroupAgg
> can do it by one-time scan by having multiple life cycle PerGroup
> state.

>
> Anyway, before going ahead we need to find rough sketch of how to
> implement this feature. Only syntax sugar is acceptable? Or internal
> executor support is necessary?

I thing, so both ways are possible. Probably the most clean way is
total refactoring of grouping executor and grouping planner. I am not
sure if we need new nodes. There are all. But these nodes cannot work
paralel now.


>
>
> Regards,
>
>
> --
> Hitoshi Harada
>


pgsql-hackers by date:

Previous
From: Олег Царев
Date:
Subject: Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities)
Next
From: Hitoshi Harada
Date:
Subject: Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities)