Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities) - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities)
Date
Msg-id 603c8f070905120627g5446fe09wfd1300c49f5d8fef@mail.gmail.com
Whole thread Raw
In response to Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities)  (Pavel Stehule <pavel.stehule@gmail.com>)
Responses Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities)  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Tue, May 12, 2009 at 2:21 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> Moreover, I guess you don't even need to buffer tuples to aggregate by
>> different keys. What you have to do is only to prepare more than one
>> hash tables (, or set up sort order if the plan detects hash table is
>> too large to fit in the memory), and one time seq scan will do. The
>> trans values are only to be stored in the memory, not the outer plan's
>> results. It will win greately in performance.
>
> it was my first solution. But I would to prepare one non hash method.
> But now I thinking about some special executor node, that fill all
> necessary hash parallel. It's special variant of hash agreggate.

I think HashAggregate will often be the fastest method of executing
this kind of operation, but it would be nice to have an alternative
(such as repeatedly sorting a tuplestore) to handle non-hashable
datatypes or cases where the HashAggregate would eat too much memory.

But that leads me to a question - does the existing HashAggregate code
make any attempt to obey work_mem?  I know that the infrastructure is
present for HashJoin/Hash, but on a quick pass I didn't notice
anything similar in HashAggregate.

And on a slightly off-topic note for this thread, is there any
compelling reason why we have at least three different hash
implementations in the executor?  HashJoin/Hash uses one for regular
batches and one for the skew batch, and I believe that HashAggregate
does something else entirely.  It seems like it might improve code
maintainability, if nothing else, to unify these to the extent
possible.

...Robert


pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: Implementation of GROUPING SETS (T431: Extended grouping capabilities)
Next
From: Alex Hunsaker
Date:
Subject: Re: DROP TABLE vs inheritance