Home > mailing lists

Re: 9.5: Memory-bounded HashAgg - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: 9.5: Memory-bounded HashAgg
Date	August 14, 2014 19:51:54
Msg-id	467ff8ab697d04cc849ecba241dc445d.squirrel@sq.gransy.com Whole thread Raw
In response to	Re: 9.5: Memory-bounded HashAgg (Atri Sharma <atri.jiit@gmail.com>)
Responses	Re: 9.5: Memory-bounded HashAgg
List	pgsql-hackers

Tree view

On 14 Srpen 2014, 18:02, Atri Sharma wrote:
> On Thursday, August 14, 2014, Jeff Davis <pgsql@j-davis.com> wrote:
>
>> On Thu, 2014-08-14 at 10:06 -0400, Tom Lane wrote:
>> > If you're following the HashJoin model, then what you do is the same
>> thing
>> > it does: you write the input tuple back out to the pending batch file
>> for
>> > the hash partition that now contains key 1001, whence it will be
>> processed
>> > when you get to that partition.  I don't see that there's any special
>> case
>> > here.
>>
>> HashJoin only deals with tuples. With HashAgg, you have to deal with a
>> mix of tuples and partially-computed aggregate state values. Not
>> impossible, but it is a little more awkward than HashJoin.
>>
>>
> +1
>
> Not to mention future cases if we start maintaining multiple state
> values,in regarded to grouping sets.

So what would you do for aggregates where the state is growing quickly?
Say, things like median() or array_agg()?

I think that "we can't do that for all aggregates" does not imply "we must
not do that at all."

There will always be aggregates not implementing dumping state for various
reasons, and in those cases the proposed approach is certainly a great
improvement. I like it, and I hope it will get committed.

But maybe for aggregates supporting serialize/deserialize of the state
(including all aggregates using known types, not just fixed-size types) a
hashjoin-like batching would be better? I can name a few custom aggregates
that'd benefit tremendously from this.

Just to be clear - this is certainly non-trivial to implement, and I'm not
trying to force anyone (e.g. Jeff) to implement the ideas I proposed. I'm
ready to spend time on reviewing the current patch, implement the approach
I proposed and compare the behaviour.

Kudos to Jeff for working on this.

Tomas

pgsql-hackers by date:

From: "Tomas Vondra"
Date: 14 August 2014, 19:37:30
Subject: Re: 9.5: Memory-bounded HashAgg

From: Bruce Momjian
Date: 14 August 2014, 19:53:00
Subject: Re: jsonb format is pessimal for toast compression

Re: 9.5: Memory-bounded HashAgg - Mailing list pgsql-hackers

Previous

Next