On 14 Srpen 2014, 18:02, Atri Sharma wrote:
> On Thursday, August 14, 2014, Jeff Davis <pgsql@j-davis.com> wrote:
>
>> On Thu, 2014-08-14 at 10:06 -0400, Tom Lane wrote:
>> > If you're following the HashJoin model, then what you do is the same
>> thing
>> > it does: you write the input tuple back out to the pending batch file
>> for
>> > the hash partition that now contains key 1001, whence it will be
>> processed
>> > when you get to that partition. I don't see that there's any special
>> case
>> > here.
>>
>> HashJoin only deals with tuples. With HashAgg, you have to deal with a
>> mix of tuples and partially-computed aggregate state values. Not
>> impossible, but it is a little more awkward than HashJoin.
>>
>>
> +1
>
> Not to mention future cases if we start maintaining multiple state
> values,in regarded to grouping sets.
So what would you do for aggregates where the state is growing quickly?
Say, things like median() or array_agg()?
I think that "we can't do that for all aggregates" does not imply "we must
not do that at all."
There will always be aggregates not implementing dumping state for various
reasons, and in those cases the proposed approach is certainly a great
improvement. I like it, and I hope it will get committed.
But maybe for aggregates supporting serialize/deserialize of the state
(including all aggregates using known types, not just fixed-size types) a
hashjoin-like batching would be better? I can name a few custom aggregates
that'd benefit tremendously from this.
Just to be clear - this is certainly non-trivial to implement, and I'm not
trying to force anyone (e.g. Jeff) to implement the ideas I proposed. I'm
ready to spend time on reviewing the current patch, implement the approach
I proposed and compare the behaviour.
Kudos to Jeff for working on this.
Tomas