On Tue, Oct 13, 2015 at 12:14 PM, Robert Haas <robertmhaas@gmail.com> wrote: > Also, I think the path for parallel aggregation should probably be > something like FinalizeAgg -> Gather -> PartialAgg -> some partial > path here. I'm not clear whether that is what you are thinking or > not.
No. I am thinking of the following way. Gather->partialagg->some partial path
I want the Gather node to merge the results coming from all workers, otherwise it may be difficult to merge at parent of gather node. Because in case the partial group aggregate is under the Gather node, if any of two workers are returning same group key data, we need to compare them and combine it to make it a single group. If we are at Gather node, it is possible that we can wait till we get slots from all workers. Once all workers returns the slots we can compare and merge the necessary slots and return the result. Am I missing something?
My assumption is the same as Robert's here.
Unless I've misunderstood, it sounds like you're proposing to add logic into the Gather node to handle final aggregation? That sounds like a modularity violation of the whole node concept.
The handling of the final aggregate stage is not all that different from the initial aggregate stage. The primary difference is just that your calling the combine function instead of the transition function, and the values being aggregated are aggregates states rather than the type of the values which were initially aggregated. The handling of GROUP BY is all the same, yet you only apply the HAVING clause during final aggregation. This is why I ended up implementing this in nodeAgg.c instead of inventing some new node type that's mostly a copy and paste of nodeAgg.c [1]
If you're performing a hash aggregate you need to wait until all the partially aggregated groups are received anyway. If you're doing a sort/agg then you'll need to sort again after the Gather node.