Attached latest performance report. Parallel aggregate is having some overhead in case of low selectivity.This can be avoided with the help of cost comparison between normal and parallel aggregates.
Hi, Thanks for posting an updated patch.
Would you be able to supply a bit more detail on your benchmark? I'm surprised by the slowdown reported with the high selectivity version. It gives me the impression that the benchmark might be producing lots of groups which need to be pushed through the tuple queue to the main process. I think it would be more interesting to see benchmarks with varying number of groups, rather than scan selectivity. Selectivity was important for parallel seqscan, but less so for this, as it's aggregated groups we're sending to main process, not individual tuples.