can we force parallel workers, by setting low parallel_setup_cost or otherwise to make use of scatter gather and Partial HashAggregate(s)?
I am just assuming more workers doing things in parallel, would require less disk spill per hash aggregate (or partial hash aggregate ?) and the scatter gather at the end.
I did some runs in my demo environment, not with the same query, some group by aggregates with around 25M rows, and it showed reasonable results, not too off.