In cost_size.c, there is this comment block:
+ * Note: in this cost model, AGG_SORTED and AGG_HASHED have exactly the
+ * same total CPU cost, but AGG_SORTED has lower startup cost. If the
+ * input path is already sorted appropriately, AGG_SORTED should be
+ * preferred (since it has no risk of memory overflow). This will happen
+ * as long as the computed total costs are indeed exactly equal --- but
+ * if there's roundoff error we might do the wrong thing. So be sure
+ * that the computations below form the same intermediate values in the
+ * same order.
But, why should they have the same cost in the first place? A sorted aggregation just has to do an equality comparison on each adjacent pair, which is costed as (cpu_operator_cost * numGroupCols) * input_tuples. A hash aggregation has to do a hashcode computation for each input, which apparently is also costed at (cpu_operator_cost * numGroupCols) * input_tuples. But it also has to do the equality comparison between the input tuple and any aggregation it is about to be aggregated into, to make sure the hashcode is not a false collision. That should be another (cpu_operator_cost * numGroupCols) * (input_tuples - numGroups), shouldn't it? Currently, that is apparently assumed to be free.
Is there something here I am overlooking?
Cheers,
Jeff