There is a simple case that isn't being handled. A straight DISTINCT is
exactly equivalent to a GROUP BY on all the columns listed. Right now it's
still doing a Sort+Unique. The HashAgg seems to be about 2-3 times as fast for
me.
Actually a DISTINCT ON should also be possible to do as a hashagg as long as
there's no ORDER BY, though I guess that would be fairly uncommon since it's
not that useful.
slo=> explain select distinct id from tab;
QUERY PLAN
---------------------------------------------------------------------------
Unique (cost=3731.53..3932.49 rows=146 width=4)
-> Sort (cost=3731.53..3832.01 rows=40192 width=4)
Sort Key: id
-> Seq Scan on tab (cost=0.00..657.92 rows=40192 width=4)
slo=> explain select id from tab group by id;
QUERY PLAN
---------------------------------------------------------------------
HashAggregate (cost=758.40..758.40 rows=146 width=4)
-> Seq Scan on tab (cost=0.00..657.92 rows=40192 width=4)
--
greg