On Mon, Mar 26, 2012 at 5:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Hm. This illustrates that it's not too prudent to rely on a default
> numdistinct estimate to decide that a hash aggregation is safe :-(.
> We had probably better tweak the cost estimation rules to not trust
> that. Maybe, if we have a default estimate, we should take the worst
> case estimate that the column might be unique? That could still burn
> us if the rowcount estimate was horribly wrong, but those are not nearly
> as shaky as numdistinct estimates ...
Perhaps we should have two work_mem settings -- one for the target to
aim for and one for a hard(er) limit that we should ensure the worst
case falls under?
I have a sketch for how to handle spilling hash aggregates to disk in
my head. I'm not sure if it's worth the amount of complexity it would
require but I'll poke around a bit and see if it works out well.
--
greg