On Mon, 2004-12-06 at 23:55 -0500, Tom Lane wrote:
> Bear in mind that the price of honoring sort_mem carefully is
> considerably far from zero.
I'll do some thinking about disk-based spilling for hashed aggregation
for 8.1
> The issue with the hash code is that it sets size parameters on the
> basis of the estimated input row count; the memory usage error factor
> is basically inversely proportional to the error in the planner's row
> estimate.
Right. But I don't think it's acceptable to consume an arbitrary amount
of memory to process a query, even if we only do that when the planner
makes a mistake (regrettably, planner mistakes occur with some
regularity).
As a quick hack, what about throwing away the constructed hash table and
switching to hashing for sorting if we exceed sort_mem by a significant
factor? (say, 200%) We might also want to print a warning message to the
logs.
This assumes that aggregation-by-sorting can be used in a superset of
the cases where aggregation-by-hashing can be used, and that the
semantics of both nodes are the same; I believe both conditions hold.
And of course, performance will likely suck -- but (a) since the planner
has guessed wrong performance is probably going to suck anyway (b) it is
better than running the machine OOM.
-Neil