Stefan,
> Some testing showed that the planner was seriously underestimating the
> number of distinct rows in the table (with the default statistic target
> it estimated ~150k rows while there are about 19M distinct values) and
> chosing a hashagg for the aggregate.
> uping the statistics target to 1000 improves the estimate to about 5M
> rows which unfortunably is still not enough to cause the planner to
> switch to a groupagg with work_mem set to 256000.
Well, it's pretty well-known that we need to fix n-distinct estimation.
But we also need to handle it gracefully if the estimate is still wrong
and we start using too much memory. Is there any way we can check how
much memory the hashagg actually *is* using and spill to disk if it goes
over work_mem?
--
--Josh
Josh Berkus
PostgreSQL @ Sun
San Francisco