On Tue, Jul 14, 2020 at 6:49 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Maybe I missed your point here. The problem is not so much that we'll
> get HashAggs that spill -- there is nothing intrinsically wrong with
> that. While it's true that the I/O pattern is not as sequential as a
> similar group agg + sort, that doesn't seem like the really important
> factor here. The really important factor is that in-memory HashAggs
> can be blazingly fast relative to *any* alternative strategy -- be it
> a HashAgg that spills, or a group aggregate + sort that doesn't spill,
> whatever. We're mostly concerned about keeping the one available fast
> strategy than we are about getting a new, generally slow strategy.
I don't know; it depends. Like, if the less-sequential I/O pattern
that is caused by a HashAgg is not really any slower than a
Sort+GroupAgg, then whatever. The planner might as well try a HashAgg
- because it will be fast if it stays in memory - and if it doesn't
work out, we've lost little by trying. But if a Sort+GroupAgg is
noticeably faster than a HashAgg that ends up spilling, then there is
a potential regression. I thought we had evidence that this was a real
problem, but if that's not the case, then I think we're fine as far as
v13 goes.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company