Re: Default setting for enable_hashagg_disk - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Default setting for enable_hashagg_disk |
Date | |
Msg-id | CAH2-WzkqNc=qJ5OUhL3m=OETBx24gHWw6g1Uz52wJzdAzCTzvA@mail.gmail.com Whole thread Raw |
In response to | Re: Default setting for enable_hashagg_disk (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Default setting for enable_hashagg_disk
Re: Default setting for enable_hashagg_disk |
List | pgsql-hackers |
On Tue, Jul 14, 2020 at 12:46 PM Robert Haas <robertmhaas@gmail.com> wrote: > - I thought the problem we were trying to solve here was that, in v12, > if the planner thinks that your hashagg will fit in memory when really > it doesn't, you will get good performance because we'll cheat; in v13, > you'll get VERY bad performance because we won't. That is the problem we started out with. I propose to solve a broader problem that I believe mostly encompasses the original problem (it's an "inventor's paradox" situation). Although the exact degree to which it truly addresses the original problem will vary across installations, I believe that it will go a very long way towards cutting down on problems for users upgrading to Postgres 13 generally. > - So, if hash_mem_multiplier affects both planning and execution, it > doesn't really solve the problem. Neither does adjusting the existing > work_mem setting. Imagine that you have two queries. The planner > thinks Q1 will use 1GB of memory for a HashAgg but it will actually > need 2GB. It thinks Q2 will use 1.5GB for a HashAgg but it will > actually need 3GB. If you plan using a 1GB memory limit, Q1 will pick > a HashAgg and perform terribly when it spills. Q2 will pick a > GroupAggregate which will be OK but not great. If you plan with a 2GB > memory limit, Q1 will pick a HashAgg and will not spill so now it will > be in great shape. But Q2 will pick a HashAgg and then spill so it > will stink. Oops. Maybe I missed your point here. The problem is not so much that we'll get HashAggs that spill -- there is nothing intrinsically wrong with that. While it's true that the I/O pattern is not as sequential as a similar group agg + sort, that doesn't seem like the really important factor here. The really important factor is that in-memory HashAggs can be blazingly fast relative to *any* alternative strategy -- be it a HashAgg that spills, or a group aggregate + sort that doesn't spill, whatever. We're mostly concerned about keeping the one available fast strategy than we are about getting a new, generally slow strategy. There will be no problems at all unless and until we're short on memory, because you can just increase work_mem and everything works out, regardless of the details. Obviously the general problems we anticipate only crop up when increasing work_mem stops being a viable DBA strategy. By teaching the system to have at least a crude appreciation of the value of memory when hashing vs when sorting, the system is often able to give much more memory to Hash aggs (and hash joins). Increasing hash_mem_multiplier (maybe while also decreasing work_mem) will be beneficial when we take memory from things that don't really need so much, like sorts (or even CTE tuplestores) -- we reduce the memory pressure without paying a commensurate price in system throughput (maybe even only a very small hit). As a bonus, everything going faster may actually *reduce* the memory usage for the system as a whole, even as individual queries use more memory. Under this scheme, it may well not matter that you cannot cheat (Postgres 12 style) anymore, because you'll be able to use the memory that is available sensibly -- regardless of whether or not the group estimates are very good (we have to have more than zero faith in the estimates -- they can be bad without being terrible). Maybe no amount of tuning can ever restore the desirable Postgres 12 performance characteristics you came to rely on, but remaining "regressions" are probably cases where the user was flying pretty close to the sun OOM-wise all along. They may have been happy with Postgres 12, but at a certain point that really is something that you have to view as a fool's paradise, even if like me you happen to be a memory Keynesian. Really big outliers tend to be rare and therefore something that the user can afford to have go slower. It's the constant steady stream of medium-sized hash aggs that we mostly need to worry about. To the extent that that's true, hash_mem_multiplier addresses the problem on the table. > - An escape hatch that prevents spilling at execution time *does* > solve this problem, but now suppose we add a Q3 which the planner > thinks will use 512MB of memory but at execution time it will actually > consume 512GB due to the row count estimate being 1024x off. So if you > enable the escape hatch to get back to a situation where Q1 and Q2 > both perform acceptably, then Q3 makes your system OOM. Right. Nothing stops these two things from being true at the same time. > - If you were to instead introduce a GUC like what I proposed before, > which allows the execution-time memory usage to exceed what was > planned, but only by a certain margin, then you can set > hash_mem_execution_overrun_multiplier_thingy=2.5 and call it a day. > Now, no matter how you set work_mem, you're fine. Depending on the > value you choose for work_mem, you may get group aggregates for some > of the queries. If you set it large enough that you get hash > aggregates, then Q1 and Q2 will avoid spilling (which works but is > slow) because the overrun is less than 2x. Q3 will spill, so you won't > OOM. Wahoo! But we'll have to live with that kludge for a long time, and haven't necessarily avoided any risk compared to the hash_mem_multiplier alternative. I think that having a shadow memory limit for the executor is pretty ugly. I'm trying to come up with a setting that can sensibly be tuned at the system level. Not an escape hatch, which seems worth avoiding. Admittedly, this is not without its downsides. > - If you hold my feet to the fire and ask me to choose between a > Boolean escape hatch (rather than a multiplier-based one) and > hash_mem_multiplier, gosh, I don't know. I guess the Boolean escape > hatch? I mean it's a pretty bad solution, but at least if I have that > I can get both Q1 and Q2 to perform well at the same time, and I guess > I'm no worse off than I was in v12. Fortunately you don't have to choose. Doing both together might make sense, to cover any remaining user apps that still experience problems after tuning hash_mem_multiplier. We can take a wait and see approach to this, as Tom suggested recently. -- Peter Geoghegan
pgsql-hackers by date: