Re: Proposal: "query_work_mem" GUC, to distribute working memory to the query's individual operators - Mailing list pgsql-hackers
From | James Hunter |
---|---|
Subject | Re: Proposal: "query_work_mem" GUC, to distribute working memory to the query's individual operators |
Date | |
Msg-id | CAJVSvF6DG-KDdRwMZrK4pg1ehRAhgAGYZS2_=FhUYso+cLGsDw@mail.gmail.com Whole thread Raw |
List | pgsql-hackers |
On Wed, Jan 22, 2025 at 1:13 PM Tomas Vondra <tomas@vondra.me> wrote: > > On 1/10/25 19:00, James Hunter wrote: > > ... > > I wouldn’t change the existing planning logic (at least not in the > > initial implementaton). So the existing planning logic would choose > > between different SQL operators, still on the assumption that every > > operator that needs working memory will get work_mem [* > > hash_mem_multiplier]. > > All this seems generally feasible, but it hinges on the to-be-described > algorithm can distribute the memory in a sensible way that doesn't > affect the costing too much. If we plan a hash join with nbatch=1, and > then come back and make it to use nbatch=1024, maybe we wouldn't have > used hash join at all. Not sure. I see two problems: (a) For a given query plan, minimizing spilling without exceeding memory limits (and thus crashing); (b) For a given amount of available memory, choosing the optimal plan. Both problems exist today, and PostgreSQL offers the same tools to address both: work_mem and hash_mem_multiplier. I argue that these tools are inadequate for (a), but I think they work reasonably well for (b). I propose to solve problem (a), but not (b). In your example, the reason PostgreSQL plans a Hash Join, with nbatch = 1, is because the planner's "nbytes" estimate for working memory is < (hash_mem_multiplier * work_mem). This implicitly assumes that the PostgreSQL *instance* has at least that memory available. If it turns out that the instance doesn't have that much memory available, then the Hash Join will crash. That's the current behavior. It would be better if, instead, we used nbatch=1024 for the Hash Join, so we *didn't* crash. (Note that your example implies that work_mem is set to 1,024x available memory!) This is problem (a). But then, as you point out, it might be *even better* if we gave up on Hash Join altogether, and just went with Nested Loop. This is problem (b). Today, "work_mem" can be set too high or too low. I argue that there's no way to avoid one or the other, and: 1. If "work_mem" is set too high -- PostgreSQL currently crashes. With my proposal, it (a) would not crash, but (b) would possibly execute a sub-optimal plan. 2. If "work_mem" is set too low -- PostgreSQL currently (a) spills unnecessarily, or (b) chooses a sub-optimal plan. With my proposal, it would (a) not spill unnecessarily, but would (b) still execute a sub-optimal plan (if chosen). I am not proposing to solve (b), the generation of optimal plans, when memory constraints are a priori unknown. I see that as a separate, lower-priority problem -- one that would require multi-pass compilation, which I would like to avoid. > The fundamental issue seems to be not having any information about how > much memory might be available to the operator. And in principle we > can't have that during the bottom-up part of the planning, until after > we construct the whole plan. Only at that point we know how many > operators will need work_mem. It's not just bottom-up vs. top-down: it's multi-pass. I think you would need something beyond Oracle's Cost-Based Query Transformation [1], where PostgreSQL generates all possible query path trees, up front, to get an estimate for the total working-memory requested for each tree. Then we could distribute "query_work_mem" to each tree, and then compute the costs once we knew how much working memory each path node on the tree would actually get. The algorithm described in the previous paragraph is certainly *possible*, but it's not remotely *feasible*, in general, because it requires generating way too many states to cost. For one thing, instead of costing path nodes individually, it has to cost entire trees; and there are far more trees than there are tree nodes. For another: today, PostgreSQL's optimizer goes out of its way not to generate obviously-bad path nodes, but in the above algorithm, there's no way to know whether a path is bad until after you've generated the entire tree. Anyway, to come up with something feasible, we'd have to apply heuristics to prune trees, etc.. And it's not clear to me that, after all of that code complexity and run-time CPU cost, the result would be any better than just leaving the optimizer as it is. "Better the devil you know..." etc. My proposal doesn't try to solve (b), instead relying on the customer to provide a reasonable "work_mem" estimate, for use by the optimizer. > Could we get at least some initial estimate how many such operators the > query *might* end up using? Maybe that'd be just enough a priori > information to set the effective work_mem for the planning part to make > this practical. You could use the # of base relations, as a proxy for # of joins, but I am not convinced this would improve the optimizer's decision. > > This assumption might understate or overstate > > the actual working memory we’ll give the operator, at runtime. If it > > understates, the planner will be biased in favor of operators that > > don’t use much working memory. If it overstates, the planner will be > > biased in favor of operators that use too much working memory. > > > > I'm not quite sure I understand this part. Could you elaborate? I was just trying to express what you more clearly restated: if work_mem is too high, vs. the actual memory available on the instance, then the optimizer will choose Hash Join, even though the optimal choice might be Nested Loop. > > (We could add a feedback loop to the planner, or even something simple > > like generating multiple path, at different “work_mem” limits, but > > everything I can think of here adds complexity without much potential > > benefit. So I would defer any changes to the planner behavior until > > later, if ever.) > > > > What would be the feedback? I can imagine improving the estimate of how > much memory a given operator needs during the bottom-up phase, but it > doesn't quite help with knowing what will happen above the current node. Something like the algorithm I sketched above, in this email, might work. Of course, it would have to be modified with heuristics, etc., to reduce the state space to something manageable... But my point is just that any feedback loop, or running the optimizer at different "work_mem" limits, is a bad idea. Leaving the optimizer as it is seems the least bad choice. Thanks for your comments, James [1] https://dl.acm.org/doi/10.5555/1182635.1164215
pgsql-hackers by date: