Re: More thorough planning for OLAP queries (was: [PATCH] Equivalence Class Filters) - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Re: More thorough planning for OLAP queries (was: [PATCH] Equivalence Class Filters) |
Date | |
Msg-id | 20160209144555.GA187927@tornado.leadboat.com Whole thread Raw |
In response to | More thorough planning for OLAP queries (was: [PATCH] Equivalence Class Filters) (David Rowley <david.rowley@2ndquadrant.com>) |
List | pgsql-hackers |
On Wed, Dec 30, 2015 at 08:16:28PM +1300, David Rowley wrote: > On [1] I suggested an idea to make improvements to the planner around the > Equivalence Class code. Later in [2] Tom raised concerns with this adding > too many planning cycles for a perhaps not common enough situation. I > don't want to discuss that particular patch here, I want to discuss more > generally about the dilemma about adding more smarts to the planner to > allow it to generate a more optimal plan in order to save on execution time. It's an important topic. > A number of ideas were suggested on the other thread about how we might go > about solving this problem. In [3] Simon talked about perhaps enabling > extra optimisations when the planner sees that the plan will cost more than > some given threshold. That's perhaps an option, but may not work well for > optimisations which must take place very early in planning, for example [4]. Yeah. A slew of steps precede us assembling a notion of total cost. I bet most controversial proposed steps will happen in that early period. We'd need a rough cost earlier than we get it today, and I don't know what achieving that would look like. Simon, did you have specifics in view? > Another idea which came up was from Evgeniy [5], which was more of a > request not to do it this way, but never-the-less, the idea was basically > to add lots of GUCs to enable/disable each extra planner feature. This and subsequent ideas each envision some kind of optimization step blacklist. Suppose it's a bitmap with one bit per optional optimizer step. Each idea chooses blacklist membership differently, but the planner acts on the blacklist about the same way. I paraphrase the ideas in those terms below, and I offer a couple more. For this idea, the blacklist is simple: 1. User defines the blacklist fully. It's essentially handing the hard part back to the user. While I sympathize with allergic reactions to this, I like it far better than rejecting optimizations based on thinking the average case wants them disabled. work_mem likewise isn't the ideal knob for any user, but it has a simple implementation and beats "1MB per hash table is okay for everyone." > Another option which I've thought about previously was a planner_strength > GUC, at which various additional optimisations are enabled at various > predefined strength levels, so that databases which tend to spend a great > deal more execution time compared to planning time can turn this up a bit > to see if that helps change that ratio a bit. This idea is far from 2. User selects from predefined blacklists like "maximum", "default", etc. > perfect though, as who's to say that planner feature X should "kick in" > before planner feature Y? I've also often thought that it might be nice to Yep. It will be essentially impossible to build an argument to move a planner feature from one strength to another. If the feature's committer has personally experienced the problem in the field, the optimization will end up active at default planner strength. > have it so the planner does not modify the Parse object, so that the > planner has the option to throw away what it's done so far and start > planning all over again with the "planner_strength" knob turned up to the > maximum, if the cost happened to indicate that the query was going to take > a long time to execute. 3. System first plans with a specific predefined blacklist that omits speculative (low probability, high payoff) steps. If that yields a high-cost plan, it repeats planning with an empty blacklist. I agree that the planner's modification of the Query tree is a substantial roadblock. The design needs to work for one-shot plans, and we're unlikely to recoup the cost of copying every one-shot Query tree. > So here I'd very much like to kick off discussion on an acceptable way to > solve this problem, in a realistic way which we're all happy with. It's bad to add 10us per plan times 3000/s/client, but the same waste once per minute per client is no cause for concern. I advise accepting speculative planner optimizations, enabled by default when similar in cost to comparable existing steps. At the same time, I encourage developing automation to cap the waste when a client elicits millions of planner runs that don't benefit from a certain optimization. Specific lines of inquiry: 4. Derive a conservative blacklist for each rewritten Query when first planning it, and use that blacklist for subsequentplans. Some prepared statements use a custom plan every time. (Constraint exclusion is one driver of such cases.) Many facets of a Query's planning problem don't depend on parameter values, so a particular optimization will apply to all the custom plans or to none of them. Let the first plan build a blacklist of provably-irrelevant optimizations, which the plan cache stores and furnishes to later runs. The first plan after an invalidation recomputes the blacklist. 5. Add a facility that profiles a workload to generate blacklist data. Think along the lines of gcc -fprofile-generate/-fprofile-use. Start a profile; run your application load test; the server collects data sufficient to match each query with its optimal blacklist. Issue "ALTER USER appuser SET optimizer_profile = 'profname'" to adopt profile-driven blacklisting. This is the automated equivalent of (1). I recommend starting with (1), full user control over the blacklist via GUC. Specifically, I recommend one GUC expressing the list rather than one Boolean GUC per optional step. The core work of enumerating optional steps and making the planner obey a blacklist applies toward all five ideas. A GUC to set that blacklist is cheap, and it would be worth having as an escape hatch even if we add sophistication later. (2) is a dead end for the reason you gave. (3) has potential, but each of (3), (4) and (5) is a big project with great uncertainty. They shouldn't block introducing optimizer steps. Thanks, nm
pgsql-hackers by date: