Thread: feature request: explain "with details" option
My apologies if this was already requested before... I think it would be fantastic if postgres had an "explain the explain" option: Today's explain tells us what loops and scans were used, and relative costs, etc. It doesn't seem to tell *why* the planner elected to use what it did. For instance, in the case of a corrupted index, it doesn't say why it's not using that index, it just doesn't use it, causing some confusion to end users. At least causing confusion to me. Or in the case of where it iterates over an entire table (seq. scan) instead of using an index because the index range specified "is most of the table" (thus not helpful to use the index)...The choice is appropriate. The reasoning why is not explicitly mentioned. Again causing possibility for some delay as you try to "decipher the mind" of the planner. Sometimes tables (ex: tables after having been first propagated) need an "analyze" run on them, but it's not clear from an "explain" output that the analyze statistics are faulty. Not even a hint. So this is a feature request for an "EXPLAIN DETAILS" option or something, basically like today's explain but with more "rationale" included. This could be immensely useful to many Postgres users. I'd even be willing to chip in a couple hundred bucks if it would help grease the wheels for somebody taking up the challenge if that helps at all :) Thank you for your consideration in this regard. -roger-
On 9 September 2016 at 01:40, Roger Pack <rogerdpack2@gmail.com> wrote: > My apologies if this was already requested before... > > I think it would be fantastic if postgres had an "explain the explain" option: > Today's explain tells us what loops and scans were used, and relative > costs, etc. It doesn't seem to tell *why* the planner elected to use > what it did. One thing that's been discussed here is to have a way to see which potential plans are rejected and compare their costs. This isn't simple because there are often *lots* of variants. You don't just want to see the "top 10" candidate plans, because they're probably a bunch of small variants on the same plan; the ones you'll be interested in will probably be very different plans with very bad relative estimates. Say you've got a query over 20 tables through five views. You're only interested in a particular part that runs much slower than the estimates say it should. You don't care about any of the other parts of the plan. How can Pg help with this? My thinking is that maybe the planner can expose an interface tools can use to offer plan alternative drill-down. You'd start with the original plan and the tool would ask "show me alternatives for this sub-path". You'd explore the plan to see what alternatives were considered. A way to see how cost estimates are calculated and based on what stats would be needed. That's hardly a simple explain (altplans) though. If you have good ideas about how to choose a small subset of alternate plans to show the user that'd be informative and not risk being even more misleading, that'd be good. But in a cost-based planner that explores many paths this might not be as simple as you expect. > For instance, in the case of a corrupted index, it doesn't say why > it's not using that index, it just doesn't use it, causing some > confusion to end users. At least causing confusion to me. It doesn't have a "corrupted index" flag. What do you mean by this? > Or in the case of where it iterates over an entire table (seq. scan) > instead of using an index because the index range specified "is most > of the table" (thus not helpful to use the index)...The choice is > appropriate. The reasoning why is not explicitly mentioned. Again > causing possibility for some delay as you try to "decipher the mind" > of the planner. Sometimes tables (ex: tables after having been first > propagated) need an "analyze" run on them, but it's not clear from an > "explain" output that the analyze statistics are faulty. Not even a > hint. That one's not simple. If Pg knew the stats were wrong it'd say so, but it has no idea. It'd have to consult its stats to figure out ... oh, damn. We could probably do a better job of identifying tables that have been flagged as needing analyze but autovacuum hasn't got around to it yet, though. > So this is a feature request for an "EXPLAIN DETAILS" option or > something, basically like today's explain but with more "rationale" > included. This could be immensely useful to many Postgres users. > > I'd even be willing to chip in a couple hundred bucks if it would help > grease the wheels for somebody taking up the challenge if that helps > at all :) I think you missed a zero or two, unfortunately. I don't think this is a small project to do well and right. Doing it badly might just add more confusing/misleading information. Then again, I'm not exactly a planner expert. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Craig Ringer <craig@2ndquadrant.com> writes: > On 9 September 2016 at 01:40, Roger Pack <rogerdpack2@gmail.com> wrote: >> Today's explain tells us what loops and scans were used, and relative >> costs, etc. It doesn't seem to tell *why* the planner elected to use >> what it did. > One thing that's been discussed here is to have a way to see which > potential plans are rejected and compare their costs. > This isn't simple because there are often *lots* of variants. You > don't just want to see the "top 10" candidate plans, because they're > probably a bunch of small variants on the same plan; the ones you'll > be interested in will probably be very different plans with very bad > relative estimates. The other big problem here is that the planner tries *very* hard to reject losing paths early; so it does not even form an explainable plan for a large fraction of the search space. (And if it did, you'd be dead before you got your EXPLAIN result back.) People have experimented with making the planner log every candidate path before the path enters the comparison tournament (and, typically, doesn't survive the first round). But I've never seen any version of that that I thought would be intelligible to non-experts. It's exceedingly verbose and it certainly doesn't look anything like what we know as EXPLAIN output. regards, tom lane
On Thu, Sep 8, 2016 at 10:40 AM, Roger Pack <rogerdpack2@gmail.com> wrote:
My apologies if this was already requested before...
I think it would be fantastic if postgres had an "explain the explain" option:
Today's explain tells us what loops and scans were used, and relative
costs, etc. It doesn't seem to tell *why* the planner elected to use
what it did.
For instance, in the case of a corrupted index, it doesn't say why
it's not using that index, it just doesn't use it, causing some
confusion to end users. At least causing confusion to me.
I've never seen such a thing. If an index is corrupt, it still gets used like normal. You just get wrong results, or crashes, depending on the nature of the corruption.
Or in the case of where it iterates over an entire table (seq. scan)
instead of using an index because the index range specified "is most
of the table" (thus not helpful to use the index)
The planner just comes up with plans that use a seq scan, and plans that use an index; and then compares the cost of them and finds that one cost is lower than the other. It never explicitly develops a specific notion of "I won't use the index because I'm retrieving too much of the table". So it wouldn't be just a matter of reporting something that isn't currently reported, it would first have to infer that thing in the first place, and that would probably be very hard.
...The choice is
appropriate. The reasoning why is not explicitly mentioned. Again
causing possibility for some delay as you try to "decipher the mind"
of the planner. Sometimes tables (ex: tables after having been first
propagated) need an "analyze" run on them, but it's not clear from an
"explain" output that the analyze statistics are faulty. Not even a
hint.
You can get a hint, sometimes, by comparing the predicted rows and the actual rows of an "explain (analyze)". Making this more user friendly to do would probably be best done by making an expert-system tool to look at the currently-reported plans (https://explain.depesz.com kind of does this already, particularly the row_x column) rather than trying to build something into core. It could be improved by detecting when the node being misestimated is simple scan of a single table or index with a single filter/condition, rather than a join or a complex filter/condition.
So this is a feature request for an "EXPLAIN DETAILS" option or
something, basically like today's explain but with more "rationale"
included. This could be immensely useful to many Postgres users.
Unfortunately it would also be immensely hard to implement.
What I would find useful is simply to have it report details about how the cost of each node of the plan was arrived at, i.e. how many of multiples of each of the *_cost factors were summed to arrive at the total cost. that would help me a lot in interpreting plans, and might also help someone like Depesz a lot in improving his expert-system.
It would be far easier than what you are proposing, but would be still be a lot of work. It would probably also be challenged because the accounting overhead, while small, would be incurred on every single planner run.
Cheers,
Jeff
On 9/8/16 11:35 PM, Tom Lane wrote: >> This isn't simple because there are often *lots* of variants. You >> > don't just want to see the "top 10" candidate plans, because they're >> > probably a bunch of small variants on the same plan; the ones you'll >> > be interested in will probably be very different plans with very bad >> > relative estimates. > The other big problem here is that the planner tries *very* hard to reject > losing paths early; so it does not even form an explainable plan for a > large fraction of the search space. (And if it did, you'd be dead before > you got your EXPLAIN result back.) What I've wished for is the ability to see plans that were close in cost to the best case scenario, since that indicates that a slight change in statistics would push the planner in another direction (sometimes with disastrous results). Maybe allowing some number of plans to bubble up if they were within X percent of the winner wouldn't be that horrible. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461
On Fri, Sep 9, 2016 at 12:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Craig Ringer <craig@2ndquadrant.com> writes: >> On 9 September 2016 at 01:40, Roger Pack <rogerdpack2@gmail.com> wrote: >>> Today's explain tells us what loops and scans were used, and relative >>> costs, etc. It doesn't seem to tell *why* the planner elected to use >>> what it did. > >> One thing that's been discussed here is to have a way to see which >> potential plans are rejected and compare their costs. > >> This isn't simple because there are often *lots* of variants. You >> don't just want to see the "top 10" candidate plans, because they're >> probably a bunch of small variants on the same plan; the ones you'll >> be interested in will probably be very different plans with very bad >> relative estimates. > > The other big problem here is that the planner tries *very* hard to reject > losing paths early; so it does not even form an explainable plan for a > large fraction of the search space. (And if it did, you'd be dead before > you got your EXPLAIN result back.) > > People have experimented with making the planner log every candidate path > before the path enters the comparison tournament (and, typically, doesn't > survive the first round). But I've never seen any version of that that > I thought would be intelligible to non-experts. It's exceedingly verbose > and it certainly doesn't look anything like what we know as EXPLAIN output. What I've observed when troubleshooting plan selection is that you often want to change the planner's choice in one particular part of the plan and see what happens - e.g. force a {sequential scan, index scan using index X, bitmap index scan using index X, index-only scan using index X} on a particular base relation, or force a {hash, merge, nested loop} join between X and Y, possibly constraining which is the inner side and which is the outer side. Or, also fairly commonly, I just want to know what other paths were generated at a given level of the plan tree and what their estimate costs were. Of course, at the risk of starting a flame war, query hints would be rather useful here. You'd be able to force the plan you want not because forcing a plan choice is a great way to run a production system (though somebody might want to do that, of course) but to debug why you're not getting that plan. Right now, the only tools we have for this sort of thing are the enable_* GUCs and twiddling the cost values, and that's OK for simple plans but for complex plans involving multiple tables it's a pretty blunt instrument and it's often tedious to understand exactly what made the planner do what it did. I for one would welcome a better way to elicit EXPLAIN (THE_PLAN_I_WANTED). Another thing that would be useful is, for each base relation, save all of the access paths and the costs of each; and for each join relation, save the cheapest cost for each join method. So if you have a join between A, B, and C, you can see all of the possible access methods and their costs for A, B, and C; plus, for the join order actually chosen (but not any alternative join order), you can see whether other join methods were judged feasible and if so what their cost would have been given the actually-surviving paths for the underlying relations. So for a two-way join you might get something like: Hash Join Considered: Merge Join (cost=...) Nested Loop (cost=...) -> Seq Scan Considered: Index Scan on... (cost=...) -> Hash -> Seq Scan Considered: Index Scan on .... (cost=...) Of course, this wouldn't tell you everything you could possibly want to know, but it would let you answer a lot of common questions like "how much slower would it have been to use the index" (or "the other index") and "were other join methods considered too expensive or were they not even considered because the planner thinks they're not usable here?" and "how much more expensive does the planner think that a hash join would have been than the nested loop actually chosen?". I think the only part of this that would be really brutal to try to represent is alternative join orders. I see no reasonable way for EXPLAIN to output useful information about what other join orders were considered and why they were not chosen; the only thing that seems like it would help in that case would be an easy way to force the exact join order you want and then see what the plan looks like. Even that's not totally perfect because sometimes there are a bunch of join orders that are essentially interchangeable and what you really want to know is whether the planner considered a join order that's materially different, but the planner makes no such distinction internally. At any rate, I don't think the fact that it's difficult or impossible to provide information about join orders should deter us from having a way to display the stuff we can get our hands around. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > I think the only part of this that would be really brutal to try to > represent is alternative join orders. I see no reasonable way for > EXPLAIN to output useful information about what other join orders were > considered and why they were not chosen; the only thing that seems > like it would help in that case would be an easy way to force the > exact join order you want and then see what the plan looks like. That exists today: write your query as a nest of explicit JOIN syntax and set join_collapse_limit = 1. Not sure that it's really all that useful, though. regards, tom lane