Thread: feature request: explain "with details" option

feature request: explain "with details" option

From

Roger Pack

Date:

08 September 2016, 17:40:51

My apologies if this was already requested before...

I think it would be fantastic if postgres had an "explain the explain" option:
Today's explain tells us what loops and scans were used, and relative
costs, etc.  It doesn't seem to tell *why* the planner elected to use
what it did.

For instance, in the case of a corrupted index, it doesn't say why
it's not using that index, it just doesn't use it, causing some
confusion to end users.  At least causing confusion to me.

Or in the case of where it iterates over an entire table (seq. scan)
instead of using an index because the index range specified "is most
of the table" (thus not helpful to use the index)...The choice is
appropriate.  The reasoning why is not explicitly mentioned.  Again
causing possibility for some delay as you try to "decipher the mind"
of the planner.  Sometimes tables (ex: tables after having been first
propagated) need an "analyze" run on them, but it's not clear from an
"explain" output that the analyze statistics are faulty.  Not even a
hint.

So this is a feature request for an "EXPLAIN DETAILS" option or
something, basically like today's explain but with more "rationale"
included.  This could be immensely useful to many Postgres users.

I'd even be willing to chip in a couple hundred bucks if it would help
grease the wheels for somebody taking up the challenge if that helps
at all :)

Thank you for your consideration in this regard.
-roger-

Re: feature request: explain "with details" option

From

Craig Ringer

Date:

09 September 2016, 01:45:39

On 9 September 2016 at 01:40, Roger Pack <rogerdpack2@gmail.com> wrote:
> My apologies if this was already requested before...
>
> I think it would be fantastic if postgres had an "explain the explain" option:
> Today's explain tells us what loops and scans were used, and relative
> costs, etc.  It doesn't seem to tell *why* the planner elected to use
> what it did.

One thing that's been discussed here is to have a way to see which
potential plans are rejected and compare their costs.

This isn't simple because there are often *lots* of variants. You
don't just want to see the "top 10" candidate plans, because they're
probably a bunch of small variants on the same plan; the ones you'll
be interested in will probably be very different plans with very bad
relative estimates.

Say you've got a query over 20 tables through five views. You're only
interested in a particular part that runs much slower than the
estimates say it should. You don't care about any of the other parts
of the plan. How can Pg help with this?

My thinking is that maybe the planner can expose an interface tools
can use to offer plan alternative drill-down. You'd start with the
original plan and the tool would ask "show me alternatives for this
sub-path". You'd explore the plan to see what alternatives were
considered. A way to see how cost estimates are calculated and based
on what stats would be needed.

That's hardly a simple   explain (altplans)  though.

If you have good ideas about how to choose a small subset of alternate
plans to show the user that'd be informative and not risk being even
more misleading, that'd be good. But in a cost-based planner that
explores many paths this might not be as simple as you expect.

> For instance, in the case of a corrupted index, it doesn't say why
> it's not using that index, it just doesn't use it, causing some
> confusion to end users.  At least causing confusion to me.

It doesn't have a "corrupted index" flag. What do you mean by this?

> Or in the case of where it iterates over an entire table (seq. scan)
> instead of using an index because the index range specified "is most
> of the table" (thus not helpful to use the index)...The choice is
> appropriate.  The reasoning why is not explicitly mentioned.  Again
> causing possibility for some delay as you try to "decipher the mind"
> of the planner.  Sometimes tables (ex: tables after having been first
> propagated) need an "analyze" run on them, but it's not clear from an
> "explain" output that the analyze statistics are faulty.  Not even a
> hint.

That one's not simple. If Pg knew the stats were wrong it'd say so,
but it has no idea. It'd have to consult its stats to figure out ...
oh, damn.

We could probably do a better job of identifying tables that have been
flagged as needing analyze but autovacuum hasn't got around to it yet,
though.

> So this is a feature request for an "EXPLAIN DETAILS" option or
> something, basically like today's explain but with more "rationale"
> included.  This could be immensely useful to many Postgres users.
>
> I'd even be willing to chip in a couple hundred bucks if it would help
> grease the wheels for somebody taking up the challenge if that helps
> at all :)

I think you missed a zero or two, unfortunately. I don't think this is
a small project to do well and right. Doing it badly might just add
more confusing/misleading information. Then again, I'm not exactly a
planner expert.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: feature request: explain "with details" option

From

Tom Lane

Date:

09 September 2016, 04:35:50

Craig Ringer <craig@2ndquadrant.com> writes:
> On 9 September 2016 at 01:40, Roger Pack <rogerdpack2@gmail.com> wrote:
>> Today's explain tells us what loops and scans were used, and relative
>> costs, etc.  It doesn't seem to tell *why* the planner elected to use
>> what it did.

> One thing that's been discussed here is to have a way to see which
> potential plans are rejected and compare their costs.

> This isn't simple because there are often *lots* of variants. You
> don't just want to see the "top 10" candidate plans, because they're
> probably a bunch of small variants on the same plan; the ones you'll
> be interested in will probably be very different plans with very bad
> relative estimates.

The other big problem here is that the planner tries *very* hard to reject
losing paths early; so it does not even form an explainable plan for a
large fraction of the search space.  (And if it did, you'd be dead before
you got your EXPLAIN result back.)

People have experimented with making the planner log every candidate path
before the path enters the comparison tournament (and, typically, doesn't
survive the first round).  But I've never seen any version of that that
I thought would be intelligible to non-experts.  It's exceedingly verbose
and it certainly doesn't look anything like what we know as EXPLAIN output.
        regards, tom lane

Re: feature request: explain "with details" option

From

Jeff Janes

Date:

10 September 2016, 19:27:28

On Thu, Sep 8, 2016 at 10:40 AM, Roger Pack <rogerdpack2@gmail.com> wrote:

My apologies if this was already requested before...

I think it would be fantastic if postgres had an "explain the explain" option:
Today's explain tells us what loops and scans were used, and relative
costs, etc. It doesn't seem to tell *why* the planner elected to use
what it did.

For instance, in the case of a corrupted index, it doesn't say why
it's not using that index, it just doesn't use it, causing some
confusion to end users. At least causing confusion to me.

I've never seen such a thing. If an index is corrupt, it still gets used like normal. You just get wrong results, or crashes, depending on the nature of the corruption.

Or in the case of where it iterates over an entire table (seq. scan)
instead of using an index because the index range specified "is most
of the table" (thus not helpful to use the index)

The planner just comes up with plans that use a seq scan, and plans that use an index; and then compares the cost of them and finds that one cost is lower than the other. It never explicitly develops a specific notion of "I won't use the index because I'm retrieving too much of the table". So it wouldn't be just a matter of reporting something that isn't currently reported, it would first have to infer that thing in the first place, and that would probably be very hard.

...The choice is
appropriate. The reasoning why is not explicitly mentioned. Again
causing possibility for some delay as you try to "decipher the mind"
of the planner. Sometimes tables (ex: tables after having been first
propagated) need an "analyze" run on them, but it's not clear from an
"explain" output that the analyze statistics are faulty. Not even a
hint.

You can get a hint, sometimes, by comparing the predicted rows and the actual rows of an "explain (analyze)". Making this more user friendly to do would probably be best done by making an expert-system tool to look at the currently-reported plans (https://explain.depesz.com kind of does this already, particularly the row_x column) rather than trying to build something into core. It could be improved by detecting when the node being misestimated is simple scan of a single table or index with a single filter/condition, rather than a join or a complex filter/condition.

So this is a feature request for an "EXPLAIN DETAILS" option or
something, basically like today's explain but with more "rationale"
included. This could be immensely useful to many Postgres users.

Unfortunately it would also be immensely hard to implement.

What I would find useful is simply to have it report details about how the cost of each node of the plan was arrived at, i.e. how many of multiples of each of the *_cost factors were summed to arrive at the total cost. that would help me a lot in interpreting plans, and might also help someone like Depesz a lot in improving his expert-system.

It would be far easier than what you are proposing, but would be still be a lot of work. It would probably also be challenged because the accounting overhead, while small, would be incurred on every single planner run.

Cheers,

Jeff

Re: feature request: explain "with details" option

From

Jim Nasby

Date:

11 September 2016, 02:50:37

On 9/8/16 11:35 PM, Tom Lane wrote:
>> This isn't simple because there are often *lots* of variants. You
>> > don't just want to see the "top 10" candidate plans, because they're
>> > probably a bunch of small variants on the same plan; the ones you'll
>> > be interested in will probably be very different plans with very bad
>> > relative estimates.
> The other big problem here is that the planner tries *very* hard to reject
> losing paths early; so it does not even form an explainable plan for a
> large fraction of the search space.  (And if it did, you'd be dead before
> you got your EXPLAIN result back.)

What I've wished for is the ability to see plans that were close in cost 
to the best case scenario, since that indicates that a slight change in 
statistics would push the planner in another direction (sometimes with 
disastrous results). Maybe allowing some number of plans to bubble up if 
they were within X percent of the winner wouldn't be that horrible.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461

Re: feature request: explain "with details" option

From

Robert Haas

Date:

12 September 2016, 20:07:07

On Fri, Sep 9, 2016 at 12:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Craig Ringer <craig@2ndquadrant.com> writes:
>> On 9 September 2016 at 01:40, Roger Pack <rogerdpack2@gmail.com> wrote:
>>> Today's explain tells us what loops and scans were used, and relative
>>> costs, etc.  It doesn't seem to tell *why* the planner elected to use
>>> what it did.
>
>> One thing that's been discussed here is to have a way to see which
>> potential plans are rejected and compare their costs.
>
>> This isn't simple because there are often *lots* of variants. You
>> don't just want to see the "top 10" candidate plans, because they're
>> probably a bunch of small variants on the same plan; the ones you'll
>> be interested in will probably be very different plans with very bad
>> relative estimates.
>
> The other big problem here is that the planner tries *very* hard to reject
> losing paths early; so it does not even form an explainable plan for a
> large fraction of the search space.  (And if it did, you'd be dead before
> you got your EXPLAIN result back.)
>
> People have experimented with making the planner log every candidate path
> before the path enters the comparison tournament (and, typically, doesn't
> survive the first round).  But I've never seen any version of that that
> I thought would be intelligible to non-experts.  It's exceedingly verbose
> and it certainly doesn't look anything like what we know as EXPLAIN output.

What I've observed when troubleshooting plan selection is that you
often want to change the planner's choice in one particular part of
the plan and see what happens - e.g. force a {sequential scan, index
scan using index X, bitmap index scan using index X, index-only scan
using index X} on a particular base relation, or force a {hash, merge,
nested loop} join between X and Y, possibly constraining which is the
inner side and which is the outer side.  Or, also fairly commonly, I
just want to know what other paths were generated at a given level of
the plan tree and what their estimate costs were.

Of course, at the risk of starting a flame war, query hints would be
rather useful here.  You'd be able to force the plan you want not
because forcing a plan choice is a great way to run a production
system (though somebody might want to do that, of course) but to debug
why you're not getting that plan.  Right now, the only tools we have
for this sort of thing are the enable_* GUCs and twiddling the cost
values, and that's OK for simple plans but for complex plans involving
multiple tables it's a pretty blunt instrument and it's often tedious
to understand exactly what made the planner do what it did.  I for one
would welcome a better way to elicit EXPLAIN (THE_PLAN_I_WANTED).

Another thing that would be useful is, for each base relation, save
all of the access paths and the costs of each; and for each join
relation, save the cheapest cost for each join method.  So if you have
a join between A, B, and C, you can see all of the possible access
methods and their costs for A, B, and C; plus, for the join order
actually chosen (but not any alternative join order), you can see
whether other join methods were judged feasible and if so what their
cost would have been given the actually-surviving paths for the
underlying relations.  So for a two-way join you might get something
like:

Hash Join Considered:   Merge Join (cost=...)   Nested Loop (cost=...) -> Seq Scan     Considered:         Index Scan
on... (cost=...) -> Hash   -> Seq Scan      Considered:         Index Scan on .... (cost=...)

Of course, this wouldn't tell you everything you could possibly want
to know, but it would let you answer a lot of common questions like
"how much slower would it have been to use the index" (or "the other
index") and "were other join methods considered too expensive or were
they not even considered because the planner thinks they're not usable
here?" and "how much more expensive does the planner think that a hash
join would have been than the nested loop actually chosen?".

I think the only part of this that would be really brutal to try to
represent is alternative join orders.  I see no reasonable way for
EXPLAIN to output useful information about what other join orders were
considered and why they were not chosen; the only thing that seems
like it would help in that case would be an easy way to force the
exact join order you want and then see what the plan looks like.  Even
that's not totally perfect because sometimes there are a bunch of join
orders that are essentially interchangeable and what you really want
to know is whether the planner considered a join order that's
materially different, but the planner makes no such distinction
internally.  At any rate, I don't think the fact that it's difficult
or impossible to provide information about join orders should deter us
from having a way to display the stuff we can get our hands around.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: feature request: explain "with details" option

From

Tom Lane

Date:

12 September 2016, 20:25:26

Robert Haas <robertmhaas@gmail.com> writes:
> I think the only part of this that would be really brutal to try to
> represent is alternative join orders.  I see no reasonable way for
> EXPLAIN to output useful information about what other join orders were
> considered and why they were not chosen; the only thing that seems
> like it would help in that case would be an easy way to force the
> exact join order you want and then see what the plan looks like.

That exists today: write your query as a nest of explicit JOIN syntax
and set join_collapse_limit = 1.  Not sure that it's really all that
useful, though.
        regards, tom lane