Thread: EXPLAIN's handling of output-a-field-or-not decisions

EXPLAIN's handling of output-a-field-or-not decisions

From

Tom Lane

Date:

26 January 2020, 20:13:49

I believe that the design intention for EXPLAIN's non-text output
formats is that a given field should appear, or not, depending solely
on the plan shape, EXPLAIN options, and possibly GUC settings.
It's not okay to suppress a field just because it's empty or zero or
otherwise uninteresting, because that makes life harder for automated
tools that now have to cope with expected fields maybe not being there.
See for instance what I wrote in commit 8ebb69f85:

    [Partial Mode] did not appear at all for a non-parallelized Agg plan node,
    which is contrary to expectation in non-text formats.  We're notionally
    producing objects that conform to a schema, so the set of fields for a
    given node type and EXPLAIN mode should be well-defined.  I set it up to
    fill in "Simple" in such cases.

    Other fields that were added for parallel query, namely "Parallel Aware"
    and Gather's "Single Copy", had not gotten the word on that point either.
    Make them appear always in non-text output.

(This is intentionally different from the policy for TEXT-format output,
which is meant to be human-readable so suppressing boring data is
sensible.)

But I noticed while poking at the EXPLAIN code yesterday that recent
patches haven't adhered to this policy too well.

For one, EXPLAIN (SETTINGS) suppresses the "Settings" subnode if
there's nothing to report.  This is just wrong, but I think all we
have to do is delete the over-eager early exit:

    /* also bail out of there are no options */
    if (!num)
        return;

The other offender is the JIT stuff: it prints if COSTS is on and
there's some JIT activity to report, and otherwise you get nothing.
This is OK for text mode but it's bogus for the other formats.
Since we just rearranged EXPLAIN's JIT output anyway, now seems like
a good time to fix it.

I think we might as well go a little further and invent an explicit
JIT option for EXPLAIN, filling in the feature that Andres didn't
bother with originally.  What's not entirely clear to me is whether
to try to preserve the current behavior by making it track COSTS
if not explicitly specified. I'd rather decouple that and say
"you must write EXPLAIN (JIT [ON]) if you want JIT info"; but maybe
people will argue that it's already too late to change this?

Another debatable question is whether to print anything in non-JIT
builds.  We could, with a little bit of pain, print a lot of zeroes
and "falses".  If we stick with the current behavior of omitting
the JIT fields entirely, then that's extending the existing policy
to say that configuration options are also allowed to affect the
set of fields that are printed.  Given that we allow GUCs to affect
that set (cf track_io_timing), maybe this is okay; but it does seem
like it's weakening the promise of a well-defined data schema for
EXPLAIN output.

Any thoughts?  I'm happy to go make this happen if there's not a
lot of argument over what it should look like.

            regards, tom lane

Re: EXPLAIN's handling of output-a-field-or-not decisions

From

Andres Freund

Date:

26 January 2020, 21:24:46

Hi,

On 2020-01-26 15:13:49 -0500, Tom Lane wrote:
> The other offender is the JIT stuff: it prints if COSTS is on and
> there's some JIT activity to report, and otherwise you get nothing.
> This is OK for text mode but it's bogus for the other formats.
> Since we just rearranged EXPLAIN's JIT output anyway, now seems like
> a good time to fix it.

No objection. I think the current choice really is just about hiding JIT
information in the cases where we display explain output in the
tests. That output can't change depending on the build environment and
settings (it's e.g. hugely useful to force all queries to be JITed for
coverage).

We did discuss adding a JIT option in 11, but it wasn't clear it'd be
useful at that time (hence the "Might want to separate that out from
COSTS at a later stage." comment ExplainOnePlan).

I'm not really bothered by entire sections that a re optional in the
other formats, tbh. Especially when they're configurable, more recent
and/or dependent on build options - tools are going to have to deal with
being absent anyway, and that's usually just about no code.

> I think we might as well go a little further and invent an explicit
> JIT option for EXPLAIN, filling in the feature that Andres didn't
> bother with originally.

Yea, I've introduced that in slightly different form in the thread I
referenced yesterday too. See 0004 in
https://www.postgresql.org/message-id/20191029000229.fkjmuld3g7f2jq7i%40alap3.anarazel.de
although it was about adding additional details, rather than showing the
"basic" information.

I'd probably want to make JIT a tristate (off, on, details), instead of
a boolean, but that's details. And can be changed later.

> What's not entirely clear to me is whether to try to preserve the
> current behavior by making it track COSTS if not explicitly
> specified.

> I'd rather decouple that and say "you must write EXPLAIN (JIT [ON]) if
> you want JIT info"; but maybe people will argue that it's already too
> late to change this?

I don't think "too late" is a concern, I don't think there's much of a
"reliance" interest here. Especially not after whacking things around
already.

One concern I do have is that I think we need the overall time for JIT
to be displayed regardless of the JIT option. Otherwise it's going to be
much harder to diagnose cases where some person reports an issue with
executor startup being slow, because of JIT overhead. I think we
shouldn't require users to guess that that's where they should
look. That need, combined with wanting the regression test output to
keep stable, is really what lead to tying the jit information to COSTS.

I guess we could make it so that JIT is inferred based on COSTS, unless
explicitly present. A bit automagical, but ...

> Another debatable question is whether to print anything in non-JIT
> builds.  We could, with a little bit of pain, print a lot of zeroes
> and "falses".  If we stick with the current behavior of omitting
> the JIT fields entirely, then that's extending the existing policy
> to say that configuration options are also allowed to affect the
> set of fields that are printed.  Given that we allow GUCs to affect
> that set (cf track_io_timing), maybe this is okay; but it does seem
> like it's weakening the promise of a well-defined data schema for
> EXPLAIN output.

Hm. I don't think I have an opinion on this. I can see an argument
either way.

Greetings,

Andres Freund

Re: EXPLAIN's handling of output-a-field-or-not decisions

From

Tom Lane

Date:

26 January 2020, 21:54:58

Andres Freund <andres@anarazel.de> writes:
> On 2020-01-26 15:13:49 -0500, Tom Lane wrote:
>> The other offender is the JIT stuff: it prints if COSTS is on and
>> there's some JIT activity to report, and otherwise you get nothing.
>> This is OK for text mode but it's bogus for the other formats.
>> Since we just rearranged EXPLAIN's JIT output anyway, now seems like
>> a good time to fix it.

> No objection. I think the current choice really is just about hiding JIT
> information in the cases where we display explain output in the
> tests. That output can't change depending on the build environment and
> settings (it's e.g. hugely useful to force all queries to be JITed for
> coverage).

Right, but then ...

> One concern I do have is that I think we need the overall time for JIT
> to be displayed regardless of the JIT option.

... how are you going to square that desire with not breaking the
regression tests?

>> Another debatable question is whether to print anything in non-JIT
>> builds.

> Hm. I don't think I have an opinion on this. I can see an argument
> either way.

After a bit more thought I'm leaning to "print nothing", since as you say
tools would have to cope with that anyway if they want to work with old
releases.  Also, while it's not that hard to print dummy data for JIT
even in a non-JIT build, I can imagine some future feature where it
*would* be hard.  So setting a precedent that we must provide dummy
output for unimplemented features could come back to bite us.

            regards, tom lane

Re: EXPLAIN's handling of output-a-field-or-not decisions

From

Andres Freund

Date:

26 January 2020, 22:42:54

Hi,

On 2020-01-26 16:54:58 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2020-01-26 15:13:49 -0500, Tom Lane wrote:
> >> The other offender is the JIT stuff: it prints if COSTS is on and
> >> there's some JIT activity to report, and otherwise you get nothing.
> >> This is OK for text mode but it's bogus for the other formats.
> >> Since we just rearranged EXPLAIN's JIT output anyway, now seems like
> >> a good time to fix it.
> 
> > No objection. I think the current choice really is just about hiding JIT
> > information in the cases where we display explain output in the
> > tests. That output can't change depending on the build environment and
> > settings (it's e.g. hugely useful to force all queries to be JITed for
> > coverage).
> 
> Right, but then ...
> 
> > One concern I do have is that I think we need the overall time for JIT
> > to be displayed regardless of the JIT option.
> 
> ... how are you going to square that desire with not breaking the
> regression tests?

Well, that's how we arrived at turning off JIT information when COSTS
OFF, because that's already something all the EXPLAINs in the regression
tests have to do. I do not want to regress from the current state, with
regard to both regression tests, and seeing at least a top-line time in
the normal EXPLAIN ANALYZE cases.

I've previously wondered about adding a REGRESS option to EXPLAIN would
not actually be a good one, so we can move the magic into that, rather
than options that are also otherwise relevant.

Greetings,

Andres Freund

Re: EXPLAIN's handling of output-a-field-or-not decisions

From

Tom Lane

Date:

26 January 2020, 22:53:09

Andres Freund <andres@anarazel.de> writes:
> On 2020-01-26 16:54:58 -0500, Tom Lane wrote:
>> ... how are you going to square that desire with not breaking the
>> regression tests?

> Well, that's how we arrived at turning off JIT information when COSTS
> OFF, because that's already something all the EXPLAINs in the regression
> tests have to do. I do not want to regress from the current state, with
> regard to both regression tests, and seeing at least a top-line time in
> the normal EXPLAIN ANALYZE cases.

Right, but that's still just a hack.

> I've previously wondered about adding a REGRESS option to EXPLAIN would
> not actually be a good one, so we can move the magic into that, rather
> than options that are also otherwise relevant.

I'd be inclined to think about a GUC actually.
force_parallel_mode = regress is sort of precedent for that,
and we do already have the infrastructure needed to force a
database-level GUC setting for regression databases.

I can see some advantages to making it an explicit EXPLAIN option
instead --- but unless we wanted to back-patch it, it'd be a real
pain in the rear for back-patching regression test cases.

            regards, tom lane

Re: EXPLAIN's handling of output-a-field-or-not decisions

From

Andres Freund

Date:

27 January 2020, 01:28:35

Hi,

On 2020-01-26 17:53:09 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I've previously wondered about adding a REGRESS option to EXPLAIN would
> > not actually be a good one, so we can move the magic into that, rather
> > than options that are also otherwise relevant.
> 
> I'd be inclined to think about a GUC actually.
> force_parallel_mode = regress is sort of precedent for that,
> and we do already have the infrastructure needed to force a
> database-level GUC setting for regression databases.

Yea, a GUC could work too. What would it do exactly? Hide COSTS, TIMING,
JIT, unless explicitly turned on in the EXPLAIN? And perhaps also
"redact" a few things that we currently manually have to filter out?
And then we'd leave the implicit JIT to on, to allow users to see where
time is spent?

Or were you thinking of something different entirely?

> I can see some advantages to making it an explicit EXPLAIN option
> instead --- but unless we wanted to back-patch it, it'd be a real
> pain in the rear for back-patching regression test cases.

Hm. Would it really be harder? I'd expect that we'd end up writing tests
in master that need additional options to be usable in the back
branches. Seems we'd definitely need to backpatch the GUC?

Greetings,

Andres Freund