EXPLAIN's handling of output-a-field-or-not decisions - Mailing list pgsql-hackers

From Tom Lane
Subject EXPLAIN's handling of output-a-field-or-not decisions
Date
Msg-id 19416.1580069629@sss.pgh.pa.us
Whole thread Raw
Responses Re: EXPLAIN's handling of output-a-field-or-not decisions  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
I believe that the design intention for EXPLAIN's non-text output
formats is that a given field should appear, or not, depending solely
on the plan shape, EXPLAIN options, and possibly GUC settings.
It's not okay to suppress a field just because it's empty or zero or
otherwise uninteresting, because that makes life harder for automated
tools that now have to cope with expected fields maybe not being there.
See for instance what I wrote in commit 8ebb69f85:

    [Partial Mode] did not appear at all for a non-parallelized Agg plan node,
    which is contrary to expectation in non-text formats.  We're notionally
    producing objects that conform to a schema, so the set of fields for a
    given node type and EXPLAIN mode should be well-defined.  I set it up to
    fill in "Simple" in such cases.

    Other fields that were added for parallel query, namely "Parallel Aware"
    and Gather's "Single Copy", had not gotten the word on that point either.
    Make them appear always in non-text output.

(This is intentionally different from the policy for TEXT-format output,
which is meant to be human-readable so suppressing boring data is
sensible.)

But I noticed while poking at the EXPLAIN code yesterday that recent
patches haven't adhered to this policy too well.

For one, EXPLAIN (SETTINGS) suppresses the "Settings" subnode if
there's nothing to report.  This is just wrong, but I think all we
have to do is delete the over-eager early exit:

    /* also bail out of there are no options */
    if (!num)
        return;

The other offender is the JIT stuff: it prints if COSTS is on and
there's some JIT activity to report, and otherwise you get nothing.
This is OK for text mode but it's bogus for the other formats.
Since we just rearranged EXPLAIN's JIT output anyway, now seems like
a good time to fix it.

I think we might as well go a little further and invent an explicit
JIT option for EXPLAIN, filling in the feature that Andres didn't
bother with originally.  What's not entirely clear to me is whether
to try to preserve the current behavior by making it track COSTS
if not explicitly specified. I'd rather decouple that and say
"you must write EXPLAIN (JIT [ON]) if you want JIT info"; but maybe
people will argue that it's already too late to change this?

Another debatable question is whether to print anything in non-JIT
builds.  We could, with a little bit of pain, print a lot of zeroes
and "falses".  If we stick with the current behavior of omitting
the JIT fields entirely, then that's extending the existing policy
to say that configuration options are also allowed to affect the
set of fields that are printed.  Given that we allow GUCs to affect
that set (cf track_io_timing), maybe this is okay; but it does seem
like it's weakening the promise of a well-defined data schema for
EXPLAIN output.

Any thoughts?  I'm happy to go make this happen if there's not a
lot of argument over what it should look like.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Mark Dilger
Date:
Subject: Re: making the backend's json parser work in frontend code
Next
From: Thomas Munro
Date:
Subject: Re: Strange coding in _mdfd_openseg()