Thread: EXPLAIN's handling of output-a-field-or-not decisions
I believe that the design intention for EXPLAIN's non-text output formats is that a given field should appear, or not, depending solely on the plan shape, EXPLAIN options, and possibly GUC settings. It's not okay to suppress a field just because it's empty or zero or otherwise uninteresting, because that makes life harder for automated tools that now have to cope with expected fields maybe not being there. See for instance what I wrote in commit 8ebb69f85: [Partial Mode] did not appear at all for a non-parallelized Agg plan node, which is contrary to expectation in non-text formats. We're notionally producing objects that conform to a schema, so the set of fields for a given node type and EXPLAIN mode should be well-defined. I set it up to fill in "Simple" in such cases. Other fields that were added for parallel query, namely "Parallel Aware" and Gather's "Single Copy", had not gotten the word on that point either. Make them appear always in non-text output. (This is intentionally different from the policy for TEXT-format output, which is meant to be human-readable so suppressing boring data is sensible.) But I noticed while poking at the EXPLAIN code yesterday that recent patches haven't adhered to this policy too well. For one, EXPLAIN (SETTINGS) suppresses the "Settings" subnode if there's nothing to report. This is just wrong, but I think all we have to do is delete the over-eager early exit: /* also bail out of there are no options */ if (!num) return; The other offender is the JIT stuff: it prints if COSTS is on and there's some JIT activity to report, and otherwise you get nothing. This is OK for text mode but it's bogus for the other formats. Since we just rearranged EXPLAIN's JIT output anyway, now seems like a good time to fix it. I think we might as well go a little further and invent an explicit JIT option for EXPLAIN, filling in the feature that Andres didn't bother with originally. What's not entirely clear to me is whether to try to preserve the current behavior by making it track COSTS if not explicitly specified. I'd rather decouple that and say "you must write EXPLAIN (JIT [ON]) if you want JIT info"; but maybe people will argue that it's already too late to change this? Another debatable question is whether to print anything in non-JIT builds. We could, with a little bit of pain, print a lot of zeroes and "falses". If we stick with the current behavior of omitting the JIT fields entirely, then that's extending the existing policy to say that configuration options are also allowed to affect the set of fields that are printed. Given that we allow GUCs to affect that set (cf track_io_timing), maybe this is okay; but it does seem like it's weakening the promise of a well-defined data schema for EXPLAIN output. Any thoughts? I'm happy to go make this happen if there's not a lot of argument over what it should look like. regards, tom lane
Hi, On 2020-01-26 15:13:49 -0500, Tom Lane wrote: > The other offender is the JIT stuff: it prints if COSTS is on and > there's some JIT activity to report, and otherwise you get nothing. > This is OK for text mode but it's bogus for the other formats. > Since we just rearranged EXPLAIN's JIT output anyway, now seems like > a good time to fix it. No objection. I think the current choice really is just about hiding JIT information in the cases where we display explain output in the tests. That output can't change depending on the build environment and settings (it's e.g. hugely useful to force all queries to be JITed for coverage). We did discuss adding a JIT option in 11, but it wasn't clear it'd be useful at that time (hence the "Might want to separate that out from COSTS at a later stage." comment ExplainOnePlan). I'm not really bothered by entire sections that a re optional in the other formats, tbh. Especially when they're configurable, more recent and/or dependent on build options - tools are going to have to deal with being absent anyway, and that's usually just about no code. > I think we might as well go a little further and invent an explicit > JIT option for EXPLAIN, filling in the feature that Andres didn't > bother with originally. Yea, I've introduced that in slightly different form in the thread I referenced yesterday too. See 0004 in https://www.postgresql.org/message-id/20191029000229.fkjmuld3g7f2jq7i%40alap3.anarazel.de although it was about adding additional details, rather than showing the "basic" information. I'd probably want to make JIT a tristate (off, on, details), instead of a boolean, but that's details. And can be changed later. > What's not entirely clear to me is whether to try to preserve the > current behavior by making it track COSTS if not explicitly > specified. > I'd rather decouple that and say "you must write EXPLAIN (JIT [ON]) if > you want JIT info"; but maybe people will argue that it's already too > late to change this? I don't think "too late" is a concern, I don't think there's much of a "reliance" interest here. Especially not after whacking things around already. One concern I do have is that I think we need the overall time for JIT to be displayed regardless of the JIT option. Otherwise it's going to be much harder to diagnose cases where some person reports an issue with executor startup being slow, because of JIT overhead. I think we shouldn't require users to guess that that's where they should look. That need, combined with wanting the regression test output to keep stable, is really what lead to tying the jit information to COSTS. I guess we could make it so that JIT is inferred based on COSTS, unless explicitly present. A bit automagical, but ... > Another debatable question is whether to print anything in non-JIT > builds. We could, with a little bit of pain, print a lot of zeroes > and "falses". If we stick with the current behavior of omitting > the JIT fields entirely, then that's extending the existing policy > to say that configuration options are also allowed to affect the > set of fields that are printed. Given that we allow GUCs to affect > that set (cf track_io_timing), maybe this is okay; but it does seem > like it's weakening the promise of a well-defined data schema for > EXPLAIN output. Hm. I don't think I have an opinion on this. I can see an argument either way. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2020-01-26 15:13:49 -0500, Tom Lane wrote: >> The other offender is the JIT stuff: it prints if COSTS is on and >> there's some JIT activity to report, and otherwise you get nothing. >> This is OK for text mode but it's bogus for the other formats. >> Since we just rearranged EXPLAIN's JIT output anyway, now seems like >> a good time to fix it. > No objection. I think the current choice really is just about hiding JIT > information in the cases where we display explain output in the > tests. That output can't change depending on the build environment and > settings (it's e.g. hugely useful to force all queries to be JITed for > coverage). Right, but then ... > One concern I do have is that I think we need the overall time for JIT > to be displayed regardless of the JIT option. ... how are you going to square that desire with not breaking the regression tests? >> Another debatable question is whether to print anything in non-JIT >> builds. > Hm. I don't think I have an opinion on this. I can see an argument > either way. After a bit more thought I'm leaning to "print nothing", since as you say tools would have to cope with that anyway if they want to work with old releases. Also, while it's not that hard to print dummy data for JIT even in a non-JIT build, I can imagine some future feature where it *would* be hard. So setting a precedent that we must provide dummy output for unimplemented features could come back to bite us. regards, tom lane
Hi, On 2020-01-26 16:54:58 -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2020-01-26 15:13:49 -0500, Tom Lane wrote: > >> The other offender is the JIT stuff: it prints if COSTS is on and > >> there's some JIT activity to report, and otherwise you get nothing. > >> This is OK for text mode but it's bogus for the other formats. > >> Since we just rearranged EXPLAIN's JIT output anyway, now seems like > >> a good time to fix it. > > > No objection. I think the current choice really is just about hiding JIT > > information in the cases where we display explain output in the > > tests. That output can't change depending on the build environment and > > settings (it's e.g. hugely useful to force all queries to be JITed for > > coverage). > > Right, but then ... > > > One concern I do have is that I think we need the overall time for JIT > > to be displayed regardless of the JIT option. > > ... how are you going to square that desire with not breaking the > regression tests? Well, that's how we arrived at turning off JIT information when COSTS OFF, because that's already something all the EXPLAINs in the regression tests have to do. I do not want to regress from the current state, with regard to both regression tests, and seeing at least a top-line time in the normal EXPLAIN ANALYZE cases. I've previously wondered about adding a REGRESS option to EXPLAIN would not actually be a good one, so we can move the magic into that, rather than options that are also otherwise relevant. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2020-01-26 16:54:58 -0500, Tom Lane wrote: >> ... how are you going to square that desire with not breaking the >> regression tests? > Well, that's how we arrived at turning off JIT information when COSTS > OFF, because that's already something all the EXPLAINs in the regression > tests have to do. I do not want to regress from the current state, with > regard to both regression tests, and seeing at least a top-line time in > the normal EXPLAIN ANALYZE cases. Right, but that's still just a hack. > I've previously wondered about adding a REGRESS option to EXPLAIN would > not actually be a good one, so we can move the magic into that, rather > than options that are also otherwise relevant. I'd be inclined to think about a GUC actually. force_parallel_mode = regress is sort of precedent for that, and we do already have the infrastructure needed to force a database-level GUC setting for regression databases. I can see some advantages to making it an explicit EXPLAIN option instead --- but unless we wanted to back-patch it, it'd be a real pain in the rear for back-patching regression test cases. regards, tom lane
Hi, On 2020-01-26 17:53:09 -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > I've previously wondered about adding a REGRESS option to EXPLAIN would > > not actually be a good one, so we can move the magic into that, rather > > than options that are also otherwise relevant. > > I'd be inclined to think about a GUC actually. > force_parallel_mode = regress is sort of precedent for that, > and we do already have the infrastructure needed to force a > database-level GUC setting for regression databases. Yea, a GUC could work too. What would it do exactly? Hide COSTS, TIMING, JIT, unless explicitly turned on in the EXPLAIN? And perhaps also "redact" a few things that we currently manually have to filter out? And then we'd leave the implicit JIT to on, to allow users to see where time is spent? Or were you thinking of something different entirely? > I can see some advantages to making it an explicit EXPLAIN option > instead --- but unless we wanted to back-patch it, it'd be a real > pain in the rear for back-patching regression test cases. Hm. Would it really be harder? I'd expect that we'd end up writing tests in master that need additional options to be usable in the back branches. Seems we'd definitely need to backpatch the GUC? Greetings, Andres Freund