Re: JIT performance bug/regression & JIT EXPLAIN - Mailing list pgsql-hackers

From Tom Lane
Subject Re: JIT performance bug/regression & JIT EXPLAIN
Date
Msg-id 18465.1580145353@sss.pgh.pa.us
Whole thread Raw
In response to Re: JIT performance bug/regression & JIT EXPLAIN  (Maciek Sakrejda <m.sakrejda@gmail.com>)
Responses Re: JIT performance bug/regression & JIT EXPLAIN  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Maciek Sakrejda <m.sakrejda@gmail.com> writes:
> On Fri, Nov 15, 2019 at 5:49 AM Robert Haas <robertmhaas@gmail.com> wrote:
>> Personally, I don't care very much about backward-compatibility, or
>> about how hard it is for tools to parse. I want it to be possible, but
>> if it takes a little extra effort, so be it.

> I think these are two separate issues. I agree on
> backward-compatibility (especially if we can embed a server version in
> structured EXPLAIN output to make it easier for tools to track format
> differences), but not caring how hard it is for tools to parse? What's
> the point of structured formats, then?

I'd not been paying any attention to this thread, but Andres just
referenced it in another discussion, so I went back and read it.
Here's my two cents:

* I agree with Robert that conditionally changing "Output" to "Project" is
an absolutely horrid idea.  That will break every tool that looks at this
stuff, and it just flies in the face of the design principle that the
output schema should be stable, and it'll be a long term pain-in-the-rear
for regression test back-patching, and it will confuse users much more than
it will help them.  The other idea of suppressing "Output" in cases where
no projection is happening might be all right, but only in text format
where we don't worry about schema stability.  Another idea perhaps is
to emit "Output: all columns" (in text formats, less sure what to do in
structured formats).

* In the structured formats, I think it should be okay to convert
expression-ish fields from being raw strings to being {Expression}
sub-nodes with the raw string as one field.  Aside from making it easy
to inject JIT info, that would also open the door to someday showing
expressions in some more-parse-able format than a string, since other
representations could also be added as new fields.  (I have a vague
recollection of wanting a list of all the Vars used in an expression,
for example.)

* Unfortunately that does nothing for the problem of how to show
per-expression JIT info in text format.  Maybe we just shouldn't.
I do not think that the readability-vs-usefulness tradeoff is going
to be all that good there, anyway.  Certainly for testing purposes
it's going to be more useful to examine portions of a structured output.

* I'm not on board with the idea of adding a version number to the
structured output formats.  In the first place, it's too late, since
we didn't leave room for one to begin with.  In the second, an overall
version number just isn't very helpful for this sort of problem.  If a
tool sees a version number higher than the latest thing it knows, what's
it supposed to do, just fail?  In practice it could still extract an awful
lot of info, so that really isn't a desirable answer.  It's better if the
data structure is such that a tool can understand that some sub-part of
the data is something it can't interpret, and just ignore that part.
(This is more or less the same design principle that PNG image format
was built on, FWIW.)  Adding on fields to an existing node type easily
meets that requirement, as does inventing new sub-node types, and that's
all that we've done so far.  But I think that replacing a scalar field
value with a sub-node probably works too (at least for well-written
tools), so the expression change suggested above should be OK.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Delaying/avoiding BTreeTupleGetNAtts() call within _bt_compare()
Next
From: Andres Freund
Date:
Subject: Re: JIT performance bug/regression & JIT EXPLAIN