Re: generic options for explain - Mailing list pgsql-hackers

From Robert Haas
Subject Re: generic options for explain
Date
Msg-id 603c8f070905241827g74c8bf9cie9d98e38037a8356@mail.gmail.com
Whole thread Raw
In response to Re: generic options for explain  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Sun, May 24, 2009 at 6:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Greg Smith <gsmith@gregsmith.com> writes:
>> On Sun, 24 May 2009, Pavel Stehule wrote:
>>> we should have a secondary function explain_query(query_string,
>>> option) that returns setof some.
>
>> +1.  The incremental approach here should first be adding functions that
>> actually do the work required.  Then, if there's a set of those that look
>> to be extremely useful, maybe at that point it's worth talking about how
>> to integrate them into the parser.  Starting with the parser changes
>> rather than the parts that actually do the work is backwards.  If you do
>> it the other way around, at all times you have a patch that actually
>> provides immediate useful value were it to be committed.
>
>> Something that returns a setof can also be easily used to implement the
>> "dump EXPLAIN to a table" feature Josh Tolley brought up (which is another
>> common request in this area).
>
> A serious problem with EXPLAIN via a function returning set, or with
> putting the result into a table, is that set results are logically
> unordered, just as table contents are.  So from a strict point of view
> this only makes sense when the output format is designed to not depend
> on row ordering to convey information.  We could certainly invent such
> a format, but I think it's a mistake to go in this direction for
> EXPLAIN output that is similar to the current output.

The current output format basically prints out the node type and then
a collection of properties that are associated with that node, where
applicable: join type, scan direction, strategy, relation, alias,
startup cost, total cost, rows, width, loops, filter, join filter,
various types of condition (index/recheck/tid/merge/hash), sort key.
However, we tend to omit certain fields (particularly scan direction
and alias) when they contain information that isn't sufficiently
interesting.  That's probably not a good idea for anything that's
intended for machine-parsing; I think for XML or JSON or
output-to-a-table we'd want to include those fields whenever possible.

With that minor complaint (and the difficulty of figuring out how to
avoid duplicating large portions of explain.c), XML or JSON output
doesn't seem that difficult.  In JSON there aren't too many ways of
doing this that make sense.  I would guess we'd want something like
this:

{ "node" : "Hash Join", "jointype" : "Inner", "startup_cost" :
"11.49", "total_cost" : "92.59", "rows" : "1877", "width" : "325",  'outer' : { "node" : "Seq Scan", <more stuff> }
'inner': { "node" : "Hash", <more stuff> } 
}

XML, being XML, has 10 ways of doing something when 1 is more than
sufficient, so there are a couple of ways we could go.  I think this
is my favorite.

<HashJoin jointype="inner" startup_cost="11.49" total_cost="92.59"
rows="1877" width="325">  <SeqScan ... />  <Hash ...>     <SeqScan ... />  </Hash>
</HashJoin>

or you could just use one node type:

<node type="HashJoin" jointype="inner" startup_cost="11.49"
total_cost="92.59" rows="1877" width="325">  <node type="SeqScan" ... />  <node type="Hash" ...>     <node
type="SeqScan"... />  </node> 
</node>

...and then there's this style:

<HashJoin> <jointype>inner</jointype> <startup_cost>11.49</startup_cost> ... <outer/>    ... </outer> <inner> ...
</inner>
</HashJoin>

...which is incredibly long and redundant, but some people who use XML
like such things.  I'm sure there are other possibilities as well.

With respect to table output things are a little bit less
straightforward.  There are two issues.  With XML or JSON, any
properties that aren't relevant to a particular node can be omitted
altogether, whereas for a table the column list has to be consistent
throughout.  We can just leave the unused columns as NULL, but it will
be a fairly wide table.  Also, with apologies to Josh Berkus, we'll
need to introduce some sort of non-natural primary key to allow
children to be related to parents, because unlike XML and JSON,
there's no built-in way to make one object the child of another.

All that having been said, making EXPLAIN into a query (so that you
can do INSERT INTO foo (EXPLAIN SELECT query...) might be useful to
some people even without changing the output format at all.  I think
you could throw a windowing function in there to at least tag each
line with its original position in the output, and some people might
want just that.

...Robert


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: generic options for explain
Next
From: Robert Haas
Date:
Subject: Re: generic options for explain