Re: machine-readable explain output - Mailing list pgsql-hackers

From Robert Haas
Subject Re: machine-readable explain output
Date
Msg-id 603c8f070906160622i384839d8t1ebe8c7011f86c35@mail.gmail.com
Whole thread Raw
In response to Re: machine-readable explain output  (Andres Freund <andres@anarazel.de>)
Responses Re: machine-readable explain output  (Andres Freund <andres@anarazel.de>)
Re: machine-readable explain output  (Andrew Dunstan <andrew@dunslane.net>)
Re: machine-readable explain output  (Peter Eisentraut <peter_e@gmx.net>)
List pgsql-hackers
On Tue, Jun 16, 2009 at 8:53 AM, Andres Freund<andres@anarazel.de> wrote:
> On 06/16/2009 02:14 PM, Greg Stark wrote:
>>
>> On Tue, Jun 16, 2009 at 12:19 PM, Andres Freund<andres@anarazel.de>
>>  wrote:
>>>
>>> <Startup-Cost>1710.98</Startup-Cost>
>>> <Total-Cost>1710.98</Total-Cost>
>>> <Plan-Rows>72398</Plan-Rows>
>>> <Plan-Width>4</Plan-Width>
>>> <Actual-Startup-Time>136.595</Actual-Startup-Time>
>>> <Actual-Total-Time>136.595</Actual-Total-Time>
>>> <Actual-Rows>72398</Actual-Rows>
>>> <Actual-Loops>1</Actual-Loops>
>>
>> XML's not really my thing currently but it sure seems strange to me to
>> have *everything* be a separate tag like this. Doesn't XML do
>> attributes too? I would have thought to use child tags like this only
>> for things that have some further structure.
>
>> I would have expected something like:
>>
>> <join
>>     <scan type=sequential source="foo.bar">
>>         <estimates cost-startup=nnn cost-total=nnn rows=nnn width=nnn></>
>>         <actual time-startup=nnn time-total=nnnn rows=nnn loops=nnn></>
>>     </scan>
>>     <scan type=function source="foo.bar($1)">
>>         <parameters>
>>              <parameter name="$1" expression="...."></>
>>          </parameters>
>>      </scan>
>> </join>
>>
>>
>> This would allow something like a graphical explain plan to still make
>> sense of a plan even if it finds a node it doesn't recognize. It would
>> still know generally what to do with a "scan" node or a "join" node
>> even if it is a new type of scan or join.

As long as you understand how the current code uses <Plan> and
<Plans>, you can do this just as well with the current implementation.Each plan node gets a <Plan>.  If there are any
plans"under" it, it 
gets a <Plans> child which contains those.  Whether you put the
additional details into attributes or other tags is irrelevant.  As to
why I chose to do it this way, I had a couple of reasons:

1. It didn't seem very wise to go with the approach of trying to do
EVERYTHING with attributes.  If I did that, then I'd either get really
long lines that were not easily readable, or I'd have to write some
kind of complicated line wrapping code (which didn't seem to make a
lot of sense for a machine-readable format).  The current format isn't
the most beautiful thing I've ever seen, but you don't need a parser
to make sense of it, just a bit of patience.

2. I wanted the JSON output and the XML output to be similar, and that
seemed much easier with this design.

3. We have existing precedent for this design pattern in, e.g. table_to_xml

http://www.postgresql.org/docs/current/interactive/functions-xml.html

> While that also looks sensible the more structured variant makes it easier
> to integrate additional stats which may not easily be pressed in the
> 'attribute' format. As a fastly contrived example you could have io
> statistics over time like:
> <iostat>
>   <stat time="10" name=pagefault>...</stat>
>   <stat time="20" name=pagefault>...</stat>
>   <stat time="30" name=pagefault>...</stat>
> </iostat>
>
> Something like that would be harder with your variant.
>
> Structuring it in tags like suggested above:
> <Plan-Estimates>
>    <Startup-Cost>...</Startup-Cost>
>    ...
> </Plan-Estimates>
> <Execution-Cost>
>    <Startup-Cost>...</Startup-Cost>
>    ...
> </Execution-Cost>
>
> Enables displaying unknown 'scalar' values just like your variant and also
> allows more structured values.
>
> It would be interesting to get somebody having used the old explain in an
> automated fashion into this discussion...

Well, one problem with this is that the actual values are not costs,
but times, and the estimated values are not times, but costs.   The
planner estimates the cost of operations on an arbitrary scale where
the cost of a sequential page fetch is 1.0.  When we measure actual
times, they are in milliseconds.  There is no point that I can see in
making it appear that those are the same thing.  Observe the current
output:

explain analyze select 1;                                    QUERY PLAN
------------------------------------------------------------------------------------Result  (cost=0.00..0.01 rows=1
width=0)(actual time=0.005..0.007 
rows=1 loops=1)Total runtime: 0.243 ms
(2 rows)

...Robert


pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen
Next
From: Jeremy Kerr
Date:
Subject: Re: [PATCH] backend: compare word-at-a-time in bcTruelen