Thread: machine-readable explain output

machine-readable explain output

From
Robert Haas
Date:
Here we go, XML and JSON output.

You will need to apply explain_refactor-v4.patch and
explain_options-v2.patch first, then apply the two patches attached to
this message.

http://archives.postgresql.org/pgsql-hackers/2009-06/msg00865.php
http://archives.postgresql.org/pgsql-hackers/2009-06/msg00866.php

The infrastructure patch applies first and is separated only for ease
of reviewing.  If the infrastructure patch applied by itself changes
any user-visible behavior, it's a bug.  The main patch does all the
heavy lifting.  The syntax is:

explain (format xml) ...
explain (format json, analyze) ...
explain (format text) ...   -- same as just plain old explain

If you don't like the syntax, please argue about that on the "generic
explain options v2" thread.  Let's try to use this thread to discuss
the output format, about which I spent a good deal of time agonizing.
I felt that it was important to keep the XML and JSON output as
similar to each other as possible.  This has the fairly obvious
advantage of reducing code complexity and the somewhat less obvious
advantage of avoiding expressing information in ways that are overly
tied to the syntax of XML.  I think the latter is actually a pretty
important point; it's hard to know that you've done something that's
actually generic unless you actually go through the exercise of making
it apply to two different cases.  This code is obviously not
completely generic; I did consider the idea that Greg Stark proposed
of having some kind of ephemeral internal format with multiple output
converters, but I couldn't figure out a way to make it work.  Still,
I've made a fairly determined effort to minimize the number of places
where we switch on the output format.  It's still larger than I'd
like, but I don't have any good ideas for trimming it down further.

There is an argument to be made that machine-readable output formats
shouldn't be afraid to print information that isn't displayed in the
regular output, but I haven't gone very far down that road in this
patch.  I lean toward the view that any additional information that
someone wants to have in the machine-readable format should also be an
available option for the text format, because I think the question of
WHAT you want to display and HOW you want to display it are largely
orthogonal (hence options are a separate patch, and this patch just
uses that infrastructure to implement an option for format).  But
there may be some exceptions.  At any rate, if it's possible, I would
like to get at least some of this work committed before I go too much
further with it, since this patch stack is already four layers deep
and my head may explode if it gets too much deeper.

If it's helpful to have any of these patches further decomposed for
reviewing purposes, see here, where they are broken out into
individual commits:

http://git.postgresql.org/gitweb?p=postgresql-rhaas.git;a=shortlog;h=refs/heads/explain_format

(It's probably a bad idea to clone this repository as I am updating
the patch set by rebasing, but it's useful for browsing.)

Comments appreciated...

...Robert

Attachment

Re: machine-readable explain output

From
Josh Berkus
Date:
On 6/11/09 10:15 PM, Robert Haas wrote:
> Here we go, XML and JSON output.
>
> You will need to apply explain_refactor-v4.patch and
> explain_options-v2.patch first, then apply the two patches attached to
> this message.

Wow, cool.  Can this work with auto_explain?  That's where I see 
machine-readable being most useful.

-- 
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com


Re: machine-readable explain output

From
Robert Haas
Date:
On Fri, Jun 12, 2009 at 11:47 AM, Josh Berkus<josh@agliodbs.com> wrote:
> On 6/11/09 10:15 PM, Robert Haas wrote:
>>
>> Here we go, XML and JSON output.
>>
>> You will need to apply explain_refactor-v4.patch and
>> explain_options-v2.patch first, then apply the two patches attached to
>> this message.
>
> Wow, cool.  Can this work with auto_explain?  That's where I see
> machine-readable being most useful.

The patch does touch contrib/auto_explain, but just enough to make it
keep working the same way it does now.  I don't think it would be too
hard to improve on that, though; I might work on it if I get bored,
but I'm hoping someone else will be motivated enough to do that part.
:-)

How would you go about extracting the XML/JSON bits from the rest of
what is in the log file?  (apologies if this is a question I should
already know the answer to)

...Robert


Re: machine-readable explain output

From
Josh Berkus
Date:
> How would you go about extracting the XML/JSON bits from the rest of
> what is in the log file?  (apologies if this is a question I should
> already know the answer to)

If you do CSV output, it's in a field.

-- 
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com


Re: machine-readable explain output

From
Andrew Dunstan
Date:

Josh Berkus wrote:
>
>> How would you go about extracting the XML/JSON bits from the rest of
>> what is in the log file?  (apologies if this is a question I should
>> already know the answer to)
>
> If you do CSV output, it's in a field.
>

And even if it's not, a well formed XML document would be fairly easy to 
extract, especially if the root element is well known (e.g. <pg:explain> 
or some such). And if the patch doesn't produce a well-formed XML doc 
then it needs work ;-). It might be nice if Robert were to post some 
samples of the output.

Quick question: does the patch use formal methods using, say, the DOM 
API to build up the XML, or informal methods  (like 
foo.append('<element>'); foo.append(content); foo.append('</element>'); )

As you can tell, I haven't looked over it yet. But I intend to ;-)

cheers

andrew


Re: machine-readable explain output

From
Robert Haas
Date:
On Fri, Jun 12, 2009 at 5:13 PM, Andrew Dunstan<andrew@dunslane.net> wrote:
> Josh Berkus wrote:
>>
>>> How would you go about extracting the XML/JSON bits from the rest of
>>> what is in the log file?  (apologies if this is a question I should
>>> already know the answer to)
>>
>> If you do CSV output, it's in a field.
>
> And even if it's not, a well formed XML document would be fairly easy to
> extract, especially if the root element is well known (e.g. <pg:explain> or
> some such). And if the patch doesn't produce a well-formed XML doc then it
> needs work ;-). It might be nice if Robert were to post some samples of the
> output.

<pgexplain>, as it happens...  I could post some samples of the
output, but it seems like it might be just as well to let those who
are curious try it for themselves.  I'd rather get opinions from
people who care enough to download & test than from those who are just
bikeshedding.  :-)

> Quick question: does the patch use formal methods using, say, the DOM API to
> build up the XML, or informal methods  (like foo.append('<element>');
> foo.append(content); foo.append('</element>'); )
>
> As you can tell, I haven't looked over it yet. But I intend to ;-)

Use the Source, Luke.  :-)

But, it's informal methods.  I don't see a lot of value in doing it
the other way, though perhaps I could be convinced otherwise.  One
thing that's nice about the way it works now is that the only support
function it requires is a basic XML-escaping function, which it turns
out we already have in the PG sources anyway, though not in a quite
usable form (the infrastructure patch deals with the necessary
adjustments).  So you can explain (format xml) even if you compile
without --with-libxml.

If you want to see how the actual XML/JSON stuff works, you might want
to start with the last patch in the series (explain_format).  If you
want to commit it, a course of action to which I can give my unbiased
endorsement, then you'll want to start with explain_refactor.

...Robert


Re: machine-readable explain output

From
Peter Eisentraut
Date:
On Friday 12 June 2009 08:15:17 Robert Haas wrote:
> Here we go, XML and JSON output.

Could you post some examples of how some plans would look in either format? 
That would help us judge the particulars.



Re: machine-readable explain output

From
Peter Eisentraut
Date:
On Saturday 13 June 2009 01:10:06 Robert Haas wrote:
> <pgexplain>, as it happens...  I could post some samples of the
> output, but it seems like it might be just as well to let those who
> are curious try it for themselves.  I'd rather get opinions from
> people who care enough to download & test than from those who are just
> bikeshedding.  :-)

I recommend, however, that you think about writing a regression test for this, 
so the interfaces are explicit, and those tweaking them in the future know 
what they are dealing with.

A couple of comments on the specifics of the output:

For the JSON format:

* Numbers should not be quoted.

For the XML format:

* Instead of <pgexplain>, use <explain> with an XML namespace declaration.

The schema name is missing in either output format.  I think that was supposed 
to be one of the features of this that the objects are unambiguously 
qualified.

I'm not sure I like element names such as <Node-Type>, instead of say 
<nodetype>, which is more like HTML and DocBook.  (Your way might be more like 
SOAP, I guess.)

Also, the result type of an EXPLAIN (format xml) should be type xml, not text.

In general, I like this direction very much.  There will probably be more 
tweaks on the output format over time.  It's not like the plain EXPLAIN hasn't 
been tweaked countless times.



Re: machine-readable explain output

From
Robert Haas
Date:
On Sat, Jun 13, 2009 at 9:08 AM, Peter Eisentraut<peter_e@gmx.net> wrote:
> On Saturday 13 June 2009 01:10:06 Robert Haas wrote:
>> <pgexplain>, as it happens...  I could post some samples of the
>> output, but it seems like it might be just as well to let those who
>> are curious try it for themselves.  I'd rather get opinions from
>> people who care enough to download & test than from those who are just
>> bikeshedding.  :-)
>
> I recommend, however, that you think about writing a regression test for this,
> so the interfaces are explicit, and those tweaking them in the future know
> what they are dealing with.

I would like to have something in this area, but Tom didn't think it
was workable.

http://archives.postgresql.org/message-id/603c8f070904151623ne07d744k615edd4aa669a64a@mail.gmail.com

Currently, we don't even have something trivial like "EXPLAIN SELECT
1" in the regression tests, so even if you completely break EXPLAIN so
that it core dumps (voice of experience speaking here) make check
still passes with flying colors.

One feature I'd like to add is an EXPLAIN-option for "COSTS", so that
you can say explain (costs off) ....  Then we could at least try a
couple of simple examples against the build-farm to see whether the
issues that Tom is worried about are problems in practice and to what
degree.  But I'm a little reluctant to develop that until at least
some of my existing work is committed, because at present I have no
guarantee either that this patch will be accepted or that it won't be
extensively modified by the committer, thus creating merge conflicts
for me to resolve.  However, assuming the infrastructure in the
explain_options patch is accepted in something similar to  its current
form, it should be a very easy patch to write when the time comes.

> A couple of comments on the specifics of the output:
>
> For the JSON format:
>
> * Numbers should not be quoted.

OK, will fix.

> For the XML format:
>
> * Instead of <pgexplain>, use <explain> with an XML namespace declaration.

Could you specify this a bit further, like write out exactly what you
want it to look like?  My XML-fu is not very strong.

> The schema name is missing in either output format.  I think that was supposed
> to be one of the features of this that the objects are unambiguously
> qualified.

Well, as I said, I'm not sure that this decision should be made based
on the selected output format.  I think it should be controlled by a
separate option that can be used for text, XML, or JSON.  Of course,
we also don't want to end up with a zillion options.  I think maybe
the existing VERBOSE option could be pressed into service here.  Right
now, all it does is print out the output lists for each node, but
maybe it could also have the effect of forcing the schema name to be
emitted, and any other similarly minor verbosities we run across.

There's other weirdness in this area too: when emitting a qual, we
table-qualify column names according to a complex heuristic (for scan
quals, when the outer plan is non-NULL or it's a subquery scan; for
upper quals, when the length of the range-table list is more than 1).
Not sure whether anyone cares about this or not.  In a similar vein,
in report_triggers(), we omit the constraint name if there is a
trigger name.  All of these seem like fairly good candidates for
things that you might want to behave differently if you ask for
"VERBOSE".

> I'm not sure I like element names such as <Node-Type>, instead of say
> <nodetype>, which is more like HTML and DocBook.  (Your way might be more like
> SOAP, I guess.)

I'm not sure I like them either.  I mostly did it that way because I
wanted to maintain consistency with the text output, which uses labels
like "Hash Cond" and "Filter".  So I just made the JSON format use
those same labels, and for the XML format, since tag names can't
contain spaces, I just replaced spaces with dashes.  Once I made that
decision it seemed like everything else should be consistent, so
that's what I did.  But we could certainly subject them all to some
additional regular transformation if we're so inclined.  I'm not sure
it's really worth the additional code complexity, but I don't care
very much.

> Also, the result type of an EXPLAIN (format xml) should be type xml, not text.

Seems reasonable.  I'll see if I can figure out how to do that.

> In general, I like this direction very much.  There will probably be more
> tweaks on the output format over time.  It's not like the plain EXPLAIN hasn't
> been tweaked countless times.

Cool, thanks for the review.  I have no illusions it won't get changed
further.  In all honesty, I'm most interested in the options syntax.
The multiple output formats portion is just a demonstration that you
can use the options syntax to enable interesting functionality, but I
personally have little use for it.  I'm hoping, however, that once we
have a standard way to add options, people will propose more options
that do interesting things; I have a few ideas myself.

...Robert


Re: machine-readable explain output

From
Bernd Helmle
Date:

--On 13. Juni 2009 15:01:43 -0400 Robert Haas <robertmhaas@gmail.com> wrote:

>> Also, the result type of an EXPLAIN (format xml) should be type xml, not
>> text.
>
> Seems reasonable.  I'll see if I can figure out how to do that.

I suppose it's okay then, that the format is not available when the server 
isn't build with --with-libxml ?


-- 
Thanks
Bernd


Re: machine-readable explain output

From
Tom Lane
Date:
Bernd Helmle <mailings@oopsware.de> writes:
> --On 13. Juni 2009 15:01:43 -0400 Robert Haas <robertmhaas@gmail.com> wrote:
>>> Also, the result type of an EXPLAIN (format xml) should be type xml, not
>>> text.
>> 
>> Seems reasonable.  I'll see if I can figure out how to do that.

> I suppose it's okay then, that the format is not available when the server 
> isn't build with --with-libxml ?

I believe we have things set up so that you can still print "xml" data
without libxml configured in.  We'd need to be sure casting to text
works too, but other than that I don't see an issue here.
        regards, tom lane


Re: machine-readable explain output

From
Robert Haas
Date:
On Sat, Jun 13, 2009 at 6:40 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> Bernd Helmle <mailings@oopsware.de> writes:
>> --On 13. Juni 2009 15:01:43 -0400 Robert Haas <robertmhaas@gmail.com> wrote:
>>>> Also, the result type of an EXPLAIN (format xml) should be type xml, not
>>>> text.
>>>
>>> Seems reasonable.  I'll see if I can figure out how to do that.
>
>> I suppose it's okay then, that the format is not available when the server
>> isn't build with --with-libxml ?
>
> I believe we have things set up so that you can still print "xml" data
> without libxml configured in.  We'd need to be sure casting to text
> works too, but other than that I don't see an issue here.

Hmm, I just tried to do this by modifying ExplainResultDesc to use
XMLOID rather than TEXTOID when stmt->format == EXPLAIN_FORMAT_XML,
and sure enough, explain (format xml) ... fails when --with-libxml is
not specified.  But maybe that's not the right way to do it - now that
I think about it, using that in combination with
do_text_output_multiline() seems totally wrong even if we end up
deciding not to worry about the output type, since while there are
multiple rows when the output is considered as text, there is surely
only one row when you look at the whole thing as an XML document.  I'm
not too sure how to do this though.  Help?

In any event, considering that EXPLAIN is a utility statement and
can't be embedded within a query, I'm not sure what benefit we get out
of returning the data as XML rather than text.  This doesn't seem
likely to change either, based on Tom's comments here.

http://archives.postgresql.org/pgsql-hackers/2009-05/msg00969.php

...Robert


Re: machine-readable explain output

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> In any event, considering that EXPLAIN is a utility statement and
> can't be embedded within a query, I'm not sure what benefit we get out
> of returning the data as XML rather than text.  This doesn't seem
> likely to change either, based on Tom's comments here.

> http://archives.postgresql.org/pgsql-hackers/2009-05/msg00969.php

I think you misinterpreted the point of that example, which is that
there already is a way to get the output of EXPLAIN into the system
for further processing.  Were this not so, we wouldn't be worrying
at all what data type it claims to have.  But since there is a way,
it's important what data type it produces.
        regards, tom lane


Re: machine-readable explain output

From
Robert Haas
Date:
On Sat, Jun 13, 2009 at 7:42 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> In any event, considering that EXPLAIN is a utility statement and
>> can't be embedded within a query, I'm not sure what benefit we get out
>> of returning the data as XML rather than text.  This doesn't seem
>> likely to change either, based on Tom's comments here.
>
>> http://archives.postgresql.org/pgsql-hackers/2009-05/msg00969.php
>
> I think you misinterpreted the point of that example, which is that
> there already is a way to get the output of EXPLAIN into the system
> for further processing.  Were this not so, we wouldn't be worrying
> at all what data type it claims to have.  But since there is a way,
> it's important what data type it produces.

Well, if you get the EXPLAIN output into the system by defining a
wrapper function, said wrapper function will return the type that it's
defined to return, regardless of what EXPLAIN itself returns, no?

I don't have a problem making it return XML; I'm just not exactly sure
how to do it.  Is it possible to get that working without depending on
libxml?  How?

...Robert


Re: machine-readable explain output

From
Peter Eisentraut
Date:
On Saturday 13 June 2009 22:01:43 Robert Haas wrote:
> > * Instead of <pgexplain>, use <explain> with an XML namespace
> > declaration.
>
> Could you specify this a bit further, like write out exactly what you
> want it to look like?  My XML-fu is not very strong.

Just replace your <pgexplain> by

<explain xmlns="http://www.postgresql.org/2009/explain">

The actual URI doesn't matter, as long as it is distinguishing.  The value I 
chose here follows conventions used by W3C.



Re: machine-readable explain output

From
Peter Eisentraut
Date:
On Saturday 13 June 2009 22:01:43 Robert Haas wrote:
> > I recommend, however, that you think about writing a regression test for
> > this, so the interfaces are explicit, and those tweaking them in the
> > future know what they are dealing with.
>
> I would like to have something in this area, but Tom didn't think it
> was workable.
>
> http://archives.postgresql.org/message-id/603c8f070904151623ne07d744k615edd
>4aa669a64a@mail.gmail.com
>
> Currently, we don't even have something trivial like "EXPLAIN SELECT
> 1" in the regression tests, so even if you completely break EXPLAIN so
> that it core dumps (voice of experience speaking here) make check
> still passes with flying colors.

That post described a scenario where you check whether given a data set and 
ANALYZE, the optimizer produces a certain plan.  I agree that that might be 
tricky.

A regression test for EXPLAIN, however, should primarily check whether the 
output format is stable.  We are planning to offer this as a public interface, 
after all.  You could use faked up statistics and all but one or two plan 
types turned off, and then the results should be pretty stable.  Unless the 
fundamental cost model changes, but it doesn't do that very often for the 
simpler plan types anyway.  Things to check for would be checking whether all 
the fields are there, quoted and escaped correctly, and what happens if 
statistics are missing or corrupted, etc.  Or whether you get any output at 
all, as you say.



Re: machine-readable explain output

From
Peter Eisentraut
Date:
On Sunday 14 June 2009 07:27:19 Robert Haas wrote:
> On Sat, Jun 13, 2009 at 7:42 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> > Robert Haas <robertmhaas@gmail.com> writes:
> >> In any event, considering that EXPLAIN is a utility statement and
> >> can't be embedded within a query, I'm not sure what benefit we get out
> >> of returning the data as XML rather than text.  This doesn't seem
> >> likely to change either, based on Tom's comments here.
> >>
> >> http://archives.postgresql.org/pgsql-hackers/2009-05/msg00969.php
> >
> > I think you misinterpreted the point of that example, which is that
> > there already is a way to get the output of EXPLAIN into the system
> > for further processing.  Were this not so, we wouldn't be worrying
> > at all what data type it claims to have.  But since there is a way,
> > it's important what data type it produces.
>
> Well, if you get the EXPLAIN output into the system by defining a
> wrapper function, said wrapper function will return the type that it's
> defined to return, regardless of what EXPLAIN itself returns, no?
>
> I don't have a problem making it return XML; I'm just not exactly sure
> how to do it.  Is it possible to get that working without depending on
> libxml?  How?

Even if this doesn't end up being feasible, I feel it's important that the XML 
and JSON formats return one datum, not one per line.  Otherwise a client that 
wants to do some processing on the result will have to do about three extra 
steps to get the result usable.


Re: machine-readable explain output

From
Dave Page
Date:
On 6/13/09, Bernd Helmle <mailings@oopsware.de> wrote:
>
>
> --On 13. Juni 2009 15:01:43 -0400 Robert Haas <robertmhaas@gmail.com> wrote:
>
>>> Also, the result type of an EXPLAIN (format xml) should be type xml, not
>>> text.
>>
>> Seems reasonable.  I'll see if I can figure out how to do that.
>
> I suppose it's okay then, that the format is not available when the server
> isn't build with --with-libxml ?

I hope not, otherwise the usefulness of the format is significantly
reduced (to practically zero) if tools cannot rely on it being
available and have to fall back to something else if it's not
available.

-- 
Dave Page
EnterpriseDB UK:   http://www.enterprisedb.com


Re: machine-readable explain output

From
Pavel Stehule
Date:
2009/6/14 Dave Page <dpage@pgadmin.org>:
> On 6/13/09, Bernd Helmle <mailings@oopsware.de> wrote:
>>
>>
>> --On 13. Juni 2009 15:01:43 -0400 Robert Haas <robertmhaas@gmail.com> wrote:
>>
>>>> Also, the result type of an EXPLAIN (format xml) should be type xml, not
>>>> text.
>>>
>>> Seems reasonable.  I'll see if I can figure out how to do that.
>>
>> I suppose it's okay then, that the format is not available when the server
>> isn't build with --with-libxml ?
>
> I hope not, otherwise the usefulness of the format is significantly
> reduced (to practically zero) if tools cannot rely on it being
> available and have to fall back to something else if it's not
> available.
>

I thing so using --with-libxml is good idea. Is nonsense repeat some
necessary xml code like xml escaping and similar. And almost all
distributed PostgreSQL binaries are compiled with xml support, so this
cannot do some problems. When somebody compile pg without xml support,
then he knows what he do.

regards
Pavel Stehule

> --
> Dave Page
> EnterpriseDB UK:   http://www.enterprisedb.com
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: machine-readable explain output

From
Dave Page
Date:
On 6/14/09, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> 2009/6/14 Dave Page <dpage@pgadmin.org>:
>> On 6/13/09, Bernd Helmle <mailings@oopsware.de> wrote:
>>>
>>>
>>> --On 13. Juni 2009 15:01:43 -0400 Robert Haas <robertmhaas@gmail.com>
>>> wrote:
>>>
>>>>> Also, the result type of an EXPLAIN (format xml) should be type xml,
>>>>> not
>>>>> text.
>>>>
>>>> Seems reasonable.  I'll see if I can figure out how to do that.
>>>
>>> I suppose it's okay then, that the format is not available when the
>>> server
>>> isn't build with --with-libxml ?
>>
>> I hope not, otherwise the usefulness of the format is significantly
>> reduced (to practically zero) if tools cannot rely on it being
>> available and have to fall back to something else if it's not
>> available.
>>
>
> I thing so using --with-libxml is good idea. Is nonsense repeat some
> necessary xml code like xml escaping and similar. And almost all
> distributed PostgreSQL binaries are compiled with xml support, so this
> cannot do some problems. When somebody compile pg without xml support,
> then he knows what he do.

That will mean we never get to use XML explain in pgAdmin. We're not
in the business of writing basic features that might work, if the
postgres packager enabled an option. We need to be able to rely on
such features always being available.

-- 
Dave Page
EnterpriseDB UK:   http://www.enterprisedb.com


Re: machine-readable explain output

From
Andrew Dunstan
Date:

Dave Page wrote:
>> I thing so using --with-libxml is good idea. Is nonsense repeat some
>> necessary xml code like xml escaping and similar. And almost all
>> distributed PostgreSQL binaries are compiled with xml support, so this
>> cannot do some problems. When somebody compile pg without xml support,
>> then he knows what he do.
>>     
>
> That will mean we never get to use XML explain in pgAdmin. We're not
> in the business of writing basic features that might work, if the
> postgres packager enabled an option. We need to be able to rely on
> such features always being available.
>
>   

As a matter of curiosity, do we have any idea what platforms don't 
support libxml2?

cheers

andrew




Re: machine-readable explain output

From
Tom Lane
Date:
Dave Page <dpage@pgadmin.org> writes:
> On 6/14/09, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> I thing so using --with-libxml is good idea.

> That will mean we never get to use XML explain in pgAdmin.

Exactly.  We are *not* going to make libxml a required piece of
infrastructure, and that means that XML-format explain output will
be useless to most client tools if it doesn't work without libxml.
        regards, tom lane


Re: machine-readable explain output

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> As a matter of curiosity, do we have any idea what platforms don't 
> support libxml2?

It's only partially about whether libxml2 is portable enough.  A person
building Postgres might also have legitimate concerns about how bug-free
and/or secure it is.  We've already spent nontrivial amounts of time
working around libxml bugs; and as for security, google shows at least
four CVEs against libxml2 in the past two years, so it's not a
negligible risk.  I can entirely see people choosing to build without it.
        regards, tom lane


Re: machine-readable explain output

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> A regression test for EXPLAIN, however, should primarily check whether the 
> output format is stable.  We are planning to offer this as a public interface, 
> after all.  You could use faked up statistics and all but one or two plan 
> types turned off, and then the results should be pretty stable.

You'd be surprised :-(.  We've found in the past that queries in the
regression tests get different plans across different platforms just
because of alignment-rule differences (leading to different numbers
of rows per page, etc etc).  I think that test cases could be chosen
to be relatively stable points in the plan space, but it's hopeless
to imagine that the low-order digits of cost estimates will be the
same across all platforms.
        regards, tom lane


Re: machine-readable explain output

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Sat, Jun 13, 2009 at 6:40 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>> I believe we have things set up so that you can still print "xml" data
>> without libxml configured in. �We'd need to be sure casting to text
>> works too, but other than that I don't see an issue here.

> Hmm, I just tried to do this by modifying ExplainResultDesc to use
> XMLOID rather than TEXTOID when stmt->format == EXPLAIN_FORMAT_XML,
> and sure enough, explain (format xml) ... fails when --with-libxml is
> not specified.

That's because the code goes through BuildTupleFromCStrings, which
invokes xml_in in this scenario, and xml_in (as opposed to xml_out)
does depend on libxml.

However, using BuildTupleFromCStrings is wasteful/stupid for *both*
text and xml output, so it seems like getting rid of it is the thing
to do here.
        regards, tom lane


Re: machine-readable explain output

From
Robert Haas
Date:
On Sun, Jun 14, 2009 at 11:28 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Sat, Jun 13, 2009 at 6:40 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>>> I believe we have things set up so that you can still print "xml" data
>>> without libxml configured in.  We'd need to be sure casting to text
>>> works too, but other than that I don't see an issue here.
>
>> Hmm, I just tried to do this by modifying ExplainResultDesc to use
>> XMLOID rather than TEXTOID when stmt->format == EXPLAIN_FORMAT_XML,
>> and sure enough, explain (format xml) ... fails when --with-libxml is
>> not specified.
>
> That's because the code goes through BuildTupleFromCStrings, which
> invokes xml_in in this scenario, and xml_in (as opposed to xml_out)
> does depend on libxml.
>
> However, using BuildTupleFromCStrings is wasteful/stupid for *both*
> text and xml output, so it seems like getting rid of it is the thing
> to do here.

Makes sense.  However, if we just make that change in do_tup_output(),
then we'll break the ability to use that function for non-text
datatypes.   Currently that doesn't look like a problem, because the
only clients are ShowGUCConfigOption(), do_text_output_oneline(), and
do_text_output_multiline(),


Re: machine-readable explain output

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Sun, Jun 14, 2009 at 11:28 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>> However, using BuildTupleFromCStrings is wasteful/stupid for *both*
>> text and xml output, so it seems like getting rid of it is the thing
>> to do here.

> Makes sense.  However, if we just make that change in do_tup_output(),
> then we'll break the ability to use that function for non-text
> datatypes.

I'd envision it taking Datums, so it doesn't really matter.  However,
as you say, specializing it to text only wouldn't be much of a loss.
        regards, tom lane


Re: machine-readable explain output

From
Robert Haas
Date:
On Sun, Jun 14, 2009 at 1:02 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Sun, Jun 14, 2009 at 11:28 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>>> However, using BuildTupleFromCStrings is wasteful/stupid for *both*
>>> text and xml output, so it seems like getting rid of it is the thing
>>> to do here.
>
>> Makes sense.  However, if we just make that change in do_tup_output(),
>> then we'll break the ability to use that function for non-text
>> datatypes.
>
> I'd envision it taking Datums, so it doesn't really matter.  However,
> as you say, specializing it to text only wouldn't be much of a loss.

I like the Datum option, so I'll work up a patch for that, unless you
want to just do it and spare me the trouble.  :-)

...Robert


Re: machine-readable explain output

From
Robert Haas
Date:
On Sun, Jun 14, 2009 at 1:04 PM, Robert Haas<robertmhaas@gmail.com> wrote:
> On Sun, Jun 14, 2009 at 1:02 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> On Sun, Jun 14, 2009 at 11:28 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>>>> However, using BuildTupleFromCStrings is wasteful/stupid for *both*
>>>> text and xml output, so it seems like getting rid of it is the thing
>>>> to do here.
>>
>>> Makes sense.  However, if we just make that change in do_tup_output(),
>>> then we'll break the ability to use that function for non-text
>>> datatypes.
>>
>> I'd envision it taking Datums, so it doesn't really matter.  However,
>> as you say, specializing it to text only wouldn't be much of a loss.
>
> I like the Datum option, so I'll work up a patch for that, unless you
> want to just do it and spare me the trouble.  :-)

Here's an attempt.  Is this anything like what you had in mind?

...Robert

Attachment

Re: machine-readable explain output

From
Robert Haas
Date:
I wrote:
> On Sun, Jun 14, 2009 at 1:04 PM, Robert Haas<robertmhaas@gmail.com> wrote:
>> On Sun, Jun 14, 2009 at 1:02 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>>> Robert Haas <robertmhaas@gmail.com> writes:
>>>> On Sun, Jun 14, 2009 at 11:28 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>>>>> However, using BuildTupleFromCStrings is wasteful/stupid for *both*
>>>>> text and xml output, so it seems like getting rid of it is the thing
>>>>> to do here.
>>>
>>>> Makes sense.  However, if we just make that change in do_tup_output(),
>>>> then we'll break the ability to use that function for non-text
>>>> datatypes.
>>>
>>> I'd envision it taking Datums, so it doesn't really matter.  However,
>>> as you say, specializing it to text only wouldn't be much of a loss.
>>
>> I like the Datum option, so I'll work up a patch for that, unless you
>> want to just do it and spare me the trouble.  :-)
>
> Here's an attempt.  Is this anything like what you had in mind?

Hmm... on further review, I'm thinking this is still a bit wastful,
because we don't really need (I think) to call
TupleDescGetAttInMetadata from begin_tup_output_tupdesc.  But I'm not
sure what the best way is to avoid that.  Any thoughts?

...Robert


Re: machine-readable explain output

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> Hmm... on further review, I'm thinking this is still a bit wastful,
> because we don't really need (I think) to call
> TupleDescGetAttInMetadata from begin_tup_output_tupdesc.  But I'm not
> sure what the best way is to avoid that.  Any thoughts?

Er, just don't do it?  We shouldn't need it if the function is doing
heap_form_tuple directly.
        regards, tom lane


Re: machine-readable explain output

From
Robert Haas
Date:
On Mon, Jun 15, 2009 at 9:51 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Hmm... on further review, I'm thinking this is still a bit wastful,
>> because we don't really need (I think) to call
>> TupleDescGetAttInMetadata from begin_tup_output_tupdesc.  But I'm not
>> sure what the best way is to avoid that.  Any thoughts?
>
> Er, just don't do it?  We shouldn't need it if the function is doing
> heap_form_tuple directly.

Oh, I guess that works.  I had thought there might be people calling
begin_tup_output_tupdesc() who wanted to go on to call
BuildTupleFromCStrings(), but it seems that's not the case.  In fact,
it looks like I can probably rip that member out of TupOutputState
altogether.

Will update patch.  Does this look like what you were thinking otherwise?

Thanks,

...Robert


Re: machine-readable explain output

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> it looks like I can probably rip that member out of TupOutputState
> altogether.

> Will update patch.  Does this look like what you were thinking otherwise?

Yeah, that's exactly what I was thinking.
        regards, tom lane


Re: machine-readable explain output

From
Andres Freund
Date:
Hi,

On 06/12/2009 07:15 AM, Robert Haas wrote:
> If you don't like the syntax, please argue about that on the "generic
> explain options v2" thread.  Let's try to use this thread to discuss
> the output format, about which I spent a good deal of time agonizing.
I spent some time playing around with the explain output with various 
queries. Beside the already raised mild dislike (from Peter Eisentraut I 
think) of Upper-Case "-" seperated tag-names I found mainly one gripe:

<Startup-Cost>1710.98</Startup-Cost>
<Total-Cost>1710.98</Total-Cost>
<Plan-Rows>72398</Plan-Rows>
<Plan-Width>4</Plan-Width>
<Actual-Startup-Time>136.595</Actual-Startup-Time>
<Actual-Total-Time>136.595</Actual-Total-Time>
<Actual-Rows>72398</Actual-Rows>
<Actual-Loops>1</Actual-Loops>

This is a bit inconsistent. i.e. for the row estimate you use 
<Plan-Rows/> and for <Startup-Cost/> you dont use the "Plan-" Prefix.
While for the 'analyze' generated variables you use the 'Actual-' prefix 
consistently.

One approach would be to have two nodes like:
<Plan-Estimates>    <Startup-Cost>...</Startup-Cost>    ...
</Plan-Estimates>
<Execution-Cost>    <Startup-Cost>...</Startup-Cost>    ...
</Execution-Cost>

This would probably make it easier to write a future proof parser and it 
also seems semantically sensible.


As an aside issue it would perhaps be nice (thinking of an 
index-suggestion tool) to make it possible for having seperate estimates 
on <Index-Cond> an <Filter> - In order not to change the format later 
that perhaps has to be considered here.
Perhaps the current structure + some additional tags is also the best 
here - I just noticed it being a potential issue.

Andres


Re: machine-readable explain output

From
Greg Stark
Date:
On Tue, Jun 16, 2009 at 12:19 PM, Andres Freund<andres@anarazel.de> wrote:
> <Startup-Cost>1710.98</Startup-Cost>
> <Total-Cost>1710.98</Total-Cost>
> <Plan-Rows>72398</Plan-Rows>
> <Plan-Width>4</Plan-Width>
> <Actual-Startup-Time>136.595</Actual-Startup-Time>
> <Actual-Total-Time>136.595</Actual-Total-Time>
> <Actual-Rows>72398</Actual-Rows>
> <Actual-Loops>1</Actual-Loops>

XML's not really my thing currently but it sure seems strange to me to
have *everything* be a separate tag like this. Doesn't XML do
attributes too? I would have thought to use child tags like this only
for things that have some further structure.

I would have expected something like:

<join   <scan type=sequential source="foo.bar">       <estimates cost-startup=nnn cost-total=nnn rows=nnn width=nnn></>
     <actual time-startup=nnn time-total=nnnn rows=nnn loops=nnn></>   </scan>   <scan type=function
source="foo.bar($1)">      <parameters>            <parameter name="$1" expression="...."></>        </parameters>
</scan>
</join>


This would allow something like a graphical explain plan to still make
sense of a plan even if it finds a node it doesn't recognize. It would
still know generally what to do with a "scan" node or a "join" node
even if it is a new type of scan or join.

-- 
greg
http://mit.edu/~gsstark/resume.pdf


Re: machine-readable explain output

From
Andres Freund
Date:
On 06/16/2009 02:14 PM, Greg Stark wrote:
> On Tue, Jun 16, 2009 at 12:19 PM, Andres Freund<andres@anarazel.de>  wrote:
>> <Startup-Cost>1710.98</Startup-Cost>
>> <Total-Cost>1710.98</Total-Cost>
>> <Plan-Rows>72398</Plan-Rows>
>> <Plan-Width>4</Plan-Width>
>> <Actual-Startup-Time>136.595</Actual-Startup-Time>
>> <Actual-Total-Time>136.595</Actual-Total-Time>
>> <Actual-Rows>72398</Actual-Rows>
>> <Actual-Loops>1</Actual-Loops>
>
> XML's not really my thing currently but it sure seems strange to me to
> have *everything* be a separate tag like this. Doesn't XML do
> attributes too? I would have thought to use child tags like this only
> for things that have some further structure.

> I would have expected something like:
>
> <join
>      <scan type=sequential source="foo.bar">
>          <estimates cost-startup=nnn cost-total=nnn rows=nnn width=nnn></>
>          <actual time-startup=nnn time-total=nnnn rows=nnn loops=nnn></>
>      </scan>
>      <scan type=function source="foo.bar($1)">
>          <parameters>
>               <parameter name="$1" expression="...."></>
>           </parameters>
>       </scan>
> </join>
>
>
> This would allow something like a graphical explain plan to still make
> sense of a plan even if it finds a node it doesn't recognize. It would
> still know generally what to do with a "scan" node or a "join" node
> even if it is a new type of scan or join.
While that also looks sensible the more structured variant makes it 
easier to integrate additional stats which may not easily be pressed in 
the 'attribute' format. As a fastly contrived example you could have io 
statistics over time like:
<iostat>   <stat time="10" name=pagefault>...</stat>   <stat time="20" name=pagefault>...</stat>   <stat time="30"
name=pagefault>...</stat>
</iostat>

Something like that would be harder with your variant.

Structuring it in tags like suggested above:
<Plan-Estimates>    <Startup-Cost>...</Startup-Cost>    ...
</Plan-Estimates>
<Execution-Cost>    <Startup-Cost>...</Startup-Cost>    ...
</Execution-Cost>

Enables displaying unknown 'scalar' values just like your variant and 
also allows more structured values.

It would be interesting to get somebody having used the old explain in 
an automated fashion into this discussion...

Andres


Re: machine-readable explain output

From
Robert Haas
Date:
On Tue, Jun 16, 2009 at 8:53 AM, Andres Freund<andres@anarazel.de> wrote:
> On 06/16/2009 02:14 PM, Greg Stark wrote:
>>
>> On Tue, Jun 16, 2009 at 12:19 PM, Andres Freund<andres@anarazel.de>
>>  wrote:
>>>
>>> <Startup-Cost>1710.98</Startup-Cost>
>>> <Total-Cost>1710.98</Total-Cost>
>>> <Plan-Rows>72398</Plan-Rows>
>>> <Plan-Width>4</Plan-Width>
>>> <Actual-Startup-Time>136.595</Actual-Startup-Time>
>>> <Actual-Total-Time>136.595</Actual-Total-Time>
>>> <Actual-Rows>72398</Actual-Rows>
>>> <Actual-Loops>1</Actual-Loops>
>>
>> XML's not really my thing currently but it sure seems strange to me to
>> have *everything* be a separate tag like this. Doesn't XML do
>> attributes too? I would have thought to use child tags like this only
>> for things that have some further structure.
>
>> I would have expected something like:
>>
>> <join
>>     <scan type=sequential source="foo.bar">
>>         <estimates cost-startup=nnn cost-total=nnn rows=nnn width=nnn></>
>>         <actual time-startup=nnn time-total=nnnn rows=nnn loops=nnn></>
>>     </scan>
>>     <scan type=function source="foo.bar($1)">
>>         <parameters>
>>              <parameter name="$1" expression="...."></>
>>          </parameters>
>>      </scan>
>> </join>
>>
>>
>> This would allow something like a graphical explain plan to still make
>> sense of a plan even if it finds a node it doesn't recognize. It would
>> still know generally what to do with a "scan" node or a "join" node
>> even if it is a new type of scan or join.

As long as you understand how the current code uses <Plan> and
<Plans>, you can do this just as well with the current implementation.Each plan node gets a <Plan>.  If there are any
plans"under" it, it 
gets a <Plans> child which contains those.  Whether you put the
additional details into attributes or other tags is irrelevant.  As to
why I chose to do it this way, I had a couple of reasons:

1. It didn't seem very wise to go with the approach of trying to do
EVERYTHING with attributes.  If I did that, then I'd either get really
long lines that were not easily readable, or I'd have to write some
kind of complicated line wrapping code (which didn't seem to make a
lot of sense for a machine-readable format).  The current format isn't
the most beautiful thing I've ever seen, but you don't need a parser
to make sense of it, just a bit of patience.

2. I wanted the JSON output and the XML output to be similar, and that
seemed much easier with this design.

3. We have existing precedent for this design pattern in, e.g. table_to_xml

http://www.postgresql.org/docs/current/interactive/functions-xml.html

> While that also looks sensible the more structured variant makes it easier
> to integrate additional stats which may not easily be pressed in the
> 'attribute' format. As a fastly contrived example you could have io
> statistics over time like:
> <iostat>
>   <stat time="10" name=pagefault>...</stat>
>   <stat time="20" name=pagefault>...</stat>
>   <stat time="30" name=pagefault>...</stat>
> </iostat>
>
> Something like that would be harder with your variant.
>
> Structuring it in tags like suggested above:
> <Plan-Estimates>
>    <Startup-Cost>...</Startup-Cost>
>    ...
> </Plan-Estimates>
> <Execution-Cost>
>    <Startup-Cost>...</Startup-Cost>
>    ...
> </Execution-Cost>
>
> Enables displaying unknown 'scalar' values just like your variant and also
> allows more structured values.
>
> It would be interesting to get somebody having used the old explain in an
> automated fashion into this discussion...

Well, one problem with this is that the actual values are not costs,
but times, and the estimated values are not times, but costs.   The
planner estimates the cost of operations on an arbitrary scale where
the cost of a sequential page fetch is 1.0.  When we measure actual
times, they are in milliseconds.  There is no point that I can see in
making it appear that those are the same thing.  Observe the current
output:

explain analyze select 1;                                    QUERY PLAN
------------------------------------------------------------------------------------Result  (cost=0.00..0.01 rows=1
width=0)(actual time=0.005..0.007 
rows=1 loops=1)Total runtime: 0.243 ms
(2 rows)

...Robert


Re: machine-readable explain output

From
Andres Freund
Date:
On 06/16/2009 03:22 PM, Robert Haas wrote:
> Well, one problem with this is that the actual values are not costs,
> but times, and the estimated values are not times, but costs.   The
> planner estimates the cost of operations on an arbitrary scale where
> the cost of a sequential page fetch is 1.0.  When we measure actual
> times, they are in milliseconds.  There is no point that I can see in
> making it appear that those are the same thing.  Observe the current
> output:
Well - the aim was not to make it possible to use the same name for
"<plan-startup-cost>" and "<actual-startup-cost>" but to group them in 
some way - so you can decide in some way (prefix or below a distinct 
node)  if they are related to planning or execution (And thus making it 
easier to handle unknown tags).
That <actual-startup-time/> morphed into <startup-cost/> instead of 
<startup-time> was just a typo.
Another solution would be to rename <Startup-Cost> into 
<Plan-Startup-Cost> for consistency. But grouping them by some node 
seems to be a bit more future-proof.


Andres



Re: machine-readable explain output

From
Andrew Dunstan
Date:

Robert Haas wrote:
> 3. We have existing precedent for this design pattern in, e.g. table_to_xml
>
> http://www.postgresql.org/docs/current/interactive/functions-xml.html
>
>   

Tables are flat, explain output is not.

If there is a relationship between the items then that needs to be 
expressed in the XML structure, either by use of child nodes or 
attributes. Relying on the sequence of nodes, if that's what you're 
doing, is not a good idea, and will make postprocessing the XML using 
XSLT, for example, quite a bit harder. (Processing a foo that comes 
after a bar is possible but not as natural as processing a foo that is a 
child or attribute of a bar)

Anyway, I think what this discussion points out is that we actually need 
a formal XML Schema for this output.

cheers

andrew


Re: machine-readable explain output

From
Andres Freund
Date:
On 06/16/2009 03:45 PM, Andrew Dunstan wrote:>> 3. We have existing precedent for this design pattern in, e.g.>>
table_to_xml>>http://www.postgresql.org/docs/current/interactive/functions-xml.html> Tables are flat, explain output is
not.
Comparing Greg's approach with Robert's it seems to me that Robert's 
approach isn't flatter than Greg's - it just relies more on nodes.

> If there is a relationship between the items then that needs to be
> expressed in the XML structure, either by use of child nodes or
> attributes. Relying on the sequence of nodes, if that's what you're
> doing, is not a good idea, and will make postprocessing the XML using
> XSLT, for example, quite a bit harder. (Processing a foo that comes
> after a bar is possible but not as natural as processing a foo that is a
> child or attribute of a bar)
How would you model something like:
<plans>  <plan> ... </plan>  <plan> ... </plan>  ...
</plans>
otherwise?

There are potentially unlimited number of child nodes - AppendNode for 
example can have any number of them. Sure, you can give each <plan> node 
a 'offset=' id, but that doesn't buy much.
I don't see how that could be much improved by using child-nodes (or 
even worse attributes).

That is as far as I have seen the only place where the format relies on 
the sequence of nodes.


> Anyway, I think what this discussion points out is that we actually need
> a formal XML Schema for this output.
Agreed.

If helpful I can create a schema for the current format.

Andres


Re: machine-readable explain output

From
Robert Haas
Date:
On Tue, Jun 16, 2009 at 9:45 AM, Andrew Dunstan<andrew@dunslane.net> wrote:
> Robert Haas wrote:
>>
>> 3. We have existing precedent for this design pattern in, e.g.
>> table_to_xml
>>
>> http://www.postgresql.org/docs/current/interactive/functions-xml.html
>
> Tables are flat, explain output is not.
>
> If there is a relationship between the items then that needs to be expressed
> in the XML structure, either by use of child nodes or attributes. Relying on
> the sequence of nodes, if that's what you're doing, is not a good idea, and

I'm not doing that.  Period, full stop.  The discussion was only about
attributes vs. child nodes.

> Anyway, I think what this discussion points out is that we actually need a
> formal XML Schema for this output.

Well, I don't know how to write one, and am not terribly interested in
learning.  Perhaps someone else would be interested?

...Robert


Re: machine-readable explain output

From
Robert Haas
Date:
On Tue, Jun 16, 2009 at 10:30 AM, Andres Freund<andres@anarazel.de> wrote:
> How would you model something like:
> <plans>
>  <plan> ... </plan>
>  <plan> ... </plan>
>  ...
> </plans>
> otherwise?
>
> There are potentially unlimited number of child nodes - AppendNode for
> example can have any number of them. Sure, you can give each <plan> node a
> 'offset=' id, but that doesn't buy much.
> I don't see how that could be much improved by using child-nodes (or even
> worse attributes).

Note that even in this case we DON'T rely on the ordering of the
nodes.  The inner <plan> nodes have child nodes which contain their
relationship to the parent.

...Robert


Re: machine-readable explain output

From
Andres Freund
Date:
Hi,

On 06/16/2009 04:32 PM, Robert Haas wrote:
> On Tue, Jun 16, 2009 at 10:30 AM, Andres Freund<andres@anarazel.de>  wrote:
>> How would you model something like:
>> <plans>
>>   <plan>  ...</plan>
>>   <plan>  ...</plan>
>>   ...
>> </plans>
>> otherwise?
>>
>> There are potentially unlimited number of child nodes - AppendNode for
>> example can have any number of them. Sure, you can give each<plan>  node a
>> 'offset=' id, but that doesn't buy much.
>> I don't see how that could be much improved by using child-nodes (or even
>> worse attributes).
> Note that even in this case we DON'T rely on the ordering of the
> nodes.  The inner<plan>  nodes have child nodes which contain their
> relationship to the parent.
Not in the case of Append nodes, but I fail to see a problem there, so...

Andres


Re: machine-readable explain output

From
Andrew Dunstan
Date:

Robert Haas wrote:
>>
>> If there is a relationship between the items then that needs to be expressed
>> in the XML structure, either by use of child nodes or attributes. Relying on
>> the sequence of nodes, if that's what you're doing, is not a good idea, and
>>     
>
> I'm not doing that.  Period, full stop.  The discussion was only about
> attributes vs. child nodes.
>
>   
>   
OK, I misread something you wrote, which prompted me to say that. 
Rereading it I realise my error. My apologies.

cheers

andrew


Re: machine-readable explain output

From
Robert Haas
Date:
On Tue, Jun 16, 2009 at 10:59 AM, Andrew Dunstan<andrew@dunslane.net> wrote:
>
>
> Robert Haas wrote:
>>>
>>> If there is a relationship between the items then that needs to be
>>> expressed
>>> in the XML structure, either by use of child nodes or attributes. Relying
>>> on
>>> the sequence of nodes, if that's what you're doing, is not a good idea,
>>> and
>>>
>>
>> I'm not doing that.  Period, full stop.  The discussion was only about
>> attributes vs. child nodes.
>>
>>
>
> OK, I misread something you wrote, which prompted me to say that. Rereading
> it I realise my error. My apologies.

No problem, no apologies needed.  I guess we do emit nodes like append
plans in the same order that they'd be emitted in text mode.  Right
now we don't emit any additional information beyond putting them in
the same order, but I suppose that could be changed if needs be.

...Robert


Re: machine-readable explain output

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 06/16/2009 04:32 PM, Robert Haas wrote:
>> Note that even in this case we DON'T rely on the ordering of the
>> nodes.  The inner<plan>  nodes have child nodes which contain their
>> relationship to the parent.

> Not in the case of Append nodes, but I fail to see a problem there, so...

The order of Append child nodes is in fact significant.  If this
representation loses that information then it needs to be fixed.
However, is it really so bad to be relying on node order for this?
        regards, tom lane


Re: machine-readable explain output

From
Andrew Dunstan
Date:

Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
>   
>> On 06/16/2009 04:32 PM, Robert Haas wrote:
>>     
>>> Note that even in this case we DON'T rely on the ordering of the
>>> nodes.  The inner<plan>  nodes have child nodes which contain their
>>> relationship to the parent.
>>>       
>
>   
>> Not in the case of Append nodes, but I fail to see a problem there, so...
>>     
>
> The order of Append child nodes is in fact significant.  If this
> representation loses that information then it needs to be fixed.
> However, is it really so bad to be relying on node order for this?
>
>             
>   

No, if there is a genuine sequence of items then relying on node order 
is just fine. My earlier (mistaken) reference was to possibly relying on 
node order for a non-sequence relationship.

cheers

andrew


Re: machine-readable explain output

From
Andrew Dunstan
Date:

Andres Freund wrote:
>> Anyway, I think what this discussion points out is that we actually need
>> a formal XML Schema for this output.
> Agreed.
>
> If helpful I can create a schema for the current format.
>
>

That will give us a useful starting point.

cheers

andrew


Re: machine-readable explain output

From
Greg Stark
Date:
On Tue, Jun 16, 2009 at 1:53 PM, Andres Freund<andres@anarazel.de> wrote:
> While that also looks sensible the more structured variant makes it easier
> to integrate additional stats which may not easily be pressed in the
> 'attribute' format. As a fastly contrived example you could have io
> statistics over time like:
> <iostat>
>   <stat time="10" name=pagefault>...</stat>
>   <stat time="20" name=pagefault>...</stat>
>   <stat time="30" name=pagefault>...</stat>
> </iostat>
>
> Something like that would be harder with your variant.

Actually that's exactly the kind of example I had in mind to make easier.

I'm picturing adding a new tag, such as <iostats>, or actually I was
thinking of <dtrace>. If we have separate tags for all the estimates
and actual timings then any tags which come with the <iostat> or
<dtrace> option would just get mixed up with the estimates and timing
info.

Each new module would provide a single tag which would have some
attributes and some child tags depending on how much structure it
needs. In cases where there's no structure, just a fixed list of
scalars like the existing expected and actual stats I don't see any
advantage to making each scalar a tag. (There's not much disadvantage
except I would have said it was completely unreadable for a human
given that you would have pages and pages of output for a significant
size plan.)

So your plan might look like

<scan type=...> <expected cost=...></> <actual time=...></> <iostats>     <samples>         <sample time=nnn
value=nnn></>    </samples> </iostats> <dtrace script="foo.d">     <probes>         <probe name=foo result=nnn></>
  <probe name=bar result=nnn></>     </probes> </dtrace> 

That would make it easy for a tool like pgadmin which doesn't know
what to do with the iostats to ignore the whole chunk, rather than
have to dig through a list of stats some of which come from iostats
and some from dtrace and some from the instrumentation and have to
figure out which tags are things it can use and which are things it
can't.

--
Gregory Stark
http://mit.edu/~gsstark/resume.pdf


Re: machine-readable explain output

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> I'm picturing adding a new tag, such as <iostats>, or actually I was
> thinking of <dtrace>. If we have separate tags for all the estimates
> and actual timings then any tags which come with the <iostat> or
> <dtrace> option would just get mixed up with the estimates and timing
> info.

FWIW, I like Greg's idea of subdividing the available data this way.
I'm no XML guru, so maybe there is a better way to do it --- but a
very large part of the reason for doing this at all is to have an
extensible format, and part of that IMHO is that client programs should
be able to have some rough idea of what things are even when they
don't know it exactly.

But I'd be just as happy with a naming convention, like
<planner:rowcount> versus <actual:rowcount>, etc.  I don't know
enough about XML usage to understand the benefits and costs of
different ways of providing that kind of structure.
        regards, tom lane


Re: machine-readable explain output

From
Andrew Dunstan
Date:

Tom Lane wrote:
> But I'd be just as happy with a naming convention, like
> <planner:rowcount> versus <actual:rowcount>, etc.  I don't know
> enough about XML usage to understand the benefits and costs of
> different ways of providing that kind of structure.
>
>   
FYI, you probably don't want this. the ':' is not just another character, it separates the namespace designator from
thelocal name. We probably only want one namespace. You can use '-' or '_'  or '.' inside names to give them some
structurebeyond XML semantics.
 


cheers

andrew


Re: machine-readable explain output

From
Robert Haas
Date:
On Tue, Jun 16, 2009 at 12:04 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> Greg Stark <gsstark@mit.edu> writes:
>> I'm picturing adding a new tag, such as <iostats>, or actually I was
>> thinking of <dtrace>. If we have separate tags for all the estimates
>> and actual timings then any tags which come with the <iostat> or
>> <dtrace> option would just get mixed up with the estimates and timing
>> info.
>
> FWIW, I like Greg's idea of subdividing the available data this way.
> I'm no XML guru, so maybe there is a better way to do it --- but a
> very large part of the reason for doing this at all is to have an
> extensible format, and part of that IMHO is that client programs should
> be able to have some rough idea of what things are even when they
> don't know it exactly.

I like it too, but I'd like to see us come up with a design that
allows it to be used for all of the output formats (text, XML, and
JSON).  I think it we should be looking for a way to allow modules to
publish abstract objects like property-value mappings, or lists of
strings, rather than thinking strictly in terms of XML.  If we have a
module called foo that emits property bar with value baz and property
bletch with value quux, then in text format we can print:

Module Foo: Bar: Bletch Baz: Quux

In XML we can print:

<Modules> <Module>   <Module-Name>Foo</Module-Name>   <Bar>Bletch</Bar>   <Baz>Quux</Baz> </Module>
</Modules>

(or any of about 10 reasonable alternatives that are functionally identical)

In JSON we can print

"Modules" : [ {   "Module Name" : "Foo",   "Bar": "Bletch",   "Baz": "Quux" }
]

(or any of about 2 reasonable alternatives that are functionally identical)

If we start thinking in terms of "provide an API to insert XML into
the XML-format output", we get back to my original complaint: if the
only way of getting additional data is to piece through the XML
output, then we'll quickly reach the point where users need XSLT and
stylesheets to extract the data they care about.  I think that's an
annoyance that is easily avoidable.

...Robert


Re: machine-readable explain output

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Jun 16, 2009 at 12:04 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>> FWIW, I like Greg's idea of subdividing the available data this way.

> I like it too, but I'd like to see us come up with a design that
> allows it to be used for all of the output formats (text, XML, and
> JSON).  I think it we should be looking for a way to allow modules to
> publish abstract objects like property-value mappings, or lists of
> strings, rather than thinking strictly in terms of XML.  If we have a
> module called foo that emits property bar with value baz and property
> bletch with value quux, then ...

This seems to be missing the point I was trying to make, which is that
a design like that actually offers no leverage at all: if you don't know
all about foo to start with, you have no idea what to do with either bar
or bletch.  You can *parse* the data, since it's in XML or JSON or
whatever, but you don't know what it is.

The EXPLAIN problem is a fairly constrained universe: there is going to
be a tree of plan nodes, there are going to be some static properties of
each plan node, and there may or may not be various sorts of estimates
and/or measurements attached to each one.  What I'm after is that code
examining the output can know "oh, this is a measurement" even if it
hasn't heard of the particular kind of measurement.

As a concrete example of what I'm thinking about, I'd hope that PgAdmin
would be able to display a graphical summary of a plan tree, and then
pop up measurements associated with one of the nodes when you
right-click on that node.  To do this, it doesn't necessarily have to
know all about each specific measurement that a particular backend
version might emit; but it needs to be able to tell which things are
measurements.
        regards, tom lane


Re: machine-readable explain output

From
Robert Haas
Date:
On Tue, Jun 16, 2009 at 1:21 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Jun 16, 2009 at 12:04 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>>> FWIW, I like Greg's idea of subdividing the available data this way.
>
>> I like it too, but I'd like to see us come up with a design that
>> allows it to be used for all of the output formats (text, XML, and
>> JSON).  I think it we should be looking for a way to allow modules to
>> publish abstract objects like property-value mappings, or lists of
>> strings, rather than thinking strictly in terms of XML.  If we have a
>> module called foo that emits property bar with value baz and property
>> bletch with value quux, then ...
>
> This seems to be missing the point I was trying to make, which is that
> a design like that actually offers no leverage at all: if you don't know
> all about foo to start with, you have no idea what to do with either bar
> or bletch.  You can *parse* the data, since it's in XML or JSON or
> whatever, but you don't know what it is.
>
> The EXPLAIN problem is a fairly constrained universe: there is going to
> be a tree of plan nodes, there are going to be some static properties of
> each plan node, and there may or may not be various sorts of estimates
> and/or measurements attached to each one.  What I'm after is that code
> examining the output can know "oh, this is a measurement" even if it
> hasn't heard of the particular kind of measurement.
>
> As a concrete example of what I'm thinking about, I'd hope that PgAdmin
> would be able to display a graphical summary of a plan tree, and then
> pop up measurements associated with one of the nodes when you
> right-click on that node.  To do this, it doesn't necessarily have to
> know all about each specific measurement that a particular backend
> version might emit; but it needs to be able to tell which things are
> measurements.

*scratches head*

So you're looking for a way to categorize the data that appear in the
output by type, like any given piece of data is either a measurement,
an estimate, or a part of the plan structure?

It seems to me that with a sufficiently powerful API, add-on modules
could emit arbitrary stuff that might not fall into the categories
that you've mentioned.  For example, there was a previous EXPLAIN XML
patch which contained a bunch of code that spit out plans that were
considered but not chosen.  And there could easily be other kinds of
less invasive add-ons that would still want to emit properties that
are formatted as text or lists rather than measurements per se.

I think it's kind of hopeless to think that a third-party module is
going to be able to do much better than to display any unexpected
properties whose value is just text and punt any unexpected properties
whose value is a complex object (nested tags in XML-parlance, hash in
JSON).

I have a feeling I'm still missing the point here...

...Robert


Re: machine-readable explain output

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Jun 16, 2009 at 1:21 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>> As a concrete example of what I'm thinking about, I'd hope that PgAdmin
>> would be able to display a graphical summary of a plan tree, and then
>> pop up measurements associated with one of the nodes when you
>> right-click on that node.

> It seems to me that with a sufficiently powerful API, add-on modules
> could emit arbitrary stuff that might not fall into the categories
> that you've mentioned.

I don't have a problem with inventing new categories when we need to.
What I'm objecting to is using the above to justify flattening the
design completely, so that the only way to know anything about
a particular datum is to know that type of datum specifically.
There is way more structure in EXPLAIN than that, and we should
design it accordingly.

(Note that any information about rejected plans could not usefully be
attached to the plan tree anyway; it'd have to be put in some other
child of the topmost node.)

> And there could easily be other kinds of
> less invasive add-ons that would still want to emit properties that
> are formatted as text or lists rather than measurements per se.

By "measurement" I did not mean to imply "single number".  Text strings
or lists could be handled very easily, I think, especially since there
are explicit ways to represent those in XML.

The main point here is that we have a pretty good idea of what
general-purpose client code is likely to want to do with the data, and
in a lot of cases that does not translate to having to know each node
type explicitly, so long as it can be categorized.
        regards, tom lane


Re: machine-readable explain output

From
Robert Haas
Date:
On Tue, Jun 16, 2009 at 2:12 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> The main point here is that we have a pretty good idea of what
> general-purpose client code is likely to want to do with the data, and
> in a lot of cases that does not translate to having to know each node
> type explicitly, so long as it can be categorized.

I agree.  I'm just not seeing the need for an *explicit*
categorization contained within the data itself.  For one thing, AIUI,
that's the job of things like an XML Schema, which Andres Freund has
already agreed to write, and I would expect that would be of some
value to tool-writers, else why are we creating it?  I also think
scalars and lists are recognizable without any particular additional
markup at all, just by introspection of the contents.

Even if we do need some kind of additional markup, I'm reluctant to
try to design it without some feedback from people writing actual
tools about what they find inadequate in the current output.  The good
news is that if this patch gets committed fairly quickly after the
release of 8.4, tool authors should have enough time to discover where
any bodies are buried in time to fix them before 8.5.  But I'm really
unconvinced that any of this minor formatting stuff is going to rise
to the level of a real problem.

...Robert


Re: machine-readable explain output

From
Andres Freund
Date:
On 06/16/2009 09:51 PM, Robert Haas wrote:
> On Tue, Jun 16, 2009 at 2:12 PM, Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>> The main point here is that we have a pretty good idea of what
>> general-purpose client code is likely to want to do with the data,
>> and in a lot of cases that does not translate to having to know
>> each node type explicitly, so long as it can be categorized.
> I agree.  I'm just not seeing the need for an *explicit*
> categorization contained within the data itself.  For one thing,
> AIUI, that's the job of things like an XML Schema, which Andres
> Freund has already agreed to write, and I would expect that would be
> of some value to tool-writers, else why are we creating it?
It defines how exactly the output has to look - thats not easily
readable out of explain.c - so anything that could be created and
validated with that schema should be acceptable by $tool - even if
explain may not create it.
Just like EBNF or similar for other languages.

It does not help categorizing values in planner/execution/whatever
categories automatedly by some tool though.

I attached a simple relaxng schema - if somebody likes another format
that should be generatable out of that (using trang). It surely could
use some more work, but I think its detailed enough for now.

> I also think scalars and lists are recognizable without any
> particular additional markup at all, just by introspection of the
> contents.
That somewhat defies the usage of a strictly structured format? Perhaps
I am misunderstanding you though.

On another note it may be interesting to emit the current options to
explain in xml/json format - although that depends whether the option
syntax will be accepted.


Writing the schema I noticed something else I did not like about the
current format:

<Triggers>
    <Trigger>
        <Trigger>Name</Trigger>
        or:
        <Constraint>ConstraintName</Constraint>
    </Trigger>
</Triggers>

The double usage of "<Trigger/>" seems to be somewhat ugly. Renaming it
to <TriggerName>/<ConstraintName> seems to be a good idea - at least
when staying at the current tag oriented style.

Andres

Attachment

Re: machine-readable explain output

From
Peter Eisentraut
Date:
On Tuesday 16 June 2009 16:22:27 Robert Haas wrote:
> 1. It didn't seem very wise to go with the approach of trying to do
> EVERYTHING with attributes.  If I did that, then I'd either get really
> long lines that were not easily readable, or I'd have to write some
> kind of complicated line wrapping code (which didn't seem to make a
> lot of sense for a machine-readable format).  The current format isn't
> the most beautiful thing I've ever seen, but you don't need a parser
> to make sense of it, just a bit of patience.

There are obviously a lot of ways to go about defining an XML format, but here 
is another one of them:

A plan is a tree of plan nodes.  Each node has some information attached to 
it, such as row counts and costs.

If you consider an XML document to be a tree of element nodes, then this falls 
into place naturally.  Each plan is an element, and all the other information 
are attributes.

With this, visual explain would be completely trivial.


Re: machine-readable explain output

From
Robert Haas
Date:
On Wed, Jun 17, 2009 at 10:27 AM, Peter Eisentraut<peter_e@gmx.net> wrote:
> On Tuesday 16 June 2009 16:22:27 Robert Haas wrote:
>> 1. It didn't seem very wise to go with the approach of trying to do
>> EVERYTHING with attributes.  If I did that, then I'd either get really
>> long lines that were not easily readable, or I'd have to write some
>> kind of complicated line wrapping code (which didn't seem to make a
>> lot of sense for a machine-readable format).  The current format isn't
>> the most beautiful thing I've ever seen, but you don't need a parser
>> to make sense of it, just a bit of patience.
>
> There are obviously a lot of ways to go about defining an XML format, but here
> is another one of them:
>
> A plan is a tree of plan nodes.  Each node has some information attached to
> it, such as row counts and costs.
>
> If you consider an XML document to be a tree of element nodes, then this falls
> into place naturally.  Each plan is an element, and all the other information
> are attributes.
>
> With this, visual explain would be completely trivial.

So what do you do about things like sort keys and target lists, that
the current code outputs as structured lists?

...Robert


Re: machine-readable explain output

From
Andres Freund
Date:
On 06/17/2009 04:27 PM, Peter Eisentraut wrote:
> On Tuesday 16 June 2009 16:22:27 Robert Haas wrote:
>> 1. It didn't seem very wise to go with the approach of trying to do
>> EVERYTHING with attributes.  If I did that, then I'd either get really
>> long lines that were not easily readable, or I'd have to write some
>> kind of complicated line wrapping code (which didn't seem to make a
>> lot of sense for a machine-readable format).  The current format isn't
>> the most beautiful thing I've ever seen, but you don't need a parser
>> to make sense of it, just a bit of patience.
>
> There are obviously a lot of ways to go about defining an XML format, but here
> is another one of them:
>
> A plan is a tree of plan nodes.  Each node has some information attached to
> it, such as row counts and costs.
> If you consider an XML document to be a tree of element nodes, then this falls
> into place naturally.  Each plan is an element, and all the other information
> are attributes.
So, the only change from the current schema would be to do move all 
additional information into attributes?

> With this, visual explain would be completely trivial.
Only that some attributes may need some more structure than a single 
scalar value.

Also that would need extra handling for each attribute to consider if 
its a information about planning or execution...


Andres


Re: machine-readable explain output

From
Peter Eisentraut
Date:
On Tuesday 16 June 2009 22:51:37 Robert Haas wrote:
> I agree.  I'm just not seeing the need for an *explicit*
> categorization contained within the data itself.  For one thing, AIUI,
> that's the job of things like an XML Schema, which Andres Freund has
> already agreed to write, and I would expect that would be of some
> value to tool-writers, else why are we creating it? 

A schema will just tell which documents are valid, not what they mean.  That 
is the job of XML ontologies, which are about as pie-in-the-sky as the 
semantic web that they are supposed to end up building.



Re: machine-readable explain output

From
Peter Eisentraut
Date:
On Tuesday 16 June 2009 20:21:21 Tom Lane wrote:
> As a concrete example of what I'm thinking about, I'd hope that PgAdmin
> would be able to display a graphical summary of a plan tree, and then
> pop up measurements associated with one of the nodes when you
> right-click on that node.  To do this, it doesn't necessarily have to
> know all about each specific measurement that a particular backend
> version might emit; but it needs to be able to tell which things are
> measurements.

To do this, you pack all "measurements" into a <measurement> element, and then 
tools are just told to display those.

Really, this isn't much different (or at all different) from designing an 
extensible tree data structure in any programming language.


Re: machine-readable explain output

From
Robert Haas
Date:
On Wed, Jun 17, 2009 at 10:40 AM, Peter Eisentraut<peter_e@gmx.net> wrote:
> On Tuesday 16 June 2009 20:21:21 Tom Lane wrote:
>> As a concrete example of what I'm thinking about, I'd hope that PgAdmin
>> would be able to display a graphical summary of a plan tree, and then
>> pop up measurements associated with one of the nodes when you
>> right-click on that node.  To do this, it doesn't necessarily have to
>> know all about each specific measurement that a particular backend
>> version might emit; but it needs to be able to tell which things are
>> measurements.
>
> To do this, you pack all "measurements" into a <measurement> element, and then
> tools are just told to display those.

I think this is markup for the sake of markup.  Right now, if we were
to add 10 additional options, all they'd need to do is call
ExplainPropertyText() and the right stuff would happen.  If we go this
route, we'll need to worry about getting each property into the right
subgroup, and argue about whether the assignment of properties to
subgroups is correct or whether we need to rearrange the subgroups (is
"sort method" a "measurement"?).  Our chances of us not having to
change this again in the future are a lot better if we just report the
data and let third-party applications worry about categorizing it if
they want to.

Possibly it would make sense to introduce groups for the portions of
the output which are added in response to particular options; for
example, we could have a section called "ANALYZE" that contains the
data that is only present when ANALYZE is used.  But this has the same
complicating effect on the code.  You'd have to get the
explain_tuplesort() stuff into the same sub-node as the analyze times
and loop counts, for example, which would require non-trivial
restructuring of the existing code for no clear benefit.  You'll
quickly get into a situation where you print the same information from
completely different parts of the code depending on whether or not the
output is text format, which is going to make maintaining this a bear.

I think the most common use case for this output format is going to be
to feed it to an XML parser and use xpath against it, or feed it into
a JSON parser and then write things like $plan->{"Actual Rows"} or
plan["Actual Rows"], depending on what language you use to process it
after you parse it.  Or you may have people who iterate over sort keys
%$plan and print all the values for which !ref $plan->{$key}.
Unnecessary levels of nesting just make the xpath expressions (or
perl/python/javascript hash/array dereferences) longer.  Changing tags
to attributes or visca versa changes which xpath expression you use to
get the data you want, but that's about it.

...Robert


Re: machine-readable explain output

From
tomas@tuxteam.de
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, Jun 17, 2009 at 04:32:51PM +0200, Andres Freund wrote:
> On 06/17/2009 04:27 PM, Peter Eisentraut wrote:
>> On Tuesday 16 June 2009 16:22:27 Robert Haas wrote:
>>> 1. It didn't seem very wise to go with the approach of trying to do
>>> EVERYTHING with attributes [...]

>> There are obviously a lot of ways to go about defining an XML format, but 
>> here
>> is another one of them:
>>
>> A plan is a tree of plan nodes.  Each node has some information attached 
>> to
>> it, such as row counts and costs.
>> If you consider an XML document to be a tree of element nodes, then this 
>> falls
>> into place naturally.  Each plan is an element, and all the other 
>> information
>> are attributes.
[...]

> So, the only change from the current schema would be to do move all 
> additional information into attributes?
>
>> With this, visual explain would be completely trivial.
> Only that some attributes may need some more structure than a single scalar 
> value.
>
> Also that would need extra handling for each attribute to consider if its a 
> information about planning or execution...

One of the common pitfalls of XML is that designers think first in terms
of the XML representation before being clear on the abstract structure
of what they want to represent. This might have something to do with the
overly baroque language (e.g. having two hierarchy "dimensions": tag
nesting and attributes, etc.).

So when Peter writes about "attributes", I would tend _not_ to read them
as "XML attributes" but rather as abstract attributes. Whether they be
actually represented as XML attributes or not will be a pragmatic
decision (XML attributes have little expressive power: they are just
strings whithout structure, as Andres noted).

The other way 'round the abstract model will end up tainted with all the
trade-offs you had to do to represent your data in XML. Look at so many
data descriptions in XML out there and you'll know what I mean.

That's why I always say: XML is horrid as a DDL. But no one listens ;-)

I'm an outsider, so just take my opinion with a fist of salt: but I'd
tend ton design first the abstract structure, then have a look at the
JSON representation and _then_ take the final decisions on the mapping
to XML.

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFKOa35Bcgs9XrR2kYRAgwCAJ0S066FfZ6q+l2Lv51/t9/1hUOUBQCfRK0e
OzCEM44nW8tF0g5SPyR+5YY=
=bPwn
-----END PGP SIGNATURE-----


Re: machine-readable explain output

From
Peter Eisentraut
Date:
On Thursday 18 June 2009 06:01:13 tomas@tuxteam.de wrote:
> One of the common pitfalls of XML is that designers think first in terms
> of the XML representation before being clear on the abstract structure
> of what they want to represent

The other aspect is that designing a useful XML format is pretty much hopeless 
without a clear idea about what processing is expected.

Look at HTML and DocBook.  Both of those might have been a result of a 
community effort to "design an XML format to publish text", but they clearly 
have different processing expectations.  And there is a whole infrastructure 
around each of them to make that work (specifications, DTDs, documented 
processing expectations, CSS, stylesheets, etc.).  Also note that neither HTML 
nor DocBook are for example designed so that a renderer can do something 
useful with elements that it doesn't know about.

The current approach to the XML explain format is an attempt to be quite 
literally everything to everyone.  The only concrete processing target that 
has been vaguely described is some kind of presumably visual display in 
pgAdmin.  The simple element-per-plan-node approach could handle that just 
fine, for example.  (The other requirements that have been mentioned are that 
it should be similar to the JSON format and it should read nicely, which have 
some merit, but aren't really going to help what I'm talking about here.)

Also note that for example DocBook and HTML have versions 1, 2, 3, 4, 5.  So I 
suggest that we do it that way: design something small for concrete 
applications, extend it later, but don't expect applications targeting the old 
versions to magically be able to process the new versions with full power.