Re: machine-readable explain output v4 - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: machine-readable explain output v4
Date
Msg-id 4A7F59FF.4060302@dunslane.net
Whole thread Raw
In response to Re: machine-readable explain output v4  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: machine-readable explain output v4  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers

Robert Haas wrote:
> The one significant representational choice that I'm aware of having
> made is to use nested tags rather than attributes in the XML format.
> This seems to me to offer several advantages.  First, it's clearly
> impossible to standardize on attributes, because attributes can only
> be text, and it seems to me that if we're going to try to output
> structured data, we want to take that as far as we can, and we have
> attributes (like sort keys) that are lists rather than scalars.  Using
> tags means that they can have substructure when needed.  Second, it
> seems likely to me that people will want to extend explain further in
> the future: indeed, that was the whole point of the explain-options
> patch which was already committed.  That's pretty simple in the
> current design - just add a few more calls to ExplainPropertyText or
> ExplainPropertyList in the appropriate place, and you're done.  I'm
> pretty sure that splitting things up between attributes and nested
> tags would complicate such modifications.
>
>   
>   

In general, in XML one uses an attribute for a named property of an 
object that can only have one value at a time. A classic example is the 
dimensions of an object - it can only have one width and height. 
Children (nested tags, particularly) are used for things it can have an 
arbitrary number of, or things which in turn can have children.  the 
HTML <p> and <body> elements are (respectively) examples of these. 
Generally, attribute values especially should be short - I recently saw 
an example that had an entire image hex encoded in an XML attribute, 
which struck me as just horrible. Enumerations, date and time values, 
booleans, measurements - these are common types of attribute values. 
Extracting a value from an attribute is no more or less difficult than 
from a nested tag, using the XPath query language.

The XML Schema standard is a language for specifying the structure of a 
given XML document type, and while it is undoubtedly complex, it is also 
much more powerful than the older DTD mechanism. I think we should be 
creating (and publishing) an XML Schema specification for any XML 
documents we are producing. There are a number of members of the 
community who are equipped to help produce these.

There is probably a good case for using an explicit namespace with such 
docs. So we might have something like:
   <pg:explain   xmlns:pg="http://www.postgresql.org/xmlspecs/explain/v1.xsd"> ....

BTW, has anyone tried validating the XML at all? I just looked very 
briefly at the patch at 
<http://archives.postgresql.org/pgsql-hackers/2009-07/msg01944.php> and 
I noticed this which makes me suspicious:

+     if (es.format == EXPLAIN_FORMAT_XML)
+         appendStringInfoString(es.str,
+             "<explain xmlns=\"http://www.postgresql.org/2009/explain\"
<http://www.postgresql.org/2009/explain%5C%22>;>\n");


That ";" after the attribute is almost certainly wrong. This is a classic case of what I was talking about a month or
twoago. Building up XML (or any structured doc, really, XML is not special in this regard) by ad hoc methods is
horriblyerror prone. if you don't want to rely on libxml, then I think you need to develop a lightweight abstraction
ratherthan just appending to a StringInfo.
 





cheers

andrew




pgsql-hackers by date:

Previous
From: Petr Jelinek
Date:
Subject: Re: GRANT ON ALL IN schema
Next
From: Andres Freund
Date:
Subject: Re: machine-readable explain output v4