Thread: Explain XML patch v2

Explain XML patch v2

From
Tom Raney
Date:
This is an update to my EXPLAIN XML patch submitted a few days ago.

I've added a documentation patch and modified some of the code per
comments by Gregory Stark.

Because the main consumer of output generated by this patch will
presumably be a machine, I didn't  clutter up the documentation with
matching XML output for every standard example query.  But, I added
enough to hopefully give the user an idea of what to expect.

Regards,

Tom Raney




Attachment

Re: Explain XML patch v2

From
Gregory Stark
Date:
"Tom Raney" <raneyt@cecs.pdx.edu> writes:

> This is an update to my EXPLAIN XML patch submitted a few days ago.
>
> I've added a documentation patch and modified some of the code per  comments by
> Gregory Stark.

You should update the wiki at

http://wiki.postgresql.org/wiki/CommitFest:2008-07

so any reviewers look at the updated patch.

(I certainly don't see any reason to wait two months for the next commit fest)

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

Re: Explain XML patch v2

From
Peter Eisentraut
Date:
Am Mittwoch, 2. Juli 2008 schrieb Tom Raney:
> This is an update to my EXPLAIN XML patch submitted a few days ago.

Could you explain how you came up with the XML schema design?  I suppose you
just made something up that went along with the existing XML output.

I would like to see more integration with the spirit of the existing XML
functionality.  For example, instead of things like

"<runtime ms=\"%.3f\" />\n"

we ought to be using XML Schema data types for time intervals and so on.

We might also want to use an XML namespace.

Table and index names should be escaped using the existing escape mechanism
for identifiers.  There might also be encoding issues.

It would also be interesting if EXPLAIN could optionally be a function that
returns a datum of type XML, to allow further processing.

Any thoughts on these issues?

Re: Explain XML patch v2

From
David Fetter
Date:
On Wed, Jul 02, 2008 at 05:57:29PM +0200, Peter Eisentraut wrote:
> It would also be interesting if EXPLAIN could optionally be a
> function that returns a datum of type XML, to allow further
> processing.

It would be better to have a function which allows people to plug in
their own serialization.  A JSON or YAML one, for example, would be
much lighter weight on both ends.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Explain XML patch v2

From
"Dave Page"
Date:
On Wed, Jul 2, 2008 at 4:57 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> Am Mittwoch, 2. Juli 2008 schrieb Tom Raney:
>> This is an update to my EXPLAIN XML patch submitted a few days ago.
>
> Could you explain how you came up with the XML schema design?  I suppose you
> just made something up that went along with the existing XML output.

Speaking of schema - I haven't had time to review the patch myself
yet, but does it include schema names for all relations? The current
text output does not (and adding it has been rejected due to
verbosity), but that makes any kind of query tuning or information
gathering tool more or less impossible to write.


--
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com

Re: Explain XML patch v2

From
daveg
Date:
On Wed, Jul 02, 2008 at 09:01:18AM -0700, David Fetter wrote:
> On Wed, Jul 02, 2008 at 05:57:29PM +0200, Peter Eisentraut wrote:
> > It would also be interesting if EXPLAIN could optionally be a
> > function that returns a datum of type XML, to allow further
> > processing.
>
> It would be better to have a function which allows people to plug in
> their own serialization.  A JSON or YAML one, for example, would be
> much lighter weight on both ends.

+1 for either of these.

-dg

--
David Gould       daveg@sonic.net      510 536 1443    510 282 0869
If simplicity worked, the world would be overrun with insects.

Re: Explain XML patch v2

From
Tom Raney
Date:
Peter Eisentraut wrote:
> Am Mittwoch, 2. Juli 2008 schrieb Tom Raney:
>
>> This is an update to my EXPLAIN XML patch submitted a few days ago.
>>
>
> Could you explain how you came up with the XML schema design?  I suppose you
> just made something up that went along with the existing XML output.
>
Yes, it is based on the existing output.
> I would like to see more integration with the spirit of the existing XML
> functionality.  For example, instead of things like
>
> "<runtime ms=\"%.3f\" />\n"
>
> we ought to be using XML Schema data types for time intervals and so on.
>
The DTD provides only rudimentary document validation but it has no
support for type checking.   So, it may make sense to move to the more
rigorous XML Schema.  There is a 'duration' data type that could be used
for the instance listed above.  Or, we could define our own.
> We might also want to use an XML namespace.
>
Taking the 'ms' field listed above:
xmlns="http://www.postgresql.org/v8.4/ms" or something like this?
> Table and index names should be escaped using the existing escape mechanism
> for identifiers.  There might also be encoding issues.
>
That's a good point.  Or, wrap them with CDATA.
> It would also be interesting if EXPLAIN could optionally be a function that
> returns a datum of type XML, to allow further processing.
>
> Any thoughts on these issues?
>
I am in the process of writing a parser of this XML output for the Red
Hat Visual Explain tool.  I want to see what surprises come up during
implementation.



Re: Explain XML patch v2

From
Tom Raney
Date:
Quoting daveg <daveg@sonic.net>:

> On Wed, Jul 02, 2008 at 09:01:18AM -0700, David Fetter wrote:
>> On Wed, Jul 02, 2008 at 05:57:29PM +0200, Peter Eisentraut wrote:
>> > It would also be interesting if EXPLAIN could optionally be a
>> > function that returns a datum of type XML, to allow further
>> > processing.
>>
>> It would be better to have a function which allows people to plug in
>> their own serialization.  A JSON or YAML one, for example, would be
>> much lighter weight on both ends.
>
> +1 for either of these.
>
> -dg
>

So, this leads me to the idea of assembling the EXPLAIN data
internally in an output-neutral data structure.  At the very end of
processing, one decision statement would decide which output plugin to
use for output.  Sprinkling XML print statements throughout the code
(as currently done in the patch) while functional, is not ideal.  And,
the escaping of XML content should ideally be done in the serializer
anyway.

Of course, this realization didn't occur to me until *after* I had
spent a bit of time coding up the patch in its current form.  Oh well.

Thoughts?

Regarding the XML datum, in order to support that, will all users need
to compile with libxml?  Are there any lighter weight solutions to
serialize other than libxml?

-Tom Raney




Re: Explain XML patch v2

From
Peter Eisentraut
Date:
Am Freitag, 4. Juli 2008 schrieb Tom Raney:
> Regarding the XML datum, in order to support that, will all users need  
> to compile with libxml?  Are there any lighter weight solutions to  
> serialize other than libxml?

You can create XML without libxml.

Re: [HACKERS] Explain XML patch v2

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Am Freitag, 4. Juli 2008 schrieb Tom Raney:
>> Regarding the XML datum, in order to support that, will all users need �
>> to compile with libxml? �Are there any lighter weight solutions to �
>> serialize other than libxml?

> You can create XML without libxml.

Seems to me that anyone who wants this feature will probably also want
the existing libxml-based features, so they'll be building with libxml
anyway.  So I'd not be in favor of expending any extra code on a
roll-your-own solution.

            regards, tom lane

Re: Explain XML patch v2

From
Simon Riggs
Date:
On Tue, 2008-07-01 at 21:48 -0700, Tom Raney wrote:
> This is an update to my EXPLAIN XML patch submitted a few days ago.

This is an important patch and you've done well to get it this far.

I have much detailed feedback, but these are just emergent requests, now
I can see and understand what you've done.

* If the patch was implemented as an ExplainOneQuery_hook we would be
able to use this with 8.3 also, which might be interesting. So there's
no real need for this to be a patch on core.

* If I understand, the DTD option inlines the DTD into the resulting XML
document, rather than adding a schema definition?

* The DTD should be in a separate file also, so it can be referred to.
We should give that DTD a permanent URL so we can identify it across the
internet. This is the first time the project has published an XML
DTD/Schema, so we need to think through the right way to publish it. The
DTD should have a version number in it, so when this gets updated in the
future we can validate against the correct one.

* The DTD is regrettably a long, long way from where I need it to be.
The PLAN elements are all unrelated to one another, apart from their
sequence within the XML document and their "indent". That definition can
lead to XML documents that match the DTD yet would never be valid plans,
amongst other problems. For sensible analysis of the SQL we need the DTD
to match the actual structure of nodes in the executor. This requires
major DTD changes, regrettably. The "indent" comes from the nesting of
the nodes, and is not merely a visual feature of the complete plan. IMHO
it is critically important that the XML correctly conveys the structure
of the plan and not just the formatting of the current text output.
Otherwise many doors will be closed to us and this whole project wasted.
I want to be able to write xml queries to retrieve plans that contain a
merge join where both the inner and outer are merge joins, or to easily
compare the structure of two complex queries looking for differences.

* The meaning of the two time attributes is somewhat confused. It is
startuptime and totaltime, not time_start and time_end.

* It looks to me like QUALIFIER alone is not enough, since in some cases
nodes have both an index condition and a filter, for example.

* I've made a number of suggested changes below, but the DTD really
needs to follow the structures in src/include/nodes/plannodes.h
In particular you'll see the recursive structure introduced by the DTD
changes below

  * EXPLAIN has been renamed PLAN
  * PLAN has been renamed NODE
  * COST has been renamed to ESTIMATE
  * ANALYZE has been renamed to ACTUAL
  * OUTPUT, SORT and QUALIFIER have been removed for clarity only

<!ELEMENT plan (estimate, runtime?)>

<!ELEMENT node (table?, index?, estimate, actual?, outerpath?, innerpath?, initpath)>
<!ATTLIST plan\n"
    nodetype CDATA     #REQUIRED>

<!ELEMENT outerpath (node)>
<!ELEMENT innerpath (node)>
<!ELEMENT initpath (node)>

<!ELEMENT estimate EMPTY >

<!ATTLIST estimate
    startupcost CDATA  #REQUIRED
    totalcost CDATA    #REQUIRED
    rows CDATA     #REQUIRED
    width CDATA    #REQUIRED>

<!ELEMENT actual EMPTY >
<!ATTLIST actual\n"
    startuptime CDATA #REQUIRED
    totaltime CDATA #REQUIRED
    rows CDATA #REQUIRED
    loops CDATA #REQUIRED>

* I'd much prefer to define this as a Schema, since that's a bit more
flexible in what we can do, plus we can specify datatypes.

* Based upon some of those issues, I'd suggest that further work on the
DTD/Schema should only happen when a reasonable range of plans have been
accumulated to allow full understanding of the possibilities

I'm sorry I've found so much to say about this, but don't be dissuaded.
The basic structure of the patch is there, we just need to get the
details right also, so we can take full.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


Re: Explain XML patch v2

From
"Dave Page"
Date:
On Fri, Jul 4, 2008 at 5:20 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>
> * If the patch was implemented as an ExplainOneQuery_hook we would be
> able to use this with 8.3 also, which might be interesting. So there's
> no real need for this to be a patch on core.

Would that mean that instead of being an optional format that pgAdmin
could request with appropriate grammar, it would be something that was
only available if the plugin was loaded, and would replace regular
EXPLAIN output? If so, that would be extremely inconvenient, and next
to useless for general purpose utilities.

--
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com

Re: Explain XML patch v2

From
Simon Riggs
Date:
On Fri, 2008-07-04 at 19:38 +0100, Dave Page wrote:
> On Fri, Jul 4, 2008 at 5:20 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> >
> > * If the patch was implemented as an ExplainOneQuery_hook we would be
> > able to use this with 8.3 also, which might be interesting. So there's
> > no real need for this to be a patch on core.
>
> Would that mean that instead of being an optional format that pgAdmin
> could request with appropriate grammar, it would be something that was
> only available if the plugin was loaded, and would replace regular
> EXPLAIN output? If so, that would be extremely inconvenient, and next
> to useless for general purpose utilities.

It can be optional since plugins can add parameters also.

It wouldn't take long to make up a plugin for 8.3 once this patch has
been committed to core for 8.4, so if you're saying you'd definitely
like it in core then I'm OK with that.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


Re: Explain XML patch v2

From
Simon Riggs
Date:
On Fri, 2008-07-04 at 17:20 +0100, Simon Riggs wrote:
> <!ELEMENT plan (estimate, runtime?)>

Sorry, I noticed a few typos in my sample DTD.

<!ELEMENT plan (node+)> with some additional elements for stats

and the attlist below was for the <node> not <plan> element

<!ATTLIST node
        nodetype CDATA     #REQUIRED>


--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


Re: Explain XML patch v2

From
"Dave Page"
Date:
On Sat, Jul 5, 2008 at 10:41 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>
> It can be optional since plugins can add parameters also.

GUCs I assume you mean, not grammar. Unless I'm misreading the code
though, if the plugin is there it will always run instead of the
regular explain code, so presumabiy that's optional as in XML or
nothing, not XML or standard output.

> It wouldn't take long to make up a plugin for 8.3 once this patch has
> been committed to core for 8.4, so if you're saying you'd definitely
> like it in core then I'm OK with that.

If i's always there it's definitely more useful to pgAdmin, and
doesn't require that we instruct users to install more server side
plugin code to use the features they want.

--
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com

Re: Explain XML patch v2

From
Simon Riggs
Date:
On Sat, 2008-07-05 at 16:00 +0100, Dave Page wrote:
> On Sat, Jul 5, 2008 at 10:41 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> >
> > It can be optional since plugins can add parameters also.
>
> GUCs I assume you mean, not grammar. Unless I'm misreading the code
> though, if the plugin is there it will always run instead of the
> regular explain code, so presumabiy that's optional as in XML or
> nothing, not XML or standard output.

You can easily make it switchable between XML and normal.

> > It wouldn't take long to make up a plugin for 8.3 once this patch has
> > been committed to core for 8.4, so if you're saying you'd definitely
> > like it in core then I'm OK with that.
>
> If i's always there it's definitely more useful to pgAdmin, and
> doesn't require that we instruct users to install more server side
> plugin code to use the features they want.

As I said, I'm OK with that.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support