Thread: machine-readable explain output
Here we go, XML and JSON output. You will need to apply explain_refactor-v4.patch and explain_options-v2.patch first, then apply the two patches attached to this message. http://archives.postgresql.org/pgsql-hackers/2009-06/msg00865.php http://archives.postgresql.org/pgsql-hackers/2009-06/msg00866.php The infrastructure patch applies first and is separated only for ease of reviewing. If the infrastructure patch applied by itself changes any user-visible behavior, it's a bug. The main patch does all the heavy lifting. The syntax is: explain (format xml) ... explain (format json, analyze) ... explain (format text) ... -- same as just plain old explain If you don't like the syntax, please argue about that on the "generic explain options v2" thread. Let's try to use this thread to discuss the output format, about which I spent a good deal of time agonizing. I felt that it was important to keep the XML and JSON output as similar to each other as possible. This has the fairly obvious advantage of reducing code complexity and the somewhat less obvious advantage of avoiding expressing information in ways that are overly tied to the syntax of XML. I think the latter is actually a pretty important point; it's hard to know that you've done something that's actually generic unless you actually go through the exercise of making it apply to two different cases. This code is obviously not completely generic; I did consider the idea that Greg Stark proposed of having some kind of ephemeral internal format with multiple output converters, but I couldn't figure out a way to make it work. Still, I've made a fairly determined effort to minimize the number of places where we switch on the output format. It's still larger than I'd like, but I don't have any good ideas for trimming it down further. There is an argument to be made that machine-readable output formats shouldn't be afraid to print information that isn't displayed in the regular output, but I haven't gone very far down that road in this patch. I lean toward the view that any additional information that someone wants to have in the machine-readable format should also be an available option for the text format, because I think the question of WHAT you want to display and HOW you want to display it are largely orthogonal (hence options are a separate patch, and this patch just uses that infrastructure to implement an option for format). But there may be some exceptions. At any rate, if it's possible, I would like to get at least some of this work committed before I go too much further with it, since this patch stack is already four layers deep and my head may explode if it gets too much deeper. If it's helpful to have any of these patches further decomposed for reviewing purposes, see here, where they are broken out into individual commits: http://git.postgresql.org/gitweb?p=postgresql-rhaas.git;a=shortlog;h=refs/heads/explain_format (It's probably a bad idea to clone this repository as I am updating the patch set by rebasing, but it's useful for browsing.) Comments appreciated... ...Robert
Attachment
On 6/11/09 10:15 PM, Robert Haas wrote: > Here we go, XML and JSON output. > > You will need to apply explain_refactor-v4.patch and > explain_options-v2.patch first, then apply the two patches attached to > this message. Wow, cool. Can this work with auto_explain? That's where I see machine-readable being most useful. -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com
On Fri, Jun 12, 2009 at 11:47 AM, Josh Berkus<josh@agliodbs.com> wrote: > On 6/11/09 10:15 PM, Robert Haas wrote: >> >> Here we go, XML and JSON output. >> >> You will need to apply explain_refactor-v4.patch and >> explain_options-v2.patch first, then apply the two patches attached to >> this message. > > Wow, cool. Can this work with auto_explain? That's where I see > machine-readable being most useful. The patch does touch contrib/auto_explain, but just enough to make it keep working the same way it does now. I don't think it would be too hard to improve on that, though; I might work on it if I get bored, but I'm hoping someone else will be motivated enough to do that part. :-) How would you go about extracting the XML/JSON bits from the rest of what is in the log file? (apologies if this is a question I should already know the answer to) ...Robert
> How would you go about extracting the XML/JSON bits from the rest of > what is in the log file? (apologies if this is a question I should > already know the answer to) If you do CSV output, it's in a field. -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com
Josh Berkus wrote: > >> How would you go about extracting the XML/JSON bits from the rest of >> what is in the log file? (apologies if this is a question I should >> already know the answer to) > > If you do CSV output, it's in a field. > And even if it's not, a well formed XML document would be fairly easy to extract, especially if the root element is well known (e.g. <pg:explain> or some such). And if the patch doesn't produce a well-formed XML doc then it needs work ;-). It might be nice if Robert were to post some samples of the output. Quick question: does the patch use formal methods using, say, the DOM API to build up the XML, or informal methods (like foo.append('<element>'); foo.append(content); foo.append('</element>'); ) As you can tell, I haven't looked over it yet. But I intend to ;-) cheers andrew
On Fri, Jun 12, 2009 at 5:13 PM, Andrew Dunstan<andrew@dunslane.net> wrote: > Josh Berkus wrote: >> >>> How would you go about extracting the XML/JSON bits from the rest of >>> what is in the log file? (apologies if this is a question I should >>> already know the answer to) >> >> If you do CSV output, it's in a field. > > And even if it's not, a well formed XML document would be fairly easy to > extract, especially if the root element is well known (e.g. <pg:explain> or > some such). And if the patch doesn't produce a well-formed XML doc then it > needs work ;-). It might be nice if Robert were to post some samples of the > output. <pgexplain>, as it happens... I could post some samples of the output, but it seems like it might be just as well to let those who are curious try it for themselves. I'd rather get opinions from people who care enough to download & test than from those who are just bikeshedding. :-) > Quick question: does the patch use formal methods using, say, the DOM API to > build up the XML, or informal methods (like foo.append('<element>'); > foo.append(content); foo.append('</element>'); ) > > As you can tell, I haven't looked over it yet. But I intend to ;-) Use the Source, Luke. :-) But, it's informal methods. I don't see a lot of value in doing it the other way, though perhaps I could be convinced otherwise. One thing that's nice about the way it works now is that the only support function it requires is a basic XML-escaping function, which it turns out we already have in the PG sources anyway, though not in a quite usable form (the infrastructure patch deals with the necessary adjustments). So you can explain (format xml) even if you compile without --with-libxml. If you want to see how the actual XML/JSON stuff works, you might want to start with the last patch in the series (explain_format). If you want to commit it, a course of action to which I can give my unbiased endorsement, then you'll want to start with explain_refactor. ...Robert
On Friday 12 June 2009 08:15:17 Robert Haas wrote: > Here we go, XML and JSON output. Could you post some examples of how some plans would look in either format? That would help us judge the particulars.
On Saturday 13 June 2009 01:10:06 Robert Haas wrote: > <pgexplain>, as it happens... I could post some samples of the > output, but it seems like it might be just as well to let those who > are curious try it for themselves. I'd rather get opinions from > people who care enough to download & test than from those who are just > bikeshedding. :-) I recommend, however, that you think about writing a regression test for this, so the interfaces are explicit, and those tweaking them in the future know what they are dealing with. A couple of comments on the specifics of the output: For the JSON format: * Numbers should not be quoted. For the XML format: * Instead of <pgexplain>, use <explain> with an XML namespace declaration. The schema name is missing in either output format. I think that was supposed to be one of the features of this that the objects are unambiguously qualified. I'm not sure I like element names such as <Node-Type>, instead of say <nodetype>, which is more like HTML and DocBook. (Your way might be more like SOAP, I guess.) Also, the result type of an EXPLAIN (format xml) should be type xml, not text. In general, I like this direction very much. There will probably be more tweaks on the output format over time. It's not like the plain EXPLAIN hasn't been tweaked countless times.
On Sat, Jun 13, 2009 at 9:08 AM, Peter Eisentraut<peter_e@gmx.net> wrote: > On Saturday 13 June 2009 01:10:06 Robert Haas wrote: >> <pgexplain>, as it happens... I could post some samples of the >> output, but it seems like it might be just as well to let those who >> are curious try it for themselves. I'd rather get opinions from >> people who care enough to download & test than from those who are just >> bikeshedding. :-) > > I recommend, however, that you think about writing a regression test for this, > so the interfaces are explicit, and those tweaking them in the future know > what they are dealing with. I would like to have something in this area, but Tom didn't think it was workable. http://archives.postgresql.org/message-id/603c8f070904151623ne07d744k615edd4aa669a64a@mail.gmail.com Currently, we don't even have something trivial like "EXPLAIN SELECT 1" in the regression tests, so even if you completely break EXPLAIN so that it core dumps (voice of experience speaking here) make check still passes with flying colors. One feature I'd like to add is an EXPLAIN-option for "COSTS", so that you can say explain (costs off) .... Then we could at least try a couple of simple examples against the build-farm to see whether the issues that Tom is worried about are problems in practice and to what degree. But I'm a little reluctant to develop that until at least some of my existing work is committed, because at present I have no guarantee either that this patch will be accepted or that it won't be extensively modified by the committer, thus creating merge conflicts for me to resolve. However, assuming the infrastructure in the explain_options patch is accepted in something similar to its current form, it should be a very easy patch to write when the time comes. > A couple of comments on the specifics of the output: > > For the JSON format: > > * Numbers should not be quoted. OK, will fix. > For the XML format: > > * Instead of <pgexplain>, use <explain> with an XML namespace declaration. Could you specify this a bit further, like write out exactly what you want it to look like? My XML-fu is not very strong. > The schema name is missing in either output format. I think that was supposed > to be one of the features of this that the objects are unambiguously > qualified. Well, as I said, I'm not sure that this decision should be made based on the selected output format. I think it should be controlled by a separate option that can be used for text, XML, or JSON. Of course, we also don't want to end up with a zillion options. I think maybe the existing VERBOSE option could be pressed into service here. Right now, all it does is print out the output lists for each node, but maybe it could also have the effect of forcing the schema name to be emitted, and any other similarly minor verbosities we run across. There's other weirdness in this area too: when emitting a qual, we table-qualify column names according to a complex heuristic (for scan quals, when the outer plan is non-NULL or it's a subquery scan; for upper quals, when the length of the range-table list is more than 1). Not sure whether anyone cares about this or not. In a similar vein, in report_triggers(), we omit the constraint name if there is a trigger name. All of these seem like fairly good candidates for things that you might want to behave differently if you ask for "VERBOSE". > I'm not sure I like element names such as <Node-Type>, instead of say > <nodetype>, which is more like HTML and DocBook. (Your way might be more like > SOAP, I guess.) I'm not sure I like them either. I mostly did it that way because I wanted to maintain consistency with the text output, which uses labels like "Hash Cond" and "Filter". So I just made the JSON format use those same labels, and for the XML format, since tag names can't contain spaces, I just replaced spaces with dashes. Once I made that decision it seemed like everything else should be consistent, so that's what I did. But we could certainly subject them all to some additional regular transformation if we're so inclined. I'm not sure it's really worth the additional code complexity, but I don't care very much. > Also, the result type of an EXPLAIN (format xml) should be type xml, not text. Seems reasonable. I'll see if I can figure out how to do that. > In general, I like this direction very much. There will probably be more > tweaks on the output format over time. It's not like the plain EXPLAIN hasn't > been tweaked countless times. Cool, thanks for the review. I have no illusions it won't get changed further. In all honesty, I'm most interested in the options syntax. The multiple output formats portion is just a demonstration that you can use the options syntax to enable interesting functionality, but I personally have little use for it. I'm hoping, however, that once we have a standard way to add options, people will propose more options that do interesting things; I have a few ideas myself. ...Robert
--On 13. Juni 2009 15:01:43 -0400 Robert Haas <robertmhaas@gmail.com> wrote: >> Also, the result type of an EXPLAIN (format xml) should be type xml, not >> text. > > Seems reasonable. I'll see if I can figure out how to do that. I suppose it's okay then, that the format is not available when the server isn't build with --with-libxml ? -- Thanks Bernd
Bernd Helmle <mailings@oopsware.de> writes: > --On 13. Juni 2009 15:01:43 -0400 Robert Haas <robertmhaas@gmail.com> wrote: >>> Also, the result type of an EXPLAIN (format xml) should be type xml, not >>> text. >> >> Seems reasonable. I'll see if I can figure out how to do that. > I suppose it's okay then, that the format is not available when the server > isn't build with --with-libxml ? I believe we have things set up so that you can still print "xml" data without libxml configured in. We'd need to be sure casting to text works too, but other than that I don't see an issue here. regards, tom lane
On Sat, Jun 13, 2009 at 6:40 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Bernd Helmle <mailings@oopsware.de> writes: >> --On 13. Juni 2009 15:01:43 -0400 Robert Haas <robertmhaas@gmail.com> wrote: >>>> Also, the result type of an EXPLAIN (format xml) should be type xml, not >>>> text. >>> >>> Seems reasonable. I'll see if I can figure out how to do that. > >> I suppose it's okay then, that the format is not available when the server >> isn't build with --with-libxml ? > > I believe we have things set up so that you can still print "xml" data > without libxml configured in. We'd need to be sure casting to text > works too, but other than that I don't see an issue here. Hmm, I just tried to do this by modifying ExplainResultDesc to use XMLOID rather than TEXTOID when stmt->format == EXPLAIN_FORMAT_XML, and sure enough, explain (format xml) ... fails when --with-libxml is not specified. But maybe that's not the right way to do it - now that I think about it, using that in combination with do_text_output_multiline() seems totally wrong even if we end up deciding not to worry about the output type, since while there are multiple rows when the output is considered as text, there is surely only one row when you look at the whole thing as an XML document. I'm not too sure how to do this though. Help? In any event, considering that EXPLAIN is a utility statement and can't be embedded within a query, I'm not sure what benefit we get out of returning the data as XML rather than text. This doesn't seem likely to change either, based on Tom's comments here. http://archives.postgresql.org/pgsql-hackers/2009-05/msg00969.php ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > In any event, considering that EXPLAIN is a utility statement and > can't be embedded within a query, I'm not sure what benefit we get out > of returning the data as XML rather than text. This doesn't seem > likely to change either, based on Tom's comments here. > http://archives.postgresql.org/pgsql-hackers/2009-05/msg00969.php I think you misinterpreted the point of that example, which is that there already is a way to get the output of EXPLAIN into the system for further processing. Were this not so, we wouldn't be worrying at all what data type it claims to have. But since there is a way, it's important what data type it produces. regards, tom lane
On Sat, Jun 13, 2009 at 7:42 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> In any event, considering that EXPLAIN is a utility statement and >> can't be embedded within a query, I'm not sure what benefit we get out >> of returning the data as XML rather than text. This doesn't seem >> likely to change either, based on Tom's comments here. > >> http://archives.postgresql.org/pgsql-hackers/2009-05/msg00969.php > > I think you misinterpreted the point of that example, which is that > there already is a way to get the output of EXPLAIN into the system > for further processing. Were this not so, we wouldn't be worrying > at all what data type it claims to have. But since there is a way, > it's important what data type it produces. Well, if you get the EXPLAIN output into the system by defining a wrapper function, said wrapper function will return the type that it's defined to return, regardless of what EXPLAIN itself returns, no? I don't have a problem making it return XML; I'm just not exactly sure how to do it. Is it possible to get that working without depending on libxml? How? ...Robert
On Saturday 13 June 2009 22:01:43 Robert Haas wrote: > > * Instead of <pgexplain>, use <explain> with an XML namespace > > declaration. > > Could you specify this a bit further, like write out exactly what you > want it to look like? My XML-fu is not very strong. Just replace your <pgexplain> by <explain xmlns="http://www.postgresql.org/2009/explain"> The actual URI doesn't matter, as long as it is distinguishing. The value I chose here follows conventions used by W3C.
On Saturday 13 June 2009 22:01:43 Robert Haas wrote: > > I recommend, however, that you think about writing a regression test for > > this, so the interfaces are explicit, and those tweaking them in the > > future know what they are dealing with. > > I would like to have something in this area, but Tom didn't think it > was workable. > > http://archives.postgresql.org/message-id/603c8f070904151623ne07d744k615edd >4aa669a64a@mail.gmail.com > > Currently, we don't even have something trivial like "EXPLAIN SELECT > 1" in the regression tests, so even if you completely break EXPLAIN so > that it core dumps (voice of experience speaking here) make check > still passes with flying colors. That post described a scenario where you check whether given a data set and ANALYZE, the optimizer produces a certain plan. I agree that that might be tricky. A regression test for EXPLAIN, however, should primarily check whether the output format is stable. We are planning to offer this as a public interface, after all. You could use faked up statistics and all but one or two plan types turned off, and then the results should be pretty stable. Unless the fundamental cost model changes, but it doesn't do that very often for the simpler plan types anyway. Things to check for would be checking whether all the fields are there, quoted and escaped correctly, and what happens if statistics are missing or corrupted, etc. Or whether you get any output at all, as you say.
On Sunday 14 June 2009 07:27:19 Robert Haas wrote: > On Sat, Jun 13, 2009 at 7:42 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > >> In any event, considering that EXPLAIN is a utility statement and > >> can't be embedded within a query, I'm not sure what benefit we get out > >> of returning the data as XML rather than text. This doesn't seem > >> likely to change either, based on Tom's comments here. > >> > >> http://archives.postgresql.org/pgsql-hackers/2009-05/msg00969.php > > > > I think you misinterpreted the point of that example, which is that > > there already is a way to get the output of EXPLAIN into the system > > for further processing. Were this not so, we wouldn't be worrying > > at all what data type it claims to have. But since there is a way, > > it's important what data type it produces. > > Well, if you get the EXPLAIN output into the system by defining a > wrapper function, said wrapper function will return the type that it's > defined to return, regardless of what EXPLAIN itself returns, no? > > I don't have a problem making it return XML; I'm just not exactly sure > how to do it. Is it possible to get that working without depending on > libxml? How? Even if this doesn't end up being feasible, I feel it's important that the XML and JSON formats return one datum, not one per line. Otherwise a client that wants to do some processing on the result will have to do about three extra steps to get the result usable.
On 6/13/09, Bernd Helmle <mailings@oopsware.de> wrote: > > > --On 13. Juni 2009 15:01:43 -0400 Robert Haas <robertmhaas@gmail.com> wrote: > >>> Also, the result type of an EXPLAIN (format xml) should be type xml, not >>> text. >> >> Seems reasonable. I'll see if I can figure out how to do that. > > I suppose it's okay then, that the format is not available when the server > isn't build with --with-libxml ? I hope not, otherwise the usefulness of the format is significantly reduced (to practically zero) if tools cannot rely on it being available and have to fall back to something else if it's not available. -- Dave Page EnterpriseDB UK: http://www.enterprisedb.com
2009/6/14 Dave Page <dpage@pgadmin.org>: > On 6/13/09, Bernd Helmle <mailings@oopsware.de> wrote: >> >> >> --On 13. Juni 2009 15:01:43 -0400 Robert Haas <robertmhaas@gmail.com> wrote: >> >>>> Also, the result type of an EXPLAIN (format xml) should be type xml, not >>>> text. >>> >>> Seems reasonable. I'll see if I can figure out how to do that. >> >> I suppose it's okay then, that the format is not available when the server >> isn't build with --with-libxml ? > > I hope not, otherwise the usefulness of the format is significantly > reduced (to practically zero) if tools cannot rely on it being > available and have to fall back to something else if it's not > available. > I thing so using --with-libxml is good idea. Is nonsense repeat some necessary xml code like xml escaping and similar. And almost all distributed PostgreSQL binaries are compiled with xml support, so this cannot do some problems. When somebody compile pg without xml support, then he knows what he do. regards Pavel Stehule > -- > Dave Page > EnterpriseDB UK: http://www.enterprisedb.com > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
On 6/14/09, Pavel Stehule <pavel.stehule@gmail.com> wrote: > 2009/6/14 Dave Page <dpage@pgadmin.org>: >> On 6/13/09, Bernd Helmle <mailings@oopsware.de> wrote: >>> >>> >>> --On 13. Juni 2009 15:01:43 -0400 Robert Haas <robertmhaas@gmail.com> >>> wrote: >>> >>>>> Also, the result type of an EXPLAIN (format xml) should be type xml, >>>>> not >>>>> text. >>>> >>>> Seems reasonable. I'll see if I can figure out how to do that. >>> >>> I suppose it's okay then, that the format is not available when the >>> server >>> isn't build with --with-libxml ? >> >> I hope not, otherwise the usefulness of the format is significantly >> reduced (to practically zero) if tools cannot rely on it being >> available and have to fall back to something else if it's not >> available. >> > > I thing so using --with-libxml is good idea. Is nonsense repeat some > necessary xml code like xml escaping and similar. And almost all > distributed PostgreSQL binaries are compiled with xml support, so this > cannot do some problems. When somebody compile pg without xml support, > then he knows what he do. That will mean we never get to use XML explain in pgAdmin. We're not in the business of writing basic features that might work, if the postgres packager enabled an option. We need to be able to rely on such features always being available. -- Dave Page EnterpriseDB UK: http://www.enterprisedb.com
Dave Page wrote: >> I thing so using --with-libxml is good idea. Is nonsense repeat some >> necessary xml code like xml escaping and similar. And almost all >> distributed PostgreSQL binaries are compiled with xml support, so this >> cannot do some problems. When somebody compile pg without xml support, >> then he knows what he do. >> > > That will mean we never get to use XML explain in pgAdmin. We're not > in the business of writing basic features that might work, if the > postgres packager enabled an option. We need to be able to rely on > such features always being available. > > As a matter of curiosity, do we have any idea what platforms don't support libxml2? cheers andrew
Dave Page <dpage@pgadmin.org> writes: > On 6/14/09, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> I thing so using --with-libxml is good idea. > That will mean we never get to use XML explain in pgAdmin. Exactly. We are *not* going to make libxml a required piece of infrastructure, and that means that XML-format explain output will be useless to most client tools if it doesn't work without libxml. regards, tom lane
Andrew Dunstan <andrew@dunslane.net> writes: > As a matter of curiosity, do we have any idea what platforms don't > support libxml2? It's only partially about whether libxml2 is portable enough. A person building Postgres might also have legitimate concerns about how bug-free and/or secure it is. We've already spent nontrivial amounts of time working around libxml bugs; and as for security, google shows at least four CVEs against libxml2 in the past two years, so it's not a negligible risk. I can entirely see people choosing to build without it. regards, tom lane
Peter Eisentraut <peter_e@gmx.net> writes: > A regression test for EXPLAIN, however, should primarily check whether the > output format is stable. We are planning to offer this as a public interface, > after all. You could use faked up statistics and all but one or two plan > types turned off, and then the results should be pretty stable. You'd be surprised :-(. We've found in the past that queries in the regression tests get different plans across different platforms just because of alignment-rule differences (leading to different numbers of rows per page, etc etc). I think that test cases could be chosen to be relatively stable points in the plan space, but it's hopeless to imagine that the low-order digits of cost estimates will be the same across all platforms. regards, tom lane
Robert Haas <robertmhaas@gmail.com> writes: > On Sat, Jun 13, 2009 at 6:40 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> I believe we have things set up so that you can still print "xml" data >> without libxml configured in. �We'd need to be sure casting to text >> works too, but other than that I don't see an issue here. > Hmm, I just tried to do this by modifying ExplainResultDesc to use > XMLOID rather than TEXTOID when stmt->format == EXPLAIN_FORMAT_XML, > and sure enough, explain (format xml) ... fails when --with-libxml is > not specified. That's because the code goes through BuildTupleFromCStrings, which invokes xml_in in this scenario, and xml_in (as opposed to xml_out) does depend on libxml. However, using BuildTupleFromCStrings is wasteful/stupid for *both* text and xml output, so it seems like getting rid of it is the thing to do here. regards, tom lane
On Sun, Jun 14, 2009 at 11:28 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Sat, Jun 13, 2009 at 6:40 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>> I believe we have things set up so that you can still print "xml" data >>> without libxml configured in. We'd need to be sure casting to text >>> works too, but other than that I don't see an issue here. > >> Hmm, I just tried to do this by modifying ExplainResultDesc to use >> XMLOID rather than TEXTOID when stmt->format == EXPLAIN_FORMAT_XML, >> and sure enough, explain (format xml) ... fails when --with-libxml is >> not specified. > > That's because the code goes through BuildTupleFromCStrings, which > invokes xml_in in this scenario, and xml_in (as opposed to xml_out) > does depend on libxml. > > However, using BuildTupleFromCStrings is wasteful/stupid for *both* > text and xml output, so it seems like getting rid of it is the thing > to do here. Makes sense. However, if we just make that change in do_tup_output(), then we'll break the ability to use that function for non-text datatypes. Currently that doesn't look like a problem, because the only clients are ShowGUCConfigOption(), do_text_output_oneline(), and do_text_output_multiline(),
Robert Haas <robertmhaas@gmail.com> writes: > On Sun, Jun 14, 2009 at 11:28 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> However, using BuildTupleFromCStrings is wasteful/stupid for *both* >> text and xml output, so it seems like getting rid of it is the thing >> to do here. > Makes sense. However, if we just make that change in do_tup_output(), > then we'll break the ability to use that function for non-text > datatypes. I'd envision it taking Datums, so it doesn't really matter. However, as you say, specializing it to text only wouldn't be much of a loss. regards, tom lane
On Sun, Jun 14, 2009 at 1:02 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Sun, Jun 14, 2009 at 11:28 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>> However, using BuildTupleFromCStrings is wasteful/stupid for *both* >>> text and xml output, so it seems like getting rid of it is the thing >>> to do here. > >> Makes sense. However, if we just make that change in do_tup_output(), >> then we'll break the ability to use that function for non-text >> datatypes. > > I'd envision it taking Datums, so it doesn't really matter. However, > as you say, specializing it to text only wouldn't be much of a loss. I like the Datum option, so I'll work up a patch for that, unless you want to just do it and spare me the trouble. :-) ...Robert
On Sun, Jun 14, 2009 at 1:04 PM, Robert Haas<robertmhaas@gmail.com> wrote: > On Sun, Jun 14, 2009 at 1:02 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> On Sun, Jun 14, 2009 at 11:28 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>>> However, using BuildTupleFromCStrings is wasteful/stupid for *both* >>>> text and xml output, so it seems like getting rid of it is the thing >>>> to do here. >> >>> Makes sense. However, if we just make that change in do_tup_output(), >>> then we'll break the ability to use that function for non-text >>> datatypes. >> >> I'd envision it taking Datums, so it doesn't really matter. However, >> as you say, specializing it to text only wouldn't be much of a loss. > > I like the Datum option, so I'll work up a patch for that, unless you > want to just do it and spare me the trouble. :-) Here's an attempt. Is this anything like what you had in mind? ...Robert
Attachment
I wrote: > On Sun, Jun 14, 2009 at 1:04 PM, Robert Haas<robertmhaas@gmail.com> wrote: >> On Sun, Jun 14, 2009 at 1:02 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>> Robert Haas <robertmhaas@gmail.com> writes: >>>> On Sun, Jun 14, 2009 at 11:28 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>>>> However, using BuildTupleFromCStrings is wasteful/stupid for *both* >>>>> text and xml output, so it seems like getting rid of it is the thing >>>>> to do here. >>> >>>> Makes sense. However, if we just make that change in do_tup_output(), >>>> then we'll break the ability to use that function for non-text >>>> datatypes. >>> >>> I'd envision it taking Datums, so it doesn't really matter. However, >>> as you say, specializing it to text only wouldn't be much of a loss. >> >> I like the Datum option, so I'll work up a patch for that, unless you >> want to just do it and spare me the trouble. :-) > > Here's an attempt. Is this anything like what you had in mind? Hmm... on further review, I'm thinking this is still a bit wastful, because we don't really need (I think) to call TupleDescGetAttInMetadata from begin_tup_output_tupdesc. But I'm not sure what the best way is to avoid that. Any thoughts? ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > Hmm... on further review, I'm thinking this is still a bit wastful, > because we don't really need (I think) to call > TupleDescGetAttInMetadata from begin_tup_output_tupdesc. But I'm not > sure what the best way is to avoid that. Any thoughts? Er, just don't do it? We shouldn't need it if the function is doing heap_form_tuple directly. regards, tom lane
On Mon, Jun 15, 2009 at 9:51 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Hmm... on further review, I'm thinking this is still a bit wastful, >> because we don't really need (I think) to call >> TupleDescGetAttInMetadata from begin_tup_output_tupdesc. But I'm not >> sure what the best way is to avoid that. Any thoughts? > > Er, just don't do it? We shouldn't need it if the function is doing > heap_form_tuple directly. Oh, I guess that works. I had thought there might be people calling begin_tup_output_tupdesc() who wanted to go on to call BuildTupleFromCStrings(), but it seems that's not the case. In fact, it looks like I can probably rip that member out of TupOutputState altogether. Will update patch. Does this look like what you were thinking otherwise? Thanks, ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > it looks like I can probably rip that member out of TupOutputState > altogether. > Will update patch. Does this look like what you were thinking otherwise? Yeah, that's exactly what I was thinking. regards, tom lane
Hi, On 06/12/2009 07:15 AM, Robert Haas wrote: > If you don't like the syntax, please argue about that on the "generic > explain options v2" thread. Let's try to use this thread to discuss > the output format, about which I spent a good deal of time agonizing. I spent some time playing around with the explain output with various queries. Beside the already raised mild dislike (from Peter Eisentraut I think) of Upper-Case "-" seperated tag-names I found mainly one gripe: <Startup-Cost>1710.98</Startup-Cost> <Total-Cost>1710.98</Total-Cost> <Plan-Rows>72398</Plan-Rows> <Plan-Width>4</Plan-Width> <Actual-Startup-Time>136.595</Actual-Startup-Time> <Actual-Total-Time>136.595</Actual-Total-Time> <Actual-Rows>72398</Actual-Rows> <Actual-Loops>1</Actual-Loops> This is a bit inconsistent. i.e. for the row estimate you use <Plan-Rows/> and for <Startup-Cost/> you dont use the "Plan-" Prefix. While for the 'analyze' generated variables you use the 'Actual-' prefix consistently. One approach would be to have two nodes like: <Plan-Estimates> <Startup-Cost>...</Startup-Cost> ... </Plan-Estimates> <Execution-Cost> <Startup-Cost>...</Startup-Cost> ... </Execution-Cost> This would probably make it easier to write a future proof parser and it also seems semantically sensible. As an aside issue it would perhaps be nice (thinking of an index-suggestion tool) to make it possible for having seperate estimates on <Index-Cond> an <Filter> - In order not to change the format later that perhaps has to be considered here. Perhaps the current structure + some additional tags is also the best here - I just noticed it being a potential issue. Andres
On Tue, Jun 16, 2009 at 12:19 PM, Andres Freund<andres@anarazel.de> wrote: > <Startup-Cost>1710.98</Startup-Cost> > <Total-Cost>1710.98</Total-Cost> > <Plan-Rows>72398</Plan-Rows> > <Plan-Width>4</Plan-Width> > <Actual-Startup-Time>136.595</Actual-Startup-Time> > <Actual-Total-Time>136.595</Actual-Total-Time> > <Actual-Rows>72398</Actual-Rows> > <Actual-Loops>1</Actual-Loops> XML's not really my thing currently but it sure seems strange to me to have *everything* be a separate tag like this. Doesn't XML do attributes too? I would have thought to use child tags like this only for things that have some further structure. I would have expected something like: <join <scan type=sequential source="foo.bar"> <estimates cost-startup=nnn cost-total=nnn rows=nnn width=nnn></> <actual time-startup=nnn time-total=nnnn rows=nnn loops=nnn></> </scan> <scan type=function source="foo.bar($1)"> <parameters> <parameter name="$1" expression="...."></> </parameters> </scan> </join> This would allow something like a graphical explain plan to still make sense of a plan even if it finds a node it doesn't recognize. It would still know generally what to do with a "scan" node or a "join" node even if it is a new type of scan or join. -- greg http://mit.edu/~gsstark/resume.pdf
On 06/16/2009 02:14 PM, Greg Stark wrote: > On Tue, Jun 16, 2009 at 12:19 PM, Andres Freund<andres@anarazel.de> wrote: >> <Startup-Cost>1710.98</Startup-Cost> >> <Total-Cost>1710.98</Total-Cost> >> <Plan-Rows>72398</Plan-Rows> >> <Plan-Width>4</Plan-Width> >> <Actual-Startup-Time>136.595</Actual-Startup-Time> >> <Actual-Total-Time>136.595</Actual-Total-Time> >> <Actual-Rows>72398</Actual-Rows> >> <Actual-Loops>1</Actual-Loops> > > XML's not really my thing currently but it sure seems strange to me to > have *everything* be a separate tag like this. Doesn't XML do > attributes too? I would have thought to use child tags like this only > for things that have some further structure. > I would have expected something like: > > <join > <scan type=sequential source="foo.bar"> > <estimates cost-startup=nnn cost-total=nnn rows=nnn width=nnn></> > <actual time-startup=nnn time-total=nnnn rows=nnn loops=nnn></> > </scan> > <scan type=function source="foo.bar($1)"> > <parameters> > <parameter name="$1" expression="...."></> > </parameters> > </scan> > </join> > > > This would allow something like a graphical explain plan to still make > sense of a plan even if it finds a node it doesn't recognize. It would > still know generally what to do with a "scan" node or a "join" node > even if it is a new type of scan or join. While that also looks sensible the more structured variant makes it easier to integrate additional stats which may not easily be pressed in the 'attribute' format. As a fastly contrived example you could have io statistics over time like: <iostat> <stat time="10" name=pagefault>...</stat> <stat time="20" name=pagefault>...</stat> <stat time="30" name=pagefault>...</stat> </iostat> Something like that would be harder with your variant. Structuring it in tags like suggested above: <Plan-Estimates> <Startup-Cost>...</Startup-Cost> ... </Plan-Estimates> <Execution-Cost> <Startup-Cost>...</Startup-Cost> ... </Execution-Cost> Enables displaying unknown 'scalar' values just like your variant and also allows more structured values. It would be interesting to get somebody having used the old explain in an automated fashion into this discussion... Andres
On Tue, Jun 16, 2009 at 8:53 AM, Andres Freund<andres@anarazel.de> wrote: > On 06/16/2009 02:14 PM, Greg Stark wrote: >> >> On Tue, Jun 16, 2009 at 12:19 PM, Andres Freund<andres@anarazel.de> >> wrote: >>> >>> <Startup-Cost>1710.98</Startup-Cost> >>> <Total-Cost>1710.98</Total-Cost> >>> <Plan-Rows>72398</Plan-Rows> >>> <Plan-Width>4</Plan-Width> >>> <Actual-Startup-Time>136.595</Actual-Startup-Time> >>> <Actual-Total-Time>136.595</Actual-Total-Time> >>> <Actual-Rows>72398</Actual-Rows> >>> <Actual-Loops>1</Actual-Loops> >> >> XML's not really my thing currently but it sure seems strange to me to >> have *everything* be a separate tag like this. Doesn't XML do >> attributes too? I would have thought to use child tags like this only >> for things that have some further structure. > >> I would have expected something like: >> >> <join >> <scan type=sequential source="foo.bar"> >> <estimates cost-startup=nnn cost-total=nnn rows=nnn width=nnn></> >> <actual time-startup=nnn time-total=nnnn rows=nnn loops=nnn></> >> </scan> >> <scan type=function source="foo.bar($1)"> >> <parameters> >> <parameter name="$1" expression="...."></> >> </parameters> >> </scan> >> </join> >> >> >> This would allow something like a graphical explain plan to still make >> sense of a plan even if it finds a node it doesn't recognize. It would >> still know generally what to do with a "scan" node or a "join" node >> even if it is a new type of scan or join. As long as you understand how the current code uses <Plan> and <Plans>, you can do this just as well with the current implementation.Each plan node gets a <Plan>. If there are any plans"under" it, it gets a <Plans> child which contains those. Whether you put the additional details into attributes or other tags is irrelevant. As to why I chose to do it this way, I had a couple of reasons: 1. It didn't seem very wise to go with the approach of trying to do EVERYTHING with attributes. If I did that, then I'd either get really long lines that were not easily readable, or I'd have to write some kind of complicated line wrapping code (which didn't seem to make a lot of sense for a machine-readable format). The current format isn't the most beautiful thing I've ever seen, but you don't need a parser to make sense of it, just a bit of patience. 2. I wanted the JSON output and the XML output to be similar, and that seemed much easier with this design. 3. We have existing precedent for this design pattern in, e.g. table_to_xml http://www.postgresql.org/docs/current/interactive/functions-xml.html > While that also looks sensible the more structured variant makes it easier > to integrate additional stats which may not easily be pressed in the > 'attribute' format. As a fastly contrived example you could have io > statistics over time like: > <iostat> > <stat time="10" name=pagefault>...</stat> > <stat time="20" name=pagefault>...</stat> > <stat time="30" name=pagefault>...</stat> > </iostat> > > Something like that would be harder with your variant. > > Structuring it in tags like suggested above: > <Plan-Estimates> > <Startup-Cost>...</Startup-Cost> > ... > </Plan-Estimates> > <Execution-Cost> > <Startup-Cost>...</Startup-Cost> > ... > </Execution-Cost> > > Enables displaying unknown 'scalar' values just like your variant and also > allows more structured values. > > It would be interesting to get somebody having used the old explain in an > automated fashion into this discussion... Well, one problem with this is that the actual values are not costs, but times, and the estimated values are not times, but costs. The planner estimates the cost of operations on an arbitrary scale where the cost of a sequential page fetch is 1.0. When we measure actual times, they are in milliseconds. There is no point that I can see in making it appear that those are the same thing. Observe the current output: explain analyze select 1; QUERY PLAN ------------------------------------------------------------------------------------Result (cost=0.00..0.01 rows=1 width=0)(actual time=0.005..0.007 rows=1 loops=1)Total runtime: 0.243 ms (2 rows) ...Robert
On 06/16/2009 03:22 PM, Robert Haas wrote: > Well, one problem with this is that the actual values are not costs, > but times, and the estimated values are not times, but costs. The > planner estimates the cost of operations on an arbitrary scale where > the cost of a sequential page fetch is 1.0. When we measure actual > times, they are in milliseconds. There is no point that I can see in > making it appear that those are the same thing. Observe the current > output: Well - the aim was not to make it possible to use the same name for "<plan-startup-cost>" and "<actual-startup-cost>" but to group them in some way - so you can decide in some way (prefix or below a distinct node) if they are related to planning or execution (And thus making it easier to handle unknown tags). That <actual-startup-time/> morphed into <startup-cost/> instead of <startup-time> was just a typo. Another solution would be to rename <Startup-Cost> into <Plan-Startup-Cost> for consistency. But grouping them by some node seems to be a bit more future-proof. Andres
Robert Haas wrote: > 3. We have existing precedent for this design pattern in, e.g. table_to_xml > > http://www.postgresql.org/docs/current/interactive/functions-xml.html > > Tables are flat, explain output is not. If there is a relationship between the items then that needs to be expressed in the XML structure, either by use of child nodes or attributes. Relying on the sequence of nodes, if that's what you're doing, is not a good idea, and will make postprocessing the XML using XSLT, for example, quite a bit harder. (Processing a foo that comes after a bar is possible but not as natural as processing a foo that is a child or attribute of a bar) Anyway, I think what this discussion points out is that we actually need a formal XML Schema for this output. cheers andrew
On 06/16/2009 03:45 PM, Andrew Dunstan wrote:>> 3. We have existing precedent for this design pattern in, e.g.>> table_to_xml>>http://www.postgresql.org/docs/current/interactive/functions-xml.html> Tables are flat, explain output is not. Comparing Greg's approach with Robert's it seems to me that Robert's approach isn't flatter than Greg's - it just relies more on nodes. > If there is a relationship between the items then that needs to be > expressed in the XML structure, either by use of child nodes or > attributes. Relying on the sequence of nodes, if that's what you're > doing, is not a good idea, and will make postprocessing the XML using > XSLT, for example, quite a bit harder. (Processing a foo that comes > after a bar is possible but not as natural as processing a foo that is a > child or attribute of a bar) How would you model something like: <plans> <plan> ... </plan> <plan> ... </plan> ... </plans> otherwise? There are potentially unlimited number of child nodes - AppendNode for example can have any number of them. Sure, you can give each <plan> node a 'offset=' id, but that doesn't buy much. I don't see how that could be much improved by using child-nodes (or even worse attributes). That is as far as I have seen the only place where the format relies on the sequence of nodes. > Anyway, I think what this discussion points out is that we actually need > a formal XML Schema for this output. Agreed. If helpful I can create a schema for the current format. Andres
On Tue, Jun 16, 2009 at 9:45 AM, Andrew Dunstan<andrew@dunslane.net> wrote: > Robert Haas wrote: >> >> 3. We have existing precedent for this design pattern in, e.g. >> table_to_xml >> >> http://www.postgresql.org/docs/current/interactive/functions-xml.html > > Tables are flat, explain output is not. > > If there is a relationship between the items then that needs to be expressed > in the XML structure, either by use of child nodes or attributes. Relying on > the sequence of nodes, if that's what you're doing, is not a good idea, and I'm not doing that. Period, full stop. The discussion was only about attributes vs. child nodes. > Anyway, I think what this discussion points out is that we actually need a > formal XML Schema for this output. Well, I don't know how to write one, and am not terribly interested in learning. Perhaps someone else would be interested? ...Robert
On Tue, Jun 16, 2009 at 10:30 AM, Andres Freund<andres@anarazel.de> wrote: > How would you model something like: > <plans> > <plan> ... </plan> > <plan> ... </plan> > ... > </plans> > otherwise? > > There are potentially unlimited number of child nodes - AppendNode for > example can have any number of them. Sure, you can give each <plan> node a > 'offset=' id, but that doesn't buy much. > I don't see how that could be much improved by using child-nodes (or even > worse attributes). Note that even in this case we DON'T rely on the ordering of the nodes. The inner <plan> nodes have child nodes which contain their relationship to the parent. ...Robert
Hi, On 06/16/2009 04:32 PM, Robert Haas wrote: > On Tue, Jun 16, 2009 at 10:30 AM, Andres Freund<andres@anarazel.de> wrote: >> How would you model something like: >> <plans> >> <plan> ...</plan> >> <plan> ...</plan> >> ... >> </plans> >> otherwise? >> >> There are potentially unlimited number of child nodes - AppendNode for >> example can have any number of them. Sure, you can give each<plan> node a >> 'offset=' id, but that doesn't buy much. >> I don't see how that could be much improved by using child-nodes (or even >> worse attributes). > Note that even in this case we DON'T rely on the ordering of the > nodes. The inner<plan> nodes have child nodes which contain their > relationship to the parent. Not in the case of Append nodes, but I fail to see a problem there, so... Andres
Robert Haas wrote: >> >> If there is a relationship between the items then that needs to be expressed >> in the XML structure, either by use of child nodes or attributes. Relying on >> the sequence of nodes, if that's what you're doing, is not a good idea, and >> > > I'm not doing that. Period, full stop. The discussion was only about > attributes vs. child nodes. > > > OK, I misread something you wrote, which prompted me to say that. Rereading it I realise my error. My apologies. cheers andrew
On Tue, Jun 16, 2009 at 10:59 AM, Andrew Dunstan<andrew@dunslane.net> wrote: > > > Robert Haas wrote: >>> >>> If there is a relationship between the items then that needs to be >>> expressed >>> in the XML structure, either by use of child nodes or attributes. Relying >>> on >>> the sequence of nodes, if that's what you're doing, is not a good idea, >>> and >>> >> >> I'm not doing that. Period, full stop. The discussion was only about >> attributes vs. child nodes. >> >> > > OK, I misread something you wrote, which prompted me to say that. Rereading > it I realise my error. My apologies. No problem, no apologies needed. I guess we do emit nodes like append plans in the same order that they'd be emitted in text mode. Right now we don't emit any additional information beyond putting them in the same order, but I suppose that could be changed if needs be. ...Robert
Andres Freund <andres@anarazel.de> writes: > On 06/16/2009 04:32 PM, Robert Haas wrote: >> Note that even in this case we DON'T rely on the ordering of the >> nodes. The inner<plan> nodes have child nodes which contain their >> relationship to the parent. > Not in the case of Append nodes, but I fail to see a problem there, so... The order of Append child nodes is in fact significant. If this representation loses that information then it needs to be fixed. However, is it really so bad to be relying on node order for this? regards, tom lane
Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > >> On 06/16/2009 04:32 PM, Robert Haas wrote: >> >>> Note that even in this case we DON'T rely on the ordering of the >>> nodes. The inner<plan> nodes have child nodes which contain their >>> relationship to the parent. >>> > > >> Not in the case of Append nodes, but I fail to see a problem there, so... >> > > The order of Append child nodes is in fact significant. If this > representation loses that information then it needs to be fixed. > However, is it really so bad to be relying on node order for this? > > > No, if there is a genuine sequence of items then relying on node order is just fine. My earlier (mistaken) reference was to possibly relying on node order for a non-sequence relationship. cheers andrew
Andres Freund wrote: >> Anyway, I think what this discussion points out is that we actually need >> a formal XML Schema for this output. > Agreed. > > If helpful I can create a schema for the current format. > > That will give us a useful starting point. cheers andrew
On Tue, Jun 16, 2009 at 1:53 PM, Andres Freund<andres@anarazel.de> wrote: > While that also looks sensible the more structured variant makes it easier > to integrate additional stats which may not easily be pressed in the > 'attribute' format. As a fastly contrived example you could have io > statistics over time like: > <iostat> > <stat time="10" name=pagefault>...</stat> > <stat time="20" name=pagefault>...</stat> > <stat time="30" name=pagefault>...</stat> > </iostat> > > Something like that would be harder with your variant. Actually that's exactly the kind of example I had in mind to make easier. I'm picturing adding a new tag, such as <iostats>, or actually I was thinking of <dtrace>. If we have separate tags for all the estimates and actual timings then any tags which come with the <iostat> or <dtrace> option would just get mixed up with the estimates and timing info. Each new module would provide a single tag which would have some attributes and some child tags depending on how much structure it needs. In cases where there's no structure, just a fixed list of scalars like the existing expected and actual stats I don't see any advantage to making each scalar a tag. (There's not much disadvantage except I would have said it was completely unreadable for a human given that you would have pages and pages of output for a significant size plan.) So your plan might look like <scan type=...> <expected cost=...></> <actual time=...></> <iostats> <samples> <sample time=nnn value=nnn></> </samples> </iostats> <dtrace script="foo.d"> <probes> <probe name=foo result=nnn></> <probe name=bar result=nnn></> </probes> </dtrace> That would make it easy for a tool like pgadmin which doesn't know what to do with the iostats to ignore the whole chunk, rather than have to dig through a list of stats some of which come from iostats and some from dtrace and some from the instrumentation and have to figure out which tags are things it can use and which are things it can't. -- Gregory Stark http://mit.edu/~gsstark/resume.pdf
Greg Stark <gsstark@mit.edu> writes: > I'm picturing adding a new tag, such as <iostats>, or actually I was > thinking of <dtrace>. If we have separate tags for all the estimates > and actual timings then any tags which come with the <iostat> or > <dtrace> option would just get mixed up with the estimates and timing > info. FWIW, I like Greg's idea of subdividing the available data this way. I'm no XML guru, so maybe there is a better way to do it --- but a very large part of the reason for doing this at all is to have an extensible format, and part of that IMHO is that client programs should be able to have some rough idea of what things are even when they don't know it exactly. But I'd be just as happy with a naming convention, like <planner:rowcount> versus <actual:rowcount>, etc. I don't know enough about XML usage to understand the benefits and costs of different ways of providing that kind of structure. regards, tom lane
Tom Lane wrote: > But I'd be just as happy with a naming convention, like > <planner:rowcount> versus <actual:rowcount>, etc. I don't know > enough about XML usage to understand the benefits and costs of > different ways of providing that kind of structure. > > FYI, you probably don't want this. the ':' is not just another character, it separates the namespace designator from thelocal name. We probably only want one namespace. You can use '-' or '_' or '.' inside names to give them some structurebeyond XML semantics. cheers andrew
On Tue, Jun 16, 2009 at 12:04 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Greg Stark <gsstark@mit.edu> writes: >> I'm picturing adding a new tag, such as <iostats>, or actually I was >> thinking of <dtrace>. If we have separate tags for all the estimates >> and actual timings then any tags which come with the <iostat> or >> <dtrace> option would just get mixed up with the estimates and timing >> info. > > FWIW, I like Greg's idea of subdividing the available data this way. > I'm no XML guru, so maybe there is a better way to do it --- but a > very large part of the reason for doing this at all is to have an > extensible format, and part of that IMHO is that client programs should > be able to have some rough idea of what things are even when they > don't know it exactly. I like it too, but I'd like to see us come up with a design that allows it to be used for all of the output formats (text, XML, and JSON). I think it we should be looking for a way to allow modules to publish abstract objects like property-value mappings, or lists of strings, rather than thinking strictly in terms of XML. If we have a module called foo that emits property bar with value baz and property bletch with value quux, then in text format we can print: Module Foo: Bar: Bletch Baz: Quux In XML we can print: <Modules> <Module> <Module-Name>Foo</Module-Name> <Bar>Bletch</Bar> <Baz>Quux</Baz> </Module> </Modules> (or any of about 10 reasonable alternatives that are functionally identical) In JSON we can print "Modules" : [ { "Module Name" : "Foo", "Bar": "Bletch", "Baz": "Quux" } ] (or any of about 2 reasonable alternatives that are functionally identical) If we start thinking in terms of "provide an API to insert XML into the XML-format output", we get back to my original complaint: if the only way of getting additional data is to piece through the XML output, then we'll quickly reach the point where users need XSLT and stylesheets to extract the data they care about. I think that's an annoyance that is easily avoidable. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Jun 16, 2009 at 12:04 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> FWIW, I like Greg's idea of subdividing the available data this way. > I like it too, but I'd like to see us come up with a design that > allows it to be used for all of the output formats (text, XML, and > JSON). I think it we should be looking for a way to allow modules to > publish abstract objects like property-value mappings, or lists of > strings, rather than thinking strictly in terms of XML. If we have a > module called foo that emits property bar with value baz and property > bletch with value quux, then ... This seems to be missing the point I was trying to make, which is that a design like that actually offers no leverage at all: if you don't know all about foo to start with, you have no idea what to do with either bar or bletch. You can *parse* the data, since it's in XML or JSON or whatever, but you don't know what it is. The EXPLAIN problem is a fairly constrained universe: there is going to be a tree of plan nodes, there are going to be some static properties of each plan node, and there may or may not be various sorts of estimates and/or measurements attached to each one. What I'm after is that code examining the output can know "oh, this is a measurement" even if it hasn't heard of the particular kind of measurement. As a concrete example of what I'm thinking about, I'd hope that PgAdmin would be able to display a graphical summary of a plan tree, and then pop up measurements associated with one of the nodes when you right-click on that node. To do this, it doesn't necessarily have to know all about each specific measurement that a particular backend version might emit; but it needs to be able to tell which things are measurements. regards, tom lane
On Tue, Jun 16, 2009 at 1:21 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Tue, Jun 16, 2009 at 12:04 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>> FWIW, I like Greg's idea of subdividing the available data this way. > >> I like it too, but I'd like to see us come up with a design that >> allows it to be used for all of the output formats (text, XML, and >> JSON). I think it we should be looking for a way to allow modules to >> publish abstract objects like property-value mappings, or lists of >> strings, rather than thinking strictly in terms of XML. If we have a >> module called foo that emits property bar with value baz and property >> bletch with value quux, then ... > > This seems to be missing the point I was trying to make, which is that > a design like that actually offers no leverage at all: if you don't know > all about foo to start with, you have no idea what to do with either bar > or bletch. You can *parse* the data, since it's in XML or JSON or > whatever, but you don't know what it is. > > The EXPLAIN problem is a fairly constrained universe: there is going to > be a tree of plan nodes, there are going to be some static properties of > each plan node, and there may or may not be various sorts of estimates > and/or measurements attached to each one. What I'm after is that code > examining the output can know "oh, this is a measurement" even if it > hasn't heard of the particular kind of measurement. > > As a concrete example of what I'm thinking about, I'd hope that PgAdmin > would be able to display a graphical summary of a plan tree, and then > pop up measurements associated with one of the nodes when you > right-click on that node. To do this, it doesn't necessarily have to > know all about each specific measurement that a particular backend > version might emit; but it needs to be able to tell which things are > measurements. *scratches head* So you're looking for a way to categorize the data that appear in the output by type, like any given piece of data is either a measurement, an estimate, or a part of the plan structure? It seems to me that with a sufficiently powerful API, add-on modules could emit arbitrary stuff that might not fall into the categories that you've mentioned. For example, there was a previous EXPLAIN XML patch which contained a bunch of code that spit out plans that were considered but not chosen. And there could easily be other kinds of less invasive add-ons that would still want to emit properties that are formatted as text or lists rather than measurements per se. I think it's kind of hopeless to think that a third-party module is going to be able to do much better than to display any unexpected properties whose value is just text and punt any unexpected properties whose value is a complex object (nested tags in XML-parlance, hash in JSON). I have a feeling I'm still missing the point here... ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Jun 16, 2009 at 1:21 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> As a concrete example of what I'm thinking about, I'd hope that PgAdmin >> would be able to display a graphical summary of a plan tree, and then >> pop up measurements associated with one of the nodes when you >> right-click on that node. > It seems to me that with a sufficiently powerful API, add-on modules > could emit arbitrary stuff that might not fall into the categories > that you've mentioned. I don't have a problem with inventing new categories when we need to. What I'm objecting to is using the above to justify flattening the design completely, so that the only way to know anything about a particular datum is to know that type of datum specifically. There is way more structure in EXPLAIN than that, and we should design it accordingly. (Note that any information about rejected plans could not usefully be attached to the plan tree anyway; it'd have to be put in some other child of the topmost node.) > And there could easily be other kinds of > less invasive add-ons that would still want to emit properties that > are formatted as text or lists rather than measurements per se. By "measurement" I did not mean to imply "single number". Text strings or lists could be handled very easily, I think, especially since there are explicit ways to represent those in XML. The main point here is that we have a pretty good idea of what general-purpose client code is likely to want to do with the data, and in a lot of cases that does not translate to having to know each node type explicitly, so long as it can be categorized. regards, tom lane
On Tue, Jun 16, 2009 at 2:12 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > The main point here is that we have a pretty good idea of what > general-purpose client code is likely to want to do with the data, and > in a lot of cases that does not translate to having to know each node > type explicitly, so long as it can be categorized. I agree. I'm just not seeing the need for an *explicit* categorization contained within the data itself. For one thing, AIUI, that's the job of things like an XML Schema, which Andres Freund has already agreed to write, and I would expect that would be of some value to tool-writers, else why are we creating it? I also think scalars and lists are recognizable without any particular additional markup at all, just by introspection of the contents. Even if we do need some kind of additional markup, I'm reluctant to try to design it without some feedback from people writing actual tools about what they find inadequate in the current output. The good news is that if this patch gets committed fairly quickly after the release of 8.4, tool authors should have enough time to discover where any bodies are buried in time to fix them before 8.5. But I'm really unconvinced that any of this minor formatting stuff is going to rise to the level of a real problem. ...Robert
On 06/16/2009 09:51 PM, Robert Haas wrote: > On Tue, Jun 16, 2009 at 2:12 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> The main point here is that we have a pretty good idea of what >> general-purpose client code is likely to want to do with the data, >> and in a lot of cases that does not translate to having to know >> each node type explicitly, so long as it can be categorized. > I agree. I'm just not seeing the need for an *explicit* > categorization contained within the data itself. For one thing, > AIUI, that's the job of things like an XML Schema, which Andres > Freund has already agreed to write, and I would expect that would be > of some value to tool-writers, else why are we creating it? It defines how exactly the output has to look - thats not easily readable out of explain.c - so anything that could be created and validated with that schema should be acceptable by $tool - even if explain may not create it. Just like EBNF or similar for other languages. It does not help categorizing values in planner/execution/whatever categories automatedly by some tool though. I attached a simple relaxng schema - if somebody likes another format that should be generatable out of that (using trang). It surely could use some more work, but I think its detailed enough for now. > I also think scalars and lists are recognizable without any > particular additional markup at all, just by introspection of the > contents. That somewhat defies the usage of a strictly structured format? Perhaps I am misunderstanding you though. On another note it may be interesting to emit the current options to explain in xml/json format - although that depends whether the option syntax will be accepted. Writing the schema I noticed something else I did not like about the current format: <Triggers> <Trigger> <Trigger>Name</Trigger> or: <Constraint>ConstraintName</Constraint> </Trigger> </Triggers> The double usage of "<Trigger/>" seems to be somewhat ugly. Renaming it to <TriggerName>/<ConstraintName> seems to be a good idea - at least when staying at the current tag oriented style. Andres
Attachment
On Tuesday 16 June 2009 16:22:27 Robert Haas wrote: > 1. It didn't seem very wise to go with the approach of trying to do > EVERYTHING with attributes. If I did that, then I'd either get really > long lines that were not easily readable, or I'd have to write some > kind of complicated line wrapping code (which didn't seem to make a > lot of sense for a machine-readable format). The current format isn't > the most beautiful thing I've ever seen, but you don't need a parser > to make sense of it, just a bit of patience. There are obviously a lot of ways to go about defining an XML format, but here is another one of them: A plan is a tree of plan nodes. Each node has some information attached to it, such as row counts and costs. If you consider an XML document to be a tree of element nodes, then this falls into place naturally. Each plan is an element, and all the other information are attributes. With this, visual explain would be completely trivial.
On Wed, Jun 17, 2009 at 10:27 AM, Peter Eisentraut<peter_e@gmx.net> wrote: > On Tuesday 16 June 2009 16:22:27 Robert Haas wrote: >> 1. It didn't seem very wise to go with the approach of trying to do >> EVERYTHING with attributes. If I did that, then I'd either get really >> long lines that were not easily readable, or I'd have to write some >> kind of complicated line wrapping code (which didn't seem to make a >> lot of sense for a machine-readable format). The current format isn't >> the most beautiful thing I've ever seen, but you don't need a parser >> to make sense of it, just a bit of patience. > > There are obviously a lot of ways to go about defining an XML format, but here > is another one of them: > > A plan is a tree of plan nodes. Each node has some information attached to > it, such as row counts and costs. > > If you consider an XML document to be a tree of element nodes, then this falls > into place naturally. Each plan is an element, and all the other information > are attributes. > > With this, visual explain would be completely trivial. So what do you do about things like sort keys and target lists, that the current code outputs as structured lists? ...Robert
On 06/17/2009 04:27 PM, Peter Eisentraut wrote: > On Tuesday 16 June 2009 16:22:27 Robert Haas wrote: >> 1. It didn't seem very wise to go with the approach of trying to do >> EVERYTHING with attributes. If I did that, then I'd either get really >> long lines that were not easily readable, or I'd have to write some >> kind of complicated line wrapping code (which didn't seem to make a >> lot of sense for a machine-readable format). The current format isn't >> the most beautiful thing I've ever seen, but you don't need a parser >> to make sense of it, just a bit of patience. > > There are obviously a lot of ways to go about defining an XML format, but here > is another one of them: > > A plan is a tree of plan nodes. Each node has some information attached to > it, such as row counts and costs. > If you consider an XML document to be a tree of element nodes, then this falls > into place naturally. Each plan is an element, and all the other information > are attributes. So, the only change from the current schema would be to do move all additional information into attributes? > With this, visual explain would be completely trivial. Only that some attributes may need some more structure than a single scalar value. Also that would need extra handling for each attribute to consider if its a information about planning or execution... Andres
On Tuesday 16 June 2009 22:51:37 Robert Haas wrote: > I agree. I'm just not seeing the need for an *explicit* > categorization contained within the data itself. For one thing, AIUI, > that's the job of things like an XML Schema, which Andres Freund has > already agreed to write, and I would expect that would be of some > value to tool-writers, else why are we creating it? A schema will just tell which documents are valid, not what they mean. That is the job of XML ontologies, which are about as pie-in-the-sky as the semantic web that they are supposed to end up building.
On Tuesday 16 June 2009 20:21:21 Tom Lane wrote: > As a concrete example of what I'm thinking about, I'd hope that PgAdmin > would be able to display a graphical summary of a plan tree, and then > pop up measurements associated with one of the nodes when you > right-click on that node. To do this, it doesn't necessarily have to > know all about each specific measurement that a particular backend > version might emit; but it needs to be able to tell which things are > measurements. To do this, you pack all "measurements" into a <measurement> element, and then tools are just told to display those. Really, this isn't much different (or at all different) from designing an extensible tree data structure in any programming language.
On Wed, Jun 17, 2009 at 10:40 AM, Peter Eisentraut<peter_e@gmx.net> wrote: > On Tuesday 16 June 2009 20:21:21 Tom Lane wrote: >> As a concrete example of what I'm thinking about, I'd hope that PgAdmin >> would be able to display a graphical summary of a plan tree, and then >> pop up measurements associated with one of the nodes when you >> right-click on that node. To do this, it doesn't necessarily have to >> know all about each specific measurement that a particular backend >> version might emit; but it needs to be able to tell which things are >> measurements. > > To do this, you pack all "measurements" into a <measurement> element, and then > tools are just told to display those. I think this is markup for the sake of markup. Right now, if we were to add 10 additional options, all they'd need to do is call ExplainPropertyText() and the right stuff would happen. If we go this route, we'll need to worry about getting each property into the right subgroup, and argue about whether the assignment of properties to subgroups is correct or whether we need to rearrange the subgroups (is "sort method" a "measurement"?). Our chances of us not having to change this again in the future are a lot better if we just report the data and let third-party applications worry about categorizing it if they want to. Possibly it would make sense to introduce groups for the portions of the output which are added in response to particular options; for example, we could have a section called "ANALYZE" that contains the data that is only present when ANALYZE is used. But this has the same complicating effect on the code. You'd have to get the explain_tuplesort() stuff into the same sub-node as the analyze times and loop counts, for example, which would require non-trivial restructuring of the existing code for no clear benefit. You'll quickly get into a situation where you print the same information from completely different parts of the code depending on whether or not the output is text format, which is going to make maintaining this a bear. I think the most common use case for this output format is going to be to feed it to an XML parser and use xpath against it, or feed it into a JSON parser and then write things like $plan->{"Actual Rows"} or plan["Actual Rows"], depending on what language you use to process it after you parse it. Or you may have people who iterate over sort keys %$plan and print all the values for which !ref $plan->{$key}. Unnecessary levels of nesting just make the xpath expressions (or perl/python/javascript hash/array dereferences) longer. Changing tags to attributes or visca versa changes which xpath expression you use to get the data you want, but that's about it. ...Robert
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Wed, Jun 17, 2009 at 04:32:51PM +0200, Andres Freund wrote: > On 06/17/2009 04:27 PM, Peter Eisentraut wrote: >> On Tuesday 16 June 2009 16:22:27 Robert Haas wrote: >>> 1. It didn't seem very wise to go with the approach of trying to do >>> EVERYTHING with attributes [...] >> There are obviously a lot of ways to go about defining an XML format, but >> here >> is another one of them: >> >> A plan is a tree of plan nodes. Each node has some information attached >> to >> it, such as row counts and costs. >> If you consider an XML document to be a tree of element nodes, then this >> falls >> into place naturally. Each plan is an element, and all the other >> information >> are attributes. [...] > So, the only change from the current schema would be to do move all > additional information into attributes? > >> With this, visual explain would be completely trivial. > Only that some attributes may need some more structure than a single scalar > value. > > Also that would need extra handling for each attribute to consider if its a > information about planning or execution... One of the common pitfalls of XML is that designers think first in terms of the XML representation before being clear on the abstract structure of what they want to represent. This might have something to do with the overly baroque language (e.g. having two hierarchy "dimensions": tag nesting and attributes, etc.). So when Peter writes about "attributes", I would tend _not_ to read them as "XML attributes" but rather as abstract attributes. Whether they be actually represented as XML attributes or not will be a pragmatic decision (XML attributes have little expressive power: they are just strings whithout structure, as Andres noted). The other way 'round the abstract model will end up tainted with all the trade-offs you had to do to represent your data in XML. Look at so many data descriptions in XML out there and you'll know what I mean. That's why I always say: XML is horrid as a DDL. But no one listens ;-) I'm an outsider, so just take my opinion with a fist of salt: but I'd tend ton design first the abstract structure, then have a look at the JSON representation and _then_ take the final decisions on the mapping to XML. Regards - -- tomás -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFKOa35Bcgs9XrR2kYRAgwCAJ0S066FfZ6q+l2Lv51/t9/1hUOUBQCfRK0e OzCEM44nW8tF0g5SPyR+5YY= =bPwn -----END PGP SIGNATURE-----
On Thursday 18 June 2009 06:01:13 tomas@tuxteam.de wrote: > One of the common pitfalls of XML is that designers think first in terms > of the XML representation before being clear on the abstract structure > of what they want to represent The other aspect is that designing a useful XML format is pretty much hopeless without a clear idea about what processing is expected. Look at HTML and DocBook. Both of those might have been a result of a community effort to "design an XML format to publish text", but they clearly have different processing expectations. And there is a whole infrastructure around each of them to make that work (specifications, DTDs, documented processing expectations, CSS, stylesheets, etc.). Also note that neither HTML nor DocBook are for example designed so that a renderer can do something useful with elements that it doesn't know about. The current approach to the XML explain format is an attempt to be quite literally everything to everyone. The only concrete processing target that has been vaguely described is some kind of presumably visual display in pgAdmin. The simple element-per-plan-node approach could handle that just fine, for example. (The other requirements that have been mentioned are that it should be similar to the JSON format and it should read nicely, which have some merit, but aren't really going to help what I'm talking about here.) Also note that for example DocBook and HTML have versions 1, 2, 3, 4, 5. So I suggest that we do it that way: design something small for concrete applications, extend it later, but don't expect applications targeting the old versions to magically be able to process the new versions with full power.