Thread: machine-readable explain output v4
OK, here's the updated version of my machine-readable explain output patch. This needed heavy updating as a result of the changes that Tom asked me to make to the explain options patch, and the further changes he made himself. In addition to any regressions I may have introduced during the rebasing process, there is one definite problem here: in the previous version of this patch, explain (format xml) returned XML data; now, it's back to text. The reason for this regression is that Tom asked me to change ExplainStmt to just carry a list of nodes and to do all the parsing in ExplainQuery. Unfortunately, the TupleDesc is constructed by ExplainResultDesc() which can't trivially be changed to take an ExplainState, because UtilityTupleDescriptor() also wants to call it. We could possibly fix this by a hack similar to the one we already added to GetCommandLogLevel(), but I haven't done that here. ...Robert
Attachment
Hi Robert, Hi all, On Thursday 30 July 2009 05:05:48 Robert Haas wrote: > OK, here's the updated version of my machine-readable explain output > patch. This needed heavy updating as a result of the changes that Tom > asked me to make to the explain options patch, and the further changes > he made himself. In addition to any regressions I may have introduced > during the rebasing process, there is one definite problem here: in > the previous version of this patch, explain (format xml) returned XML > data; now, it's back to text. > The reason for this regression is that Tom asked me to change > ExplainStmt to just carry a list of nodes and to do all the parsing in > ExplainQuery. Unfortunately, the TupleDesc is constructed by > ExplainResultDesc() which can't trivially be changed to take an > ExplainState, because UtilityTupleDescriptor() also wants to call it. > We could possibly fix this by a hack similar to the one we already > added to GetCommandLogLevel(), but I haven't done that here. Hm. I think its cleaner to move the option parsing into its own function - its 3 places parsing options then and probably not going to get less. I am not sure this is cleaner than including the parsed options in ExplainStm though... (possibly in a separate struct to avoid changing copy/equal-funcs everytime) Some more comments on the (new) version of the patch: - The regression tests are gone? - Currently a value scan looks like »Values Scan on "*VALUES*"« What about adding its alias at least in verbose mode? This currently is inconsistent with other scans. Also he output columns of a VALUES scan are named column$N even if names as specified like in AS foo(colname) - why do xml/json contain no time units anymore? (e.g. Total Runtime). Admittedly thats already inconsistent in the current text format... - Code patterns like: if (es->format == EXPLAIN_FORMAT_TEXT) appendStringInfo(es->str, "Total runtime: %.3f ms\n", 1000.0 * totaltime); else if (es->format == EXPLAIN_FORMAT_XML) appendStringInfo(es->str, " <Total-Runtime>%.3f</Total-Runtime>\n", 1000.0* totaltime); else if (es->format == EXPLAIN_FORMAT_JSON) appendStringInfo(es->str, ",\n \"Total runtime\": %.3f", 1000.0 * totaltime); or if (es->format == EXPLAIN_FORMAT_TEXT) appendStringInfo(es->str, " for constraint %s", conname); else if (es->format == EXPLAIN_FORMAT_XML) { appendStringInfoString(es->str, " <Constraint-Name>"); escape_xml(es->str, conname); appendStringInfoString(es->str, "</Constraint-Name>\n"); } else if (es->format == EXPLAIN_FORMAT_JSON) { appendStringInfo(es->str,"\n \"Constraint Name\": "); escape_json(es->str, conname); } possibly could be simplified using ExplainPropertyText or a function accepting a format string. At least for the !EXPLAIN_FORMAT_TEXT this seems simple at multiple places in ExplainOnePlan and report_triggers. On Friday 31 July 2009 23:13:54 Robert Haas wrote: > On Sat, Jul 18, 2009 at 10:29 PM, Andres Freund<andres@anarazel.de> wrote: > > One part where I find the code flow ugly is 'did_boilerplate' in > > report_triggers/its callsites. > > I can see why it is done that way, but its not exactly obvious to read > > when you want to find out how the format looks. > Suggestions? The only idea without building more xml/json infrastructure I have is using a separate stringbuffer inside report_triggers - but thats not much nicer. Thats all I could find right now... Andres
On Sun, Aug 2, 2009 at 12:06 PM, Andres Freund<andres@anarazel.de> wrote: > Hi Robert, Hi all, > > On Thursday 30 July 2009 05:05:48 Robert Haas wrote: >> OK, here's the updated version of my machine-readable explain output >> patch. This needed heavy updating as a result of the changes that Tom >> asked me to make to the explain options patch, and the further changes >> he made himself. In addition to any regressions I may have introduced >> during the rebasing process, there is one definite problem here: in >> the previous version of this patch, explain (format xml) returned XML >> data; now, it's back to text. > >> The reason for this regression is that Tom asked me to change >> ExplainStmt to just carry a list of nodes and to do all the parsing in >> ExplainQuery. Unfortunately, the TupleDesc is constructed by >> ExplainResultDesc() which can't trivially be changed to take an >> ExplainState, because UtilityTupleDescriptor() also wants to call it. >> We could possibly fix this by a hack similar to the one we already >> added to GetCommandLogLevel(), but I haven't done that here. > > Hm. I think its cleaner to move the option parsing into its own function - its > 3 places parsing options then and probably not going to get less. > I am not sure this is cleaner than including the parsed options in ExplainStm > though... (possibly in a separate struct to avoid changing copy/equal-funcs > everytime) Well, this is why we need Tom to weigh in. > Some more comments on the (new) version of the patch: > - The regression tests are gone? Tom added some that look adequate to me to create_index.sql, as a separate commit, so I don't think I need to do this in my patch any more. Maybe some of those examples should be changed to output JSON or XML, though, but I'd rather leave this up to Tom's discretion on commit because I think he has opinions about this and I think my chances of guessing what they are are low. > - Currently a value scan looks like »Values Scan on "*VALUES*"« What about > adding its alias at least in verbose mode? This currently is inconsistent with > other scans. I don't know why this doesn't work, but I think it's unrelated to this patch. > Also he output columns of a VALUES scan are named column$N even > if names as specified like in AS foo(colname) This is consistent with how other types of scans are treated. rhaas=# explain verbose select x,y,z from (select * from pg_class) a(x,y,z); QUERY PLAN ----------------------------------------------------------------------Seq Scan on pg_catalog.pg_class (cost=0.00..8.44 rows=244width=72) Output: pg_class.relname, pg_class.relnamespace, pg_class.reltype (2 rows) > - why do xml/json contain no time units anymore? (e.g. Total Runtime). > Admittedly thats already inconsistent in the current text format... I'm not sure what you mean by "any more". I don't think any version of these patches I ever submitted did otherwise, nor do I think it's particularly valuable. Costs are implicitly in units of PostgreSQL-costing and times are implicitly in units of milliseconds, just as they are in the text format. Changing this would require clients to strip off the trailing 'ms' before converting to a floating-point number, which seems like an irritation with no corresponding benefit. > - Code patterns like: > if (es->format == EXPLAIN_FORMAT_TEXT) > appendStringInfo(es->str, "Total runtime: %.3f ms\n", > 1000.0 * totaltime); > else if (es->format == EXPLAIN_FORMAT_XML) > appendStringInfo(es->str, > " <Total-Runtime>%.3f</Total-Runtime>\n", > 1000.0 * totaltime); > else if (es->format == EXPLAIN_FORMAT_JSON) > appendStringInfo(es->str, ",\n \"Total runtime\" : %.3f", > 1000.0 * totaltime); > or > if (es->format == EXPLAIN_FORMAT_TEXT) > appendStringInfo(es->str, " for constraint %s", conname); > else if (es->format == EXPLAIN_FORMAT_XML) > { > appendStringInfoString(es->str, " <Constraint-Name>"); > escape_xml(es->str, conname); > appendStringInfoString(es->str, "</Constraint-Name>\n"); > } > else if (es->format == EXPLAIN_FORMAT_JSON) > { > appendStringInfo(es->str, "\n \"Constraint Name\": "); > escape_json(es->str, conname); > } > > possibly could be simplified using ExplainPropertyText or a function accepting > a format string. > At least for the !EXPLAIN_FORMAT_TEXT this seems simple at multiple places in > ExplainOnePlan and report_triggers. Well, the whole explain output format is pretty idiosyncratic, and I had to work pretty hard to beat it into submission. I think that it would not be totally trivial to do what you're suggesting here because it would require adding code to manage es->indent outside of ExplainPrintPlan(), which we currently don't. I'm not sure whether that works out to a net win. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Sun, Aug 2, 2009 at 12:06 PM, Andres Freund<andres@anarazel.de> wrote: >> - Currently a value scan looks like �Values Scan on "*VALUES*"� What about >> adding its alias at least in verbose mode? This currently is inconsistent with >> other scans. > I don't know why this doesn't work, but I think it's unrelated to this patch. If you mean something like regression=# explain select * from (values(1)) ss; QUERY PLAN -------------------------------------------------------------Values Scan on "*VALUES*" (cost=0.00..0.01 rows=1 width=4) (1 row) I think the reason is that the alias applies to a SubqueryScan node that later gets optimized away. The VALUES per se doesn't have an alias. regards, tom lane
On Sunday 02 August 2009 23:34:04 Robert Haas wrote: > On Sun, Aug 2, 2009 at 12:06 PM, Andres Freund<andres@anarazel.de> wrote: > > Hi Robert, Hi all, > > > > On Thursday 30 July 2009 05:05:48 Robert Haas wrote: > >> OK, here's the updated version of my machine-readable explain output > >> patch. This needed heavy updating as a result of the changes that Tom > >> asked me to make to the explain options patch, and the further changes > >> he made himself. In addition to any regressions I may have introduced > >> during the rebasing process, there is one definite problem here: in > >> the previous version of this patch, explain (format xml) returned XML > >> data; now, it's back to text. > >> > >> The reason for this regression is that Tom asked me to change > >> ExplainStmt to just carry a list of nodes and to do all the parsing in > >> ExplainQuery. Unfortunately, the TupleDesc is constructed by > >> ExplainResultDesc() which can't trivially be changed to take an > >> ExplainState, because UtilityTupleDescriptor() also wants to call it. > >> We could possibly fix this by a hack similar to the one we already > >> added to GetCommandLogLevel(), but I haven't done that here. > > Hm. I think its cleaner to move the option parsing into its own function > > - its 3 places parsing options then and probably not going to get less. I > > am not sure this is cleaner than including the parsed options in > > ExplainStm though... (possibly in a separate struct to avoid changing > > copy/equal-funcs everytime) > Well, this is why we need Tom to weigh in. Yes. > > Some more comments on the (new) version of the patch: > > - The regression tests are gone? > Tom added some that look adequate to me to create_index.sql, as a > separate commit, so I don't think I need to do this in my patch any > more. Maybe some of those examples should be changed to output JSON > or XML, though, but I'd rather leave this up to Tom's discretion on > commit because I think he has opinions about this and I think my > chances of guessing what they are are low. Yea, I was referring to ones using xml/json. > > - Currently a value scan looks like »Values Scan on "*VALUES*"« What > > about adding its alias at least in verbose mode? This currently is > > inconsistent with other scans. > I don't know why this doesn't work, but I think it's unrelated to this > patch. True. > > Also he output columns of a VALUES scan are named column$N even > > if names as specified like in AS foo(colname) > This is consistent with how other types of scans are treated. > rhaas=# explain verbose select x,y,z from (select * from pg_class) > a(x,y,z); QUERY PLAN > ---------------------------------------------------------------------- > Seq Scan on pg_catalog.pg_class (cost=0.00..8.44 rows=244 width=72) > Output: pg_class.relname, pg_class.relnamespace, pg_class.reltype > (2 rows) This is someone weird considering since using it directly in relations works different: explain (verbose) SELECT * FROM pg_namespace AS f(a,b,c); QUERY PLAN --------------------------------------------------------------------------- Seq Scan on pg_catalog.pg_namespacef (cost=0.00..1.06 rows=6 width=100) Output: a, b, c(2 rows) Not your "guilt" though. Still its rather strange and looks worth fixable. > > - why do xml/json contain no time units anymore? (e.g. Total Runtime). > > Admittedly thats already inconsistent in the current text format... > I'm not sure what you mean by "any more". I don't think any version > of these patches I ever submitted did otherwise, nor do I think it's > particularly valuable. Costs are implicitly in units of > PostgreSQL-costing and times are implicitly in units of milliseconds, > just as they are in the text format. Changing this would require > clients to strip off the trailing 'ms' before converting to a > floating-point number, which seems like an irritation with no > corresponding benefit. I did not think any of your patches did - it was just a difference between the original output and the new formats I just noted - as I said its not even consistent in the text format. > > - Code patterns like: > > if (es->format == EXPLAIN_FORMAT_TEXT) > > appendStringInfo(es->str, "Total runtime: %.3f > > ms\n", 1000.0 * totaltime); else if (es->format == EXPLAIN_FORMAT_XML) > > appendStringInfo(es->str, > > " > > <Total-Runtime>%.3f</Total-Runtime>\n", 1000.0 * totaltime); else if > > (es->format == EXPLAIN_FORMAT_JSON) > > appendStringInfo(es->str, ",\n \"Total > > runtime\" : %.3f", 1000.0 * totaltime); or > > if (es->format == EXPLAIN_FORMAT_TEXT) > > appendStringInfo(es->str, " for constraint > > %s", conname); else if (es->format == EXPLAIN_FORMAT_XML) { > > appendStringInfoString(es->str, " > > <Constraint-Name>"); escape_xml(es->str, conname); > > appendStringInfoString(es->str, > > "</Constraint-Name>\n"); } > > else if (es->format == EXPLAIN_FORMAT_JSON) > > { > > appendStringInfo(es->str, "\n > > \"Constraint Name\": "); escape_json(es->str, conname); > > } > > > > possibly could be simplified using ExplainPropertyText or a function > > accepting a format string. > > At least for the !EXPLAIN_FORMAT_TEXT this seems simple at multiple > > places in ExplainOnePlan and report_triggers. > Well, the whole explain output format is pretty idiosyncratic, and I > had to work pretty hard to beat it into submission. I think that it > would not be totally trivial to do what you're suggesting here because > it would require adding code to manage es->indent outside of > ExplainPrintPlan(), which we currently don't. I'm not sure whether > that works out to a net win. Thats why I suggested doing it for JSON/XML only. E.g. like in the attached patch. The result looks simpler for my eyes. While re-checking this I noticed a newly introduced bug in report_triggers() - in case of a trigger/conname==false "Trigger Name" gets printed twice due to a duplicated else - merge glitch? (fixed in attached patch as well)
Robert Haas <robertmhaas@gmail.com> writes: >>> The reason for this regression is that Tom asked me to change >>> ExplainStmt to just carry a list of nodes and to do all the parsing in >>> ExplainQuery. �Unfortunately, the TupleDesc is constructed by >>> ExplainResultDesc() which can't trivially be changed to take an >>> ExplainState, because UtilityTupleDescriptor() also wants to call it. >>> We could possibly fix this by a hack similar to the one we already >>> added to GetCommandLogLevel(), but I haven't done that here. I don't see anything particularly wrong with having ExplainResultDesc do the same kind of thing GetCommandLogLevel is doing. The alternative is to have it call a function that extracts all the parameters into an ExplainState, but the thing I have against that is that it makes for a larger cross-section of user mistakes that will result in throwing an elog() before we actually reach execution. I didn't want GetCommandLogLevel throwing any errors it doesn't actually have to, because that would interfere with logging of statements that could perfectly well get logged normally. I'm not sure if there are any comparable reasons to not want errors from UtilityTupleDescriptor, but there could well be. >> - The regression tests are gone? > Tom added some that look adequate to me to create_index.sql, as a > separate commit, so I don't think I need to do this in my patch any > more. Maybe some of those examples should be changed to output JSON > or XML, though, but I'd rather leave this up to Tom's discretion on > commit because I think he has opinions about this and I think my > chances of guessing what they are are low. Well, of course the existing tests are not going to exercise XML or JSON output format. Dunno how much we care. I had supposed that XML or JSON would always emit all the fields and leave it to the recipient to suppress what they don't want. If we want to have platform-independent regression tests then we'd need to make the COSTS option effective for XML/JSON format --- do we want that? > I'm not sure what you mean by "any more". I don't think any version > of these patches I ever submitted did otherwise, nor do I think it's > particularly valuable. Costs are implicitly in units of > PostgreSQL-costing and times are implicitly in units of milliseconds, > just as they are in the text format. Changing this would require > clients to strip off the trailing 'ms' before converting to a > floating-point number, which seems like an irritation with no > corresponding benefit. I agree with not labeling these numbers. We definitely can't label the costs with anything useful, and labeling runtimes would be a bit inconsistent. regards, tom lane
On Monday 03 August 2009 01:57:48 Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > > - The regression tests are gone? > > Tom added some that look adequate to me to create_index.sql, as a > > separate commit, so I don't think I need to do this in my patch any > > more. Maybe some of those examples should be changed to output JSON > > or XML, though, but I'd rather leave this up to Tom's discretion on > > commit because I think he has opinions about this and I think my > > chances of guessing what they are are low. > Well, of course the existing tests are not going to exercise XML or > JSON output format. Dunno how much we care. I had supposed that > XML or JSON would always emit all the fields and leave it to the > recipient to suppress what they don't want. If we want to have > platform-independent regression tests then we'd need to make the > COSTS option effective for XML/JSON format --- do we want that? Options such as COSTS do effect XML/JSON right now. While not important for COSTS itself, I think its generally good to do so because a certain option might not be done per default efficiencywise and I don't see a reason to specialcase COSTS. Andres
On Sun, Aug 2, 2009 at 7:57 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Well, of course the existing tests are not going to exercise XML or > JSON output format. Dunno how much we care. I had supposed that > XML or JSON would always emit all the fields and leave it to the > recipient to suppress what they don't want. If we want to have > platform-independent regression tests then we'd need to make the > COSTS option effective for XML/JSON format --- do we want that? Well, as I've said many, many times on these threads, I feel strongly that the choice of output format shouldn't for the most part affect the information which is displayed. Output format should be an option, and the information to display should be an option, and the two should be, as far as possible, separate. So what you're suggesting is the way it works now. http://archives.postgresql.org/pgsql-hackers/2009-05/msg00879.php http://archives.postgresql.org/pgsql-hackers/2009-05/msg00916.php http://archives.postgresql.org/pgsql-hackers/2009-05/msg00917.php http://archives.postgresql.org/pgsql-hackers/2009-05/msg00989.php ...Robert
On Sun, Aug 2, 2009 at 7:57 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >>>> The reason for this regression is that Tom asked me to change >>>> ExplainStmt to just carry a list of nodes and to do all the parsing in >>>> ExplainQuery. Unfortunately, the TupleDesc is constructed by >>>> ExplainResultDesc() which can't trivially be changed to take an >>>> ExplainState, because UtilityTupleDescriptor() also wants to call it. >>>> We could possibly fix this by a hack similar to the one we already >>>> added to GetCommandLogLevel(), but I haven't done that here. > > I don't see anything particularly wrong with having ExplainResultDesc > do the same kind of thing GetCommandLogLevel is doing. After I did this, I thought it would be useful to add a regression test to make sure that it is doing the right thing. So I came up with this: CREATE OR REPLACE FUNCTION test_explain_format(text) RETURNS text AS $$ DECLARE x RECORD; BEGIN EXECUTE 'explain (format ' || $1 || ') select 1' INTO x; RETURN pg_typeof(x."QUERY PLAN"); END $$ LANGUAGE plpgsql; This works the first time you run it in a particular session, but then if change $1 so as to get a different answer, it fails: rhaas=# select test_explain_format('text');test_explain_format ---------------------text (1 row) rhaas=# select test_explain_format('xml'); ERROR: type of "x.QUERY PLAN" does not match that when preparing the plan CONTEXT: PL/pgSQL function "test_explain_format" line 5 at RETURN rhaas=# discard ALL PLANS TEMP rhaas=# discard plans; DISCARD PLANS rhaas=# select test_explain_format('xml'); ERROR: type of "x.QUERY PLAN" does not match that when preparing the plan CONTEXT: PL/pgSQL function "test_explain_format" line 5 at RETURN rhaas=# discard all; DISCARD ALL rhaas=# select test_explain_format('xml'); ERROR: type of "x.QUERY PLAN" does not match that when preparing the plan CONTEXT: PL/pgSQL function "test_explain_format" line 5 at RETURN rhaas=# If I quit psql and start back up again, then it works: rhaas=# select test_explain_format('xml');test_explain_format ---------------------xml (1 row) So I guess that leads me to - (1) How do I make this work? (2) Is it worth making this work? ...Robert
Robert Haas wrote: > On Sun, Aug 2, 2009 at 7:57 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > >> Robert Haas <robertmhaas@gmail.com> writes: >> >>>>> The reason for this regression is that Tom asked me to change >>>>> ExplainStmt to just carry a list of nodes and to do all the parsing in >>>>> ExplainQuery. Unfortunately, the TupleDesc is constructed by >>>>> ExplainResultDesc() which can't trivially be changed to take an >>>>> ExplainState, because UtilityTupleDescriptor() also wants to call it. >>>>> We could possibly fix this by a hack similar to the one we already >>>>> added to GetCommandLogLevel(), but I haven't done that here. >>>>> >> I don't see anything particularly wrong with having ExplainResultDesc >> do the same kind of thing GetCommandLogLevel is doing. >> > > After I did this, I thought it would be useful to add a regression > test to make sure that it is doing the right thing. So I came up with > this: > > CREATE OR REPLACE FUNCTION test_explain_format(text) RETURNS text AS $$ > DECLARE > x RECORD; > BEGIN > EXECUTE 'explain (format ' || $1 || ') select 1' INTO x; > RETURN pg_typeof(x."QUERY PLAN"); > END > $$ LANGUAGE plpgsql; > > This works the first time you run it in a particular session, but then > if change $1 so as to get a different answer, it fails: > > rhaas=# select test_explain_format('text'); > test_explain_format > --------------------- > text > (1 row) > > rhaas=# select test_explain_format('xml'); > ERROR: type of "x.QUERY PLAN" does not match that when preparing the plan > CONTEXT: PL/pgSQL function "test_explain_format" line 5 at RETURN > rhaas=# discard > ALL PLANS TEMP > rhaas=# discard plans; > DISCARD PLANS > rhaas=# select test_explain_format('xml'); > ERROR: type of "x.QUERY PLAN" does not match that when preparing the plan > CONTEXT: PL/pgSQL function "test_explain_format" line 5 at RETURN > rhaas=# discard all; > DISCARD ALL > rhaas=# select test_explain_format('xml'); > ERROR: type of "x.QUERY PLAN" does not match that when preparing the plan > CONTEXT: PL/pgSQL function "test_explain_format" line 5 at RETURN > rhaas=# > > If I quit psql and start back up again, then it works: > > rhaas=# select test_explain_format('xml'); > test_explain_format > --------------------- > xml > (1 row) > > So I guess that leads me to - > > (1) How do I make this work? > (2) Is it worth making this work? You could have the function create an inner function which it then runs and drops. I have had to perform such gymnastics in the past to get around similar problems. And yes, it's ugly as hell. cheers andrew
On Wed, Aug 5, 2009 at 7:20 AM, Andrew Dunstan<andrew@dunslane.net> wrote: > Robert Haas wrote: >> On Sun, Aug 2, 2009 at 7:57 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>> Robert Haas <robertmhaas@gmail.com> writes: >>>>>> The reason for this regression is that Tom asked me to change >>>>>> ExplainStmt to just carry a list of nodes and to do all the parsing in >>>>>> ExplainQuery. Unfortunately, the TupleDesc is constructed by >>>>>> ExplainResultDesc() which can't trivially be changed to take an >>>>>> ExplainState, because UtilityTupleDescriptor() also wants to call it. >>>>>> We could possibly fix this by a hack similar to the one we already >>>>>> added to GetCommandLogLevel(), but I haven't done that here. >>>>>> >>> >>> I don't see anything particularly wrong with having ExplainResultDesc >>> do the same kind of thing GetCommandLogLevel is doing. >>> >> >> After I did this, I thought it would be useful to add a regression >> test to make sure that it is doing the right thing. So I came up with >> this: >> >> CREATE OR REPLACE FUNCTION test_explain_format(text) RETURNS text AS $$ >> DECLARE >> x RECORD; >> BEGIN >> EXECUTE 'explain (format ' || $1 || ') select 1' INTO x; >> RETURN pg_typeof(x."QUERY PLAN"); >> END >> $$ LANGUAGE plpgsql; >> >> This works the first time you run it in a particular session, but then >> if change $1 so as to get a different answer, it fails: >> >> rhaas=# select test_explain_format('text'); >> test_explain_format >> --------------------- >> text >> (1 row) >> >> rhaas=# select test_explain_format('xml'); >> ERROR: type of "x.QUERY PLAN" does not match that when preparing the plan >> CONTEXT: PL/pgSQL function "test_explain_format" line 5 at RETURN >> rhaas=# discard >> ALL PLANS TEMP >> rhaas=# discard plans; >> DISCARD PLANS >> rhaas=# select test_explain_format('xml'); >> ERROR: type of "x.QUERY PLAN" does not match that when preparing the plan >> CONTEXT: PL/pgSQL function "test_explain_format" line 5 at RETURN >> rhaas=# discard all; >> DISCARD ALL >> rhaas=# select test_explain_format('xml'); >> ERROR: type of "x.QUERY PLAN" does not match that when preparing the plan >> CONTEXT: PL/pgSQL function "test_explain_format" line 5 at RETURN >> rhaas=# >> >> If I quit psql and start back up again, then it works: >> >> rhaas=# select test_explain_format('xml'); >> test_explain_format >> --------------------- >> xml >> (1 row) >> >> So I guess that leads me to - >> >> (1) How do I make this work? >> (2) Is it worth making this work? > > > You could have the function create an inner function which it then runs and > drops. I have had to perform such gymnastics in the past to get around > similar problems. And yes, it's ugly as hell. <hurls> Well, I guess I could do it like this: CREATE OR REPLACE FUNCTION test_explain_format() RETURNS text[] AS $$ DECLARE xt RECORD; xx RECORD; xj RECORD; BEGIN EXPLAIN (FORMAT TEXT) SELECT 1 INTO xt; EXPLAIN (FORMAT XML) SELECT 1 INTO xx; EXPLAIN (FORMAT JSON)SELECT 1 INTO xj; RETURN ARRAY[ pg_typeof(xt."QUERY PLAN"), pg_typeof(xx."QUERY PLAN"), pg_typeof(xj."QUERY PLAN") ]; END $$ LANGUAGE plpgsql; Fortunately there is not an unlimited space to probe here... ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > (2) Is it worth making this work? No, I don't think so. The odds of such a test ever showing anything interesting seem minimal. plpgsql's inability to cope with the case would be nice to fix, but I'm not holding my breath for it... regards, tom lane
On Sun, Aug 2, 2009 at 7:29 PM, Andres Freund<andres@anarazel.de> wrote: >> Well, the whole explain output format is pretty idiosyncratic, and I >> had to work pretty hard to beat it into submission. I think that it >> would not be totally trivial to do what you're suggesting here because >> it would require adding code to manage es->indent outside of >> ExplainPrintPlan(), which we currently don't. I'm not sure whether >> that works out to a net win. > Thats why I suggested doing it for JSON/XML only. E.g. like in the attached > patch. The result looks simpler for my eyes. I looked at this some more. I think it's a mess. It's probably right to do what you're suggesting here, but this patch only changes pieces of it without making the whole thing consistent. report_triggers(), for example, kludges a value into es->indent but then ignores it when determining how much to indent <Trigger>, etc. I can't even figure out why this works now, let alone being able to maintain it down the line. I'm working on trying to fix this. ...Robert
On Wed, Aug 5, 2009 at 10:48 PM, Robert Haas<robertmhaas@gmail.com> wrote: > On Sun, Aug 2, 2009 at 7:29 PM, Andres Freund<andres@anarazel.de> wrote: >>> Well, the whole explain output format is pretty idiosyncratic, and I >>> had to work pretty hard to beat it into submission. I think that it >>> would not be totally trivial to do what you're suggesting here because >>> it would require adding code to manage es->indent outside of >>> ExplainPrintPlan(), which we currently don't. I'm not sure whether >>> that works out to a net win. >> Thats why I suggested doing it for JSON/XML only. E.g. like in the attached >> patch. The result looks simpler for my eyes. > > I looked at this some more. I think it's a mess. It's probably right > to do what you're suggesting here, but this patch only changes pieces > of it without making the whole thing consistent. report_triggers(), > for example, kludges a value into es->indent but then ignores it when > determining how much to indent <Trigger>, etc. I can't even figure > out why this works now, let alone being able to maintain it down the > line. > > I'm working on trying to fix this. Revised patch attached. I'm not convinced this is as good as it can be, but I've been looking at this patch for so long that I'm starting to get cross-eyed, and I'd like to Tom at least have a look at this and assess it before we run out of CommitFest. ...Robert
Attachment
Robert Haas <robertmhaas@gmail.com> writes: > Revised patch attached. I'm not convinced this is as good as it can > be, but I've been looking at this patch for so long that I'm starting > to get cross-eyed, and I'd like to Tom at least have a look at this > and assess it before we run out of CommitFest. I'm starting to look at this now. I feel unqualified to opine on the quality of the XML/JSON schema design, and given the utter lack of documentation of what that design is, I'm not sure that anyone who could comment on it has done so. Could we have a spec please? Also, the JSON code seems a bit messy/poorly factorized. Is there a reason for that, or just it's not as mature as the XML code? regards, tom lane
On Sun, Aug 9, 2009 at 3:57 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Revised patch attached. I'm not convinced this is as good as it can >> be, but I've been looking at this patch for so long that I'm starting >> to get cross-eyed, and I'd like to Tom at least have a look at this >> and assess it before we run out of CommitFest. > > I'm starting to look at this now. I feel unqualified to opine on the > quality of the XML/JSON schema design, and given the utter lack of > documentation of what that design is, I'm not sure that anyone who > could comment on it has done so. Could we have a spec please? *scratches head* You're not the first person to make that request, and I'd like to respond to it to well, but I don't really know what to write. Most of the discussion about the XML/JSON output format thus far has been around things like whether we should downcase everything, and even the people offering these comments have mostly labelled them with words to the effect of "I know this is trival but...". I think that the reason for this is that fundamentally explain output is fundamentally a tree, and XML and JSON both have ways of representing a tree with properties hanging off the nodes, and this patch uses those ways. I can't figure out what else there is, so I don't know what I'm explaining why I didn't do. The one significant representational choice that I'm aware of having made is to use nested tags rather than attributes in the XML format. This seems to me to offer several advantages. First, it's clearly impossible to standardize on attributes, because attributes can only be text, and it seems to me that if we're going to try to output structured data, we want to take that as far as we can, and we have attributes (like sort keys) that are lists rather than scalars. Using tags means that they can have substructure when needed. Second, it seems likely to me that people will want to extend explain further in the future: indeed, that was the whole point of the explain-options patch which was already committed. That's pretty simple in the current design - just add a few more calls to ExplainPropertyText or ExplainPropertyList in the appropriate place, and you're done. I'm pretty sure that splitting things up between attributes and nested tags would complicate such modifications. Peter Eisentraut, in an earlier review of this patch, complained about the format as well, saying something along the lines of "this is trying to be all things to all people". I don't want to dismiss that criticism, but neither can I put my finger on the problem. In an ideal world, we'd like to be all things to all people, but it's usually not possible to achieve that in practice. Still, it's not clear to me what need this wouldn't serve. It's possible to generate the text format from the XML or JSON format, so it should be well-suited to graphical presentation of explain output. It's also possible to grope through the output and, say, find the average cost of all your seqscan nodes, or verify the absence of merge joins, or anything of that sort that someone might think that they want to do. In a nutshell, the design is "take all the fields we have now and put XML/JSON markup around them so they're easier to get to". Maybe that's not enough of a design, but I don't have any other ideas. > Also, the JSON code seems a bit messy/poorly factorized. Is there > a reason for that, or just it's not as mature as the XML code? I wrote them together, so it's not a question of code maturity, but I wouldn't rule out me being dumb. I'm open to suggestions... AFAICS, the need to comma-separate list and hash elements is most of the problem. I had thought about replacing es->needs_separator with a list so that we could push/pop elements, but I wasn't totally sure whether that was a good idea. ...Robert
Robert Haas wrote: > The one significant representational choice that I'm aware of having > made is to use nested tags rather than attributes in the XML format. > This seems to me to offer several advantages. First, it's clearly > impossible to standardize on attributes, because attributes can only > be text, and it seems to me that if we're going to try to output > structured data, we want to take that as far as we can, and we have > attributes (like sort keys) that are lists rather than scalars. Using > tags means that they can have substructure when needed. Second, it > seems likely to me that people will want to extend explain further in > the future: indeed, that was the whole point of the explain-options > patch which was already committed. That's pretty simple in the > current design - just add a few more calls to ExplainPropertyText or > ExplainPropertyList in the appropriate place, and you're done. I'm > pretty sure that splitting things up between attributes and nested > tags would complicate such modifications. > > > In general, in XML one uses an attribute for a named property of an object that can only have one value at a time. A classic example is the dimensions of an object - it can only have one width and height. Children (nested tags, particularly) are used for things it can have an arbitrary number of, or things which in turn can have children. the HTML <p> and <body> elements are (respectively) examples of these. Generally, attribute values especially should be short - I recently saw an example that had an entire image hex encoded in an XML attribute, which struck me as just horrible. Enumerations, date and time values, booleans, measurements - these are common types of attribute values. Extracting a value from an attribute is no more or less difficult than from a nested tag, using the XPath query language. The XML Schema standard is a language for specifying the structure of a given XML document type, and while it is undoubtedly complex, it is also much more powerful than the older DTD mechanism. I think we should be creating (and publishing) an XML Schema specification for any XML documents we are producing. There are a number of members of the community who are equipped to help produce these. There is probably a good case for using an explicit namespace with such docs. So we might have something like: <pg:explain xmlns:pg="http://www.postgresql.org/xmlspecs/explain/v1.xsd"> .... BTW, has anyone tried validating the XML at all? I just looked very briefly at the patch at <http://archives.postgresql.org/pgsql-hackers/2009-07/msg01944.php> and I noticed this which makes me suspicious: + if (es.format == EXPLAIN_FORMAT_XML) + appendStringInfoString(es.str, + "<explain xmlns=\"http://www.postgresql.org/2009/explain\" <http://www.postgresql.org/2009/explain%5C%22>;>\n"); That ";" after the attribute is almost certainly wrong. This is a classic case of what I was talking about a month or twoago. Building up XML (or any structured doc, really, XML is not special in this regard) by ad hoc methods is horriblyerror prone. if you don't want to rely on libxml, then I think you need to develop a lightweight abstraction ratherthan just appending to a StringInfo. cheers andrew
On Monday 10 August 2009 01:21:35 Andrew Dunstan wrote: > Robert Haas wrote: > > The one significant representational choice that I'm aware of having > > made is to use nested tags rather than attributes in the XML format. > > This seems to me to offer several advantages. First, it's clearly > > impossible to standardize on attributes, because attributes can only > > be text, and it seems to me that if we're going to try to output > > structured data, we want to take that as far as we can, and we have > > attributes (like sort keys) that are lists rather than scalars. Using > > tags means that they can have substructure when needed. Second, it > > seems likely to me that people will want to extend explain further in > > the future: indeed, that was the whole point of the explain-options > > patch which was already committed. That's pretty simple in the > > current design - just add a few more calls to ExplainPropertyText or > > ExplainPropertyList in the appropriate place, and you're done. I'm > > pretty sure that splitting things up between attributes and nested > > tags would complicate such modifications. > The XML Schema standard is a language for specifying the structure of a > given XML document type, and while it is undoubtedly complex, it is also > much more powerful than the older DTD mechanism. I think we should be > creating (and publishing) an XML Schema specification for any XML > documents we are producing. There are a number of members of the > community who are equipped to help produce these. I produced/mailed a relaxng version for a a bit older version and I plan to refresh and document it once the format seems suitably stable. I am not sure it is yet. If yes, this should not take that long... (Relaxng because you easily can convert it into most other XML schema description languages) > There is probably a good case for using an explicit namespace with such > docs. So we might have something like: > > <pg:explain > xmlns:pg="http://www.postgresql.org/xmlspecs/explain/v1.xsd"> .... > > BTW, has anyone tried validating the XML at all? I just looked very > briefly at the patch at > <http://archives.postgresql.org/pgsql-hackers/2009-07/msg01944.php> and > I noticed this which makes me suspicious: > > + if (es.format == EXPLAIN_FORMAT_XML) > + appendStringInfoString(es.str, > + "<explain xmlns=\"http://www.postgresql.org/2009/explain\" > <http://www.postgresql.org/2009/explain%5C%22>;>\n"); That bug is fixed - as referenced above I wrote a schema and validated it. So, yes, the generated XML was valid at least before the last round of refactoring. And I looked through the output quite a bit so I would surprised if there is such a breakage. > That ";" after the attribute is almost certainly wrong. This is a classic > case of what I was talking about a month or two ago. Building up XML (or > any structured doc, really, XML is not special in this regard) by ad hoc > methods is horribly error prone. if you don't want to rely on libxml, then > I think you need to develop a lightweight abstraction rather than just > appending to a StringInfo. Actually by now a non-insignificant portion already "outsources" this - only some special cases (empty attributes, no newlines wanted, initial element with namespace) do not do this. While it would be possible to add another step inbetween and generate a format neutral tree and generate the different formats out of it I am not sure that this is worthwile. The current text format will need to stay special cased anyway because its far to inconsistent to generate it from anything abstract and I don't see any completely new formats coming (i.e. not just optional parts)? Andres
Andres Freund wrote: >> BTW, has anyone tried validating the XML at all? I just looked very >> briefly at the patch at >> <http://archives.postgresql.org/pgsql-hackers/2009-07/msg01944.php> and >> I noticed this which makes me suspicious: >> >> + if (es.format == EXPLAIN_FORMAT_XML) >> + appendStringInfoString(es.str, >> + "<explain xmlns=\"http://www.postgresql.org/2009/explain\" >> <http://www.postgresql.org/2009/explain%5C%22>;>\n"); >> > That bug is fixed - as referenced above I wrote a schema and validated it. So, > yes, the generated XML was valid at least before the last round of > refactoring. And I looked through the output quite a bit so I would surprised > if there is such a breakage. > then someone needs to update the commitfest page with the lastest patch. That's the link I followed. > > > > Hmm. I wonder how i got the link to an old version of the patch. Anyway, I'm glad it's fixed. cheers andrew
Andrew Dunstan wrote: > > > Andres Freund wrote: >>> BTW, has anyone tried validating the XML at all? I just looked very >>> briefly at the patch at >>> <http://archives.postgresql.org/pgsql-hackers/2009-07/msg01944.php> and >>> I noticed this which makes me suspicious: >>> >>> + if (es.format == EXPLAIN_FORMAT_XML) >>> + appendStringInfoString(es.str, >>> + "<explain >>> xmlns=\"http://www.postgresql.org/2009/explain\" >>> <http://www.postgresql.org/2009/explain%5C%22>;>\n"); >>> >> That bug is fixed - as referenced above I wrote a schema and >> validated it. So, yes, the generated XML was valid at least before >> the last round of refactoring. And I looked through the output quite >> a bit so I would surprised if there is such a breakage. >> then someone needs to update the commitfest page with the lastest >> patch. That's the link I followed. >> >> >> > > Hmm. I wonder how i got the link to an old version of the patch. > > Anyway, I'm glad it's fixed. I takle it back. It's still there at <http://archives.postgresql.org/pgsql-hackers/2009-08/msg00485.php> posted 3 days ago. cheers andrew
Andres Freund <andres@anarazel.de> writes: > On Monday 10 August 2009 01:21:35 Andrew Dunstan wrote: >> That ";" after the attribute is almost certainly wrong. This is a classic >> case of what I was talking about a month or two ago. Building up XML (or >> any structured doc, really, XML is not special in this regard) by ad hoc >> methods is horribly error prone. if you don't want to rely on libxml, then >> I think you need to develop a lightweight abstraction rather than just >> appending to a StringInfo. > While it would be possible to add another step inbetween and generate a format > neutral tree and generate the different formats out of it I am not sure that > this is worthwile. > The current text format will need to stay special cased anyway because its far > to inconsistent to generate it from anything abstract and I don't see any > completely new formats coming (i.e. not just optional parts)? The patch as-submitted does have a lightweight abstraction layer of sorts, but the main code line feels free to ignore that and hack around it. I agree that we can't avoid special-casing the text output, but I'm trying to improve the encapsulation otherwise. What I've got at the moment is attached. I'd be interested in comments on the grouping notion in particular --- I reverse-engineered that out of what the code was doing, and I'm sure it could use improvement. regards, tom lane /** Explain a property, such as sort keys or targets, that takes the form of* a list of unlabeled items. "data" is a listof C strings.*/ static void ExplainPropertyList(const char *qlabel, List *data, ExplainState *es) {ListCell *lc;bool first = true; switch (es->format){ case EXPLAIN_FORMAT_TEXT: appendStringInfoSpaces(es->str, es->indent * 2); appendStringInfo(es->str," %s: ", qlabel); foreach(lc, data) { if (!first) appendStringInfoString(es->str,", "); appendStringInfoString(es->str, (const char *) lfirst(lc)); first= false; } appendStringInfoChar(es->str, '\n'); break; case EXPLAIN_FORMAT_XML: ExplainXMLTag(qlabel, X_OPENING, es); foreach(lc, data) { char *str; appendStringInfoSpaces(es->str, es->indent * 2 + 2); appendStringInfoString(es->str, "<Item>"); str = escape_xml((const char *) lfirst(lc)); appendStringInfoString(es->str, str); pfree(str); appendStringInfoString(es->str, "</Item>\n"); } ExplainXMLTag(qlabel, X_CLOSING, es); break; case EXPLAIN_FORMAT_JSON: ExplainJSONLineEnding(es); appendStringInfoSpaces(es->str, es->indent * 2); escape_json(es->str, qlabel); appendStringInfoString(es->str, ": ["); foreach(lc, data) { if (!first) appendStringInfoString(es->str, ", "); escape_json(es->str, (const char *) lfirst(lc)); first = false; } appendStringInfoChar(es->str, ']'); break;} } #define ExplainPropertyText(qlabel, value, es) \ExplainProperty(qlabel, value, false, es) /** Explain a simple property.** If "numeric" is true, the value is a number (or other value that* doesn't need quoting inJSON).*/ static void ExplainProperty(const char *qlabel, const char *value, bool numeric, ExplainState *es) {switch (es->format){ case EXPLAIN_FORMAT_TEXT: appendStringInfoSpaces(es->str, es->indent * 2); appendStringInfo(es->str," %s: %s\n", qlabel, value); break; case EXPLAIN_FORMAT_XML: { char *str; appendStringInfoSpaces(es->str, es->indent * 2); ExplainXMLTag(qlabel, X_OPENING | X_NOWHITESPACE,es); str = escape_xml(value); appendStringInfoString(es->str, str); pfree(str); ExplainXMLTag(qlabel, X_CLOSING | X_NOWHITESPACE, es); appendStringInfoChar(es->str, '\n'); } break; case EXPLAIN_FORMAT_JSON: ExplainJSONLineEnding(es); appendStringInfoSpaces(es->str, es->indent * 2); escape_json(es->str, qlabel); appendStringInfoString(es->str, ": "); if (numeric) appendStringInfoString(es->str,value); else escape_json(es->str, value); break;} } /** Explain an integer-valued property.*/ static void ExplainPropertyInteger(const char *qlabel, int value, ExplainState *es) {char buf[32]; snprintf(buf, sizeof(buf), "%d", value);ExplainProperty(qlabel, buf, true, es); } /** Explain a long-integer-valued property.*/ static void ExplainPropertyLong(const char *qlabel, long value, ExplainState *es) {char buf[32]; snprintf(buf, sizeof(buf), "%ld", value);ExplainProperty(qlabel, buf, true, es); } /** Explain a float-valued property, using the specified number of* fractional digits.*/ static void ExplainPropertyFloat(const char *qlabel, double value, int ndigits, ExplainState *es) {char buf[256]; snprintf(buf, sizeof(buf), "%.*f", ndigits, value);ExplainProperty(qlabel, buf, true, es); } /** Open a group of related objects.** objtype is the type of the group object, labelname is its label within* a containingobject (if any).** If labeled is true, the group members will be labeled properties,* while if it's false, they'llbe unlabeled objects.*/ static void ExplainOpenGroup(const char *objtype, const char *labelname, bool labeled, ExplainState *es) {switch (es->format){ case EXPLAIN_FORMAT_TEXT: /* nothing to do */ break; case EXPLAIN_FORMAT_XML: ExplainXMLTag(objtype, X_OPENING, es); break; case EXPLAIN_FORMAT_JSON: ExplainJSONLineEnding(es); appendStringInfoSpaces(es->str, 2 * es->indent); if (labelname) { escape_json(es->str, labelname); appendStringInfoString(es->str, ": "); } appendStringInfoChar(es->str, labeled ? '{' : '['); /* * In JSON mode, the grouping_stack is an integer list. 0 means * we've emitted nothing at thisgrouping level, 1 means we've * emitted something (and so the next item needs a comma). * See ExplainJSONLineEnding(). */ es->grouping_stack = lcons_int(0, es->grouping_stack); break;}es->indent++; } /** Close a group of related objects.* Parameters must match the corresponding ExplainOpenGroup call.*/ static void ExplainCloseGroup(const char *objtype, const char *labelname, bool labeled, ExplainState *es) {es->indent--;switch (es->format){ case EXPLAIN_FORMAT_TEXT: /* nothing to do */ break; case EXPLAIN_FORMAT_XML: ExplainXMLTag(objtype, X_CLOSING, es); break; case EXPLAIN_FORMAT_JSON: appendStringInfoChar(es->str, '\n'); appendStringInfoSpaces(es->str, 2 * es->indent); appendStringInfoChar(es->str, labeled ? '}' : ']'); es->grouping_stack = list_delete_first(es->grouping_stack); break;} } /** Emit opening or closing XML tag.** "flags" must contain X_OPENING or X_CLOSING, which optionally can be* OR'ed with X_NOWHITESPACEto suppress the whitespace we'd normally add.** XML tag names can't contain white space, so we replace anyspaces in* "tagname" with dashes.*/ static void ExplainXMLTag(const char *tagname, int flags, ExplainState *es) {const char *s; if ((flags & X_NOWHITESPACE) == 0) appendStringInfoSpaces(es->str, 2 * es->indent);appendStringInfoCharMacro(es->str,'<');if ((flags & X_CLOSING) != 0) appendStringInfoCharMacro(es->str, '/');for(s = tagname; *s; s++) appendStringInfoCharMacro(es->str, (*s == ' ') ? '-' : *s);appendStringInfoCharMacro(es->str,'>');if ((flags & X_NOWHITESPACE) == 0) appendStringInfoCharMacro(es->str, '\n'); } /** Emit a JSON line ending.** JSON requires a comma after each property but the last. To facilitate this,* in JSON mode,the text emitted for each property begins just prior to the* preceding line-break (and comma, if applicable).*/ static void ExplainJSONLineEnding(ExplainState *es) {Assert(es->format == EXPLAIN_FORMAT_JSON);if (linitial_int(es->grouping_stack) != 0) appendStringInfoChar(es->str, ',');else linitial_int(es->grouping_stack) = 1;appendStringInfoChar(es->str, '\n'); } /** Produce a JSON string literal, properly escaping characters in the text.*/ static void escape_json(StringInfo buf, const char *str) {const char *p; appendStringInfoCharMacro(buf, '\"');for (p = str; *p; p++){ switch (*p) { case '\b': appendStringInfoString(buf,"\\b"); break; case '\f': appendStringInfoString(buf, "\\f"); break; case '\n': appendStringInfoString(buf, "\\n"); break; case '\r': appendStringInfoString(buf, "\\r"); break; case '\t': appendStringInfoString(buf, "\\t"); break; case '"': appendStringInfoString(buf, "\\\""); break; case '\\': appendStringInfoString(buf, "\\\\"); break; default: if ((unsigned char) *p< ' ') appendStringInfo(buf, "\\u%04x", (int) *p); else appendStringInfoCharMacro(buf,*p); break; }}appendStringInfoCharMacro(buf, '\"'); }
On Sun, Aug 9, 2009 at 8:48 PM, Andrew Dunstan<andrew@dunslane.net> wrote: > > > Andrew Dunstan wrote: >> >> >> Andres Freund wrote: >>>> >>>> BTW, has anyone tried validating the XML at all? I just looked very >>>> briefly at the patch at >>>> <http://archives.postgresql.org/pgsql-hackers/2009-07/msg01944.php> and >>>> I noticed this which makes me suspicious: >>>> >>>> + if (es.format == EXPLAIN_FORMAT_XML) >>>> + appendStringInfoString(es.str, >>>> + "<explain xmlns=\"http://www.postgresql.org/2009/explain\" >>>> <http://www.postgresql.org/2009/explain%5C%22>;>\n"); >>>> >>> >>> That bug is fixed - as referenced above I wrote a schema and validated >>> it. So, yes, the generated XML was valid at least before the last round of >>> refactoring. And I looked through the output quite a bit so I would >>> surprised if there is such a breakage. >>> then someone needs to update the commitfest page with the lastest patch. >>> That's the link I followed. >>> >>> >>> >> >> Hmm. I wonder how i got the link to an old version of the patch. >> >> Anyway, I'm glad it's fixed. > > > I takle it back. It's still there at > <http://archives.postgresql.org/pgsql-hackers/2009-08/msg00485.php> posted 3 > days ago. What the hell? I have every version of that patch I've ever submitted in ~/patch/explain-as-submitted, and that extra semicolon is not there in any of them. Furthermore, when I open up the attachment from my sent mail, the semicolon isn't there either. Yet I see it at the link you provided just as clearly as you do. Is there a bug in the archives code??? ...Robert
On Monday 10 August 2009 02:48:29 Andrew Dunstan wrote: > Andrew Dunstan wrote: > > Andres Freund wrote: > >>> BTW, has anyone tried validating the XML at all? I just looked very > >>> briefly at the patch at > >>> <http://archives.postgresql.org/pgsql-hackers/2009-07/msg01944.php> and > >>> I noticed this which makes me suspicious: > >>> > >>> + if (es.format == EXPLAIN_FORMAT_XML) > >>> + appendStringInfoString(es.str, > >>> + "<explain > >>> xmlns=\"http://www.postgresql.org/2009/explain\" > >>> <http://www.postgresql.org/2009/explain%5C%22>;>\n"); > >> > >> That bug is fixed - as referenced above I wrote a schema and > >> validated it. So, yes, the generated XML was valid at least before > >> the last round of refactoring. And I looked through the output quite > >> a bit so I would surprised if there is such a breakage. > >> then someone needs to update the commitfest page with the lastest > >> patch. That's the link I followed. > > > > Hmm. I wonder how i got the link to an old version of the patch. > > > > Anyway, I'm glad it's fixed. > > I takle it back. It's still there at > <http://archives.postgresql.org/pgsql-hackers/2009-08/msg00485.php> > posted 3 days ago. That seems to be a failure of the mail interface? I neither see it when opening the attachement manually nor in the git repository? Andres
Andrew Dunstan <andrew@dunslane.net> writes: > I takle it back. It's still there at > <http://archives.postgresql.org/pgsql-hackers/2009-08/msg00485.php> > posted 3 days ago. Hmm, I think the archive website must be mangling that somehow. What I have in the code I'm reviewing is if (es.format == EXPLAIN_FORMAT_XML) appendStringInfoString(es.str, "<explain xmlns=\"http://www.postgresql.org/2009/explain\">\n"); I was planning to complain about the format of this URL --- shouldn't it be more like http://www.postgresql.org/explain/v1 ? --- but there's no semicolon. regards, tom lane
On Sun, Aug 9, 2009 at 9:03 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> I takle it back. It's still there at >> <http://archives.postgresql.org/pgsql-hackers/2009-08/msg00485.php> >> posted 3 days ago. > > Hmm, I think the archive website must be mangling that somehow. > What I have in the code I'm reviewing is > > if (es.format == EXPLAIN_FORMAT_XML) > appendStringInfoString(es.str, > "<explain xmlns=\"http://www.postgresql.org/2009/explain\">\n"); > > I was planning to complain about the format of this URL --- shouldn't it > be more like http://www.postgresql.org/explain/v1 ? --- but there's > no semicolon. Peter Eisentraut suggested it. Take it up with him because I don't care. :-) ...Robert
On Sun, Aug 9, 2009 at 8:53 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Andres Freund <andres@anarazel.de> writes: >> On Monday 10 August 2009 01:21:35 Andrew Dunstan wrote: >>> That ";" after the attribute is almost certainly wrong. This is a classic >>> case of what I was talking about a month or two ago. Building up XML (or >>> any structured doc, really, XML is not special in this regard) by ad hoc >>> methods is horribly error prone. if you don't want to rely on libxml, then >>> I think you need to develop a lightweight abstraction rather than just >>> appending to a StringInfo. > >> While it would be possible to add another step inbetween and generate a format >> neutral tree and generate the different formats out of it I am not sure that >> this is worthwile. >> The current text format will need to stay special cased anyway because its far >> to inconsistent to generate it from anything abstract and I don't see any >> completely new formats coming (i.e. not just optional parts)? > > The patch as-submitted does have a lightweight abstraction layer of > sorts, but the main code line feels free to ignore that and hack around > it. I agree that we can't avoid special-casing the text output, but > I'm trying to improve the encapsulation otherwise. What I've got at the > moment is attached. I'd be interested in comments on the grouping > notion in particular --- I reverse-engineered that out of what the code > was doing, and I'm sure it could use improvement. I haven't tested it but it looks pretty good to me. I probably should have done something like this to begin with, but I fell into the trap of being too timid about introducing new concepts. One subtle point that isn't documented and probably should be is that JSON can't support a container that behaves partly like a list and partly like a hash, as XML can. So for example in XML a <Plan> tag could have children like <Startup-Cost> (one each) and could also have its inner, outer, and sub-plans in there as <Plan> tags right under the parent <Plan>. I'm not sure this would be good design anyway, but it COULD be done. In JSON, this will crash and burn, because the container is either an array (which precludes labelling the elements) or a hash (which precludes duplicates). I've avoided this problem by introducing an additional layer of grouping whenever this situation comes up; for symmetry it exists in both output formats. So for example in the above-mentioned case the <Plan> has a child called <Plans> which in turn contains each <Plan> that belongs under the parent; in JSON, this maps nicely to a property called "Plans" which points to an array of hashes, each of which represents a plan. We should probably have a long comment somewhere in explain.c explaining all of this. It's the sort of thing that you figure out you need to have when you're writing the code, but it might not be too obvious reading the code after the fact. It's also a reason why I'm really glad I decided to write two alternative output formats; otherwise, I fear that we would have blundered into a bunch of XML-isms that wouldn't have been translatable into anything else. ...Robert
On Monday 10 August 2009 02:53:16 Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On Monday 10 August 2009 01:21:35 Andrew Dunstan wrote: > >> That ";" after the attribute is almost certainly wrong. This is a > >> classic case of what I was talking about a month or two ago. Building up > >> XML (or any structured doc, really, XML is not special in this regard) > >> by ad hoc methods is horribly error prone. if you don't want to rely on > >> libxml, then I think you need to develop a lightweight abstraction > >> rather than just appending to a StringInfo. > > > > While it would be possible to add another step inbetween and generate a > > format neutral tree and generate the different formats out of it I am not > > sure that this is worthwile. > > The current text format will need to stay special cased anyway because > > its far to inconsistent to generate it from anything abstract and I don't > > see any completely new formats coming (i.e. not just optional parts)? > > The patch as-submitted does have a lightweight abstraction layer of > sorts, but the main code line feels free to ignore that and hack around > it. I agree that we can't avoid special-casing the text output, but > I'm trying to improve the encapsulation otherwise. What I've got at the > moment is attached. I'd be interested in comments on the grouping > notion in particular --- I reverse-engineered that out of what the code > was doing, and I'm sure it could use improvement. Adding the notion of opening a 'empty' Group together with X_OPENCLOSE or handling of X_OPENING|X_CLOSING would allow to handle empty tags like in ExplainOneUtility (<Notify />). Otherwise it seems to look nice. Andres
Andres Freund <andres@anarazel.de> writes: > Adding the notion of opening a 'empty' Group together with X_OPENCLOSE or > handling of X_OPENING|X_CLOSING would allow to handle empty tags like in > ExplainOneUtility (<Notify />). Yeah, I was just wondering what to do with the <Notify /> code. I'm not sure if it's worth trying to fold it into the OpenGroup/CloseGroup code --- it might be easier to have an ExplainEmptyGroup or something like that. regards, tom lane
On Sun, Aug 9, 2009 at 9:32 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Andres Freund <andres@anarazel.de> writes: >> Adding the notion of opening a 'empty' Group together with X_OPENCLOSE or >> handling of X_OPENING|X_CLOSING would allow to handle empty tags like in >> ExplainOneUtility (<Notify />). > > Yeah, I was just wondering what to do with the <Notify /> code. I'm > not sure if it's worth trying to fold it into the OpenGroup/CloseGroup > code --- it might be easier to have an ExplainEmptyGroup or something > like that. Well there's no rule that says you can't emit it as <Notify> </Notify> ...Robert
On Monday 10 August 2009 03:34:36 Robert Haas wrote: > On Sun, Aug 9, 2009 at 9:32 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > > Andres Freund <andres@anarazel.de> writes: > >> Adding the notion of opening a 'empty' Group together with X_OPENCLOSE > >> or handling of X_OPENING|X_CLOSING would allow to handle empty tags like > >> in ExplainOneUtility (<Notify />). > > > > Yeah, I was just wondering what to do with the <Notify /> code. I'm > > not sure if it's worth trying to fold it into the OpenGroup/CloseGroup > > code --- it might be easier to have an ExplainEmptyGroup or something > > like that. > > Well there's no rule that says you can't emit it as > > <Notify> > </Notify> That wont work nicely with json output. Andres
Robert Haas wrote: > One subtle point that isn't documented and probably should be is that > JSON can't support a container that behaves partly like a list and > partly like a hash, as XML can. So for example in XML a <Plan> tag > could have children like <Startup-Cost> (one each) and could also have > its inner, outer, and sub-plans in there as <Plan> tags right under > the parent <Plan>. I'm not sure this would be good design anyway, but > it COULD be done. In JSON, this will crash and burn, because the > container is either an array (which precludes labelling the elements) > or a hash (which precludes duplicates). > > > Right, this is fairly well known, I think. There are methods to map XML to JSON, and it can even be done in such a way that you can make a complete round trip, but in the schemes I've seen the JSON doesn't really look like what you would use if you designed the JSON document from scratch, or if it does then you're losing something. cheers andrew
On Monday 10 August 2009 03:43:22 Andres Freund wrote: > On Monday 10 August 2009 03:34:36 Robert Haas wrote: > > On Sun, Aug 9, 2009 at 9:32 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > > > Andres Freund <andres@anarazel.de> writes: > > >> Adding the notion of opening a 'empty' Group together with X_OPENCLOSE > > >> or handling of X_OPENING|X_CLOSING would allow to handle empty tags > > >> like in ExplainOneUtility (<Notify />). > > > > > > Yeah, I was just wondering what to do with the <Notify /> code. I'm > > > not sure if it's worth trying to fold it into the OpenGroup/CloseGroup > > > code --- it might be easier to have an ExplainEmptyGroup or something > > > like that. > > > > Well there's no rule that says you can't emit it as > > <Notify> > > </Notify> > That wont work nicely with json output. I have not the slightest clue what I was thinking while typing that... Andres
Robert Haas <robertmhaas@gmail.com> writes: > Revised patch attached. I'm not convinced this is as good as it can > be, but I've been looking at this patch for so long that I'm starting > to get cross-eyed, and I'd like to Tom at least have a look at this > and assess it before we run out of CommitFest. Committed after significant hacking to try to make the format abstraction layer a tad more complete. There are still some open issues: * I still think we need a written spec for the non-text output formats. One of the problems with machine reading of the text format is you have to reverse-engineer what the possibilities are, and this patch hasn't made that better. A list of the possible fields, and the possible values for those fields that have finite domains, would be a start. * There are some decisions that seem a bit questionable to me, like using "Parent Relationship" tags rather than having the child plans as labeled attributes of the parent node. But I can't really evaluate this for lack of experience with designing XML/JSON structures. * As already noted, the URL for the XML schema seems questionable. I think that versioning should go more like v1, v2, ... instead of being tied to a year. * I complained earlier that I thought dumping expressions as text was pretty bogus --- it'll leave anything that's trying to do analysis in depth still having to parse complicated stuff. I don't know exactly what I want instead, but at the very least it seems like the variables used in an expression ought to be more readily available. Anyway, it's committed so that people can play with it. We're a lot more likely to get feedback if people actually try to use the feature. regards, tom lane
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> I takle it back. It's still there at >> <http://archives.postgresql.org/pgsql-hackers/2009-08/msg00485.php> >> posted 3 days ago. > > Hmm, I think the archive website must be mangling that somehow. > What I have in the code I'm reviewing is > > if (es.format == EXPLAIN_FORMAT_XML) > appendStringInfoString(es.str, > "<explain xmlns=\"http://www.postgresql.org/2009/explain\">\n"); > > I was planning to complain about the format of this URL --- shouldn't it > be more like http://www.postgresql.org/explain/v1 ? --- but there's > no semicolon. that url seems too general anyway - can we do something like http://www.postgresql.org/schema/explain/v1 or http://www.postgresql.org/xml/2009/explain/? Stefan
On Mon, Aug 10, 2009 at 1:56 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Revised patch attached. I'm not convinced this is as good as it can >> be, but I've been looking at this patch for so long that I'm starting >> to get cross-eyed, and I'd like to Tom at least have a look at this >> and assess it before we run out of CommitFest. > > Committed after significant hacking to try to make the format > abstraction layer a tad more complete. Looks nice, thank you. > There are still some open issues: > > * I still think we need a written spec for the non-text output formats. > One of the problems with machine reading of the text format is you have > to reverse-engineer what the possibilities are, and this patch hasn't > made that better. A list of the possible fields, and the possible > values for those fields that have finite domains, would be a start. Where would we put this in the documentation? Seems like it might need a new section/chapter somewhere. > * There are some decisions that seem a bit questionable to me, like > using "Parent Relationship" tags rather than having the child plans > as labeled attributes of the parent node. But I can't really evaluate > this for lack of experience with designing XML/JSON structures. That would work for XML, but I think it doesn't for JSON. > * As already noted, the URL for the XML schema seems questionable. > I think that versioning should go more like v1, v2, ... instead of > being tied to a year. Or what about being based on the major PostgreSQL major version? Would it be lame to think about something like http://www.postgresql.org/docs/8.5/static/sql-explain.html ? > * I complained earlier that I thought dumping expressions as text > was pretty bogus --- it'll leave anything that's trying to > do analysis in depth still having to parse complicated stuff. > I don't know exactly what I want instead, but at the very least it > seems like the variables used in an expression ought to be more > readily available. > > Anyway, it's committed so that people can play with it. We're a > lot more likely to get feedback if people actually try to use the > feature. Awesome. ...Robert
Andres Freund wrote: > I produced/mailed a relaxng version for a a bit older version and I plan to > refresh and document it once the format seems suitably stable. I am not sure > it is yet. If yes, this should not take that long... > (Relaxng because you easily can convert it into most other XML schema > description languages) > > I don't mind doing both, but I think one should be authoritative, and whatever the relative technical merits are (please, let's not debate that here) the fact after quite some years is that XML Schemas have much more traction. See the thread that starts at <http://lists.xml.org/archives/xml-dev/200804/msg00058.html>. For example, Xerces-J supports XML Schemas natively. Anyway, now it's committed I will be having a play with it. cheers andrew
On Monday 10 August 2009 14:39:22 Andrew Dunstan wrote: > Andres Freund wrote: > > I produced/mailed a relaxng version for a a bit older version and I plan > > to refresh and document it once the format seems suitably stable. I am > > not sure it is yet. If yes, this should not take that long... > > (Relaxng because you easily can convert it into most other XML schema > > description languages) > I don't mind doing both, but I think one should be authoritative, and > whatever the relative technical merits are (please, let's not debate > that here) the fact after quite some years is that XML Schemas have much > more traction. See the thread that starts at > <http://lists.xml.org/archives/xml-dev/200804/msg00058.html>. For > example, Xerces-J supports XML Schemas natively. I don't really mind which format gets choosen - I just had relaxng one already done. Do you plan to write a XML-Schema Schema? Just to avoid duplicated work... Andres
Andres Freund wrote: > On Monday 10 August 2009 14:39:22 Andrew Dunstan wrote: > >> Andres Freund wrote: >> >>> I produced/mailed a relaxng version for a a bit older version and I plan >>> to refresh and document it once the format seems suitably stable. I am >>> not sure it is yet. If yes, this should not take that long... >>> (Relaxng because you easily can convert it into most other XML schema >>> description languages) >>> >> I don't mind doing both, but I think one should be authoritative, and >> whatever the relative technical merits are (please, let's not debate >> that here) the fact after quite some years is that XML Schemas have much >> more traction. See the thread that starts at >> <http://lists.xml.org/archives/xml-dev/200804/msg00058.html>. For >> example, Xerces-J supports XML Schemas natively. >> > I don't really mind which format gets choosen - I just had relaxng one > already done. > Do you plan to write a XML-Schema Schema? Just to avoid duplicated work... > > > I'll see what I can do. cheers andrew
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Aug 10, 2009 at 1:56 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> There are still some open issues: >> >> * I still think we need a written spec for the non-text output formats. > Where would we put this in the documentation? Seems like it might > need a new section/chapter somewhere. I think it'd be sufficient to put it on the EXPLAIN reference page. IIRC, the COPY data format is documented on COPY's reference page, and this is equally particular to a single SQL command. >> * There are some decisions that seem a bit questionable to me, like >> using "Parent Relationship" tags rather than having the child plans >> as labeled attributes of the parent node. �But I can't really evaluate >> this for lack of experience with designing XML/JSON structures. > That would work for XML, but I think it doesn't for JSON. Why not? Something like "Inner": { ... } fits in JSON AFAICS. I don't know if there are any recognized style guidelines in the structured-document world that would approve or deprecate the way you've done it. I do see the advantage that code can visit all the subplans of a plan without knowing much about the plan tree structure. What's bothering me the most is that member subplans of an Append are mushed together with other child plan types. The ordering of the members is significant. Now the chosen representation does preserve that order, but ISTM mushing all the child plan types together like this makes it *look* like the Plans attribute is unordered; especially when the ordering is in fact not significant for every other child plan type. >> * As already noted, the URL for the XML schema seems questionable. >> I think that versioning should go more like v1, v2, ... instead of >> being tied to a year. > Or what about being based on the major PostgreSQL major version? > Would it be lame to think about something like > http://www.postgresql.org/docs/8.5/static/sql-explain.html ? Yeah. In the first place, I imagine the schema will change a few times before 8.5 is released. In the second, once we do get it right it'll probably be stable across multiple releases. I think we should just have a serial number on them, and not assume that either years or releases are appropriate quantifiers. >> * I complained earlier that I thought dumping expressions as text >> was pretty bogus --- it'll leave anything that's trying to >> do analysis in depth still having to parse complicated stuff. >> I don't know exactly what I want instead, but at the very least it >> seems like the variables used in an expression ought to be more >> readily available. On this point: basically what's bothering me is that by dumping expressions as undifferentiated text blobs, we are making the same mistake in small that this patch is meant to solve in large. I don't really want to do the work of decomposing expressions right now, but I'm not happy that it's impossible to do so without a non-backwards-compatible DTD break. What do you think of emitting expressions as containers, with the text as the only property? Instead of <Filter>(f1 > 0)</Filter> something like <Filter><Expr><Text>(f1 > 0)</Text></Expr></Filter> This would leave room to add additional properties beside the text, and not break existing clients when we do it. Another issue that bothers me a bit is the "Strategy" field. It may or may not appear depending on "Node Type", and when it does appear, the possible values vary depending on "Node Type". Is that sort of non-orthogonality considered good style? regards, tom lane
On Mon, Aug 10, 2009 at 10:54 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Mon, Aug 10, 2009 at 1:56 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>> There are still some open issues: >>> >>> * I still think we need a written spec for the non-text output formats. > >> Where would we put this in the documentation? Seems like it might >> need a new section/chapter somewhere. > > I think it'd be sufficient to put it on the EXPLAIN reference page. > IIRC, the COPY data format is documented on COPY's reference page, > and this is equally particular to a single SQL command. OK, I was just worried it might be long. >>> * There are some decisions that seem a bit questionable to me, like >>> using "Parent Relationship" tags rather than having the child plans >>> as labeled attributes of the parent node. But I can't really evaluate >>> this for lack of experience with designing XML/JSON structures. > >> That would work for XML, but I think it doesn't for JSON. > > Why not? Something like > > "Inner": { ... } > > fits in JSON AFAICS. I don't know if there are any recognized style > guidelines in the structured-document world that would approve or > deprecate the way you've done it. I do see the advantage that code > can visit all the subplans of a plan without knowing much about the > plan tree structure. What's bothering me the most is that member > subplans of an Append are mushed together with other child plan types. > The ordering of the members is significant. Now the chosen > representation does preserve that order, but ISTM mushing all the > child plan types together like this makes it *look* like the Plans > attribute is unordered; especially when the ordering is in fact > not significant for every other child plan type. Oh, I see what you mean. Yeah, we could do that, and I thought about it. I decided on this, because it would potentially let you recurse down the tree of nodes without having to handle every kind of sub-plan-node separately. >>> * As already noted, the URL for the XML schema seems questionable. >>> I think that versioning should go more like v1, v2, ... instead of >>> being tied to a year. > >> Or what about being based on the major PostgreSQL major version? >> Would it be lame to think about something like >> http://www.postgresql.org/docs/8.5/static/sql-explain.html ? > > Yeah. In the first place, I imagine the schema will change a few times > before 8.5 is released. In the second, once we do get it right it'll > probably be stable across multiple releases. I think we should just > have a serial number on them, and not assume that either years or > releases are appropriate quantifiers. That's fine then. I'm easy; the schema URL is the least interesting part of this IMO. >>> * I complained earlier that I thought dumping expressions as text >>> was pretty bogus --- it'll leave anything that's trying to >>> do analysis in depth still having to parse complicated stuff. >>> I don't know exactly what I want instead, but at the very least it >>> seems like the variables used in an expression ought to be more >>> readily available. > > On this point: basically what's bothering me is that by dumping > expressions as undifferentiated text blobs, we are making the same > mistake in small that this patch is meant to solve in large. > I don't really want to do the work of decomposing expressions right > now, but I'm not happy that it's impossible to do so without a > non-backwards-compatible DTD break. What do you think of emitting > expressions as containers, with the text as the only property? > Instead of > > <Filter>(f1 > 0)</Filter> > > something like > > <Filter><Expr><Text>(f1 > 0)</Text></Expr></Filter> > > This would leave room to add additional properties beside the text, > and not break existing clients when we do it. Well, there you seem to be adding TWO containers, which is probably overkill. I could see adding one. > Another issue that bothers me a bit is the "Strategy" field. > It may or may not appear depending on "Node Type", and when it > does appear, the possible values vary depending on "Node Type". > Is that sort of non-orthogonality considered good style? It is in the land of Robert, but someone else might prefer something different. I'm not attached to doing it this way, but as a guy who does a lot of database work I tend to prefer to avoid having a multiple, distinct fields for similar information. It makes it had to read the SELECT * FROM table output in my 80-character terminal window. :-) ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Aug 10, 2009 at 10:54 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> <Filter><Expr><Text>(f1 > 0)</Text></Expr></Filter> >> >> This would leave room to add additional properties beside the text, >> and not break existing clients when we do it. > Well, there you seem to be adding TWO containers, which is probably > overkill. I could see adding one. Uh, no, I see one container and a property. If we do just <Filter><Expr>(f1 > 0)</Expr></Filter> then where do we put additional information about the expression when the time comes? regards, tom lane
Robert Haas escribió: > What the hell? I have every version of that patch I've ever submitted > in ~/patch/explain-as-submitted, and that extra semicolon is not there > in any of them. Furthermore, when I open up the attachment from my > sent mail, the semicolon isn't there either. Yet I see it at the link > you provided just as clearly as you do. Is there a bug in the > archives code??? Hmm, wow, interesting. The mbox from which the archives page is created does _not_ have a semicolon there. A(nother) Mhonarc bug perhaps? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Mon, Aug 10, 2009 at 12:13 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Mon, Aug 10, 2009 at 10:54 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>> <Filter><Expr><Text>(f1 > 0)</Text></Expr></Filter> >>> >>> This would leave room to add additional properties beside the text, >>> and not break existing clients when we do it. > >> Well, there you seem to be adding TWO containers, which is probably >> overkill. I could see adding one. > > Uh, no, I see one container and a property. If we do just > > <Filter><Expr>(f1 > 0)</Expr></Filter> > > then where do we put additional information about the expression > when the time comes? I would assume you would just write: <Filter><Text>(f1 > 0)</Text><Other-Stuff>thing!</Other-Stuff></Filter> ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Aug 10, 2009 at 12:13 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> Uh, no, I see one container and a property. If we do just >> >> <Filter><Expr>(f1 > 0)</Expr></Filter> >> >> then where do we put additional information about the expression >> when the time comes? > I would assume you would just write: > <Filter><Text>(f1 > 0)</Text><Other-Stuff>thing!</Other-Stuff></Filter> Perhaps the issue would be clearer in JSON notation. We have "Filter": "(f1 > 0)" What I suggest is "Filter": { "Text": "(f1 > 0)" } I don't see where you're going to shoehorn in any additional information without the container, and once you have the container you need to name the property, no? regards, tom lane
On Mon, Aug 10, 2009 at 12:47 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Mon, Aug 10, 2009 at 12:13 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>> Uh, no, I see one container and a property. If we do just >>> >>> <Filter><Expr>(f1 > 0)</Expr></Filter> >>> >>> then where do we put additional information about the expression >>> when the time comes? > >> I would assume you would just write: > >> <Filter><Text>(f1 > 0)</Text><Other-Stuff>thing!</Other-Stuff></Filter> > > Perhaps the issue would be clearer in JSON notation. We have > > "Filter": "(f1 > 0)" > > What I suggest is > > "Filter": { "Text": "(f1 > 0)" } > > I don't see where you're going to shoehorn in any additional information > without the container, and once you have the container you need to name > the property, no? I agree. The JSON looks perfect to me. I may be thick as a post here and say "oh, I'm a moron" when you explain this to me, but I still don't understand why that would require the XML notation to interpose an intermediate node. Why can't "filter" node itself can be the labelled container? ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > I may be thick as a post here and say "oh, I'm a moron" when you > explain this to me, but I still don't understand why that would > require the XML notation to interpose an intermediate node. Why can't > "filter" node itself can be the labelled container? Filter isn't a node; it's a property of the containing Plan node. The way we have this set up, there's a distinction between properties and groups, which AFAICS we have to have in order to have directly comparable structures in XML and JSON. Didn't you design this yourself? (I think part of the issue is that containers in JSON are anonymous whereas XML wants to assign them a named type. That's fine with me, in fact the JSON approach looks rather impoverished.) regards, tom lane
On Mon, Aug 10, 2009 at 1:45 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I may be thick as a post here and say "oh, I'm a moron" when you >> explain this to me, but I still don't understand why that would >> require the XML notation to interpose an intermediate node. Why can't >> "filter" node itself can be the labelled container? > > Filter isn't a node; it's a property of the containing Plan node. My use of the word node was poorly chosen, since that word has a specific meaning in the context of PG. > The way we have this set up, there's a distinction between properties > and groups, which AFAICS we have to have in order to have directly > comparable structures in XML and JSON. Didn't you design this > yourself? Yes, I did. But the point is that as far as I can see, the following two things are equivalent: <Filter><Text>(f1 > 0)</Text></Filter>"Filter": { "Text": "(f1 > 0)" } And this is not: <Filter><Expr><Text>(f1 > 0)</Text></Expr></Filter> The latter would be equivalent to something like this in JSON: "Filter" : { "Expr" : { "Text: "(f1 > 0)" } } or if you intended the <Expr> thing to be an array-type container, then it would be equivalent to this: "Filter" : { [ { "Text: "(f1 > 0)" } ] } Would it be helpful for me to try to reduce this to code? > (I think part of the issue is that containers in JSON are anonymous > whereas XML wants to assign them a named type. That's fine with me, > in fact the JSON approach looks rather impoverished.) That does make things a little tricky, though it has the virtue of mapping nicely onto data structures other than XML. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Aug 10, 2009 at 1:45 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> The way we have this set up, there's a distinction between properties >> and groups, which AFAICS we have to have in order to have directly >> comparable structures in XML and JSON. �Didn't you design this >> yourself? > Yes, I did. But the point is that as far as I can see, the following > two things are equivalent: > <Filter><Text>(f1 > 0)</Text></Filter> > "Filter": { "Text": "(f1 > 0)" } > And this is not: > <Filter><Expr><Text>(f1 > 0)</Text></Expr></Filter> Well, in that case we need to redesign the "grouping" abstraction in explain.c, because it's set up to treat a JSON hash as equivalent to a parent node type (or whatever you want to call it) in XML. But I'm not sure I want to do it. It appears to me that what you want to do is dumb down the XML representation so it's just as impoverished as the JSON one; to wit that there won't be any abstract concept of an "Expr" node, and all client programs will have to implicitly know that, say, Filter is an instance of an Expr and therefore can be expected to have thus-and-such fields inside it. This does not seem like an improvement. It puzzles me that you are pushing to make expression containers anonymous and unrecognizable when you wanted the opposite for child plans. regards, tom lane
On Mon, Aug 10, 2009 at 7:57 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Mon, Aug 10, 2009 at 1:45 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>> The way we have this set up, there's a distinction between properties >>> and groups, which AFAICS we have to have in order to have directly >>> comparable structures in XML and JSON. Didn't you design this >>> yourself? > >> Yes, I did. But the point is that as far as I can see, the following >> two things are equivalent: > >> <Filter><Text>(f1 > 0)</Text></Filter> >> "Filter": { "Text": "(f1 > 0)" } > >> And this is not: > >> <Filter><Expr><Text>(f1 > 0)</Text></Expr></Filter> > > Well, in that case we need to redesign the "grouping" abstraction in > explain.c, because it's set up to treat a JSON hash as equivalent to > a parent node type (or whatever you want to call it) in XML. But > I'm not sure I want to do it. It appears to me that what you want > to do is dumb down the XML representation so it's just as impoverished > as the JSON one; to wit that there won't be any abstract concept of > an "Expr" node, and all client programs will have to implicitly know > that, say, Filter is an instance of an Expr and therefore can be > expected to have thus-and-such fields inside it. This does not seem > like an improvement. It puzzles me that you are pushing to make > expression containers anonymous and unrecognizable when you wanted the > opposite for child plans. I think I might be starting to understand what you're getting at here.Let me check: I think what you're saying is that theExpr node is potentially useful to clients for identifying where in the tree the Exprs are, even without specific knowledge about the different types of filter nodes, much as the Plans node is useful for finding all of the subplan objects. If that is correct, then let me make two comments. First, I don't think it should be a goal of this format to be self-documenting. To the extent that it can be so without mucking up the format, great. But self-documenting formats are often a recipe for massive bloat, and in my experience they tend to create more problems than they solve. I think keeping it lean is the way to go. But, second, there might be a way that we can have our cake and eat it too. Instead of decorating each node with properties Filter, Join-Filter, One-Time-Filter, and so on, what we could do is decorate each node with a single property called Filters, which would be a list of hashes: 'Filters' : [ { 'Filter-Type' => 'Join', 'Text-Expr' => 'a.foo < b.bar' }] <Filters> <Filter> <Filter-Type>Join</Filter-Type> <Text-Expr>a.foo < b.bar</Text-Expr> </Filter> </Filters> Would that address your concerns, or am I still way off base here? ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > I think I might be starting to understand what you're getting at here. > Let me check: I think what you're saying is that the Expr node is > potentially useful to clients for identifying where in the tree the > Exprs are, even without specific knowledge about the different types > of filter nodes, much as the Plans node is useful for finding all of > the subplan objects. Well, my point is that there are a bunch of different places that expressions can appear in a plan tree --- filters, join filters, sort keys, hash clauses, merge clauses, index clauses, output expressions, and I'm not sure I remembered them all, and I fully expect that some more might appear in future. I don't think it's appropriate to require clients to keep track of exactly which tree properties are expressions, especially not when we can easily label them. > If that is correct, then let me make two comments. First, I don't > think it should be a goal of this format to be self-documenting. It isn't. But I think it should be a goal that a client can, say, find all the references to a given variable without a huge amount of hard-wired knowledge about specific properties of specific node types. So it should be able to find all the expressions with a relatively dumb search of the tree (eg, XPath?). > But, second, there might be a way that we can have our cake and eat it > too. Instead of decorating each node with properties Filter, > Join-Filter, One-Time-Filter, and so on, what we could do is decorate > each node with a single property called Filters, which would be a list > of hashes: > 'Filters' : [ { 'Filter-Type' => 'Join', 'Text-Expr' => 'a.foo < b.bar' }] I still say that's a bad idea for child Plans; and it's a worse one for expressions. The point about retaining knowledge of child order is absolutely critical for many of the places where exprs appear, such as sort keys and output columns. You'd have to seriously uglify the notation in order to do it like this. One way that we could handle this, I guess, is to embed more information in the property names --- eg use "Expr Text" where I just had "Text". But it's not apparent to me why that should be considered better style than the other way. regards, tom lane
On Tue, Aug 11, 2009 at 12:11 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I think I might be starting to understand what you're getting at here. >> Let me check: I think what you're saying is that the Expr node is >> potentially useful to clients for identifying where in the tree the >> Exprs are, even without specific knowledge about the different types >> of filter nodes, much as the Plans node is useful for finding all of >> the subplan objects. > > Well, my point is that there are a bunch of different places that > expressions can appear in a plan tree --- filters, join filters, sort > keys, hash clauses, merge clauses, index clauses, output expressions, > and I'm not sure I remembered them all, and I fully expect that some > more might appear in future. I don't think it's appropriate to require > clients to keep track of exactly which tree properties are expressions, > especially not when we can easily label them. > >> If that is correct, then let me make two comments. First, I don't >> think it should be a goal of this format to be self-documenting. > > It isn't. But I think it should be a goal that a client can, say, > find all the references to a given variable without a huge amount of > hard-wired knowledge about specific properties of specific node types. > So it should be able to find all the expressions with a relatively > dumb search of the tree (eg, XPath?). > >> But, second, there might be a way that we can have our cake and eat it >> too. Instead of decorating each node with properties Filter, >> Join-Filter, One-Time-Filter, and so on, what we could do is decorate >> each node with a single property called Filters, which would be a list >> of hashes: > >> 'Filters' : [ { 'Filter-Type' => 'Join', 'Text-Expr' => 'a.foo < b.bar' }] > > I still say that's a bad idea for child Plans; and it's a worse one for > expressions. The point about retaining knowledge of child order is > absolutely critical for many of the places where exprs appear, such as > sort keys and output columns. You'd have to seriously uglify the > notation in order to do it like this. > > One way that we could handle this, I guess, is to embed more information > in the property names --- eg use "Expr Text" where I just had "Text". > But it's not apparent to me why that should be considered better style > than the other way. Unfortunately, I have to admit to total confusion. The idea in the last paragraph seems reasonable to me, but since I don't understand the other alternative, I can't say whether it's better or worse. I wonder if we would be better off waiting for feedback from actual tool authors. I have no illusions that the format is perfect... ...Robert
On Tue, 11 Aug 2009 13:11:47 -0400 Robert Haas <robertmhaas@gmail.com> wrote: > Unfortunately, I have to admit to total confusion. The idea in the > last paragraph seems reasonable to me, but since I don't understand > the other alternative, I can't say whether it's better or worse. I > wonder if we would be better off waiting for feedback from actual tool > authors. I have no illusions that the format is perfect... Have any tool authors stepped up and committed resources to utilizing this feature in the near term? This is one of the features I have been most interested in, and I believe it has the potential to greatly improve PostgreSQL's "reputation" of being hard to optimize if used correctly. I envision a web site where users can paste the XML explain output and have it return some pretty specific information and fancy graphs describing exactly whats going on with their query and possible ideas on how to further optimize the query itself and even PostgreSQL configuration settings. The reason I would like to provide this tool in a web-based form is that no additional software installation would be necessary for the user, reducing any hurdles to using it to zero. I'm guessing that my vision likely exceeds the scope of this feature in its initial form at least, but assuming no one else has stepped up, I'm more than willing to start committing resources as early as this weekend with the understanding that this feature is still in development and likely will change several times before or if its finally committed for 8.5. -- Mike
Mike <ipso@snappymail.ca> writes: > Have any tool authors stepped up and committed resources to utilizing > this feature in the near term? I don't think anyone's promised much. If you want to have a go at using it, we'd be very happy. > I'm guessing that my vision likely exceeds the scope of this feature in > its initial form at least, but assuming no one else has stepped up, I'm > more than willing to start committing resources as early as this > weekend with the understanding that this feature is still in > development and likely will change several times before or if its > finally committed for 8.5. It's definitely committed for 8.5, but the exact format of the output is (obviously) still subject to change. regards, tom lane
Tom Lane wrote: > Mike <ipso@snappymail.ca> writes: > >> Have any tool authors stepped up and committed resources to utilizing >> this feature in the near term? >> > > I don't think anyone's promised much. If you want to have a go at using > it, we'd be very happy. > > >> I'm guessing that my vision likely exceeds the scope of this feature in >> its initial form at least, but assuming no one else has stepped up, I'm >> more than willing to start committing resources as early as this >> weekend with the understanding that this feature is still in >> development and likely will change several times before or if its >> finally committed for 8.5. >> > > It's definitely committed for 8.5, but the exact format of the output > is (obviously) still subject to change. > > > Good. I had a look at this for a little while yesterday. I built it, did an install, loaded auto_explain and then ran the regression tests. I didn't like the output much. It looks like the XML has been dumbed down to fit a data model that works for JSON as well, particularly in the lack of use of attributes. An XML processor won't care that much, but humans will certainly find it more tiresome to read. In effect we are swapping horizontal expansion for vertical expansion. It would be nicer to be able to fit a plan into a screen. I also took the last relaxng spec I could find and used a nice little tool called Trang to translate it into an XML Schema spec. The good news is that that worked. The bad news is that the spec almost certainly needs some tightening, especially around those elements that probably should be XML attributes. cheers andrew
On Tuesday 11 August 2009 21:59:48 Andrew Dunstan wrote: > Tom Lane wrote: > > Mike <ipso@snappymail.ca> writes: > >> Have any tool authors stepped up and committed resources to utilizing > >> this feature in the near term? > > > > I don't think anyone's promised much. If you want to have a go at using > > it, we'd be very happy. > > > >> I'm guessing that my vision likely exceeds the scope of this feature in > >> its initial form at least, but assuming no one else has stepped up, I'm > >> more than willing to start committing resources as early as this > >> weekend with the understanding that this feature is still in > >> development and likely will change several times before or if its > >> finally committed for 8.5. > > > > It's definitely committed for 8.5, but the exact format of the output > > is (obviously) still subject to change. > Good. I had a look at this for a little while yesterday. I built it, did > an install, loaded auto_explain and then ran the regression tests. I > didn't like the output much. It looks like the XML has been dumbed down > to fit a data model that works for JSON as well, particularly in the > lack of use of attributes. An XML processor won't care that much, but > humans will certainly find it more tiresome to read. In effect we are > swapping horizontal expansion for vertical expansion. It would be nicer > to be able to fit a plan into a screen. The problem is that nobody yet came up with one that is easy to generate and liked by many people... Aesthetically I do not like the current schema as well, but personally I don't think it matters that much. Everything fancy has the problem of requiring relatively much code... > I also took the last relaxng spec I could find and used a nice little > tool called Trang to translate it into an XML Schema spec. The good news > is that that worked. The bad news is that the spec almost certainly > needs some tightening, especially around those elements that probably > should be XML attributes. Unrelated to the other issues the relaxng schema has some missing pieces if I remember it correctly (constraint => constraint-name, trigger => trigger-name) I think). It is also by far not strict enough yet... Andres
On Tue, Aug 11, 2009 at 3:59 PM, Andrew Dunstan<andrew@dunslane.net> wrote: > Good. I had a look at this for a little while yesterday. I built it, did an > install, loaded auto_explain and then ran the regression tests. I didn't > like the output much. It looks like the XML has been dumbed down to fit a > data model that works for JSON as well, particularly in the lack of use of > attributes. An XML processor won't care that much, but humans will certainly > find it more tiresome to read. In effect we are swapping horizontal > expansion for vertical expansion. It would be nicer to be able to fit a plan > into a screen. Isn't that what text format is for? Not that the XML format is perfect; I haven't heard one person say that they like anything about it except that it is already implemented. ...Robert
Robert Haas <robertmhaas@gmail.com> wrote: > Andrew Dunstan<andrew@dunslane.net> wrote: >> find it more tiresome to read. In effect we are swapping horizontal >> expansion for vertical expansion. It would be nicer to be able to >> fit a plan into a screen. > > Isn't that what text format is for? In my experience XML which represents anything of any real complexity is tiresome to read, period. I would worry more about it accurately representing the data and being easy to massage with software. That's not an opinion on any particular choice here, just on what should matter in making a choice. -Kevin
Robert Haas wrote: > On Tue, Aug 11, 2009 at 3:59 PM, Andrew Dunstan<andrew@dunslane.net> wrote: > >> Good. I had a look at this for a little while yesterday. I built it, did an >> install, loaded auto_explain and then ran the regression tests. I didn't >> like the output much. It looks like the XML has been dumbed down to fit a >> data model that works for JSON as well, particularly in the lack of use of >> attributes. An XML processor won't care that much, but humans will certainly >> find it more tiresome to read. In effect we are swapping horizontal >> expansion for vertical expansion. It would be nicer to be able to fit a plan >> into a screen. >> > > Isn't that what text format is for? > > Not that the XML format is perfect; I haven't heard one person say > that they like anything about it except that it is already > implemented. > > > Well, I don't think that the fact that we are producing machine-readable output means we can just ignore the human side of it. It is more than likely that such output will be read by both machines and humans. Obviously, we need to make sure that it meets machine processing needs first, but once we have done that we should not ignore the human requirements. BTW, I think it's great that we have machine-readable formats. This opens the door to a lot of tools. Well done. No doubt there will be disagreements about the formats. Our community certainly has a tendency to argue over matters of appearance, sometimes to the point of silliness. My understanding was that the idea of getting this in now was to let people have a look, and play to some extent. cheers andrew
On Tue, 2009-08-11 at 23:58 +0200, Andrew Dunstan wrote: > Well, I don't think that the fact that we are producing machine-readable > output means we can just ignore the human side of it. It is more than > likely that such output will be read by both machines and humans. > Obviously, we need to make sure that it meets machine processing needs > first, but once we have done that we should not ignore the human > requirements. XML is easy to transform to about anything else. I would vote for the XML output to only focus on machine readability and completeness, and then if needed provide style sheets to make it human readable. Then anybody could have his preferred style to look at it. If there would be a tool to format the XML according to some style sheet (to make it easy for those who don't care about XML and style sheets), I would volunteer to provide the XSL style sheets for different formats on request. Cheers, Csaba.
Csaba Nagy wrote: > On Tue, 2009-08-11 at 23:58 +0200, Andrew Dunstan wrote: > >> Well, I don't think that the fact that we are producing machine-readable >> output means we can just ignore the human side of it. It is more than >> likely that such output will be read by both machines and humans. >> Obviously, we need to make sure that it meets machine processing needs >> first, but once we have done that we should not ignore the human >> requirements. >> > > XML is easy to transform to about anything else. I would vote for the > XML output to only focus on machine readability and completeness, and > then if needed provide style sheets to make it human readable. Then > anybody could have his preferred style to look at it. If there would be > a tool to format the XML according to some style sheet (to make it easy > for those who don't care about XML and style sheets), I would volunteer > to provide the XSL style sheets for different formats on request. > > > Have you actually looked at a logfile with this in it? A simple stylesheet won't do at all. What you get is not an XML document but a text document with little bits of XML embedded in it. So you would need a program to parse that file and either turn it into a single legal XML document or pass each piece of XML individually to your XSLT processor. Bleah. And all this because you pose a false dichotomy between correctness and completeness on one hand and human readability on the other. I don't accept that at all. I think we can and should improve human readability without sacrificing anything on the correctness and completeness front. In fact, that also needs improving, and we can do them both at the same time. I want to be able to have machine readable explain output, but I also want to be able to browse the logs without wasting more brain cells than necessary and without having to use external tools other than by standard text browser (less). As Pooh Bear said, "Both please!" One thing I have noticed that we should talk about is that the explain XML output doesn't contain the query that is being explained. That's unfortunate - it means that any logfile processor will need to extract the statement from the surrounding text rather than just pulling out the XML and passing it to an XML processor. Another design issue is this: The root node of an XML document is ideally a distinguished element that can't occur within itself. auto-explain doesn't seem to be doing this. It outputs fragments that have <Plan> as the root element rather than <explain> Here's a tiny fragment of my logfile with auto_explain.log_min_duration = 0 and auto_explain.log_format = 'xml' settings: STATEMENT: SELECT 1 AS one; LOG: duration: 0.008 ms plan: <Plan> <Node-Type>Result</Node-Type> <Startup-Cost>0.00</Startup-Cost> <Total-Cost>0.01</Total-Cost> <Plan-Rows>1</Plan-Rows> <Plan-Width>0</Plan-Width> </Plan> But a <Plan> node can appear as the decendant of a <Plan> node, so this violates the design principle I enunciated above. (Anecdote: The very first use I even made of Postgres was at the end of 2002, for an Application I designed called "30 minute App" which let people design online and generate a questionnaire application for small devices (Palm, Blackberry, PocketPC) and stored the application metadata as a piece of XML in a text field in Postgres.) cheers andrew
On Wed, Aug 12, 2009 at 9:42 AM, Andrew Dunstan<andrew@dunslane.net> wrote: > Csaba Nagy wrote: >> >> On Tue, 2009-08-11 at 23:58 +0200, Andrew Dunstan wrote: >> >>> >>> Well, I don't think that the fact that we are producing machine-readable >>> output means we can just ignore the human side of it. It is more than likely >>> that such output will be read by both machines and humans. Obviously, we >>> need to make sure that it meets machine processing needs first, but once we >>> have done that we should not ignore the human requirements. >>> >> >> XML is easy to transform to about anything else. I would vote for the >> XML output to only focus on machine readability and completeness, and >> then if needed provide style sheets to make it human readable. Then >> anybody could have his preferred style to look at it. If there would be >> a tool to format the XML according to some style sheet (to make it easy >> for those who don't care about XML and style sheets), I would volunteer >> to provide the XSL style sheets for different formats on request. >> >> >> > > Have you actually looked at a logfile with this in it? A simple stylesheet > won't do at all. What you get is not an XML document but a text document > with little bits of XML embedded in it. So you would need a program to parse > that file and either turn it into a single legal XML document or pass each > piece of XML individually to your XSLT processor. Bleah. > > And all this because you pose a false dichotomy between correctness and > completeness on one hand and human readability on the other. I don't accept > that at all. I think we can and should improve human readability without > sacrificing anything on the correctness and completeness front. In fact, > that also needs improving, and we can do them both at the same time. > > I want to be able to have machine readable explain output, but I also want > to be able to browse the logs without wasting more brain cells than > necessary and without having to use external tools other than by standard > text browser (less). As Pooh Bear said, "Both please!" > > One thing I have noticed that we should talk about is that the explain XML > output doesn't contain the query that is being explained. That's unfortunate > - it means that any logfile processor will need to extract the statement > from the surrounding text rather than just pulling out the XML and passing > it to an XML processor. > > Another design issue is this: The root node of an XML document is ideally a > distinguished element that can't occur within itself. auto-explain doesn't > seem to be doing this. It outputs fragments that have <Plan> as the root > element rather than <explain> > > Here's a tiny fragment of my logfile with auto_explain.log_min_duration = 0 > and auto_explain.log_format = 'xml' settings: > > > STATEMENT: SELECT 1 AS one; > LOG: duration: 0.008 ms plan: > <Plan> > <Node-Type>Result</Node-Type> > <Startup-Cost>0.00</Startup-Cost> > <Total-Cost>0.01</Total-Cost> > <Plan-Rows>1</Plan-Rows> > <Plan-Width>0</Plan-Width> > </Plan> > > > But a <Plan> node can appear as the decendant of a <Plan> node, so this > violates the design principle I enunciated above. auto_explain sort of hooks into the middle of the main EXPLAIN machinery in a strange way. It's not comparable to a real EXPLAIN ANALYZE because, for example, it won't report_triggers(). But we could still put a container around it of some sort by modifying contrib/auto_explain. Perhaps you'd care to propose a patch? ...Robert
Andrew Dunstan <andrew@dunslane.net> writes: > Another design issue is this: The root node of an XML document is > ideally a distinguished element that can't occur within itself. > auto-explain doesn't seem to be doing this. Huh? I get (for "explain 2+2") <explain xmlns="http://www.postgresql.org/2009/explain"> <Query> <Plan> <Node-Type>Result</Node-Type> <Startup-Cost>0.00</Startup-Cost> <Total-Cost>0.01</Total-Cost> <Plan-Rows>1</Plan-Rows> <Plan-Width>0</Plan-Width> </Plan> </Query></explain> regards, tom lane
On Wed, Aug 12, 2009 at 10:01 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> Another design issue is this: The root node of an XML document is >> ideally a distinguished element that can't occur within itself. >> auto-explain doesn't seem to be doing this. > > Huh? I get (for "explain 2+2") > > <explain xmlns="http://www.postgresql.org/2009/explain"> > <Query> > <Plan> > <Node-Type>Result</Node-Type> > <Startup-Cost>0.00</Startup-Cost> > <Total-Cost>0.01</Total-Cost> > <Plan-Rows>1</Plan-Rows> > <Plan-Width>0</Plan-Width> > </Plan> > </Query> > </explain> He's talking about *auto* explain. ...Robert
On Wed, 2009-08-12 at 15:42 +0200, Andrew Dunstan wrote: > Have you actually looked at a logfile with this in it? A simple > stylesheet won't do at all. What you get is not an XML document but a > text document with little bits of XML embedded in it. So you would need > a program to parse that file and either turn it into a single legal XML > document or pass each piece of XML individually to your XSLT processor. > Bleah. I'm pretty sure you will never find a human readable format which is easily extracted from the logs by a program. But if you format the XML in a (very human unreadable) one-line-without-breaks format then it will be a lot easier extracted by a program and formatted at your will. > And all this because you pose a false dichotomy between correctness and > completeness on one hand and human readability on the other. I don't > accept that at all. I think we can and should improve human readability > without sacrificing anything on the correctness and completeness front. > In fact, that also needs improving, and we can do them both at the same > time. I really really doubt that. I would go here on the UNIX approach of piping the things through the right tools, each one doing a simple and good job for it's single and well defined purpose. So let the explain spit out a line of XML without much thought about formatting but focusing on completeness, making it easy for tools to get that line, and then let the tools do the formatting depending on what you want to do with the information. Each part will be simpler than you would put in a directly human readable XML (if that's possible at all) approach, which will anyway not cover all the uses and tastes. > I want to be able to have machine readable explain output, but I also > want to be able to browse the logs without wasting more brain cells than > necessary and without having to use external tools other than by > standard text browser (less). As Pooh Bear said, "Both please!" I argue that a sufficiently complicated explain output will never be easily navigated in a text browser, however much you would like it. If you do a where clause with 100 nested ANDs (which occasionally happens here), I don't think you'll be able to cleanly show that in a text browser - it will simply not fit in no matter how you format it. But using the right GUI tool (even a generic XML one) it would be easy to just temporarily collapse the whole top AND and have a clean view of the rest. Cheers, Csaba.
On Wed, 2009-08-12 at 16:51 +0200, Csaba Nagy wrote: > I argue that a sufficiently complicated explain output will never be > easily navigated in a text browser, however much you would like it. If > you do a where clause with 100 nested ANDs (which occasionally happens > here), I don't think you'll be able to cleanly show that in a text > browser - it will simply not fit in no matter how you format it. But > using the right GUI tool (even a generic XML one) it would be easy to > just temporarily collapse the whole top AND and have a clean view of the > rest. Just a small note: I don't know how the machine-readable auto-explain output actually looks like, so I just assumed individual conditions will have their own XML node. But even if they don't have and the AND thing would be flat in the explain output, the same argument applies to even a few 10s of joins or sub-selects (and we do have that too). Sometimes collapsing parts of the plan would help big time in having a clean picture, which (starting with a certain complexity of the query) you will have a hard time to do in plain text regardless of the formatting. I do have plenty of explain outputs (from 8.2 postgres) which don't fit on my screen in text form even without further details I gather you can get in the machine readable form. Cheers, Csaba.
Csaba Nagy wrote: > On Wed, 2009-08-12 at 15:42 +0200, Andrew Dunstan wrote: > >> Have you actually looked at a logfile with this in it? A simple >> stylesheet won't do at all. What you get is not an XML document but a >> text document with little bits of XML embedded in it. So you would need >> a program to parse that file and either turn it into a single legal XML >> document or pass each piece of XML individually to your XSLT processor. >> Bleah. >> > > I'm pretty sure you will never find a human readable format which is > easily extracted from the logs by a program. But if you format the XML > in a (very human unreadable) one-line-without-breaks format then it will > be a lot easier extracted by a program and formatted at your will. > That will just make things worse. And it will break if the XML includes any expression that contains a line break. > >> And all this because you pose a false dichotomy between correctness and >> completeness on one hand and human readability on the other. I don't >> accept that at all. I think we can and should improve human readability >> without sacrificing anything on the correctness and completeness front. >> In fact, that also needs improving, and we can do them both at the same >> time. >> > > I really really doubt that. I would go here on the UNIX approach of > piping the things through the right tools, each one doing a simple and > good job for it's single and well defined purpose. So let the explain > spit out a line of XML without much thought about formatting but > focusing on completeness, making it easy for tools to get that line, and > then let the tools do the formatting depending on what you want to do > with the information. Each part will be simpler than you would put in a > directly human readable XML (if that's possible at all) approach, which > will anyway not cover all the uses and tastes. > I think we're going to have to agree to disagree on this. I think you're going in precisely the wrong direction. I repeat, I want to be able to have a log file that is both machine processable and not utterly unreadable by a human. And I do not accept at all that this is impossible. Nor do I accept I should need some extra processing tool to read the machine processable output without suffering brain damage. If we were to adopt your approach I bet you would find that nobody in their right mind would use the machine readable formats. I sure wouldn't. cheers andrew
Andrew Dunstan escribió: > STATEMENT: SELECT 1 AS one; > LOG: duration: 0.008 ms plan: > <Plan> > <Node-Type>Result</Node-Type> > <Startup-Cost>0.00</Startup-Cost> > <Total-Cost>0.01</Total-Cost> > <Plan-Rows>1</Plan-Rows> > <Plan-Width>0</Plan-Width> > </Plan> I think what this says is that auto-explain should not be sending its output to the regular logfile, but somewhere else. The format you show above is not the most usable thing in the world -- not for machine parsing, and neither for human consumption. Maybe it should fill its own file with something like this: <autoexplain> <query>select 1 as one;</query> <duration>0.008</duration> <Plan> <Node-Type>Result</Node-Type> <Startup-Cost>0.00</Startup-Cost> <Total-Cost>0.01</Total-Cost> <Plan-Rows>1</Plan-Rows> <Plan-Width>0</Plan-Width></Plan> </autoexplain> or some such. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera wrote: > Andrew Dunstan escribió: > > >> STATEMENT: SELECT 1 AS one; >> LOG: duration: 0.008 ms plan: >> <Plan> >> <Node-Type>Result</Node-Type> >> <Startup-Cost>0.00</Startup-Cost> >> <Total-Cost>0.01</Total-Cost> >> <Plan-Rows>1</Plan-Rows> >> <Plan-Width>0</Plan-Width> >> </Plan> >> > > I think what this says is that auto-explain should not be sending its > output to the regular logfile, but somewhere else. The format you show > above is not the most usable thing in the world -- not for machine > parsing, and neither for human consumption. > > Maybe it should fill its own file with something like this: > > <autoexplain> > <query>select 1 as one;</query> > <duration>0.008</duration> > <Plan> > <Node-Type>Result</Node-Type> > <Startup-Cost>0.00</Startup-Cost> > <Total-Cost>0.01</Total-Cost> > <Plan-Rows>1</Plan-Rows> > <Plan-Width>0</Plan-Width> > </Plan> > </autoexplain> > > or some such. > > With a format like this, pulling them out of the log file would be trivial, so I don't see why it couldn't go in the log file. It needs a bit of tweaking, but the idea is right. And to answer Robert's question - yes, I will be submitting a patch or two in due course. cheers andrew
On Wed, 2009-08-12 at 17:11 +0200, Andrew Dunstan wrote: > That will just make things worse. And it will break if the XML includes > any expression that contains a line break. Then escape the expressions using CDATA or such... I'm sure it would be possible to make sure it's one line and rely on that. That's part of being machine readable, being able to rely on getting it at all without too much parsing magic... > I repeat, I want to be able to have a log file that is both machine > processable and not utterly unreadable by a human. And I do not accept > at all that this is impossible. Nor do I accept I should need some extra > processing tool to read the machine processable output without suffering > brain damage. If we were to adopt your approach I bet you would find > that nobody in their right mind would use the machine readable formats. Then why you bother calling it "machine readable" at all ? Would you really read your auto-explain output on the DB server ? I doubt that's the common usage scenario, I would expect that most people would let a tool extract/summarize it and definitely process it somewhere else than on the DB machine, with the proper tool set. For ad-hoc explain analysis which I think it's the typical use case for on-the-DB-server inspection of text-format explain output you would surely use something else than what is called "machine readable" format... Cheers, Csaba.
On Wed, 12 Aug 2009 09:42:00 -0400 Andrew Dunstan <andrew@dunslane.net> wrote: > One thing I have noticed that we should talk about is that the > explain XML output doesn't contain the query that is being explained. > That's unfortunate - it means that any logfile processor will need to > extract the statement from the surrounding text rather than just > pulling out the XML and passing it to an XML processor. For the purposes I propose using the XML output for, not having the original query in the XML itself is almost a show stopper. I assume the original design of this feature had tools in mind that would actually connect to PostgreSQL directly with commands being issued through the tool itself. I would like to create a website where users can paste or upload just the XML output to and have enough information to do a full analysis of the query that way. Ideally this would include the raw query itself, column statistics targets and ideally even some of the more important query performance related GUC settings. I realize that could be a lot of information that may not be needed otherwise, so if its only included in the VERBOSE explain output, that would be more than fine, but just having this information available increases the usefulness of such an "outside" tool many times. Again though, at the very least I think the raw query needs to be included. -- Mike
On Wed, 2009-08-12 at 17:31 +0200, Csaba Nagy wrote: > On Wed, 2009-08-12 at 17:11 +0200, Andrew Dunstan wrote: > > That will just make things worse. And it will break if the XML includes > > any expression that contains a line break. > > Then escape the expressions using CDATA or such... I'm sure it would be > possible to make sure it's one line and rely on that. That's part of > being machine readable, being able to rely on getting it at all without > too much parsing magic... Looks like today I'm talking consistently stupid... CDATA of course won't help in avoiding line breaks, but it would be still possible to define an XML entity for that. This starts to be too complicated anyway, so probably not worth it. In any case I still think making it easy for a tool to extract/parse the information should be higher priority than human readability - because the information itself is not really human readable without external help except for simple queries, and I doubt the main use case is simple queries. Cheers, Csaba.
Csaba Nagy wrote: > Then why you bother calling it "machine readable" at all ? Would you > really read your auto-explain output on the DB server ? I doubt that's > the common usage scenario, I would expect that most people would let a > tool extract/summarize it and definitely process it somewhere else than > on the DB machine, with the proper tool set. > Sure I would. I look at log files almost every day to find out things. Why should I have to wade through a pile of utterly unreadable crap to find it? Auto-explain lets you have *one* output format. To follow your approach, I will have to change that, and have two log files, one machine processable and one human readable. Triple bleah. I have not suggested anything that would break the machine readability. You seem to think that making the machine readable output remotely human friendly is somehow going to detract from its machine processability. But that's just silly, frankly. I do not want and should not have to choose between a format that is machine readable and one that is to some extent human readable. cheers andrew
On Wed, 2009-08-12 at 17:41 +0200, Andrew Dunstan wrote: > > Csaba Nagy wrote: > > Then why you bother calling it "machine readable" at all ? Would you > > really read your auto-explain output on the DB server ? I doubt that's > > the common usage scenario, I would expect that most people would let a > > tool extract/summarize it and definitely process it somewhere else than > > on the DB machine, with the proper tool set. > Sure I would. I look at log files almost every day to find out things. > Why should I have to wade through a pile of utterly unreadable crap to > find it? I look at log files every day too (not almost, but literally every day), but I really can't imagine myself looking at an explain output directly in the log file. Even with the output of an 8.2 server which has less detail than I think the new formats have, I always copy the text from the psql prompt to some friendlier tool where I can play around with it, delete parts of it if needed for the sake of clear overview and where I can easily switch between line-wrap or not and such. I simply don't believe that a remotely human presentation of the thing worths the slightest compromise in machine readability. That said, I would like to finish this discussion here because it gets pointless, I agree to let us disagree :-) > Auto-explain lets you have *one* output format. To follow your approach, > I will have to change that, and have two log files, one machine > processable and one human readable. Triple bleah. By ad-hoc I didn't mean reading the auto-explain log, that's surely no ad-hoc operation... I would make that a mandatory daily routine which is better handled by a tool which serves you directly the plans of the worst performing queries sorted by runtime for example and highlighting the obvious planner mis-estimates. By ad-hoc I mean a query you examine on the psql command line, and there you can expect human readable of course without considerations to machine readability. > I have not suggested anything that would break the machine readability. > You seem to think that making the machine readable output remotely human > friendly is somehow going to detract from its machine processability. > But that's just silly, frankly. I do not want and should not have to > choose between a format that is machine readable and one that is to some > extent human readable. OK, if you can do that it's fine... but I'm afraid that will lead to compromises on the machine readability side and will only delay the whole thing. Cheers, Csaba.
Csaba Nagy wrote: > On Wed, 2009-08-12 at 17:11 +0200, Andrew Dunstan wrote: > >> That will just make things worse. And it will break if the XML includes >> any expression that contains a line break. >> > > Then escape the expressions using CDATA or such... I'm sure it would be > possible to make sure it's one line and rely on that. That's part of > being machine readable, being able to rely on getting it at all without > too much parsing magic... > > > Well, the right solution would actually be NOT to use CDATA but to replace a literal linefeed with the XML numeric escape , but I really don't think it's necessary. The extraction tools will be simple whether or not we put everything on one line. Assuming we adopt Alvaro's suggestion of an <auto-explain> root, here's how it would work without putting everything on one line: { echo "<explain-root>"; sed -n '!<auto-explain>!,!</auto-explain>!p' logfile ; echo "</explain-root>"; } > explain-plans.xml Putting everything on one line, this becomes: { echo "<explain-root>"; sed -n '!<auto-explain>!p' logfile ; echo "</explain-root>"; } > explain-plans.xml Not very hard either way, is it? cheers andrew
On Wed, 2009-08-12 at 18:07 +0200, Andrew Dunstan wrote: > > Csaba Nagy wrote: > > On Wed, 2009-08-12 at 17:11 +0200, Andrew Dunstan wrote: > Well, the right solution would actually be NOT to use CDATA but to > replace a literal linefeed with the XML numeric escape , but I > really don't think it's necessary. Yes you're right, with the extraction too... > The extraction tools will be simple whether or not we put everything on > one line. Well I normally avoid anything containing caffeine, and today I had my first coffee of the year and 2 cokes too. I would say ignore the whole discussion of on the grounds of abnormal hyperactivity on my part... Cheers, Csaba.
On Tue, 11 Aug 2009, Mike wrote: > Have any tool authors stepped up and committed resources to utilizing > this feature in the near term? Even before the easier to read format was available, there were already multiple EXPLAIN analysis tools floating around, some of them web-based like you're considering; a list is at http://wiki.postgresql.org/wiki/Using_EXPLAIN You might expect some of those tool authors would do the appropriate overhaul to import the new format data, and perhaps make things more powerful or simple in the process. You might want to collaborate within someone writing one of those existing applications rather than start over on your own. > The reason I would like to provide this tool in a web-based form is that > no additional software installation would be necessary for the user, > reducing any hurdles to using it to zero. I personally hate only having a web-based tool for this style of application, because I'm always dealing with data I can't paste into somebody else's site for EXPLAIN output--that's a common hurdle that's impossible to clear given all the regulatory and business secret restrictions people work under nowadays. Even when the source code is available for the web app, that puts you back to needing to install the tool locally, and I've found web apps tend to be more complicated to get running than a typical standalone app. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD