Thread: Re: [BUGS] Invalid YAML output from EXPLAIN
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 Dean Rasheed wrote: ... > So the current code in escape_yaml() is inadequate for producing valid > YAML. I think it would have to also consider at least the following > characters as special "-" ":" "[" "]" "{" "}" "," "\"" "'" > "|" "*" "&". Technically, it would also need to trap empty strings, > and strings with leading or trailing whitespace. > > Making escape_yaml() completely bulletproof with this approach would > be quite difficult, and (IMO) not worth the effort ... Doesn't seem like a lot of effort to me. You've already laid out most of the exceptions above, although they require a few tweaks. The rules should be: Requires quoting only if the first character:& * ! | > ' " % @ ` # Same as above, but no quoting if the second character is "safe":- ? : Always requires quoting:":<space>" "<space>#" aka ': ' ' #' Always requires quoting:, [ ] { } Always require quoting:(leading space) (trailing space) (empty string) See: http://yaml.org/spec/1.2/spec.html section 5.3 and 7.3.3 - -- Greg Sabino Mullane greg@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201006070943 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAkwM+wAACgkQvJuQZxSWSsgWZACcCgb0rDvA6ZVhHId/q568gBGo sjgAoLY7HbkI7sRpO45vi0jSRJ2Fiytk =v7T/ -----END PGP SIGNATURE-----
"Greg Sabino Mullane" <greg@turnstep.com> writes: > The rules should be: > Requires quoting only if the first character: > & * ! | > ' " % @ ` # > Same as above, but no quoting if the second character is "safe": > - ? : > Always requires quoting: > ":<space>" "<space>#" aka ': ' ' #' > Always requires quoting: > , [ ] { } > Always require quoting: > (leading space) (trailing space) (empty string) Egad ... this is supposed to be an easily machine-generatable format? If it's really as broken as the above suggests, I think we should rip it out while we still can. regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 Tom Lane wrote: ... > Egad ... this is supposed to be an easily machine-generatable format? > > If it's really as broken as the above suggests, I think we should > rip it out while we still can. Heh ... not like you to shrink from a challenge. ;) I don't think the above would be particularly hard to implement myself, but if it becomes a really big deal, we can certainly punt by simply quoting anything containing an indicator (the special characters above). It will still be 100% valid YAML, just with some excess quoting for the very rare case when a value contains one of the special characters. - -- Greg Sabino Mullane greg@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201006071035 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAkwNA+4ACgkQvJuQZxSWSshSswCg81kd3FdYnQup1eLWGesm+vm+ VO8AoL1Fwil/vXfRdRHx4A4zZUTDbZuT =oPDv -----END PGP SIGNATURE-----
On Mon, Jun 7, 2010 at 10:37 AM, Greg Sabino Mullane <greg@turnstep.com> wrote: > Tom Lane wrote: > I don't think the above would be particularly hard to implement myself, > but if it becomes a really big deal, we can certainly punt by simply > quoting anything containing an indicator (the special characters above). > It will still be 100% valid YAML, just with some excess quoting for the > very rare case when a value contains one of the special characters. Since you're the main advocate of this feature, I think you should implement it rather than leaving it to Tom or I. The reason why I was initially skeptical of adding a YAML output format is that JSON is a subset of YAML. Therefore, the JSON output format ought to be perfectly sufficient for anyone using a YAML parser. If it's not, that's because their YAML processor is broken, and they should get a new one, or because the YAML spec is defective. The YAML format got voted in by consensus because people thought that it would also make a nice alternative to the text format for human readable output. I don't believe that (it uses way too much vertical space) but even if you accept the argument, the more we make the YAML format look like the JSON format, the less water that argument holds. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas wrote: > On Mon, Jun 7, 2010 at 10:37 AM, Greg Sabino Mullane <greg@turnstep.com> wrote: > >> Tom Lane wrote: >> I don't think the above would be particularly hard to implement myself, >> but if it becomes a really big deal, we can certainly punt by simply >> quoting anything containing an indicator (the special characters above). >> It will still be 100% valid YAML, just with some excess quoting for the >> very rare case when a value contains one of the special characters. >> > > Since you're the main advocate of this feature, I think you should > implement it rather than leaving it to Tom or I. > Or anyone else :-) > The reason why I was initially skeptical of adding a YAML output > format is that JSON is a subset of YAML. Therefore, the JSON output > format ought to be perfectly sufficient for anyone using a YAML > parser. > There is some debate on this point, IIRC. cheers andrew
"Greg Sabino Mullane" <greg@turnstep.com> writes: > I don't think the above would be particularly hard to implement myself, > but if it becomes a really big deal, we can certainly punt by simply > quoting anything containing an indicator (the special characters above). I would go with that. The quoting rules you proposed previously seem way too complicated --- meaning potentially buggy, and even if they're not buggy, the behavior would seem unpredictable to most users. regards, tom lane
On 7 June 2010 15:56, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Greg Sabino Mullane" <greg@turnstep.com> writes: >> I don't think the above would be particularly hard to implement myself, >> but if it becomes a really big deal, we can certainly punt by simply >> quoting anything containing an indicator (the special characters above). > > I would go with that. The quoting rules you proposed previously seem > way too complicated --- meaning potentially buggy, and even if they're > not buggy, the behavior would seem unpredictable to most users. > Well actually it's not just everything containing a special character, it's also anything with leading or trailing whitespace, and empty strings (not sure that can ever happen in practice). It's because of the potential for bugs in this area, that I'd propose just quoting everything (except numeric values) as in my original patch. Regards, Dean
* Tom Lane: > Egad ... this is supposed to be an easily machine-generatable format? Perhaps you could surround all strings with "" in the generator, and escape all potentially special characters (which seems to include some whitespace even in quoted strings, unfortunately)? It has been claimed before that YAML is a superset of JSON, so why can't the YAML folks use the existing JSON output instead? -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99
Florian Weimer wrote: > It has been claimed before that YAML is a superset of JSON, so why > can't the YAML folks use the existing JSON output instead? > Because JSON just crosses the line where it feels like there's so much markup that people expect a tool is necessary to read it, which has always been the issue with XML too--bad human readability. I was on the fence about YAML until I used it for a client issue over the weekend. I was able to hack together a quick tool to work on the issue that parsed enough YAML *without using an external library* well enough for my purposes in an hour, one that was still far more robust than a similar hack trying to read plain old text format for EXPLAIN. And the client was able to follow what was going on as I passed YAML output back and forth with them. Just having every field labeled clearly cut off all the usual "which of these is the startup cost again?" questions I'm used to getting. The complaints about YAML taking up too much vertical space are understandable, but completely opposite of what I care about. I can e-mail a customer a YAML plan and it will survive to the other side and even in a reply back to me. Whereas any non-trivial text format one is guaranteed to utterly destroyed by line wrapping along the way. I think this thread could use a fresh example to remind anyone who hasn't played with the curent YAML format what it looks like. Here's one from a query against the Dell Store 2 database: EXPLAIN SELECT * FROM customers WHERE customerid>1000 ORDER BY zip; QUERY PLAN ---------- Sort (cost=4449.30..4496.80 rows=19000 width=268) Sort Key: zip -> Seq Scan on customers (cost=0.00..726.00 rows=19000width=268) Filter: (customerid > 1000) EXPLAIN (FORMAT YAML) SELECT * FROM customers WHERE customerid>1000 ORDER BY zip; QUERY PLAN -------------------------------------- Plan: + Node Type: Sort + StartupCost: 4449.30 + Total Cost: 4496.80 + Plan Rows: 19000 + Plan Width: 268 + Sort Key: + - zip + Plans: + - Node Type: Seq Scan + Parent Relationship: Outer + Relation Name: customers + Alias: customers + Startup Cost: 0.00 + Total Cost: 726.00 + Plan Rows: 19000 + Plan Width: 268 + Filter: (customerid > 1000) -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
Greg Smith <greg@2ndquadrant.com> writes: > The complaints about YAML taking up too much vertical space are > understandable, but completely opposite of what I care about. I can > e-mail a customer a YAML plan and it will survive to the other side and > even in a reply back to me. Whereas any non-trivial text format one is > guaranteed to utterly destroyed by line wrapping along the way. > I think this thread could use a fresh example to remind anyone who > hasn't played with the curent YAML format what it looks like. So? This doesn't look amazingly unlike the current JSON output, and to the extent that we have to add more quoting to it, it's going to look even more like the JSON output. Given the lack of any field separators other than newlines, I'm also finding myself extremely doubtful about the claim that it survives line-wrapping mutilations well. For instance this bit: - Node Type: Seq Scan Parent Relationship: Outer doesn't appear to have anything but whitespace to distinguish it from - Node Type: Seq Scan Parent Relationship: Outer regards, tom lane
> It's because of the potential for bugs in this area, that I'd propose > just quoting everything (except numeric values) as in my original > patch. I don't see a problem with this. I supported YAML output because I find it easier to read and copy&paste than the other outputs. This is still the case even with quoting. And it's not exactly a hugely intrusive patch. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Tom Lane wrote: > This doesn't look amazingly unlike the current JSON output, > and to the extent that we have to add more quoting to it, it's > going to look even more like the JSON output. > I don't know about that; here's the JSON one: EXPLAIN (FORMAT JSON) SELECT * FROM customers WHERE customerid>1000 ORDER BY zip; QUERY PLAN -------------------------------------------[ + { + "Plan": { + "Node Type": "Sort", + "Startup Cost": 4449.30, + "Total Cost": 4496.80, + "Plan Rows": 19000, + "Plan Width":268, + "Sort Key": ["zip"], + "Plans": [ + { + "Node Type": "Seq Scan", + "Parent Relationship": "Outer",+ "Relation Name": "customers", + "Alias": "customers", + "Startup Cost": 0.00, + "Total Cost": 726.00, + "Plan Rows": 19000, + "Plan Width": 268, + "Filter": "(customerid > 1000)"+ } + ] + } + } +] From the perspective of how that's less useful as a human form of output, it's longer, wider, and has redundant punctuation that gets in the way. I think that YAML quoting will need to respect one of the special cases to keep from ruining its readability: "Requires quoting only if the first character" for " will make its current format look terrible if that rule is applied to the whole line instead. That sounds like a necessary special case to include: don't quote any quote characters that appear unless they're the first character on the line. Everything else could switch back to really aggressive quoting in every spot and that wouldn't hurt the readability of the format very much IMHO. > Given the lack of any field separators other than newlines, I'm also > finding myself extremely doubtful about the claim that it survives > line-wrapping mutilations well. All I was claiming there is that the output is dramatically less wide than the standard text format of the same plan, and therefore far less likely to get nailed by a mail client that wraps at normal line widths. Agreed that once wrapping does occur, it has serious problems too. Here are the stats for this plan, leaving off the QUERY PLAN header from each: TEXT: 4 vertical, 69 horizontal YAML: 18 vertical, 36 horizontal JSON: 25 vertical, 43 horizontal XML[1]: 27 vertical, 60 horizontal Quote the TEXT line with "> " or get a plan with one more line of intendation, and you're likely to get wrapped badly at the 72 character line limit some clients use. Quite a bit more headroom before the YAML format will wrap like that; JSON is in the middle. I now see plenty of use for YAML when exchanging plans over e-mail, and it's a bonus that should survive that format to be parseable on the other side. JSON and XML are certainly the preferred way to feed plans into analysis tools. unambiguously. [1] Might as well make this a complete example: <explain xmlns="http://www.postgresql.org/2009/explain"> + <Query> + <Plan> + <Node-Type>Sort</Node-Type> + <Startup-Cost>4449.30</Startup-Cost> + <Total-Cost>4496.80</Total-Cost> + <Plan-Rows>19000</Plan-Rows> + <Plan-Width>268</Plan-Width> + <Sort-Key> + <Item>zip</Item> + </Sort-Key> + <Plans> + <Plan> + <Node-Type>Seq Scan</Node-Type> + <Parent-Relationship>Outer</Parent-Relationship>+ <Relation-Name>customers</Relation-Name> + <Alias>customers</Alias> + <Startup-Cost>0.00</Startup-Cost> + <Total-Cost>726.00</Total-Cost> + <Plan-Rows>19000</Plan-Rows> + <Plan-Width>268</Plan-Width> + <Filter>(customerid > 1000)</Filter> + </Plan> + </Plans> + </Plan> + </Query> +</explain> -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
* Greg Smith: > Florian Weimer wrote: >> It has been claimed before that YAML is a superset of JSON, so why >> can't the YAML folks use the existing JSON output instead? >> > > Because JSON just crosses the line where it feels like there's so much > markup that people expect a tool is necessary to read it, which has > always been the issue with XML too--bad human readability. But YAML is not human-readable. There are human-readable subsets of it, but the general serializers do not produce them, and specific serializers are difficult to get right (as we've seen). > EXPLAIN (FORMAT YAML) SELECT * FROM customers WHERE customerid>1000 > ORDER BY zip; > QUERY PLAN > ------------------------------------- > - Plan: + > Node Type: Sort + > Startup Cost: 4449.30 + > Total Cost: 4496.80 + > Plan Rows: 19000 + > Plan Width: 268 + > Sort Key: + > - zip + > Plans: + > - Node Type: Seq Scan + > Parent Relationship: Outer + > Relation Name: customers + > Alias: customers + > Startup Cost: 0.00 + > Total Cost: 726.00 + > Plan Rows: 19000 + > Plan Width: 268 + > Filter: (customerid > 1000) What does your parser do with this (equivalent but shorter) YAML output? - Plan: !!map &0 Node Type: Sort &1 Startup Cost: 4449.30 &2 Total Cost: 4496.80 &3 Plan Rows: &5 19000 &4 PlanWidth: &6 268 Sort Key: ["zip"] Plans: !!seq - *0: Seq Scan Parent Relationship: Outer Relation Name:&7 customers Alias: *7 *1: 0.00 *2: 726.00 *3: *5 *4: *6 Filter: (customerid > 1000) Looking at the spec, it's rather difficult to come up with a readable subset which can parsed easily and is general in the sense that it can express empty strings, strings with embedded newlines, and so on. YAML's rules for dealing with whitespace are fairly complex, but are probably needed to get a more compact notation than JSON. -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 > But YAML is not human-readable. There are human-readable subsets of > it, but the general serializers do not produce them, and specific > serializers are difficult to get right (as we've seen). No, it *is* human readable. Indeed, that's one of the things that differentiates it from JSON: readability is the main goal, whereas JSON's goals are different. The readablity necessarily makes the parsing rules more complex, but that's the implicit tradeoff. (Did you miss the part where the other Greg is sending explain plans via email?) > What does your parser do with this (equivalent but shorter) > YAML output? > > - Plan: !!map > &0 Node Type: Sort > &1 Startup Cost: 4449.30 > &2 Total Cost: 4496.80 > &3 Plan Rows: &5 19000 > &4 Plan Width: &6 268 > Sort Key: ["zip"] > Plans: !!seq > - *0: Seq Scan > Parent Relationship: Outer > Relation Name: &7 customers > Alias: *7 > *1: 0.00 > *2: 726.00 > *3: *5 > *4: *6 > Filter: (customerid > 1000) But we're not using alias nodes (nor would we ever want to), so I'm not sure what the point of your contrived example is. That's shorter, but certainly not easier to read by human /or/ machine. > Looking at the spec, it's rather difficult to come up with a readable > subset which can parsed easily and is general in the sense that it can > express empty strings, strings with embedded newlines, and so on. > YAML's rules for dealing with whitespace are fairly complex, but are > probably needed to get a more compact notation than JSON. I'll state that both embedded newlines and column names and values with funny characters like '*' and '|' are rare events, and the great majority of things you'll see in an explain plan are plain ol' ASCII, in which YAML produces a very good representation. But you are right that we need to make sure we are handling the whitespace correctly. When I get some free time, I'll make a patch to implement as much of the spec as we sanely can. As I said before, I don't think we need to strive for putting everything we possibly can into "plain scalar" objects, as we can cover 99% of the cases easy enough and fall back to 'when in doubt, quote' for the rest. - -- Greg Sabino Mullane greg@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201006080931 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAkwOR2gACgkQvJuQZxSWSshkVwCgzqunUkawnBRGwOV8msQPudN8 UmkAoM1wz+wFCEz34CMJ7VH+S7T3mc43 =8OjG -----END PGP SIGNATURE-----
On Tue, Jun 8, 2010 at 9:37 AM, Greg Sabino Mullane <greg@turnstep.com> wrote: > When I get some free time, I'll make a patch to implement as much of > the spec as we sanely can. Saying that you'll fix it but not on any particular timetable is basically equivalent to saying that you're not willing to fix it at all. We are trying to get a release out the door. I'm not trying to be rude, but it's frustrating to me when people object to having their code ripped out but also won't commit to getting it fixed in a timely fashion. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 Robert Haas wrote: >> When I get some free time, I'll make a patch to implement as >> much of the spec as we sanely can. > Saying that you'll fix it but not on any particular timetable is > basically equivalent to saying that you're not willing to fix it at > all. It's not equivalent at all. If I wasn't willing to fix it all, I'd say so. > We are trying to get a release out the door. I'm not trying to > be rude, but it's frustrating to me when people object to having their > code ripped out but also won't commit to getting it fixed in a timely > fashion. You might not be trying, but you are coming across as quite rude. The bug was only reported Monday morning, and you are yelling at me on a Tuesday night for not being willing to drop everything I'm doing and fix it right now? Yes, we're heading towards 9.0 and yes, I'd sure hate to see YAML ripped out (especially now that it's been listed near and far as one of our new features), but I've got bills to pay and writing a patch is a volunteer effort for me. Since you seem so keen on telling other people what they should be doing, here's some of your own medicine: why not focus on something other than YAML, which myself and many other people can write, and work more on the 9.0 open issues that your energy and expertise would be more suited for? - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 201006091156 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAkwPuawACgkQvJuQZxSWSshqSwCgyUoNhi8r/ug/joERXJfJF4mu 3h4AoOtLUHWcN3udePN1Ne2jc+gBa/uS =OtxW -----END PGP SIGNATURE-----
On Wed, Jun 9, 2010 at 11:57 AM, Greg Sabino Mullane <greg@turnstep.com> wrote: > The bug was only reported Monday morning, and you are yelling at me > on a Tuesday night for not being willing to drop everything I'm doing > and fix it right now? I am not saying and have not said that you needed to drop everything you were doing and fix it right now. Had you said, "I will get a patch for this out this week", that would have been fine with me. What I did and do object to is that your commitment to fix it was completely open-ended. From reading your email, there's no way for someone to know whether you'll get to this in two days or a month, and a month, at least in my opinion, is too long, maybe at any time but certainly at this point in the release cycle. > Since you seem so keen on telling other people what they should be > doing, here's some of your own medicine: why not focus on something > other than YAML, which myself and many other people can write, and > work more on the 9.0 open issues that your energy and expertise > would be more suited for? I think this comment is a little snide, but it deserves a serious response. I spent most of yesterday afternoon and evening working on every open item that I had a clue about, and another two hours this morning. Most of the remaining items are either things that I am not qualified to fix or for which there are currently patches out for comment. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company