On ons, 2011-08-24 at 23:28 -0400, Josh Kupershmidt wrote:
> I found myself rewriting the ./src/tools/find_gt_lt script in Perl
> this evening, since the existing script was quite broken (the main
> problem is it's not capable of understanding CDATA or sgml comment
> sections, and hence produces a bunch of noise).
>
> The rewritten version picked up a few stylistic inconsistencies in the
> SGML, such as:
> * breaking the trailing '>' of an SGML marker across lines. AFAIK
> this is legal, but is a bit inconsistent and just confuses simplistic
> tools like find_gt_lt
The cases you show don't appear to be terribly useful, but I think on
occasion this can be necessary to work around some arcane whitespace
rules in SGML or XML. (Just look at the generated HTML; it uses this
technique throughout.)
> * using single quotes instead of double quotes to surround a node
> attribute, as in <orderedlist numeration='loweralpha'>
It would be better if the tool could handle that, because sometimes you
want to use single quotes if the value contains double quotes.
> as well as seemingly-invalid SGML, such as using '>' unescaped inside
> normal SGML entries.
Unescaped > is valid, AFAIK.