Large SGML Cleanup - Mailing list pgsql-docs

From Josh Kupershmidt
Subject Large SGML Cleanup
Date
Msg-id AANLkTi=1Sm9N3Khiued9UiMfdd_TKLimMiO9mCfHtL39@mail.gmail.com
Whole thread Raw
Responses Re: Large SGML Cleanup  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Large SGML Cleanup  (Alvaro Herrera <alvherre@commandprompt.com>)
Re: Large SGML Cleanup  (Peter Eisentraut <peter_e@gmx.net>)
List pgsql-docs
[Resending without large attachment, looks like the previous attempt
isn't going to make it]

Hi all,

I've gone through the SGML documentation, trying to push the output
HTML towards HTML 4.01 compliance. By far the most common problem I
found was incorrect nesting of <para> nodes, which results in invalid
HTML.

A common idiom I encountered was SGML like this:

<para>
...
 <simplelist>
  ...
 </simplelist>
...
</para>

This SGML would then produce HTML which looked like this:

<p>
...
   <table>
   ...
   </table>
...
</p>

This HTML fails validation, as one isn't supposed to be stuffing
tables inside <p> nodes. The attached patch fixes all the instances of
this I could find, by closing out <para> nodes before beginning lists
and tables.

I used the w3c-markup-validator package and the web service at
validator.w3.org to test HTML validity. A handy Perl package I found
for this was WebService::Validator, which includes the example script
"validate_files_in_dir.pl" to easily validate a directory full of html
files. With this patch, the number of invalid HTML files has been
reduced to 16 from many dozens.

Patch at:
http://kupershmidt.org/pg/sgml_fixup.patch.gz

Josh

pgsql-docs by date:

Previous
From: Katharina kuhn
Date:
Subject: Re: CREATE CUSTOM TEXT SEARCH PARSER
Next
From: Tom Lane
Date:
Subject: Re: Large SGML Cleanup