XML: Single root element - Mailing list pgsql-docs
From | Jürgen Purtz |
---|---|
Subject | XML: Single root element |
Date | |
Msg-id | f8b59177-1251-8813-541f-73383aa744f5@purtz.de Whole thread Raw |
List | pgsql-docs |
Some time ago we upgraded our documentation from SGML to XML in a huge step. Most of the resulting files are well-formed - but not all. The well-formed criteria is violated by such files which contains more than one root element. You can locate such files with the command: xmllint --noout *.sgml ref/*.sgml 2> >(grep Extra) Actually this is not a serious problem. But for further XML processing (parsing, Docbook upgrade to version 5.x, use of an XML-editor, xinclude, xpath, namespaces, ... ) it is necessary - or at least very helpful - to change the content of every single file in a manual step to a *well-formed* XML file, especially with one single root element. The attached patch results from applying different strategies to achieve this aim. Strategy 1: Move the element of the outer file where the 'calling' entity resides to the included file as an additional top-level element. Example 'legal.sgml': Actual situation ================ postgres.sgml: <book id="postgres"> <title>PostgreSQL &version; Documentation</title> <bookinfo> <corpauthor>The PostgreSQL Global Development Group</corpauthor> <productname>PostgreSQL</productname> <productnumber>&version;</productnumber> &legal; </bookinfo> ... legal.sgml: <date>2019</date> <copyright> <year>1996-2019</year> <holder>The PostgreSQL Global Development Group</holder> </copyright> <legalnotice id="legalnotice"> ... </legalnotice> -- End of File -- New situation ============= postgres.sgml: <book id="postgres"> <title>PostgreSQL &version; Documentation</title> &legal; ... legal.sgml: <bookinfo> <corpauthor>The PostgreSQL Global Development Group</corpauthor> <productname>PostgreSQL</productname> <productnumber>&version;</productnumber> <date>2019</date> <copyright> <year>1996-2019</year> <holder>The PostgreSQL Global Development Group</holder> </copyright> <legalnotice id="legalnotice"> ... </legalnotice> </bookinfo> -- End of File -- Some single files are changed but the intermediate file (respectively the main memory) after resolving all entities keeps unchanged. This file resp. main memory is the basis for all further steps like validation or output generation. Strategy 2: The files of the release notes consists of many sect1-elements at the top level. To overcome this situation one can try to change sect1 to sect2, sect2 to sect3, ... and use a new sect1 element as a cramp over the complete file. The chain of sect<n> sections is limited to 5 levels - and in some cases we use all of them. Therefore it's necessary to change the mark-up from sect<n>-elements to section-elements, which can be used recursively without limits. This strategy leads to changes in the visual representation of the TOC, because every title-element shifts one level down. (In my opinion this is an improvement because a: after clicking to 'Release Notes' we actually have 372 items plus their sub-items. This will be reduced to one item per major release: 11, 10, 9.6, 9.5, ... and b: the acknowledgement-element is shown - as intended - per complete major release, not only with the very first version of a release.) Furthermore we have exactly one HTML file per major release for the standard HTML output. Strategy 3: Split huge files into smaller files (contrib, xfunc) and/or shift some sections to the calling file. From the perspective of a git user or someone, who translates the documentation to a different language, this is not funny but I hope that it will be accepted. PS_1: For tests don't forget the Make-target 'errcodes-table.sgml' PS_2: The remaining files version.sgml, filelist.sgml and ref/allfiles.sgml, which contains nothing but entity definitions, will possibly change or get superfluous with the migration to Docbook 5.x. Kind regards Jürgen Purtz
Attachment
pgsql-docs by date: