Home > mailing lists

XML: Single root element - Mailing list pgsql-docs

From	Jürgen Purtz
Subject	XML: Single root element
Date	January 30, 2019 10:34:12
Msg-id	f8b59177-1251-8813-541f-73383aa744f5@purtz.de Whole thread Raw
List	pgsql-docs

Tree view

Some time ago we upgraded our documentation from SGML to XML in a huge 
step. Most of the resulting files are well-formed - but not all. The 
well-formed criteria is violated by such files which contains more than 
one root element. You can locate such files with the command:

xmllint --noout *.sgml ref/*.sgml 2> >(grep Extra)

Actually this is not a serious problem. But for further XML processing 
(parsing, Docbook upgrade to version 5.x, use of an XML-editor, 
xinclude, xpath, namespaces, ... ) it is necessary - or at least very 
helpful - to change the content of every single file in a manual step to 
a *well-formed* XML file, especially with one single root element. The 
attached patch results from applying different strategies to achieve 
this aim.

Strategy 1: Move the element of the outer file where the 'calling' 
entity resides to the included file as an additional top-level element. 
Example 'legal.sgml':

Actual situation
================
postgres.sgml:
<book id="postgres">
  <title>PostgreSQL &version; Documentation</title>

  <bookinfo>  <corpauthor>The PostgreSQL Global Development 
Group</corpauthor>
   <productname>PostgreSQL</productname>
   <productnumber>&version;</productnumber>
   &legal;
  </bookinfo>
  ...


legal.sgml:
<date>2019</date>

<copyright>
  <year>1996-2019</year>
  <holder>The PostgreSQL Global Development Group</holder>
</copyright>

<legalnotice id="legalnotice">
...
</legalnotice>
-- End of File --

New situation
=============
postgres.sgml:
<book id="postgres">
  <title>PostgreSQL &version; Documentation</title>

  &legal;
  ...


legal.sgml:
<bookinfo>
  <corpauthor>The PostgreSQL Global Development Group</corpauthor>
  <productname>PostgreSQL</productname>
  <productnumber>&version;</productnumber>
  <date>2019</date>

  <copyright>
   <year>1996-2019</year>
   <holder>The PostgreSQL Global Development Group</holder>
  </copyright>

  <legalnotice id="legalnotice">
  ...
  </legalnotice>
</bookinfo>
-- End of File --

Some single files are changed but the intermediate file (respectively 
the main memory) after resolving all entities keeps unchanged. This file 
resp. main memory is the basis for all further steps like validation or 
output generation.

Strategy 2: The files of the release notes consists of many 
sect1-elements at the top level. To overcome this situation one can try 
to change sect1 to sect2, sect2 to sect3, ... and use a new sect1 
element as a cramp over the complete file. The chain of sect<n> sections 
is limited to 5 levels - and in some cases we use all of them. Therefore 
it's necessary to change the mark-up from sect<n>-elements to 
section-elements, which can be used recursively without limits.
This strategy leads to changes in the visual representation of the TOC, 
because every title-element shifts one level down. (In my opinion this 
is an improvement because a: after clicking to 'Release Notes' we 
actually have 372 items plus their sub-items. This will be reduced to 
one item per major release: 11, 10, 9.6, 9.5, ... and b: the 
acknowledgement-element is shown - as intended - per complete major 
release, not only with the very first version of a release.) Furthermore 
we have exactly one HTML file per major release for the standard HTML 
output.

Strategy 3: Split huge files into smaller files (contrib, xfunc) and/or 
shift some sections to the calling file. From the perspective of a git 
user or someone, who translates the documentation to a different 
language, this is not funny but I hope that it will be accepted.


PS_1: For tests don't forget the Make-target 'errcodes-table.sgml'
PS_2: The remaining files version.sgml, filelist.sgml and 
ref/allfiles.sgml, which contains nothing but entity definitions, will 
possibly change or get superfluous with the migration to Docbook 5.x.

Kind regards
Jürgen Purtz

Attachment

XmlWellFormed.patch

pgsql-docs by date:

From: Ioseph Kim
Date: 28 January 2019, 16:54:24
Subject: Re: patch earthdistance.sgml (add geo_distance function description)

From: PG Doc comments form
Date: 31 January 2019, 04:55:20
Subject: Not working

XML: Single root element - Mailing list pgsql-docs

Attachment

Previous

Next