Re: UTF-8 docs - Mailing list pgsql-docs

From Jürgen Purtz
Subject Re: UTF-8 docs
Date
Msg-id a65e7fdf-c7a9-6106-307d-2fab50981c74@purtz.de
Whole thread Raw
In response to Re: UTF-8 docs  (Tatsuo Ishii <ishii@sraoss.co.jp>)
Responses Re: UTF-8 docs
List pgsql-docs

In the previous mails we have seen some statements concerning the source format of postgres' documentation and other statements to formats which are derived from it. In the following I'm only speaking about the original format. Premised this, I want to second Victor Wagner, who wrote on pgsql-hackers:

> Really, what change we need, it is conversion from SGML to XML format.
> It would solve some real problems, such as ability to include diagrams
> in the docs, and also let everyone to explicitely specify encoding in
> XML declaration (and probably cause switch to UTF-8 as side effect,
> because most XML-based tools use UTF-8 as default).

The real fundamental step is the switch from SGML to XML. He consists not only in a change of the markup format (omittag, shorttag). We must also replace SGML tools for parsing, validating and generating diverse output formats like HTML or PDF with modern XML tools. And we need additional XSLT steps or modifications of the CSS files to replace the DSSSL scripts. This work is in progress.

After we got rid of all SGML related parts we can profit from the actual XML tools and standards, eg.:

- Docbook itself is moving from 4.x to 5.x on the basis of XML. (Actually I don't recommend this additional step because of some incompatibilities in the migration to 5.x, see: https://lists.oasis-open.org/archives/docbook/201606/msg00007.html )

- The common attribute "xml:lang" for translations

- Extensions like XInclude, SVG, MathML, ...

- ...





On 23.08.2016 00:51, Tatsuo Ishii wrote:
From: Alexander Law <exclusion@gmail.com>
Subject: UTF-8 docs
Date: Mon, 22 Aug 2016 16:36:14 +0300
Message-ID: <7fbf2e80-9507-0521-d0e9-913ab81a58df@gmail.com>

Hello,
I've just seen a discussion about docs endoding in pgsql-hackers.

https://www.postgresql.org/message-id/20160822.141645.655870136709055853.t-ishii%40sraoss.co.jp
Can we continue the discussion in this mailing list?
We (at Postgres Pro) have developed the whole build chain (with
support for l10n) so we can just share it.
I have been just subscribed to the pgsql-docs list.
Here is the last conversation with Peter at pgsql-hackers.

On 8/22/16 9:32 AM, Tatsuo Ishii wrote:
I don't know what kind of problem you are seeing with encoding
handling, but at least UTF-8 is working for Japanese, French and
Russian.
Those translations are using DocBook XML.
But in the mean time I can create UTF-8 HTML files like this:

make html
[snip]
/bin/mkdir -p html
SP_CHARSET_FIXED=1 SP_ENCODING=UTF-8 openjade  -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D . -c /usr/share/sgml/docbook/stylesheet/dsssl/modular/catalog -d stylesheet.dsl -t sgml -i output-html -i include-index postgres.sgml

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



pgsql-docs by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: UTF-8 docs
Next
From: Alexander Law
Date:
Subject: Re: Docbook 5.x