Thread: UTF8 for docs

UTF8 for docs

From
Bruce Momjian
Date:
Our release.sgml contains these lines, that I wrote:

        we cannot use UTF8 because SGML Docbook does not support it

        do not use numeric _UTF_ numeric character escapes (&#nnn;),
        we can only use Latin1

        Example: Alvaro Herrera is Álvaro Herrera

Should this be changed now that we are using XML for head?  It cannot be
changed for back branch releases since those are still SGML, so I
suggest we keep this restriction.  I have updated the doc comments.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


Re: UTF8 for docs

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Our release.sgml contains these lines, that I wrote:
>         we cannot use UTF8 because SGML Docbook does not support it
>         do not use numeric _UTF_ numeric character escapes (&#nnn;),
>         we can only use Latin1

> Should this be changed now that we are using XML for head?  It cannot be
> changed for back branch releases since those are still SGML, so I
> suggest we keep this restriction.  I have updated the doc comments.

I might be wrong, but I was under the impression that restricting the
character set was still a good idea because of downstream restrictions
on rendering of the docs.  For instance, pretty much every web browser
can render Latin1 characters, but I wouldn't bet on Klingon working.

Maybe we could go a little further than the standard named-entity
characters, but it'd take some research to figure out what is safe.

            regards, tom lane


Re: UTF8 for docs

From
Bruce Momjian
Date:
On Tue, May  1, 2018 at 10:14:48AM -0400, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Our release.sgml contains these lines, that I wrote:
> >         we cannot use UTF8 because SGML Docbook does not support it
> >         do not use numeric _UTF_ numeric character escapes (&#nnn;),
> >         we can only use Latin1
> 
> > Should this be changed now that we are using XML for head?  It cannot be
> > changed for back branch releases since those are still SGML, so I
> > suggest we keep this restriction.  I have updated the doc comments.
> 
> I might be wrong, but I was under the impression that restricting the
> character set was still a good idea because of downstream restrictions
> on rendering of the docs.  For instance, pretty much every web browser
> can render Latin1 characters, but I wouldn't bet on Klingon working.

Oh, uh, I was unclear if those SGML specifications were passed unchanged
into the output.

> Maybe we could go a little further than the standard named-entity
> characters, but it'd take some research to figure out what is safe.

Yeah.  I have added this as a doc comment so we don't forget.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +