Thread: UTF8 for docs
Our release.sgml contains these lines, that I wrote: we cannot use UTF8 because SGML Docbook does not support it do not use numeric _UTF_ numeric character escapes (nnn;), we can only use Latin1 Example: Alvaro Herrera is Álvaro Herrera Should this be changed now that we are using XML for head? It cannot be changed for back branch releases since those are still SGML, so I suggest we keep this restriction. I have updated the doc comments. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Bruce Momjian <bruce@momjian.us> writes: > Our release.sgml contains these lines, that I wrote: > we cannot use UTF8 because SGML Docbook does not support it > do not use numeric _UTF_ numeric character escapes (nnn;), > we can only use Latin1 > Should this be changed now that we are using XML for head? It cannot be > changed for back branch releases since those are still SGML, so I > suggest we keep this restriction. I have updated the doc comments. I might be wrong, but I was under the impression that restricting the character set was still a good idea because of downstream restrictions on rendering of the docs. For instance, pretty much every web browser can render Latin1 characters, but I wouldn't bet on Klingon working. Maybe we could go a little further than the standard named-entity characters, but it'd take some research to figure out what is safe. regards, tom lane
On Tue, May 1, 2018 at 10:14:48AM -0400, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > Our release.sgml contains these lines, that I wrote: > > we cannot use UTF8 because SGML Docbook does not support it > > do not use numeric _UTF_ numeric character escapes (nnn;), > > we can only use Latin1 > > > Should this be changed now that we are using XML for head? It cannot be > > changed for back branch releases since those are still SGML, so I > > suggest we keep this restriction. I have updated the doc comments. > > I might be wrong, but I was under the impression that restricting the > character set was still a good idea because of downstream restrictions > on rendering of the docs. For instance, pretty much every web browser > can render Latin1 characters, but I wouldn't bet on Klingon working. Oh, uh, I was unclear if those SGML specifications were passed unchanged into the output. > Maybe we could go a little further than the standard named-entity > characters, but it'd take some research to figure out what is safe. Yeah. I have added this as a doc comment so we don't forget. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +