Re: Doc: typo in config.sgml - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: Doc: typo in config.sgml |
Date | |
Msg-id | Zzur1CWDmDe6lroJ@momjian.us Whole thread Raw |
In response to | Re: Doc: typo in config.sgml (Bruce Momjian <bruce@momjian.us>) |
List | pgsql-hackers |
On Mon, Nov 11, 2024 at 10:02:15PM +0900, Yugo Nagata wrote: > On Tue, 5 Nov 2024 10:08:17 +0100 > Peter Eisentraut <peter@eisentraut.org> wrote: > > > > >> So you convert LATIN1 characters to HTML entities so that it's easier > > >> to detect non-LATIN1 characters is in the SGML docs? If my > > >> understanding is correct, it can be also achieved by using some tools > > >> like: > > >> > > >> iconv -t ISO-8859-1 -f UTF-8 release-17.sgml > > >> > > >> If there are some non-LATIN1 characters in release-17.sgml, > > >> it will complain like: > > >> > > >> iconv: illegal input sequence at position 175 > > >> > > >> An advantage of this is, we don't need to covert each LATIN1 > > >> characters to HTML entities and make the sgml file authors life a > > >> little bit easier. > > > I think the iconv approach is an idea worth checking out. > > > > It's also not necessarily true that the set of characters provided by > > the built-in PDF fonts is exactly the set of characters in Latin 1. It > > appears to be close enough, but I'm not sure, and I haven't found any > > authoritative information on that. > > I found a description in FAQ on Apache FOP [1] that explains some glyphs for > Latin1 character set are not contained in the standard text fonts. > > The standard text fonts supplied with Acrobat Reader have mostly glyphs for > characters from the ISO Latin 1 character set. For a variety of reasons, even > those are not completely guaranteed to work, for example you can't use the fi > ligature from the standard serif font. So, the failure of ligatures is caused usually by not using the right Adobe Font Metric (AFM) file, I think. I have seen faulty ligature rendering in PDFs but was alway able to fix it by using the right AFM file. Odds are, failure is caused by using a standard Latin1 AFM file and not the AFM file that matches the font being used. > [1] https://xmlgraphics.apache.org/fop/faq.html#pdf-characters > > However, it seems that using iconv to detect non-Latin1 characters may be still > useful because these are likely not displayed in PDF. For example, we can do this > in make check as the attached patch 0002. It cannot show the filname where one > is found, though. I was thinking something like: grep -l --recursive -P '[\x80-\xFF]' . | while read FILE do iconv -f UTF-8 -t ISO-8859-1 "$FILE" || exit 1 done This only checks files with non-ASCII characters. > > Another approach for a fix would be > > to get FOP produce the required warnings or errors more reliably. I > > know it has a bunch of logging settings (ultimately via log4j), so there > > might be some possibilities. > > When a character that cannot be displayed in PDF is found, a warning > "Glyph ... not available in font ...." is output in fop's log. We can > prevent such characters from being contained in PDF by checking > the message as the attached patch 0001. However, this is checked after > the pdf is generated since I could not have an idea how to terminate the > generation immediately when such character is detected. So, are we sure this will be the message even for non-English users? I thought checking for warning message text was too fragile. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com When a patient asks the doctor, "Am I going to die?", he means "Am I going to die soon?"
pgsql-hackers by date: