Re: Doc: typo in config.sgml - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Doc: typo in config.sgml
Date
Msg-id Zzur1CWDmDe6lroJ@momjian.us
Whole thread Raw
In response to Re: Doc: typo in config.sgml  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
On Mon, Nov 11, 2024 at 10:02:15PM +0900, Yugo Nagata wrote:
> On Tue, 5 Nov 2024 10:08:17 +0100
> Peter Eisentraut <peter@eisentraut.org> wrote:
> 
> 
> > >> So you convert LATIN1 characters to HTML entities so that it's easier
> > >> to detect non-LATIN1 characters is in the SGML docs? If my
> > >> understanding is correct, it can be also achieved by using some tools
> > >> like:
> > >>
> > >> iconv -t ISO-8859-1 -f UTF-8 release-17.sgml
> > >>
> > >> If there are some non-LATIN1 characters in release-17.sgml,
> > >> it will complain like:
> > >>
> > >> iconv: illegal input sequence at position 175
> > >>
> > >> An advantage of this is, we don't need to covert each LATIN1
> > >> characters to HTML entities and make the sgml file authors life a
> > >> little bit easier.
> 
> > I think the iconv approach is an idea worth checking out.
> > 
> > It's also not necessarily true that the set of characters provided by 
> > the built-in PDF fonts is exactly the set of characters in Latin 1.  It 
> > appears to be close enough, but I'm not sure, and I haven't found any 
> > authoritative information on that.  
> 
> I found a description in FAQ on Apache FOP [1] that explains some glyphs for
> Latin1 character set are not contained in the standard text fonts.
> 
>  The standard text fonts supplied with Acrobat Reader have mostly glyphs for
>  characters from the ISO Latin 1 character set. For a variety of reasons, even
>  those are not completely guaranteed to work, for example you can't use the fi
>  ligature from the standard serif font.

So, the failure of ligatures is caused usually by not using the right
Adobe Font Metric (AFM) file, I think.  I have seen faulty ligature
rendering in PDFs but was alway able to fix it by using the right AFM
file.  Odds are, failure is caused by using a standard Latin1 AFM file
and not the AFM file that matches the font being used.

> [1] https://xmlgraphics.apache.org/fop/faq.html#pdf-characters
> 
> However, it seems that using iconv to detect non-Latin1 characters may be still
> useful because these are likely not displayed in PDF. For example, we can do this
> in make check as the attached patch 0002. It cannot show the filname where one
> is found, though.

I was thinking something like:

    grep -l --recursive  -P '[\x80-\xFF]' . |
    while read FILE
    do  iconv -f UTF-8 -t ISO-8859-1 "$FILE" || exit 1
    done

This only checks files with non-ASCII characters.

> > Another approach for a fix would be 
> > to get FOP produce the required warnings or errors more reliably.  I 
> > know it has a bunch of logging settings (ultimately via log4j), so there 
> > might be some possibilities.
> 
> When a character that cannot be displayed in PDF is found, a warning
> "Glyph ... not available in font ...." is output in fop's log. We can
> prevent such characters from being contained in PDF by checking
> the message as the attached patch 0001. However, this is checked after
> the pdf is generated since I could not have an idea how to terminate the
> generation immediately when such character is detected.

So, are we sure this will be the message even for non-English users? I
thought checking for warning message text was too fragile.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"



pgsql-hackers by date:

Previous
From: Laurenz Albe
Date:
Subject: Re: PGSERVICEFILE as part of a normal connection string
Next
From: Tomas Vondra
Date:
Subject: Re: Showing applied extended statistics in explain Part 2