On Tue, Nov 19, 2024 at 11:29:07AM +0900, Yugo NAGATA wrote:
> On Mon, 18 Nov 2024 16:04:20 -0500
> > So, the failure of ligatures is caused usually by not using the right
> > Adobe Font Metric (AFM) file, I think. I have seen faulty ligature
> > rendering in PDFs but was alway able to fix it by using the right AFM
> > file. Odds are, failure is caused by using a standard Latin1 AFM file
> > and not the AFM file that matches the font being used.
> >
> > > [1] https://xmlgraphics.apache.org/fop/faq.html#pdf-characters
> > >
> > > However, it seems that using iconv to detect non-Latin1 characters may be still
> > > useful because these are likely not displayed in PDF. For example, we can do this
> > > in make check as the attached patch 0002. It cannot show the filname where one
> > > is found, though.
> >
> > I was thinking something like:
> >
> > grep -l --recursive -P '[\x80-\xFF]' . |
> > while read FILE
> > do iconv -f UTF-8 -t ISO-8859-1 "$FILE" || exit 1
> > done
> >
> > This only checks files with non-ASCII characters.
>
> Checking non-latin1 after non-ASCII characters seems good idea.
> I attached a updated patch (0002) that uses perl instead of grep
> because non-GNU grep could not have escape sequences for hex.
Yes, good point.
> > So, are we sure this will be the message even for non-English users? I
> > thought checking for warning message text was too fragile.
>
> I am not sure whether fop has messages in non-English, although I've never
> seen Japanese messages output.
>
> I wonder we can get unified results if executed with LANG=C.
> The updated patch 0001 is fixed in this direction.
Yes, good idea.
> + @ ( $(PERL) -ne '/[\x80-\xFF]/ and `${ICONV} -t ISO-8859-1 -f UTF-8 "$$ARGV" 2>/dev/null` and
print("$$ARGV:$$_"),$$n++;END {exit($$n>0)}' \
I am thinking we should have -f before -t becaues it is from/to.
I like this approach.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
When a patient asks the doctor, "Am I going to die?", he means
"Am I going to die soon?"