Re: New Chinese FAQ - Mailing list pgsql-www

From Bruce Momjian
Subject Re: New Chinese FAQ
Date
Msg-id 200505170312.j4H3CRY25080@candle.pha.pa.us
Whole thread Raw
In response to Re: New Chinese FAQ  ("Magnus Hagander" <mha@sollentuna.net>)
Responses Re: New Chinese FAQ  (Bruce Momjian <pgman@candle.pha.pa.us>)
List pgsql-www
Magnus Hagander wrote:
> >> ok, I'll fix the html tag problem ASAP.
> >>
> >
> >I fixed the tag problem and it now verifies fine:
> >
> >
> >http://validator.w3.org/check?uri=http%3A%2F%2Fwwwmaster.postgr
> >esql.org%2Fdocs%2Ffaqs.FAQ_chinese.html&charset=gb2312+%28Chine
> se%2C+simplified%29
> >
> >The only problem reported is that it says the encoding is incorrect for
> >a large number of lines.  The above encoding forces it to be
> >gb2312.  If
> >I make it Unicode I get even more failures.  However, I remember iconv
> >doing the conversion to UTF8 just fine, so maybe something is
> >wrong with
> >how we are validating it.
>
> The output should be UTF8, and it should autodetect it. The output from
> the *website* should *not* validate as gb2312, because it is no longer
> in that encoding.
>
> The reason that's the only error you get may be that it doesn't validate
> the document because of encoding errors. So this doesn't prove (or
> disprove for that matter) that the tags are fixed.
>

Yes, I was using the doctype of HTML 4.0 when I tested, but when it was
on the web site only then was XHTML Traditional tested.

> >Anyway, the HTML is OK so it seems we just have encoding issue now.
> >The current version in CVS is all fixed up so please submit updates
> >based on that version.  Thanks.
>
> I'm sorry to say, but it's invalid characters in it again :-(
> On svr2:
> svr2# iconv -f gb2312 -t utf-8 FAQ_chinese.html >/dev/null
> iconv: FAQ_chinese.html: cannot convert
>
>
> On developer.pgadmin.org:
> mha@developer:~/ext/faqs$ iconv -f gb2312 -t utf-8 FAQ_chinese.html  -o
> /dev/null
> iconv: illegal input sequence at position 8182
>
>
> Could it be cvs that messes the encoding up? Can you mail me the file as
> you see it before you commit and I can see if that makes a difference?
>

The problem is that the document is clearly not XHTML, but when I use
htmltidy -raw -asxhtml to convert it to XHTML, it somehow messes up the
encodings and then iconv fails.  So, I either have to manually fix the
HTML file to be XHTML, or I have to figure out why htmltidy is changing
the encoded text even though I am using -raw.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

pgsql-www by date:

Previous
From: "Magnus Hagander"
Date:
Subject: Re: New Chinese FAQ
Next
From: Bruce Momjian
Date:
Subject: Re: New Chinese FAQ