Thread: Re: [BUGS] BUG #4279: Bad codepage in our web-site

Re: [BUGS] BUG #4279: Bad codepage in our web-site

From
Bruce Momjian
Date:
Vitold S wrote:
> 
> The following bug has been logged online:
> 
> Bug reference:      4279
> Logged by:          Vitold S
> Email address:      vit1251@mail.ru
> PostgreSQL version: 8.3.3
> Operating system:   Windows XP
> Description:        Bad codepage in our web-site
> Details: 
> 
> Page http://www.postgresql.org/docs/faqs.FAQ_russian.html contin in header
> information about this page in UTF-8, but page real save on koi8-r. Please
> or convert or change header. Unreadble page.

I can confirme this bug.  I can read the file in
CVS doc/src/FAQ/FAQ_russian.html just fine;  I see Cyrillic characters
in Firefox 3, and the encoding in the HTML says:
   <META http-equiv="Content-Type" content="text/html; charset=koi8-r">

But going to the FAQ on our web site:
http://www.postgresql.org/docs/faqs.FAQ_russian.html

shows only black diamonds for all characters.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: [BUGS] BUG #4279: Bad codepage in our web-site

From
Andrew Sullivan
Date:
On Fri, Aug 22, 2008 at 01:37:42PM -0400, Bruce Momjian wrote:
> I can confirme this bug.  I can read the file in
> CVS doc/src/FAQ/FAQ_russian.html just fine;  I see Cyrillic characters
> in Firefox 3, and the encoding in the HTML says:
> 
>     <META http-equiv="Content-Type" content="text/html; charset=koi8-r">

Not in the HTML I just looked at:
   <meta http-equiv="Content-Type" content="text/html; charset=utf-8"   />

Which is likely the problem.  Convert the content of the page to UTF-8
so that it can live in the web page environment.

A
- 
Andrew Sullivan
ajs@commandprompt.com
+1 503 667 4564 x104
http://www.commandprompt.com/


Re: [BUGS] BUG #4279: Bad codepage in our web-site

From
Bruce Momjian
Date:
Andrew Sullivan wrote:
> On Fri, Aug 22, 2008 at 01:37:42PM -0400, Bruce Momjian wrote:
> > I can confirme this bug.  I can read the file in
> > CVS doc/src/FAQ/FAQ_russian.html just fine;  I see Cyrillic characters
> > in Firefox 3, and the encoding in the HTML says:
> > 
> >     <META http-equiv="Content-Type" content="text/html; charset=koi8-r">
> 
> Not in the HTML I just looked at:
> 
>     <meta http-equiv="Content-Type" content="text/html; charset=utf-8"
>     />

You saw that in our CVS HTML file that is part of the source
distribution?  That is the master copy.

> Which is likely the problem.  Convert the content of the page to UTF-8
> so that it can live in the web page environment.

I see lots of FAQs in CVS that are not UTF8:
FAQ.html:    <META http-equiv="Content-Type" content="text/html; charset=US-ASCII">FAQ_MINGW.html:  <meta
http-equiv="content-type"FAQ_brazilian.html:   <META http-equiv="Content-Type" content="text/html;
charset=iso-8859-1">FAQ_czech.html:<metahttp-equiv="Content-Type" content="text/html; charset=utf-8"
/>FAQ_czech.html:<metahttp-equiv="Content-language" content="cs">FAQ_farsi.html:<META http-equiv=Content-Type
content="text/html;charset=utf-8"></HEAD>FAQ_french.html:    <META http-equiv="Content-Type" content="text/html;
charset=iso-8859-1">FAQ_german.html:<metahttp-equiv="Content-Type" content="text/html;
charset=UTF-8">FAQ_hungarian.html:http-equiv="content-type">FAQ_japanese.html:<META http-equiv="Content-Type"
content="text/html;charset=utf-8">FAQ_polish.html:         <meta http-equiv="Content-Type" content="text/html;
charset=iso-8859-2">FAQ_russian.html:   <META http-equiv="Content-Type" content="text/html;
charset=koi8-r">FAQ_turkish.html:      <meta http-equiv="Content-Type" content="text/html; charset=iso8859-9">
 

Do they all have to be converted?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: [BUGS] BUG #4279: Bad codepage in our web-site

From
Andrew Sullivan
Date:
On Fri, Aug 22, 2008 at 02:03:49PM -0400, Bruce Momjian wrote:
> >     <meta http-equiv="Content-Type" content="text/html; charset=utf-8"
> >     />
> 
> You saw that in our CVS HTML file that is part of the source
> distribution?  That is the master copy.

No, that's my point.  When I look at the web page source, it's UTF-8.
And since that's the only realistic way to have the framework document
around the FAQ, it seems reasonable to me.

> I see lots of FAQs in CVS that are not UTF8:
> 
>     FAQ.html:    <META http-equiv="Content-Type" content="text/html; charset=US-ASCII">

This doesn't matter, because US-ASCII is a proper subset of UTF-8.

>     FAQ_brazilian.html:    <META http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

This doesn't need to change because the bottom range of Unicode was made
the same as ISO 8859-1 in order to make the transition somewhat easier.

>     FAQ_polish.html:         <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-2">
>     FAQ_russian.html:    <META http-equiv="Content-Type" content="text/html; charset=koi8-r">
>     FAQ_turkish.html:       <meta http-equiv="Content-Type" content="text/html; charset=iso8859-9">
> 
> Do they all have to be converted?

Those three do, I expect.  They sure look like it on the page.  All
sorts of "?" in there.

A
-- 
Andrew Sullivan
ajs@commandprompt.com
+1 503 667 4564 x104
http://www.commandprompt.com/


Re: [BUGS] BUG #4279: Bad codepage in our web-site

From
Bruce Momjian
Date:
Andrew Sullivan wrote:
> On Fri, Aug 22, 2008 at 02:03:49PM -0400, Bruce Momjian wrote:
> > >     <meta http-equiv="Content-Type" content="text/html; charset=utf-8"
> > >     />
> > 
> > You saw that in our CVS HTML file that is part of the source
> > distribution?  That is the master copy.
> 
> No, that's my point.  When I look at the web page source, it's UTF-8.
> And since that's the only realistic way to have the framework document
> around the FAQ, it seems reasonable to me.
> 
> > I see lots of FAQs in CVS that are not UTF8:
> > 
> >     FAQ.html:    <META http-equiv="Content-Type" content="text/html; charset=US-ASCII">
> 
> This doesn't matter, because US-ASCII is a proper subset of UTF-8.
> 
> >     FAQ_brazilian.html:    <META http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
> 
> This doesn't need to change because the bottom range of Unicode was made
> the same as ISO 8859-1 in order to make the transition somewhat easier.
> 
> >     FAQ_polish.html:         <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-2">
> >     FAQ_russian.html:    <META http-equiv="Content-Type" content="text/html; charset=koi8-r">
> >     FAQ_turkish.html:       <meta http-equiv="Content-Type" content="text/html; charset=iso8859-9">
> > 
> > Do they all have to be converted?
> 
> Those three do, I expect.  They sure look like it on the page.  All
> sorts of "?" in there.

Well, something is converting them to UTF8 headings on our web site and
I thought the HTML encoding was converted at that stage.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: [BUGS] BUG #4279: Bad codepage in our web-site

From
Tino Wildenhain
Date:
Bruce Momjian wrote:
...
>>> Do they all have to be converted?
>> Those three do, I expect.  They sure look like it on the page.  All
>> sorts of "?" in there.
> 
> Well, something is converting them to UTF8 headings on our web site and
> I thought the HTML encoding was converted at that stage.

probably the server aka. the real content-type Header.
Putting them into the file again via meta tag really helps
only if you are loading it from filesystem. Of course
there should be no mismatch between server, meta and contents
encoding. Having everything in UTF8 should be preferred imho.

Regards
Tino


Re: [BUGS] BUG #4279: Bad codepage in our web-site

From
Peter Eisentraut
Date:
On Friday 22 August 2008 22:13:16 Andrew Sullivan wrote:
> >       FAQ_brazilian.html:    <META http-equiv="Content-Type"
> > content="text/html; charset=iso-8859-1">
>
> This doesn't need to change because the bottom range of Unicode was made
> the same as ISO 8859-1 in order to make the transition somewhat easier.

The bottom range of Unicode codepoints is the same as ISO 8859-1, but not the
bottom range of UTF-8 encoded bytes.


Re: [BUGS] BUG #4279: Bad codepage in our web-site

From
Magnus Hagander
Date:
Bruce Momjian wrote:
> I see lots of FAQs in CVS that are not UTF8:
> 
>     FAQ.html:    <META http-equiv="Content-Type" content="text/html; charset=US-ASCII">
>     FAQ_MINGW.html:  <meta http-equiv="content-type"
>     FAQ_brazilian.html:    <META http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
>     FAQ_czech.html:<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
>     FAQ_czech.html:<meta http-equiv="Content-language" content="cs">
>     FAQ_farsi.html:<META http-equiv=Content-Type content="text/html; charset=utf-8"></HEAD>
>     FAQ_french.html:    <META http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
>     FAQ_german.html:<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
>     FAQ_hungarian.html: http-equiv="content-type">
>     FAQ_japanese.html:<META http-equiv="Content-Type" content="text/html; charset=utf-8">
>     FAQ_polish.html:         <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-2">
>     FAQ_russian.html:    <META http-equiv="Content-Type" content="text/html; charset=koi8-r">
>     FAQ_turkish.html:       <meta http-equiv="Content-Type" content="text/html; charset=iso8859-9">
> 
> Do they all have to be converted?
> 

I wouldn't put too much work into that. Wouldn't it be better to migrate
them to the wiki instead?

//Magnus


Re: [BUGS] BUG #4279: Bad codepage in our web-site

From
Andrew Sullivan
Date:
On Sat, Aug 23, 2008 at 03:45:09PM +0300, Peter Eisentraut wrote:
> The bottom range of Unicode codepoints is the same as ISO 8859-1, but not the 
> bottom range of UTF-8 encoded bytes.

Oh, right, silly me.  Well, then, those'll need fixing, too.  

I'm not sure I see why this is such a problem.  It's obvious that if
you're going to support on-the-fly internationalisation on your site
(as postgresql.org does), then you need to pick the one encoding that
allows you to do that. 

A

-- 
Andrew Sullivan
ajs@commandprompt.com
+1 503 667 4564 x104
http://www.commandprompt.com/


Re: [BUGS] BUG #4279: Bad codepage in our web-site

From
Peter Eisentraut
Date:
On Monday 25 August 2008 11:00:12 Magnus Hagander wrote:
> I wouldn't put too much work into that. Wouldn't it be better to migrate
> them to the wiki instead?

I'm all in favor of the wiki, but consider that this is prominent user-facing 
stuff here.  If we put the FAQ in the wiki, why not the entire web site?  
Perhaps the web team should take over these files and apply the existing 
translation framework to it.


Re: [BUGS] BUG #4279: Bad codepage in our web-site

From
Magnus Hagander
Date:
Peter Eisentraut wrote:
> On Monday 25 August 2008 11:00:12 Magnus Hagander wrote:
>> I wouldn't put too much work into that. Wouldn't it be better to migrate
>> them to the wiki instead?
> 
> I'm all in favor of the wiki, but consider that this is prominent user-facing 
> stuff here.  If we put the FAQ in the wiki, why not the entire web site?  
> Perhaps the web team should take over these files and apply the existing 
> translation framework to it.

That we could certainly  do. At least if we can get pgweb set up to
allow specific permissions to update these things - I would like to rid
us of the bottleneck of having to write up an actual patch and send it
to a list, to have somebody else apply it, and only then realize it's
broken HTML etc. The translators should be able to do this on their own.

I don't think the current web team can just "take them over". Sure, they
could take over the part that Bruce does today which is apply patches
for people, but it doesn't solve the issue in itself.

//Magnus


Re: [BUGS] BUG #4279: Bad codepage in our web-site

From
Peter Eisentraut
Date:
Bruce Momjian wrote:
> Andrew Sullivan wrote:
>> On Fri, Aug 22, 2008 at 02:03:49PM -0400, Bruce Momjian wrote:
>>>>     <meta http-equiv="Content-Type" content="text/html; charset=utf-8"
>>>>     />
>>> You saw that in our CVS HTML file that is part of the source
>>> distribution?  That is the master copy.
>> No, that's my point.  When I look at the web page source, it's UTF-8.
>> And since that's the only realistic way to have the framework document
>> around the FAQ, it seems reasonable to me.
>>
>>> I see lots of FAQs in CVS that are not UTF8:
>>>
>>>     FAQ.html:    <META http-equiv="Content-Type" content="text/html; charset=US-ASCII">
>> This doesn't matter, because US-ASCII is a proper subset of UTF-8.
>>
>>>     FAQ_brazilian.html:    <META http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
>> This doesn't need to change because the bottom range of Unicode was made
>> the same as ISO 8859-1 in order to make the transition somewhat easier.
>>
>>>     FAQ_polish.html:         <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-2">
>>>     FAQ_russian.html:    <META http-equiv="Content-Type" content="text/html; charset=koi8-r">
>>>     FAQ_turkish.html:       <meta http-equiv="Content-Type" content="text/html; charset=iso8859-9">
>>>
>>> Do they all have to be converted?
>> Those three do, I expect.  They sure look like it on the page.  All
>> sorts of "?" in there.
> 
> Well, something is converting them to UTF8 headings on our web site and
> I thought the HTML encoding was converted at that stage.

Whatever process is taking the FAQ documents and integrating them with 
the web layout around it needs to run an encoding conversion.