Thread: Re: [BUGS] BUG #4279: Bad codepage in our web-site
Vitold S wrote: > > The following bug has been logged online: > > Bug reference: 4279 > Logged by: Vitold S > Email address: vit1251@mail.ru > PostgreSQL version: 8.3.3 > Operating system: Windows XP > Description: Bad codepage in our web-site > Details: > > Page http://www.postgresql.org/docs/faqs.FAQ_russian.html contin in header > information about this page in UTF-8, but page real save on koi8-r. Please > or convert or change header. Unreadble page. I can confirme this bug. I can read the file in CVS doc/src/FAQ/FAQ_russian.html just fine; I see Cyrillic characters in Firefox 3, and the encoding in the HTML says: <META http-equiv="Content-Type" content="text/html; charset=koi8-r"> But going to the FAQ on our web site: http://www.postgresql.org/docs/faqs.FAQ_russian.html shows only black diamonds for all characters. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Fri, Aug 22, 2008 at 01:37:42PM -0400, Bruce Momjian wrote: > I can confirme this bug. I can read the file in > CVS doc/src/FAQ/FAQ_russian.html just fine; I see Cyrillic characters > in Firefox 3, and the encoding in the HTML says: > > <META http-equiv="Content-Type" content="text/html; charset=koi8-r"> Not in the HTML I just looked at: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> Which is likely the problem. Convert the content of the page to UTF-8 so that it can live in the web page environment. A - Andrew Sullivan ajs@commandprompt.com +1 503 667 4564 x104 http://www.commandprompt.com/
Andrew Sullivan wrote: > On Fri, Aug 22, 2008 at 01:37:42PM -0400, Bruce Momjian wrote: > > I can confirme this bug. I can read the file in > > CVS doc/src/FAQ/FAQ_russian.html just fine; I see Cyrillic characters > > in Firefox 3, and the encoding in the HTML says: > > > > <META http-equiv="Content-Type" content="text/html; charset=koi8-r"> > > Not in the HTML I just looked at: > > <meta http-equiv="Content-Type" content="text/html; charset=utf-8" > /> You saw that in our CVS HTML file that is part of the source distribution? That is the master copy. > Which is likely the problem. Convert the content of the page to UTF-8 > so that it can live in the web page environment. I see lots of FAQs in CVS that are not UTF8: FAQ.html: <META http-equiv="Content-Type" content="text/html; charset=US-ASCII">FAQ_MINGW.html: <meta http-equiv="content-type"FAQ_brazilian.html: <META http-equiv="Content-Type" content="text/html; charset=iso-8859-1">FAQ_czech.html:<metahttp-equiv="Content-Type" content="text/html; charset=utf-8" />FAQ_czech.html:<metahttp-equiv="Content-language" content="cs">FAQ_farsi.html:<META http-equiv=Content-Type content="text/html;charset=utf-8"></HEAD>FAQ_french.html: <META http-equiv="Content-Type" content="text/html; charset=iso-8859-1">FAQ_german.html:<metahttp-equiv="Content-Type" content="text/html; charset=UTF-8">FAQ_hungarian.html:http-equiv="content-type">FAQ_japanese.html:<META http-equiv="Content-Type" content="text/html;charset=utf-8">FAQ_polish.html: <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-2">FAQ_russian.html: <META http-equiv="Content-Type" content="text/html; charset=koi8-r">FAQ_turkish.html: <meta http-equiv="Content-Type" content="text/html; charset=iso8859-9"> Do they all have to be converted? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Fri, Aug 22, 2008 at 02:03:49PM -0400, Bruce Momjian wrote: > > <meta http-equiv="Content-Type" content="text/html; charset=utf-8" > > /> > > You saw that in our CVS HTML file that is part of the source > distribution? That is the master copy. No, that's my point. When I look at the web page source, it's UTF-8. And since that's the only realistic way to have the framework document around the FAQ, it seems reasonable to me. > I see lots of FAQs in CVS that are not UTF8: > > FAQ.html: <META http-equiv="Content-Type" content="text/html; charset=US-ASCII"> This doesn't matter, because US-ASCII is a proper subset of UTF-8. > FAQ_brazilian.html: <META http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> This doesn't need to change because the bottom range of Unicode was made the same as ISO 8859-1 in order to make the transition somewhat easier. > FAQ_polish.html: <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-2"> > FAQ_russian.html: <META http-equiv="Content-Type" content="text/html; charset=koi8-r"> > FAQ_turkish.html: <meta http-equiv="Content-Type" content="text/html; charset=iso8859-9"> > > Do they all have to be converted? Those three do, I expect. They sure look like it on the page. All sorts of "?" in there. A -- Andrew Sullivan ajs@commandprompt.com +1 503 667 4564 x104 http://www.commandprompt.com/
Andrew Sullivan wrote: > On Fri, Aug 22, 2008 at 02:03:49PM -0400, Bruce Momjian wrote: > > > <meta http-equiv="Content-Type" content="text/html; charset=utf-8" > > > /> > > > > You saw that in our CVS HTML file that is part of the source > > distribution? That is the master copy. > > No, that's my point. When I look at the web page source, it's UTF-8. > And since that's the only realistic way to have the framework document > around the FAQ, it seems reasonable to me. > > > I see lots of FAQs in CVS that are not UTF8: > > > > FAQ.html: <META http-equiv="Content-Type" content="text/html; charset=US-ASCII"> > > This doesn't matter, because US-ASCII is a proper subset of UTF-8. > > > FAQ_brazilian.html: <META http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> > > This doesn't need to change because the bottom range of Unicode was made > the same as ISO 8859-1 in order to make the transition somewhat easier. > > > FAQ_polish.html: <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-2"> > > FAQ_russian.html: <META http-equiv="Content-Type" content="text/html; charset=koi8-r"> > > FAQ_turkish.html: <meta http-equiv="Content-Type" content="text/html; charset=iso8859-9"> > > > > Do they all have to be converted? > > Those three do, I expect. They sure look like it on the page. All > sorts of "?" in there. Well, something is converting them to UTF8 headings on our web site and I thought the HTML encoding was converted at that stage. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: ... >>> Do they all have to be converted? >> Those three do, I expect. They sure look like it on the page. All >> sorts of "?" in there. > > Well, something is converting them to UTF8 headings on our web site and > I thought the HTML encoding was converted at that stage. probably the server aka. the real content-type Header. Putting them into the file again via meta tag really helps only if you are loading it from filesystem. Of course there should be no mismatch between server, meta and contents encoding. Having everything in UTF8 should be preferred imho. Regards Tino
On Friday 22 August 2008 22:13:16 Andrew Sullivan wrote: > > FAQ_brazilian.html: <META http-equiv="Content-Type" > > content="text/html; charset=iso-8859-1"> > > This doesn't need to change because the bottom range of Unicode was made > the same as ISO 8859-1 in order to make the transition somewhat easier. The bottom range of Unicode codepoints is the same as ISO 8859-1, but not the bottom range of UTF-8 encoded bytes.
Bruce Momjian wrote: > I see lots of FAQs in CVS that are not UTF8: > > FAQ.html: <META http-equiv="Content-Type" content="text/html; charset=US-ASCII"> > FAQ_MINGW.html: <meta http-equiv="content-type" > FAQ_brazilian.html: <META http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> > FAQ_czech.html:<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> > FAQ_czech.html:<meta http-equiv="Content-language" content="cs"> > FAQ_farsi.html:<META http-equiv=Content-Type content="text/html; charset=utf-8"></HEAD> > FAQ_french.html: <META http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> > FAQ_german.html:<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> > FAQ_hungarian.html: http-equiv="content-type"> > FAQ_japanese.html:<META http-equiv="Content-Type" content="text/html; charset=utf-8"> > FAQ_polish.html: <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-2"> > FAQ_russian.html: <META http-equiv="Content-Type" content="text/html; charset=koi8-r"> > FAQ_turkish.html: <meta http-equiv="Content-Type" content="text/html; charset=iso8859-9"> > > Do they all have to be converted? > I wouldn't put too much work into that. Wouldn't it be better to migrate them to the wiki instead? //Magnus
On Sat, Aug 23, 2008 at 03:45:09PM +0300, Peter Eisentraut wrote: > The bottom range of Unicode codepoints is the same as ISO 8859-1, but not the > bottom range of UTF-8 encoded bytes. Oh, right, silly me. Well, then, those'll need fixing, too. I'm not sure I see why this is such a problem. It's obvious that if you're going to support on-the-fly internationalisation on your site (as postgresql.org does), then you need to pick the one encoding that allows you to do that. A -- Andrew Sullivan ajs@commandprompt.com +1 503 667 4564 x104 http://www.commandprompt.com/
On Monday 25 August 2008 11:00:12 Magnus Hagander wrote: > I wouldn't put too much work into that. Wouldn't it be better to migrate > them to the wiki instead? I'm all in favor of the wiki, but consider that this is prominent user-facing stuff here. If we put the FAQ in the wiki, why not the entire web site? Perhaps the web team should take over these files and apply the existing translation framework to it.
Peter Eisentraut wrote: > On Monday 25 August 2008 11:00:12 Magnus Hagander wrote: >> I wouldn't put too much work into that. Wouldn't it be better to migrate >> them to the wiki instead? > > I'm all in favor of the wiki, but consider that this is prominent user-facing > stuff here. If we put the FAQ in the wiki, why not the entire web site? > Perhaps the web team should take over these files and apply the existing > translation framework to it. That we could certainly do. At least if we can get pgweb set up to allow specific permissions to update these things - I would like to rid us of the bottleneck of having to write up an actual patch and send it to a list, to have somebody else apply it, and only then realize it's broken HTML etc. The translators should be able to do this on their own. I don't think the current web team can just "take them over". Sure, they could take over the part that Bruce does today which is apply patches for people, but it doesn't solve the issue in itself. //Magnus
Bruce Momjian wrote: > Andrew Sullivan wrote: >> On Fri, Aug 22, 2008 at 02:03:49PM -0400, Bruce Momjian wrote: >>>> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" >>>> /> >>> You saw that in our CVS HTML file that is part of the source >>> distribution? That is the master copy. >> No, that's my point. When I look at the web page source, it's UTF-8. >> And since that's the only realistic way to have the framework document >> around the FAQ, it seems reasonable to me. >> >>> I see lots of FAQs in CVS that are not UTF8: >>> >>> FAQ.html: <META http-equiv="Content-Type" content="text/html; charset=US-ASCII"> >> This doesn't matter, because US-ASCII is a proper subset of UTF-8. >> >>> FAQ_brazilian.html: <META http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> >> This doesn't need to change because the bottom range of Unicode was made >> the same as ISO 8859-1 in order to make the transition somewhat easier. >> >>> FAQ_polish.html: <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-2"> >>> FAQ_russian.html: <META http-equiv="Content-Type" content="text/html; charset=koi8-r"> >>> FAQ_turkish.html: <meta http-equiv="Content-Type" content="text/html; charset=iso8859-9"> >>> >>> Do they all have to be converted? >> Those three do, I expect. They sure look like it on the page. All >> sorts of "?" in there. > > Well, something is converting them to UTF8 headings on our web site and > I thought the HTML encoding was converted at that stage. Whatever process is taking the FAQ documents and integrating them with the web layout around it needs to run an encoding conversion.