Thread: database encoding "WIN" -- Western or Cyrillic?

database encoding "WIN" -- Western or Cyrillic?

From
Preston Landers
Date:
The PG documents on database character set encodings, such as here:

http://www.postgresql.org/docs/8.0/interactive/multibyte.html#CHARSET-TABLE

say that the "WIN" encoding in Postgresql means cp1251, which according
to Microsoft:

http://www.microsoft.com/globaldev/reference/wincp.mspx

cp1251 is Cyrillic / Russian:

http://www.microsoft.com/globaldev/reference/sbcs/1251.htm

Whereas cp1252 is Western, also called "Windows Latin-1", "Windows
ANSI", etc.

http://www.microsoft.com/globaldev/reference/sbcs/1252.htm

If the postgresql encoding "WIN" is intended to be Cyrillic 1251, then
it should be labeled as such in the docs to avoid confusion.  However,
that leaves the problem of how to create a "Western" 1252 encoded
database in Postgresql, since no encoding is specified for 1252.

Now in reality you can just use LATIN1 (8859-1) as if it were Win-1252
as long as all your clients are Windows and respect that convention.
The new characters in Windows 1252 are in spots where the original
Latin-1 didn't have anything.  Where you might get into trouble is when
the web app says "OK, I see the database is using Latin-1 aka
ISO-8859-1, so I'm going to tell the client web browser that it's
ISO-8859-1."  That may prevent the web browser from showing the correct
glyphs until the user manually selects codepage 1252.  I'm planning to
have my web app fudge it and always report LATIN1 as windows-1252 for
now.

If the postgresql encoding "WIN" really is intended to be Western
codepage 1252, then the docs (and possibly the code?) obviously need to
be fixed, then a separate Cyrillic WIN1251 encoding created.

thanks,
Preston Landers

(pibble @t yahoo dot com)



Re: database encoding "WIN" -- Western or Cyrillic?

From
Peter Eisentraut
Date:
Am Samstag, 12. Februar 2005 23:32 schrieb Preston Landers:
> If the postgresql encoding "WIN" is intended to be Cyrillic 1251, then
> it should be labeled as such in the docs to avoid confusion.

Well, isn't it?  You pointed to the place in the documentation yourself.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: database encoding "WIN" -- Western or Cyrillic?

From
Peter Eisentraut
Date:
Am Dienstag, 15. Februar 2005 00:09 schrieb Preston Landers:
> Assuming that it is indeed supposed to be Cyrillic, that leaves the
> question of how to create a Postgresql database using Windows CP 1252
> a.k.a. "Western" or "ANSI".

You don't.  That encoding is not supported (mostly because no one has bothered
to implement it until now).

> I remain mystified why the "WIN" encoding would default to a fairly
> obscure Cyrillic encoding considering most Russian users that I know
> are using either KOI8 or Unicode.

Legacy.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: database encoding "WIN" -- Western or Cyrillic?

From
"Serguei A. Mokhov"
Date:
> Date: Tue, 15 Feb 2005 15:47:14 +0100
> From: Peter Eisentraut <peter_e@gmx.net>
>
> > I remain mystified why the "WIN" encoding would default to a fairly
> > obscure Cyrillic encoding considering most Russian users that I know
> > are using either KOI8 or Unicode.
>
> Legacy.

I'd not call it legacy _yet_ as it's as frequently used as about KOI8-R
for "WIN" being the default on Windows for many apps. And since now the
Win32 port is out, it's gonna be even more used. KOI8 was primarily used
on Unixen. Unicode does not kick in yet much for Windows-minded people.

--
Serguei A. Mokhov            |  /~\    The ASCII
Computer Science Department  |  \ / Ribbon Campaign
Concordia University         |   X    Against HTML
Montreal, Quebec, Canada     |  / \      Email!

Re: database encoding "WIN" -- Western or Cyrillic?

From
Preston Landers
Date:
Peter Eisentraut wrote:

>Am Samstag, 12. Februar 2005 23:32 schrieb Preston Landers:
>> If the postgresql encoding "WIN" is intended to be Cyrillic 1251,
then
>> it should be labeled as such in the docs to avoid confusion.

>Well, isn't it?  You pointed to the place in the documentation
yourself.

Pardon me, but did you read the rest of my original email?

The "WIN" encoding is labeled "Windows CP1251".  It doesn't say
anything about Cyrillic or Russian, although WIN1256 is labeled
"Windows CP1256 (Arabic)".

Assuming that it is indeed supposed to be Cyrillic, that leaves the
question of how to create a Postgresql database using Windows CP 1252
a.k.a. "Western" or "ANSI".

I remain mystified why the "WIN" encoding would default to a fairly
obscure Cyrillic encoding considering most Russian users that I know
are using either KOI8 or Unicode.  It should at least be labeled
"Cyrllic" to avoid confusion.

I've seen a number of other websites that mistakenly referred to CP1251
 Cyrillic when they really mean CP1252 Western.  That's why I assumed
this mistake was made here, especially since the word "Cyrillic" isn't
mentioned on the PG doc page, and there is no indication of how to use
CP1252.