Home > mailing lists

Re: which charset use for cyrilic? - Mailing list pgsql-general

From	Tino Wildenhain
Subject	Re: which charset use for cyrilic?
Date	October 29, 2005 12:39:40
Msg-id	1130600370.9938.14.camel@Andrea.peacock.de Whole thread Raw
In response to	which charset use for cyrilic? (Zet <skyer@on.kg>)
List	pgsql-general

Tree view

Am Samstag, den 29.10.2005, 13:11 +0400 schrieb Zet:
> Hi
>
> Which charset is need to be set in database for cyrilic?
>
> I've used till now WIN, but today I found a problem

win?
>
> for example:
>
> SELECT *
> FROM table
> WHERE a = 'слово'
>
> returns me a record, where a = 'фраза'
>
> after I tried UNICODE
> but for most of cyrilic words PG gives error like
> "invalid byte sequence for encoding "UNICODE":..."

Well for cyrillic, you have the following options:

cp-1251 (windows codepage)
koi-8 (traditional charset)
utf-8 (universal, if you want to have latin characters coexist
       with cyrillic. This is also what you get with the
       UNICODE setting in PG)

You should use the same encoding in the database as
you use in your application to make things easier.

Now you have some data already in your database.
So if you want to change the encoding, you need
to recode your char, varchar and text.

1. ) find out the setting of your database:
     show server_encoding()

if this matches, what you want, you are ready with
this step.

if you get something like SQL_ASCII, then you dont
know what charset actually got used - inspect your
application in this case which encoding it used
to store text.

Make a complete backup, check your
lc_* variables:

SHOW LC_MESSAGES; (and so on)

If its not something like

ru_RU@utf8 (if its UNICODE you want to use)

Then you better run initdb again with the
correct locales setting. This is important
for lower(),upper(), ilike, oder by, etc.
to work.

recreate your DB with setting UNICODE (or
whatever you want to use - same as with the
locales)

create a text dump out of your dump via
pg_restore (its recommended to backup using pg_dump -Fc)
relace the occurences of

SET CLIENT_ENCODING TO '...'; (this is what your
original database had) With what you now want
as encoding:

SET CLIENT_ENCODING TO 'UNICODE';

(this can be done with sed if you dont want
to load all the dump in your editor)

restore the database with the new cript.
Postgres will take care of the charset conversion.

pgsql-general by date:

From: Laurent HERVE
Date: 29 October 2005, 12:10:25
Subject: Enhancement Request : Expressions for format string in PlPgsql RAISE statement

From: Bruno Wolff III
Date: 29 October 2005, 13:16:20
Subject: Re: function that resolves IP addresses

Re: which charset use for cyrilic? - Mailing list pgsql-general

Previous

Next