Thread: which charset use for cyrilic?

which charset use for cyrilic?

From
Zet
Date:
Hi

Which charset is need to be set in database for cyrilic?

I've used till now WIN, but today I found a problem

for example:

SELECT *
FROM table
WHERE a = 'слово'

returns me a record, where a = 'фраза'

after I tried UNICODE
but for most of cyrilic words PG gives error like
"invalid byte sequence for encoding "UNICODE":..."

Regards,
Zet


Re: which charset use for cyrilic?

From
Oleg Bartunov
Date:
Zet,

there is pgsql-ru-general list (russian), btw.
see http://www.postgresql.org/community/lists/subscribe for
subscription info.

You did't get us enough info and examples (cut'n paste form psql would be
nice).



     Oleg
On Sat, 29 Oct 2005, Zet wrote:

> Hi
>
> Which charset is need to be set in database for cyrilic?
>
> I've used till now WIN, but today I found a problem
>
> for example:
>
> SELECT *
> FROM table
> WHERE a = 'слово'
>
> returns me a record, where a = 'фраза'
>
> after I tried UNICODE
> but for most of cyrilic words PG gives error like
> "invalid byte sequence for encoding "UNICODE":..."
>
> Regards,
> Zet
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: which charset use for cyrilic?

From
Tino Wildenhain
Date:
Am Samstag, den 29.10.2005, 13:11 +0400 schrieb Zet:
> Hi
>
> Which charset is need to be set in database for cyrilic?
>
> I've used till now WIN, but today I found a problem

win?
>
> for example:
>
> SELECT *
> FROM table
> WHERE a = 'слово'
>
> returns me a record, where a = 'фраза'
>
> after I tried UNICODE
> but for most of cyrilic words PG gives error like
> "invalid byte sequence for encoding "UNICODE":..."

Well for cyrillic, you have the following options:

cp-1251 (windows codepage)
koi-8 (traditional charset)
utf-8 (universal, if you want to have latin characters coexist
       with cyrillic. This is also what you get with the
       UNICODE setting in PG)

You should use the same encoding in the database as
you use in your application to make things easier.

Now you have some data already in your database.
So if you want to change the encoding, you need
to recode your char, varchar and text.

1. ) find out the setting of your database:
     show server_encoding()

if this matches, what you want, you are ready with
this step.

if you get something like SQL_ASCII, then you dont
know what charset actually got used - inspect your
application in this case which encoding it used
to store text.

Make a complete backup, check your
lc_* variables:

SHOW LC_MESSAGES; (and so on)

If its not something like

ru_RU@utf8 (if its UNICODE you want to use)

Then you better run initdb again with the
correct locales setting. This is important
for lower(),upper(), ilike, oder by, etc.
to work.

recreate your DB with setting UNICODE (or
whatever you want to use - same as with the
locales)

create a text dump out of your dump via
pg_restore (its recommended to backup using pg_dump -Fc)
relace the occurences of

SET CLIENT_ENCODING TO '...'; (this is what your
original database had) With what you now want
as encoding:

SET CLIENT_ENCODING TO 'UNICODE';

(this can be done with sed if you dont want
to load all the dump in your editor)

restore the database with the new cript.
Postgres will take care of the charset conversion.