Thread: Re: unicode

Re: unicode

From
Tatsuo Ishii
Date:
The actual checking is done in INSERT/UPDATE/COPY. However, the
checking is currently very limited: every byte of a mutibyte character
must be greater than 0x7f.

> Tatsuo,
> 
> do I understand correctly that there is no checking for
> convertion between local charset and unicode in insert and
> checking is done only in select ?
> 
> test=# create table qq (a text);
> CREATE TABLE
> test=# \encoding koi8
> test=# insert into qq values('бартунов');
> INSERT 24617 1
> test=# \encoding unicode
> test=# select * from qq;
>     a
> ----------
>  п�п�я�������п�п�
> (1 row)
> 
> test=# \encoding unicode
> test=# insert into qq values('бартунов');
> INSERT 24618 1
> test=# select * from qq;
>     a
> ----------
>  п�п�я�������п�п�
> 
> (2 rows)
> 
> test=# \encoding koi8
> test=# select * from qq;
> WARNING:  UtfToLocal: could not convert UTF-8 (0xc2c1). Ignored
> WARNING:  UtfToLocal: could not convert UTF-8 (0xd2d4). Ignored
> WARNING:  UtfToLocal: could not convert UTF-8 (0xd5ce). Ignored
> WARNING:  UtfToLocal: could not convert UTF-8 (0xcfd7). Ignored
>     a
> ----------
>  бартунов
> 
> (2 rows)
> 
> 
> 
>     Regards,
>         Oleg
> _____________________________________________________________
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83
> 

Re: unicode

From
Hannu Krosing
Date:
Tatsuo Ishii kirjutas N, 26.09.2002 kell 03:37:
> The actual checking is done in INSERT/UPDATE/COPY. However, the
> checking is currently very limited: every byte of a mutibyte character
> must be greater than 0x7f.

Where can I read about basic tech details of Unicode / Charset
Conversion / ...

I't like to find answers to the following (for database created using
UNICODE)

1. Where exactly are conversions between national charsets done

2. What is converyted (whole SQL statements or just data)

3. What format is used for processing in memory (UCS-2, UCS-4, UTF-8,
UTF-16, UTF-32, ...)

4. What format is used when saving to disk (UCS-*, UTF-*, SCSU, ...) ?

5. Are LIKE/SIMILAR aware of locale stuff ?

-------------
Hannu


Re: unicode

From
Tatsuo Ishii
Date:
> Where can I read about basic tech details of Unicode / Charset
> Conversion / ...
> 
> I't like to find answers to the following (for database created using
> UNICODE)
> 
> 1. Where exactly are conversions between national charsets done

No "national charset" is in PostgreSQL. I assume you want to know
where frontend/backend encoding conversion happens. They are handled
by pg_server_to_client(does conversion BE to FE) and
pg_client_to_server(FE to BE). These functions are called by the
communication sub system(backend/libpq) and COPY. In summary, in most
cases the encoding conversion is done before the parser and after the
executor produces the final result.

> 2. What is converyted (whole SQL statements or just data)

Whole statement.

> 3. What format is used for processing in memory (UCS-2, UCS-4, UTF-8,
> UTF-16, UTF-32, ...)

"format"? I assume you are talking about the encoding.

It is exactly same as the database encoding. For UNICODE database, we
use UTF-8. Not UCS-2 nor UCS-4.

> 4. What format is used when saving to disk (UCS-*, UTF-*, SCSU, ...) ?

Ditto.

> 5. Are LIKE/SIMILAR aware of locale stuff ?

I don't know about SIMILAR, but I believe LIKE is not locale aware and
is correct from the standard's point of view...
--
Tatsuo Ishii