Thread: Re: unicode
The actual checking is done in INSERT/UPDATE/COPY. However, the checking is currently very limited: every byte of a mutibyte character must be greater than 0x7f. > Tatsuo, > > do I understand correctly that there is no checking for > convertion between local charset and unicode in insert and > checking is done only in select ? > > test=# create table qq (a text); > CREATE TABLE > test=# \encoding koi8 > test=# insert into qq values('бартунов'); > INSERT 24617 1 > test=# \encoding unicode > test=# select * from qq; > a > ---------- > п�п�я�������п�п� > (1 row) > > test=# \encoding unicode > test=# insert into qq values('бартунов'); > INSERT 24618 1 > test=# select * from qq; > a > ---------- > п�п�я�������п�п� > > (2 rows) > > test=# \encoding koi8 > test=# select * from qq; > WARNING: UtfToLocal: could not convert UTF-8 (0xc2c1). Ignored > WARNING: UtfToLocal: could not convert UTF-8 (0xd2d4). Ignored > WARNING: UtfToLocal: could not convert UTF-8 (0xd5ce). Ignored > WARNING: UtfToLocal: could not convert UTF-8 (0xcfd7). Ignored > a > ---------- > бартунов > > (2 rows) > > > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, sci.researcher, hostmaster of AstroNet, > Sternberg Astronomical Institute, Moscow University (Russia) > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(095)939-16-83, +007(095)939-23-83 >
Tatsuo Ishii kirjutas N, 26.09.2002 kell 03:37: > The actual checking is done in INSERT/UPDATE/COPY. However, the > checking is currently very limited: every byte of a mutibyte character > must be greater than 0x7f. Where can I read about basic tech details of Unicode / Charset Conversion / ... I't like to find answers to the following (for database created using UNICODE) 1. Where exactly are conversions between national charsets done 2. What is converyted (whole SQL statements or just data) 3. What format is used for processing in memory (UCS-2, UCS-4, UTF-8, UTF-16, UTF-32, ...) 4. What format is used when saving to disk (UCS-*, UTF-*, SCSU, ...) ? 5. Are LIKE/SIMILAR aware of locale stuff ? ------------- Hannu
> Where can I read about basic tech details of Unicode / Charset > Conversion / ... > > I't like to find answers to the following (for database created using > UNICODE) > > 1. Where exactly are conversions between national charsets done No "national charset" is in PostgreSQL. I assume you want to know where frontend/backend encoding conversion happens. They are handled by pg_server_to_client(does conversion BE to FE) and pg_client_to_server(FE to BE). These functions are called by the communication sub system(backend/libpq) and COPY. In summary, in most cases the encoding conversion is done before the parser and after the executor produces the final result. > 2. What is converyted (whole SQL statements or just data) Whole statement. > 3. What format is used for processing in memory (UCS-2, UCS-4, UTF-8, > UTF-16, UTF-32, ...) "format"? I assume you are talking about the encoding. It is exactly same as the database encoding. For UNICODE database, we use UTF-8. Not UCS-2 nor UCS-4. > 4. What format is used when saving to disk (UCS-*, UTF-*, SCSU, ...) ? Ditto. > 5. Are LIKE/SIMILAR aware of locale stuff ? I don't know about SIMILAR, but I believe LIKE is not locale aware and is correct from the standard's point of view... -- Tatsuo Ishii