Thread: BUG #1721: mutiple bytes character string comaprison error
The following bug has been logged online: Bug reference: 1721 Logged by: Chii-Tung Liu Email address: cdliou@mail.cyut.edu.tw PostgreSQL version: 8.0.3 Operating system: Windows XP SP2 Description: mutiple bytes character string comaprison error Details: When compare two UTF-8 encoded string that contains Chinese words, the result is always TRUE 1. create a database test with encoding set to unicode CREATE DATABASE test WITH OWNER = postgres ENCODING = 'UNICODE' TABLESPACE = pg_default; 2. insert data with Chinese words INSERT into node set title='1 ä¸æ' 3. SELECT title from node where title > '1.1 ' would return '1 ä¸æ' 4. Both SELECT '1 ä¸æ' > '1.1' and SELECT '1.1' > '1 ä¸æ' return FALSE
"Chii-Tung Liu" <cdliou@mail.cyut.edu.tw> writes: > PostgreSQL version: 8.0.3 > Operating system: Windows XP SP2 > When compare two UTF-8 encoded string that contains Chinese words, the > result is always TRUE Sorry, but UTF-8 encoding doesn't work properly on Windows (yet). Use some other database encoding. regards, tom lane
On Sun, 19 Jun 2005, Tom Lane wrote: > "Chii-Tung Liu" <cdliou@mail.cyut.edu.tw> writes: > > PostgreSQL version: 8.0.3 > > Operating system: Windows XP SP2 > > > When compare two UTF-8 encoded string that contains Chinese words, the > > result is always TRUE > > Sorry, but UTF-8 encoding doesn't work properly on Windows (yet). > Use some other database encoding. > Shouldn't we forbid its creation then? At least a strongly worded warning? We see these complaints too often. Kris Jurka
Kris Jurka <books@ejurka.com> writes: > On Sun, 19 Jun 2005, Tom Lane wrote: >> Sorry, but UTF-8 encoding doesn't work properly on Windows (yet). >> Use some other database encoding. > Shouldn't we forbid its creation then? There was serious discussion of that before the 8.0 release, but we decided not to forbid it. Check the archives; I don't recall the reasoning at the moment. > We see these complaints too often. There are lots of complaints we see way too often ;-) ... but distressingly, there are still only 24 hours in a day. regards, tom lane
> The following bug has been logged online: > > Bug reference: 1721 > Logged by: Chii-Tung Liu > Email address: cdliou@mail.cyut.edu.tw > PostgreSQL version: 8.0.3 > Operating system: Windows XP SP2 > Description: mutiple bytes character string comaprison error > Details: > > When compare two UTF-8 encoded string that contains Chinese words, the > result is always TRUE > 1. create a database test with encoding set to unicode > CREATE DATABASE test > WITH OWNER = postgres > ENCODING = 'UNICODE' > TABLESPACE = pg_default; > 2. insert data with Chinese words > INSERT into node set title='1 $BCfJ8(B' > > 3. SELECT title from node where title > '1.1 ' > would return '1 $BCfJ8(B' > > 4. Both SELECT '1 $BCfJ8(B' > '1.1' and SELECT '1.1' > '1 $BCfJ8(B' return > FALSE I think you need to use C locale. -- Tatsuo Ishii
Tom Lane wrote: > Kris Jurka <books@ejurka.com> writes: > > On Sun, 19 Jun 2005, Tom Lane wrote: > >> Sorry, but UTF-8 encoding doesn't work properly on Windows (yet). > >> Use some other database encoding. > > > Shouldn't we forbid its creation then? > > There was serious discussion of that before the 8.0 release, but > we decided not to forbid it. Check the archives; I don't recall > the reasoning at the moment. UTF8 encoding works with the C locale assuming you don't care about ordering of the character set, e.g. Japanese. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
>=20 > UTF8 encoding works with the C locale assuming you don't care=20 > about ordering of the character set, e.g. Japanese. >=20 Has anyone with the ability to compile postgresql on windows tested the ICU patch? ... John
> > UTF8 encoding works with the C locale assuming you don't care about=20 > > ordering of the character set, e.g. Japanese. > >=20 >=20 > Has anyone with the ability to compile postgresql on windows=20 > tested the ICU patch? Yes. See http://archives.postgresql.org/pgsql-hackers/2005-05/msg00662.php //Magnus
> Tom Lane wrote: > > Kris Jurka <books@ejurka.com> writes: > > > On Sun, 19 Jun 2005, Tom Lane wrote: > > >> Sorry, but UTF-8 encoding doesn't work properly on Windows (yet). > > >> Use some other database encoding. > > > > > Shouldn't we forbid its creation then? > > > > There was serious discussion of that before the 8.0 release, but > > we decided not to forbid it. Check the archives; I don't recall > > the reasoning at the moment. > > UTF8 encoding works with the C locale assuming you don't care about > ordering of the character set, e.g. Japanese. No, sometimes Japanese needs char ordering too and I think this is not a Windows only problem. The real problem is Unicode defines char orderes in totally random manner because Chinese/Japanese/Korean Kanji characters are "Unified" in Unicode. To solve the problem, we can use convert UTF8 to EUC_JP using CONVERT. See archives for more details. Or you can use Unicode locale only if your platform's locale database is not broken and you only use single locale. -- Tatsuo Ishii