Thread: Different result when using '=' and 'like' in unicode mode
I used PostgreSQL 7.2.1 on redhat 7.3, installed using the rpm from redhat,. When I createdb -E UNICODE, and import data using \copy command, after converted all the imported text files to UTF-8 encoding. It looks well, I can get the data from several platforms in different locale. But, a strange problem, when i using '=' in where clause, to match the varchar value, the result is not correct, for example, the table structure is Table "tbl_test" Column | Type | Modifiers -------------+-----------------------+----------- id | integer | name | character varying(40) | the sql is select * from tbl_test where name = 'ç³ç°'; but the result contains other data, such as ç³çº, æ³çº, æ³ç° and I changed the sql to select * from tbl_test where name like 'ç³ç°'; select * from tbl_test where name ilike 'ç³ç°'; the results both correct. I don't know the reason of this problem, and in my application it not allowed to use 'like' replaced '=', the data amount is too large. Is this a bug of PostgreSQL? or only a problem in redhat environment? and how to fix it? Thx, Jiang Sheng
Jiang Sheng writes: > the sql is > select * from tbl_test where name = 'ç³ç°'; > > but the result contains other data, such as ç³çº, æ³çº, æ³ç° > > and I changed the sql to > select * from tbl_test where name like 'ç³ç°'; > select * from tbl_test where name ilike 'ç³ç°'; > > the results both correct. The LIKE operator uses a character-by-character comparison (which could quite possibly behave oddly if your strings contain combining characters, but that is a different issue). The = operator uses the host operating system's locale facilities to do a locale-aware comparison. So in general LIKE and = are not interchangeable. If the result you get for = is wrong even under some locale, then you probably don't have the right locale set on your server. Recent glibc systems require you to set your locale to xx_YY.utf8 (rather than just xx_YY) if you want to use Unicode. -- Peter Eisentraut peter_e@gmx.net
Thanks a lot! In your opinion, should I re-compile the PostgreSQL after set the $LANG to ja_JP.utf8? or I need to set other variables' value.. I changed the $LANG value, but the result still not correct, and not found anything about it in PostgrSQL's document. Thx Jiang Sheng ----- Original Message ----- éä¿¡è : "Peter Eisentraut" <peter_e@gmx.net> å®å : "Jiang Sheng" <jiangsheng@sis.sh.cn> Cc: <pgsql-bugs@postgresql.org> éä¿¡æ¥æ : 2002å¹´11æ18æ¥ 6:21 件å : Re: [BUGS] Different result when using '=' and 'like' in unicode mode > Jiang Sheng writes: > > > the sql is > > select * from tbl_test where name = 'ç³ç°'; > > > > but the result contains other data, such as ç³çº, æ³çº, æ³ç° > > > > and I changed the sql to > > select * from tbl_test where name like 'ç³ç°'; > > select * from tbl_test where name ilike 'ç³ç°'; > > > > the results both correct. > > The LIKE operator uses a character-by-character comparison (which could > quite possibly behave oddly if your strings contain combining characters, > but that is a different issue). The = operator uses the host operating > system's locale facilities to do a locale-aware comparison. So in general > LIKE and = are not interchangeable. > > If the result you get for = is wrong even under some locale, then you > probably don't have the right locale set on your server. Recent glibc > systems require you to set your locale to xx_YY.utf8 (rather than just > xx_YY) if you want to use Unicode. > > -- > Peter Eisentraut peter_e@gmx.net >
Jiang Sheng writes: > In your opinion, should I re-compile the PostgreSQL after set the $LANG > to ja_JP.utf8? or I need to set other variables' value.. Recompiling has nothing to do with that. Please see the Administrator's Guide chapter on localization for all the information. -- Peter Eisentraut peter_e@gmx.net