Thread: BUG #9722: select ILIKE is not case insensitive in UTF8 cyrillic
The following bug has been logged on the website: Bug reference: 9722 Logged by: Martin Vassilev Email address: mv@netsurf.bg PostgreSQL version: 9.3.4 Operating system: Linux Slackware x86_64 Description: test1=# \d test1 Table "public.test1" Column | Type | Modifiers --------+---------+------------------------------------------------------ ID | integer | not null default nextval('"test1_ID_seq"'::regclass) TEXT | text | test1=# select * from test1 where "TEXT" ilike 'ТеÑÑ%'; ID | TEXT ----+--------------- 3 | ТеÑÑов Ð·Ð°Ð¿Ð¸Ñ 4 | ТеÑÑов запиÑ1 (2 rows) test1=# select * from test1 where "TEXT" ilike 'ÑеÑÑ%'; ID | TEXT ----+------ (0 rows) Тhese two queries should return the same result. I don't had that problem in version 8.4.X. I want to thanks pgsql team for great work on this DB project. regards, Martin
mv@netsurf.bg writes: > test1=# select * from test1 where "TEXT" ilike 'ТеÑÑ%'; > ID | TEXT > ----+--------------- > 3 | ТеÑÑов Ð·Ð°Ð¿Ð¸Ñ > 4 | ТеÑÑов запиÑ1 > (2 rows) > test1=# select * from test1 where "TEXT" ilike 'ÑеÑÑ%'; > ID | TEXT > ----+------ > (0 rows) > Тhese two queries should return the same result. > I don't had that problem in version 8.4.X. Perhaps you don't have the database locale/encoding set correctly? (Look at the lc_collate and lc_ctype settings, and compare to what you were using in 8.4.) Another possibility is that you changed machines and the new OS doesn't have the right Cyrillic locale definition. PG mostly depends on the OS' locale features to implement things like case folding. regards, tom lane
Martin Vassilev <mv@netsurf.bg> writes: > Hi Tom, >> Perhaps you don't have the database locale/encoding set correctly? >> (Look at the lc_collate and lc_ctype settings, and compare to what >> you were using in 8.4.) > the 8.4 setup is gone :( That's not a reason not to identify what you've got now. > Here is my locale settings: > ~# locale > LANG=en_US.UTF-8 > LC_CTYPE="en_US.UTF-8" > LC_COLLATE=C This is your shell environment, which may or may not have much to do with what Postgres is using: you need to look at the lc_collate and lc_ctype parameters for each database. However, it would not surprise me at all if an en_US setting doesn't case-fold Cyrillic characters --- and a C setting most certainly wouldn't. regards, tom lane
On Wednesday, March 26, 2014 09:58:35 Tom Lane wrote: > Martin Vassilev <mv@netsurf.bg> writes: > > Hi Tom, > > > >> Perhaps you don't have the database locale/encoding set correctly? > >> (Look at the lc_collate and lc_ctype settings, and compare to what > >> you were using in 8.4.) > > > > the 8.4 setup is gone :( > > That's not a reason not to identify what you've got now. > > > Here is my locale settings: > > > > ~# locale > > LANG=en_US.UTF-8 > > LC_CTYPE="en_US.UTF-8" > > LC_COLLATE=C > > This is your shell environment, which may or may not have much to do with > what Postgres is using: you need to look at the lc_collate and lc_ctype > parameters for each database. However, it would not surprise me at all > if an en_US setting doesn't case-fold Cyrillic characters --- and a C > setting most certainly wouldn't. > I just recreated DB with --lc-collate=bg_BG.utf8 --lc-ctype=bg_BG.utf8 and it is fine now. Thanks for support. -- Best regards , Martin Vassilev Network/System Administrator Net-Surf.net Ltd. phone: + 359 96 399994 e-mail: mv@netsurf.bg http://www.netsurf.bg
Hi Tom, thanks for reply. > Perhaps you don't have the database locale/encoding set correctly? test1=# show server_encoding; server_encoding ----------------- UTF8 (1 row) > (Look at the lc_collate and lc_ctype settings, and compare to what > you were using in 8.4.) the 8.4 setup is gone :( > > Another possibility is that you changed machines and the new OS > doesn't have the right Cyrillic locale definition. PG mostly depends > on the OS' locale features to implement things like case folding. Here is my locale settings: ~# locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE=C LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL= -- Best regards , Martin Vassilev Network/System Administrator Net-Surf.net Ltd. phone: + 359 96 399994 e-mail: mv@netsurf.bg http://www.netsurf.bg