Thread: Lower case
Hi! Sorry for my English.. I want to do case-insensitivity search, like this ... WHERE lower (column_name) LIKE lower (%value%); This work fine for English.. But i need search for Russian words, lower() operator does not work with Russian (non-English) chars, but ORDER works fine... ??????????????? What's wrong? -- It is better to travel hopefully than to fly Continental. Now playing: 16 - ?????????.mp3 AutoGenerated by fortune & xmms...
Hi, On Wed, 2005-01-26 at 12:01 +0000, Vladimir S. Petukhov wrote: > Hi! > > Sorry for my English.. > > I want to do case-insensitivity search, like this > ... WHERE lower (column_name) LIKE lower (%value%); > This work fine for English.. > But i need search for Russian words, lower() operator does not work with > Russian (non-English) chars, but ORDER works fine... > ??????????????? > What's wrong? lower(), upper() and case insensitive search highly depend on the correct locale. The locale is currently set on initdb time, that is, when your whole cluster gets created. If you dont want to recreate your cluster and your db, you probably need to write a function to lower() according to your datas locale. Regards Tino -- Tino Wildenhain <tino@wildenhain.de>
On Wed, 26 Jan 2005 12:01:49 +0000, Vladimir S. Petukhov <vladimir@sycore.org> wrote: > I want to do case-insensitivity search, like this > ... WHERE lower (column_name) LIKE lower (%value%); > This work fine for English.. > But i need search for Russian words, lower() operator does not work with > Russian (non-English) chars, but ORDER works fine... > ??????????????? > What's wrong? Welllllll... Have you made an initdb with apropriate locale setting? Try: pg_controldata /var/lib/postgresql/data (or wherever your db is) You should see lines like: LC_COLLATE: C LC_CTYPE: C If you are using Unicode these should be ru_RU.UTF-8, if not then ru_RU.KOI8-R or something. If you see 'C', or 'en_EN' or similar then you won't have Russian lower/upper support (and ORDER BY was just a "luck" :)). What you can do: pg_dump the database, initdb --locale=ru_RU; pg_restore the database. Ohhh, and since you're at it, there is one thing which makes me wonder and if you don't mind, please try it. :) This will require plperlu language, so 'createlang plperlu' and that you use unicode encoding in your database. Try if this function: CREATE OR REPLACE FUNCTION lc(text) RETURNS text AS $$ utf8::decode($_[0]); return lc($_[0]); $$ LANGUAGE plperlu STABLE; Used as select lc(your_text_column) from your_table; works for Russian alphabet. :) I'm just cuuurious! :) Regards, Dawid
Tino Wildenhain wrote: > Hi, > > On Wed, 2005-01-26 at 12:01 +0000, Vladimir S. Petukhov wrote: > >>Hi! >> >>Sorry for my English.. >> >>I want to do case-insensitivity search, like this >>... WHERE lower (column_name) LIKE lower (%value%); >>This work fine for English.. >>But i need search for Russian words, lower() operator does not work with >>Russian (non-English) chars, but ORDER works fine... >>??????????????? >>What's wrong? what about ILIKE operator? miso
Vladimir, there is pgsql-ru-general mailing list for russian speaking people Oleg On Wed, 26 Jan 2005, Vladimir S. Petukhov wrote: > Hi! > > Sorry for my English.. > > I want to do case-insensitivity search, like this > ... WHERE lower (column_name) LIKE lower (%value%); > This work fine for English.. > But i need search for Russian words, lower() operator does not work with > Russian (non-English) chars, but ORDER works fine... > ??????????????? > What's wrong? > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
pg_controldata /var/pgsql/data ... LC_COLLATE: ru_RU LC_CTYPE: ru_RU bash-2.05b# psql -l List of databases Name | Owner | Encoding -----------+----------+---------- testdb | postgres | UNICODE And LIKE, ILIKE, ~ do not recognize upper/lower case.. SELECT ... ORDER BY do something like that (in English Alphabet): a AAAA aaaaa Tast12 tes test Test12 test12 ?:( On Wednesday 26 January 2005 10:15, Dawid Kuroczko wrote: > On Wed, 26 Jan 2005 12:01:49 +0000, Vladimir S. Petukhov > > <vladimir@sycore.org> wrote: > > I want to do case-insensitivity search, like this > > ... WHERE lower (column_name) LIKE lower (%value%); > > This work fine for English.. > > But i need search for Russian words, lower() operator does not work with > > Russian (non-English) chars, but ORDER works fine... > > ??????????????? > > What's wrong? > > Welllllll... Have you made an initdb with apropriate locale setting? > Try: > pg_controldata /var/lib/postgresql/data > (or wherever your db is) > You should see lines like: > LC_COLLATE: C > LC_CTYPE: C > > If you are using Unicode these should be ru_RU.UTF-8, if not then > ru_RU.KOI8-R or something. If you see 'C', or 'en_EN' or similar > then you won't have Russian lower/upper support (and ORDER BY > was just a "luck" :)). > > What you can do: > pg_dump the database, initdb --locale=ru_RU; pg_restore the database. > > Ohhh, and since you're at it, there is one thing which makes me > wonder and if you don't mind, please try it. :) This will require > plperlu language, so 'createlang plperlu' and that you use > unicode encoding in your database. > > Try if this function: > CREATE OR REPLACE FUNCTION lc(text) RETURNS text AS $$ > utf8::decode($_[0]); > return lc($_[0]); > $$ LANGUAGE plperlu STABLE; > > Used as select lc(your_text_column) from your_table; > works for Russian alphabet. :) I'm just cuuurious! :) > > Regards, > Dawid -- Скунсу и не надо быть красивым. Его и так все уважают. Now playing: AutoGenerated by fortune & xmms...
"Vladimir S. Petukhov" <vladimir@sycore.org> writes: > pg_controldata /var/pgsql/data > ... > LC_COLLATE: ru_RU > LC_CTYPE: ru_RU > bash-2.05b# psql -l > List of databases > Name | Owner | Encoding > -----------+----------+---------- > testdb | postgres | UNICODE > And LIKE, ILIKE, ~ do not recognize upper/lower case.. What character encoding is implied by those LC_ settings on your machine? If it's different from the database encoding (here utf8) these things won't actually work right. Also, before PG 8.0 upper/lower simply don't work on multibyte characters, which means you'd have to use a single-byte encoding to make it work (I think koi8 is single-byte but not sure). regards, tom lane
On Wednesday 26 January 2005 20:01, you wrote: > "Vladimir S. Petukhov" <vladimir@sycore.org> writes: > > pg_controldata /var/pgsql/data > > ... > > LC_COLLATE: ru_RU > > LC_CTYPE: ru_RU > > > > bash-2.05b# psql -l > > List of databases > > Name | Owner | Encoding > > -----------+----------+---------- > > testdb | postgres | UNICODE > > > > And LIKE, ILIKE, ~ do not recognize upper/lower case.. > > What character encoding is implied by those LC_ settings on your machine? > If it's different from the database encoding (here utf8) these things > won't actually work right. LANG=ru_RU.koi8r LC_ALL=ru_RU.koi8r But how it act on lower/upper cases? Client use utf-8 encoding... > > Also, before PG 8.0 upper/lower simply don't work on multibyte characters, > which means you'd have to use a single-byte encoding to make it work > (I think koi8 is single-byte but not sure). > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- You feel a whole lot more like you do now than you did when you used to. Now playing: AutoGenerated by fortune & xmms...
On Thu, 27 Jan 2005 00:16:14 +0000, Vladimir S. Petukhov <vladimir@sycore.org> wrote: > > > LC_COLLATE: ru_RU > > > LC_CTYPE: ru_RU > > > Name | Owner | Encoding > > > -----------+----------+---------- > > > testdb | postgres | UNICODE > > > And LIKE, ILIKE, ~ do not recognize upper/lower case.. > > > > What character encoding is implied by those LC_ settings on your machine? > > If it's different from the database encoding (here utf8) these things > > won't actually work right. > LANG=ru_RU.koi8r > LC_ALL=ru_RU.koi8r > But how it act on lower/upper cases? Client use utf-8 encoding... The client uses utf-8 encoding, so does server. Texts are stored using UTF-8. However when you call a lower() function from PostgreSQL it does more or less following: -- it retrieves text row from database. This text is in UTF-8 encoding. -- it calls strxfrm function upon this text. -- strxfrm function sees that current locale is ru_RU.koi8r -- strxfrm then takes utf-8 encoded text and treats it as koi8r -- strxfrm "skips over" characters it does not recognize (utf-8 chars) -- strxfrm returns transformed text -- PostgreSQL takes the resulting text, believing it is still in utf-8. In other words, probably only latin characters were subject to lower() functions, any "unknown" Russian UTF-8 characters were at best skipped. Please note that PostgreSQL does not do implicit utf8->koi8r->utf8 conversion while calling function lower(). AFAIK it does not even know (or care) if current locale setting ("ru_RU") is for different encoding than current database's. It is DB Admin's duty to make sure cluster locale (done in initdb) is compatible with database encoding (done in createdb). Regards, Dawid