Thread: Lower case

Lower case

From
"Vladimir S. Petukhov"
Date:
Hi!

Sorry for my English..

I want to do case-insensitivity search, like this
... WHERE lower (column_name) LIKE lower (%value%);
This work fine for English..
But i need search for Russian words, lower() operator does not work with
Russian (non-English) chars, but ORDER works fine...
???????????????
What's wrong?

--
It is better to travel hopefully than to fly Continental.

Now playing: 16 - ?????????.mp3
   AutoGenerated by fortune & xmms...

Re: Lower case

From
Tino Wildenhain
Date:
Hi,

On Wed, 2005-01-26 at 12:01 +0000, Vladimir S. Petukhov wrote:
> Hi!
>
> Sorry for my English..
>
> I want to do case-insensitivity search, like this
> ... WHERE lower (column_name) LIKE lower (%value%);
> This work fine for English..
> But i need search for Russian words, lower() operator does not work with
> Russian (non-English) chars, but ORDER works fine...
> ???????????????
> What's wrong?

lower(), upper() and case insensitive search highly depend on
the correct locale. The locale is currently set on initdb
time, that is, when your whole cluster gets created.

If you dont want to recreate your cluster and your db,
you probably need to write a function to lower()
according to your datas locale.

Regards
Tino
--
Tino Wildenhain <tino@wildenhain.de>


Re: Lower case

From
Dawid Kuroczko
Date:
On Wed, 26 Jan 2005 12:01:49 +0000, Vladimir S. Petukhov
<vladimir@sycore.org> wrote:
> I want to do case-insensitivity search, like this
> ... WHERE lower (column_name) LIKE lower (%value%);
> This work fine for English..
> But i need search for Russian words, lower() operator does not work with
> Russian (non-English) chars, but ORDER works fine...
> ???????????????
> What's wrong?

Welllllll...  Have you made an initdb with apropriate locale setting?
Try:
  pg_controldata /var/lib/postgresql/data
(or wherever your db is)
You should see lines like:
LC_COLLATE:                           C
LC_CTYPE:                             C

If you are using Unicode these should be ru_RU.UTF-8, if not then
ru_RU.KOI8-R or something.  If you see 'C', or 'en_EN' or similar
then you won't have Russian lower/upper support (and ORDER BY
was just a "luck" :)).

What you can do:
   pg_dump the database, initdb --locale=ru_RU; pg_restore the database.

Ohhh, and since you're at it, there is one thing which makes me
wonder and if you don't mind, please try it. :)  This will require
plperlu language, so 'createlang plperlu' and that you use
unicode encoding in your database.

Try if this function:
CREATE OR REPLACE FUNCTION lc(text) RETURNS text AS $$
        utf8::decode($_[0]);
        return lc($_[0]);
$$ LANGUAGE plperlu STABLE;

Used as select lc(your_text_column) from your_table;
works for Russian alphabet. :)  I'm just cuuurious! :)

   Regards,
      Dawid

Re: Lower case

From
Michal Hlavac
Date:
Tino Wildenhain wrote:
> Hi,
>
> On Wed, 2005-01-26 at 12:01 +0000, Vladimir S. Petukhov wrote:
>
>>Hi!
>>
>>Sorry for my English..
>>
>>I want to do case-insensitivity search, like this
>>... WHERE lower (column_name) LIKE lower (%value%);
>>This work fine for English..
>>But i need search for Russian words, lower() operator does not work with
>>Russian (non-English) chars, but ORDER works fine...
>>???????????????
>>What's wrong?

what about ILIKE operator?

miso

Re: Lower case

From
Oleg Bartunov
Date:
Vladimir,

there is pgsql-ru-general mailing list for russian speaking people

     Oleg
On Wed, 26 Jan 2005, Vladimir S. Petukhov wrote:

> Hi!
>
> Sorry for my English..
>
> I want to do case-insensitivity search, like this
> ... WHERE lower (column_name) LIKE lower (%value%);
> This work fine for English..
> But i need search for Russian words, lower() operator does not work with
> Russian (non-English) chars, but ORDER works fine...
> ???????????????
> What's wrong?
>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: Lower case

From
"Vladimir S. Petukhov"
Date:
pg_controldata /var/pgsql/data
...
LC_COLLATE:                           ru_RU
LC_CTYPE:                             ru_RU

bash-2.05b# psql -l
        List of databases
   Name    |  Owner   | Encoding
-----------+----------+----------
 testdb   | postgres | UNICODE

And LIKE, ILIKE, ~ do not recognize upper/lower case..
SELECT ... ORDER BY do something like that (in English Alphabet):
a
AAAA
aaaaa
Tast12
tes
test
Test12
test12

?:(

On Wednesday 26 January 2005 10:15, Dawid Kuroczko wrote:
> On Wed, 26 Jan 2005 12:01:49 +0000, Vladimir S. Petukhov
>
> <vladimir@sycore.org> wrote:
> > I want to do case-insensitivity search, like this
> > ... WHERE lower (column_name) LIKE lower (%value%);
> > This work fine for English..
> > But i need search for Russian words, lower() operator does not work with
> > Russian (non-English) chars, but ORDER works fine...
> > ???????????????
> > What's wrong?
>
> Welllllll...  Have you made an initdb with apropriate locale setting?
> Try:
>   pg_controldata /var/lib/postgresql/data
> (or wherever your db is)
> You should see lines like:
> LC_COLLATE:                           C
> LC_CTYPE:                             C
>
> If you are using Unicode these should be ru_RU.UTF-8, if not then
> ru_RU.KOI8-R or something.  If you see 'C', or 'en_EN' or similar
> then you won't have Russian lower/upper support (and ORDER BY
> was just a "luck" :)).
>
> What you can do:
>    pg_dump the database, initdb --locale=ru_RU; pg_restore the database.
>
> Ohhh, and since you're at it, there is one thing which makes me
> wonder and if you don't mind, please try it. :)  This will require
> plperlu language, so 'createlang plperlu' and that you use
> unicode encoding in your database.
>
> Try if this function:
> CREATE OR REPLACE FUNCTION lc(text) RETURNS text AS $$
>         utf8::decode($_[0]);
>         return lc($_[0]);
> $$ LANGUAGE plperlu STABLE;
>
> Used as select lc(your_text_column) from your_table;
> works for Russian alphabet. :)  I'm just cuuurious! :)
>
>    Regards,
>       Dawid

--
Скунсу и не надо быть красивым. Его и так все уважают.

Now playing:
   AutoGenerated by fortune & xmms...

Re: Lower case

From
Tom Lane
Date:
"Vladimir S. Petukhov" <vladimir@sycore.org> writes:
> pg_controldata /var/pgsql/data
> ...
> LC_COLLATE:                           ru_RU
> LC_CTYPE:                             ru_RU

> bash-2.05b# psql -l
>         List of databases
>    Name    |  Owner   | Encoding
> -----------+----------+----------
>  testdb   | postgres | UNICODE

> And LIKE, ILIKE, ~ do not recognize upper/lower case..

What character encoding is implied by those LC_ settings on your machine?
If it's different from the database encoding (here utf8) these things
won't actually work right.

Also, before PG 8.0 upper/lower simply don't work on multibyte characters,
which means you'd have to use a single-byte encoding to make it work
(I think koi8 is single-byte but not sure).

            regards, tom lane

Re: Lower case

From
"Vladimir S. Petukhov"
Date:
On Wednesday 26 January 2005 20:01, you wrote:
> "Vladimir S. Petukhov" <vladimir@sycore.org> writes:
> > pg_controldata /var/pgsql/data
> > ...
> > LC_COLLATE:                           ru_RU
> > LC_CTYPE:                             ru_RU
> >
> > bash-2.05b# psql -l
> >         List of databases
> >    Name    |  Owner   | Encoding
> > -----------+----------+----------
> >  testdb   | postgres | UNICODE
> >
> > And LIKE, ILIKE, ~ do not recognize upper/lower case..
>
> What character encoding is implied by those LC_ settings on your machine?
> If it's different from the database encoding (here utf8) these things
> won't actually work right.
LANG=ru_RU.koi8r
LC_ALL=ru_RU.koi8r
But how it act on lower/upper cases? Client use utf-8 encoding...

>
> Also, before PG 8.0 upper/lower simply don't work on multibyte characters,
> which means you'd have to use a single-byte encoding to make it work
> (I think koi8 is single-byte but not sure).
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly

--
You feel a whole lot more like you do now than you did when you used to.

Now playing:
   AutoGenerated by fortune & xmms...

Re: Lower case

From
Dawid Kuroczko
Date:
On Thu, 27 Jan 2005 00:16:14 +0000, Vladimir S. Petukhov
<vladimir@sycore.org> wrote:
> > > LC_COLLATE:                           ru_RU
> > > LC_CTYPE:                             ru_RU
> > >    Name    |  Owner   | Encoding
> > > -----------+----------+----------
> > >  testdb   | postgres | UNICODE
> > > And LIKE, ILIKE, ~ do not recognize upper/lower case..
> >
> > What character encoding is implied by those LC_ settings on your machine?
> > If it's different from the database encoding (here utf8) these things
> > won't actually work right.
> LANG=ru_RU.koi8r
> LC_ALL=ru_RU.koi8r
> But how it act on lower/upper cases? Client use utf-8 encoding...

The client uses utf-8 encoding, so does server.  Texts are stored
using UTF-8.  However when you call a lower() function from
PostgreSQL it does more or less following:
  -- it retrieves text row from database.  This text is in UTF-8 encoding.
  -- it calls strxfrm function upon this text.
     -- strxfrm function sees that current locale is ru_RU.koi8r
     -- strxfrm then takes utf-8 encoded text and treats it as koi8r
     -- strxfrm "skips over" characters it does not recognize (utf-8 chars)
     -- strxfrm returns transformed text
  -- PostgreSQL takes the resulting text, believing it is still in utf-8.
In other words, probably only latin characters were subject to lower()
functions, any "unknown" Russian UTF-8 characters were at best
skipped.

Please note that PostgreSQL does not do implicit utf8->koi8r->utf8
conversion while calling function lower().  AFAIK it does not even
know (or care) if current locale setting ("ru_RU") is for different
encoding than current database's.  It is DB Admin's duty to make
sure cluster locale (done in initdb) is compatible with database
encoding (done in createdb).

  Regards,
      Dawid