Thread: BUG #9722: select ILIKE is not case insensitive in UTF8 cyrillic

BUG #9722: select ILIKE is not case insensitive in UTF8 cyrillic

From
mv@netsurf.bg
Date:
The following bug has been logged on the website:

Bug reference:      9722
Logged by:          Martin Vassilev
Email address:      mv@netsurf.bg
PostgreSQL version: 9.3.4
Operating system:   Linux Slackware x86_64
Description:

test1=# \d test1
                          Table "public.test1"
 Column |  Type   |                      Modifiers
--------+---------+------------------------------------------------------
 ID     | integer | not null default nextval('"test1_ID_seq"'::regclass)
 TEXT   | text    |


test1=# select * from test1 where "TEXT" ilike 'Тест%';
 ID |     TEXT
----+---------------
  3 | Тестов запис
  4 | Тестов запис1
(2 rows)

test1=# select * from test1 where "TEXT" ilike 'тест%';
 ID | TEXT
----+------
(0 rows)


Тhese two queries should return the same result.
I don't had that problem in version 8.4.X.
I want to thanks pgsql team for great work on this DB project.

regards,
Martin

Re: BUG #9722: select ILIKE is not case insensitive in UTF8 cyrillic

From
Tom Lane
Date:
mv@netsurf.bg writes:
> test1=# select * from test1 where "TEXT" ilike 'Тест%';
>  ID |     TEXT
> ----+---------------
>   3 | Тестов запис
>   4 | Тестов запис1
> (2 rows)

> test1=# select * from test1 where "TEXT" ilike 'тест%';
>  ID | TEXT
> ----+------
> (0 rows)

> Тhese two queries should return the same result.
> I don't had that problem in version 8.4.X.

Perhaps you don't have the database locale/encoding set correctly?
(Look at the lc_collate and lc_ctype settings, and compare to what
you were using in 8.4.)

Another possibility is that you changed machines and the new OS
doesn't have the right Cyrillic locale definition.  PG mostly depends
on the OS' locale features to implement things like case folding.

            regards, tom lane

Re: BUG #9722: select ILIKE is not case insensitive in UTF8 cyrillic

From
Tom Lane
Date:
Martin Vassilev <mv@netsurf.bg> writes:
> Hi Tom,
>> Perhaps you don't have the database locale/encoding set correctly?
>> (Look at the lc_collate and lc_ctype settings, and compare to what
>> you were using in 8.4.)
> the 8.4 setup is gone :(

That's not a reason not to identify what you've got now.

> Here is my locale settings:

> ~# locale
> LANG=en_US.UTF-8
> LC_CTYPE="en_US.UTF-8"
> LC_COLLATE=C

This is your shell environment, which may or may not have much to do with
what Postgres is using: you need to look at the lc_collate and lc_ctype
parameters for each database.  However, it would not surprise me at all
if an en_US setting doesn't case-fold Cyrillic characters --- and a C
setting most certainly wouldn't.

            regards, tom lane

Re: BUG #9722: select ILIKE is not case insensitive in UTF8 cyrillic

From
Martin Vassilev
Date:
On Wednesday, March 26, 2014 09:58:35 Tom Lane wrote:
> Martin Vassilev <mv@netsurf.bg> writes:
> > Hi Tom,
> >
> >> Perhaps you don't have the database locale/encoding set correctly?
> >> (Look at the lc_collate and lc_ctype settings, and compare to what
> >> you were using in 8.4.)
> >
> > the 8.4 setup is gone :(
>
> That's not a reason not to identify what you've got now.
>
> > Here is my locale settings:
> >
> > ~# locale
> > LANG=en_US.UTF-8
> > LC_CTYPE="en_US.UTF-8"
> > LC_COLLATE=C
>
> This is your shell environment, which may or may not have much to do with
> what Postgres is using: you need to look at the lc_collate and lc_ctype
> parameters for each database.  However, it would not surprise me at all
> if an en_US setting doesn't case-fold Cyrillic characters --- and a C
> setting most certainly wouldn't.
>
I just recreated DB with --lc-collate=bg_BG.utf8 --lc-ctype=bg_BG.utf8
and it is fine now.
Thanks for support.

--

Best regards ,
Martin Vassilev
Network/System Administrator

Net-Surf.net Ltd.
phone: + 359 96 399994
e-mail: mv@netsurf.bg
http://www.netsurf.bg

Re: BUG #9722: select ILIKE is not case insensitive in UTF8 cyrillic

From
Martin Vassilev
Date:
Hi Tom,
thanks for reply.
> Perhaps you don't have the database locale/encoding set correctly?

test1=# show server_encoding;
 server_encoding
-----------------
 UTF8
(1 row)

> (Look at the lc_collate and lc_ctype settings, and compare to what
> you were using in 8.4.)
the 8.4 setup is gone :(
>
> Another possibility is that you changed machines and the new OS
> doesn't have the right Cyrillic locale definition.  PG mostly depends
> on the OS' locale features to implement things like case folding.
Here is my locale settings:

~# locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE=C
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=



--

Best regards ,
Martin Vassilev
Network/System Administrator

Net-Surf.net Ltd.
phone: + 359 96 399994
e-mail: mv@netsurf.bg
http://www.netsurf.bg