Thread: Different result when using '=' and 'like' in unicode mode

Different result when using '=' and 'like' in unicode mode

From
Jiang Sheng
Date:
I used PostgreSQL 7.2.1 on redhat 7.3, installed using the rpm from redhat,.
When I createdb -E UNICODE, and import data using \copy command, after
converted  all the imported text files to UTF-8 encoding. It looks well, I
can get
the data from several platforms in different locale. But, a strange problem,
when i using '=' in where clause, to match the varchar value, the result is
not correct,
for example, the table structure is

             Table "tbl_test"
   Column    |         Type          | Modifiers
-------------+-----------------------+-----------
 id          | integer               |
 name        | character varying(40) |

the sql is
select * from tbl_test where name = '石田';

but the result contains other data, such as 石町, 柳町, 柳田

and I changed the sql to
select * from tbl_test where name like '石田';
select * from tbl_test where name ilike '石田';

the results both correct.

I don't know the reason of this problem, and in my application it not
allowed
to use 'like' replaced '=', the data amount is too large.

Is this a bug of PostgreSQL? or only a problem in redhat environment? and
how to fix it?

Thx,
Jiang Sheng

Re: Different result when using '=' and 'like' in unicode

From
Peter Eisentraut
Date:
Jiang Sheng writes:

> the sql is
> select * from tbl_test where name = '石田';
>
> but the result contains other data, such as 石町, 柳町, 柳田
>
> and I changed the sql to
> select * from tbl_test where name like '石田';
> select * from tbl_test where name ilike '石田';
>
> the results both correct.

The LIKE operator uses a character-by-character comparison (which could
quite possibly behave oddly if your strings contain combining characters,
but that is a different issue).  The = operator uses the host operating
system's locale facilities to do a locale-aware comparison.  So in general
LIKE and = are not interchangeable.

If the result you get for = is wrong even under some locale, then you
probably don't have the right locale set on your server.  Recent glibc
systems require you to set your locale to xx_YY.utf8 (rather than just
xx_YY) if you want to use Unicode.

--
Peter Eisentraut   peter_e@gmx.net

Re: Different result when using '=' and 'like' in unicode mode

From
Jiang Sheng
Date:
Thanks a lot!
In your opinion, should I re-compile the PostgreSQL after set the $LANG
to ja_JP.utf8? or I need to set other variables' value..
I changed the $LANG value, but the result still not correct, and not found
anything about it in PostgrSQL's document.

Thx
Jiang Sheng


----- Original Message -----
送信者 : "Peter Eisentraut" <peter_e@gmx.net>
宛先 : "Jiang Sheng" <jiangsheng@sis.sh.cn>
Cc: <pgsql-bugs@postgresql.org>
送信日時 : 2002年11月18日 6:21
件名 : Re: [BUGS] Different result when using '=' and 'like' in unicode mode


> Jiang Sheng writes:
>
> > the sql is
> > select * from tbl_test where name = '石田';
> >
> > but the result contains other data, such as 石町, 柳町, 柳田
> >
> > and I changed the sql to
> > select * from tbl_test where name like '石田';
> > select * from tbl_test where name ilike '石田';
> >
> > the results both correct.
>
> The LIKE operator uses a character-by-character comparison (which could
> quite possibly behave oddly if your strings contain combining characters,
> but that is a different issue).  The = operator uses the host operating
> system's locale facilities to do a locale-aware comparison.  So in general
> LIKE and = are not interchangeable.
>
> If the result you get for = is wrong even under some locale, then you
> probably don't have the right locale set on your server.  Recent glibc
> systems require you to set your locale to xx_YY.utf8 (rather than just
> xx_YY) if you want to use Unicode.
>
> --
> Peter Eisentraut   peter_e@gmx.net
>

Re: Different result when using '=' and 'like' in unicode

From
Peter Eisentraut
Date:
Jiang Sheng writes:

> In your opinion, should I re-compile the PostgreSQL after set the $LANG
> to ja_JP.utf8? or I need to set other variables' value..

Recompiling has nothing to do with that.  Please see the Administrator's
Guide chapter on localization for all the information.

--
Peter Eisentraut   peter_e@gmx.net