Thread: unicode and =

unicode and =

From
"Grant Morgan"
Date:
= is not working on a char(30) coloumn for me.
I want to find rows with equal name.
I have my database set to unicode.

SQL1
SELECT h1.key,h1.name,h2.key,h2.name
 FROM table1 as h1, table1 as h2
WHERE h1.name=h2.name
and h1.OID = 730716

produces result rows where name doe not match
name is multibyte UTF-8 values.

SQL1
SELECT h1.key,h1.name,h2.key,h2.name
 FROM table1 as h1, table1 as h2
WHERE h1.key=h2.key
and h1.OID = 730716

produces correct results.
key is single byte UTF-8 values only (digits only)

I have a hash index on name, I dropped it and got a different but still wrong result.
key is part of a multicolumn primary kay

version 8.0.3 - gcc 3.4.3 fedora 3

Any suggestion on how to match multibyte characters? Do I need to use a differnt comparison operator?

Thanks,
Grant

Re: unicode and =

From
Tom Lane
Date:
"Grant Morgan" <grant@ryuuguu.com> writes:
> = is not working on a char(30) coloumn for me.
> I want to find rows with equal name.
> I have my database set to unicode.

I'll bet you are running the postmaster in a locale that isn't expecting
utf-8 encoding.  The locale and encoding have to match or you're going
to get very strange behavior.

            regards, tom lane

Re: unicode and =

From
"Grant Morgan"
Date:
I am not sure what locale I was running as I had not set it when doing initdb.
I created a new DB with --locale=en_US.utf8 -E UNICODE
and imported my data from original source (not copied from old DB) and still have the smae problem that UNICODE strings
withdouble byte characters that are not equal get selected as equal. 

to test things further
md5(h1.name)=md5(h2.name)
works and only matches equal values.
h1.name=h2.name
match un equal values.

Anyone have any other ideas? or is en_US.utf8 not a proper utf8 locale ( I got the name by doing locale -a )
I am not so concerned about sorting on this project just equality, but general solution would be apreciated.

Thanks,
Grant

On Mon, 20 Jun 2005 10:13:39 +0900, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> "Grant Morgan" <grant@ryuuguu.com> writes:
>> = is not working on a char(30) coloumn for me.
>> I want to find rows with equal name.
>> I have my database set to unicode.
>
> I'll bet you are running the postmaster in a locale that isn't expecting
> utf-8 encoding.  The locale and encoding have to match or you're going
> to get very strange behavior.
>
>             regards, tom lane
>
>