Thread: problems when comparing text with special chars

problems when comparing text with special chars

From

Date:

03 September 2005, 18:26:25

hi list

i got a weird problem. somehow, for pg-sql, this statement is not true:

SELECT lower('zwiebelbäcker') = 'zwiebelbäcker'
=> false (which is the wrong answer, obviously)

amazingly, this one works fine:

SELECT lower('zwiebelbacker') = 'zwiebelbacker'
=> true

thus, the following query fails:

SELECT login FROM users.users WHERE lower(login) = 'zwiebelbäcker'
=> 0 Result (not correct!!)

workaround:

SELECT login FROM users.users WHERE lower(login) = lower('zwiebelbäcker')
=> 1 Result (correct)

why is this and how can it be circumvented? my workaround of using lower()
only works as so long as i don't want case-insensitive checks... i've
already tried to cast it to text by using ::text, didn't help.

thanks,
thomas

Re: problems when comparing text with special chars

From

Tom Lane

Date:

04 September 2005, 19:49:42

<me@alternize.com> writes:
> SELECT lower('zwiebelb�cker') = 'zwiebelb�cker'
> => false (which is the wrong answer, obviously)

If you think that's the wrong answer, then you need to initdb in a
locale in which it's the right answer.  Also take care that you are
using the database encoding that the locale expects.  There is some
discussion of these issues in the "localization" chapter of the
manual.

            regards, tom lane

Re: problems when comparing text with special chars

From

Date:

04 September 2005, 20:12:09

well, all other functions (at least) seem to work just fine. the data is
correctly stored including the umlaut and other special chars, only when
applying lower() / upper() they are being stripped.

the installer (yes, its the v8.0.3 windows version) had me choose the
following db encoding, which also matches the server's win2k3 environement.
this seems right to me:
LC_COLLATE:                           German_Switzerland.1252
LC_CTYPE:                             German_Switzerland.1252

the DB was created with the characterset UNICODE to have full support for
foreign characters (i.e. japanese and stuff).

           List of databases
    Name    |     Owner     | Encoding
------------+---------------+-----------
 outnow     | db_outnow     | UNICODE
 template0  | sa            | SQL_ASCII
 template1  | sa            | SQL_ASCII

the mentioned localisation chapter in the documentation didn't give me a
hint what's wrong here, but i also must admit i'm new to this localisation
stuff - mssql didn't need to be configured in that aspect ;-)
so, any pointers what encoding and locale types i must set to get the system
to run smooth would be very appreciated... what is the correct encoding to
choose to have the most flexibilities when storing western as well as
eastern unicode characters?

thanks in advance,
thomas

----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: <me@alternize.com>
Cc: <pgsql-novice@postgresql.org>
Sent: Monday, September 05, 2005 12:49 AM
Subject: Re: [NOVICE] problems when comparing text with special chars

> <me@alternize.com> writes:
>> SELECT lower('zwiebelbäcker') = 'zwiebelbäcker'
>> => false (which is the wrong answer, obviously)
>
> If you think that's the wrong answer, then you need to initdb in a
> locale in which it's the right answer.  Also take care that you are
> using the database encoding that the locale expects.  There is some
> discussion of these issues in the "localization" chapter of the
> manual.
>
> regards, tom lane
>

Re: problems when comparing text with special chars

From

Tom Lane

Date:

04 September 2005, 20:33:17

<me@alternize.com> writes:
> the installer (yes, its the v8.0.3 windows version) had me choose the
> following db encoding, which also matches the server's win2k3 environement.
> this seems right to me:
> LC_COLLATE:                           German_Switzerland.1252
> LC_CTYPE:                             German_Switzerland.1252

OK ...

> the DB was created with the characterset UNICODE to have full support for
> foreign characters (i.e. japanese and stuff).

Wrong answer --- the name of the locale implies that it wants Win1252
encoding.  In general you don't get to choose locale and encoding
independently.

Our Unicode support is pretty broken under Windows anyway in 8.0
(8.1 should be better).  You want to stick with that locale and use
the corresponding encoding rather than trying to switch to a Unicode
locale.

            regards, tom lane