Thread: problems when comparing text with special chars
hi list i got a weird problem. somehow, for pg-sql, this statement is not true: SELECT lower('zwiebelbäcker') = 'zwiebelbäcker' => false (which is the wrong answer, obviously) amazingly, this one works fine: SELECT lower('zwiebelbacker') = 'zwiebelbacker' => true thus, the following query fails: SELECT login FROM users.users WHERE lower(login) = 'zwiebelbäcker' => 0 Result (not correct!!) workaround: SELECT login FROM users.users WHERE lower(login) = lower('zwiebelbäcker') => 1 Result (correct) why is this and how can it be circumvented? my workaround of using lower() only works as so long as i don't want case-insensitive checks... i've already tried to cast it to text by using ::text, didn't help. thanks, thomas
<me@alternize.com> writes: > SELECT lower('zwiebelb�cker') = 'zwiebelb�cker' > => false (which is the wrong answer, obviously) If you think that's the wrong answer, then you need to initdb in a locale in which it's the right answer. Also take care that you are using the database encoding that the locale expects. There is some discussion of these issues in the "localization" chapter of the manual. regards, tom lane
well, all other functions (at least) seem to work just fine. the data is correctly stored including the umlaut and other special chars, only when applying lower() / upper() they are being stripped. the installer (yes, its the v8.0.3 windows version) had me choose the following db encoding, which also matches the server's win2k3 environement. this seems right to me: LC_COLLATE: German_Switzerland.1252 LC_CTYPE: German_Switzerland.1252 the DB was created with the characterset UNICODE to have full support for foreign characters (i.e. japanese and stuff). List of databases Name | Owner | Encoding ------------+---------------+----------- outnow | db_outnow | UNICODE template0 | sa | SQL_ASCII template1 | sa | SQL_ASCII the mentioned localisation chapter in the documentation didn't give me a hint what's wrong here, but i also must admit i'm new to this localisation stuff - mssql didn't need to be configured in that aspect ;-) so, any pointers what encoding and locale types i must set to get the system to run smooth would be very appreciated... what is the correct encoding to choose to have the most flexibilities when storing western as well as eastern unicode characters? thanks in advance, thomas ----- Original Message ----- From: "Tom Lane" <tgl@sss.pgh.pa.us> To: <me@alternize.com> Cc: <pgsql-novice@postgresql.org> Sent: Monday, September 05, 2005 12:49 AM Subject: Re: [NOVICE] problems when comparing text with special chars > <me@alternize.com> writes: >> SELECT lower('zwiebelbäcker') = 'zwiebelbäcker' >> => false (which is the wrong answer, obviously) > > If you think that's the wrong answer, then you need to initdb in a > locale in which it's the right answer. Also take care that you are > using the database encoding that the locale expects. There is some > discussion of these issues in the "localization" chapter of the > manual. > > regards, tom lane >
<me@alternize.com> writes: > the installer (yes, its the v8.0.3 windows version) had me choose the > following db encoding, which also matches the server's win2k3 environement. > this seems right to me: > LC_COLLATE: German_Switzerland.1252 > LC_CTYPE: German_Switzerland.1252 OK ... > the DB was created with the characterset UNICODE to have full support for > foreign characters (i.e. japanese and stuff). Wrong answer --- the name of the locale implies that it wants Win1252 encoding. In general you don't get to choose locale and encoding independently. Our Unicode support is pretty broken under Windows anyway in 8.0 (8.1 should be better). You want to stick with that locale and use the corresponding encoding rather than trying to switch to a Unicode locale. regards, tom lane