Thread: BUG #3316: upper() does not convert to upper case on database with encoding utf-8 and locale de_DE
BUG #3316: upper() does not convert to upper case on database with encoding utf-8 and locale de_DE
From
"Florian Wunderlich"
Date:
The following bug has been logged online: Bug reference: 3316 Logged by: Florian Wunderlich Email address: fwunderlich@factor3.de PostgreSQL version: 8.2.4 Operating system: Linux 2.6.15.6 (debian) Description: upper() does not convert to upper case on database with encoding utf-8 and locale de_DE Details: The database cluster has been initialized with locale=de_DE. SHOW ALL shows all lc_ variables as "de_DE". There are two databases: temp which has been created with encoding='utf-8', and temp2 with encoding='iso-8859-1'. Both databases are completely empty. The console is running with encoding iso-8859-1. The following commands are used in a file encoded in iso-8859-1: set client_encoding='utf-8'; select upper('äöü'); In case the argument to upper() does not come out as expected: it is an a-umlaut, o-umlaut and u-umlaut. The following command is then used: iconv -f iso-8859-1 -t utf-8 | psql temp | iconv -f utf-8 -t iso-8859-1 This converts the iso-8859-1 encoded file from above to utf-8 and converts the psql output back to iso-8859-1. For database temp, this yields "äöü" (lower case letters), while for temp2, it yields "ÃÃÃ" (upper case letters), which is correct. I did not find a bug report for this problem on pgsql-bugs or with Google. It seems that there have been problems in the past with multibyte database, but for 8.1, they have been fixed and using a multibyte database should work fine.
Re: BUG #3316: upper() does not convert to upper case on database with encoding utf-8 and locale de_DE
From
Tom Lane
Date:
"Florian Wunderlich" <fwunderlich@factor3.de> writes: > The following commands are > used in a file encoded in iso-8859-1: > set client_encoding='utf-8'; > select upper('äöü'); Isn't that pilot error, plain and simple? You told the machine your input is in utf8, not latin1. regards, tom lane
Re: BUG #3316: upper() does not convert to upper case on database with encoding utf-8 and locale de_DE
From
Florian Wunderlich
Date:
Tom Lane wrote: > "Florian Wunderlich" <fwunderlich@factor3.de> writes: >> The following commands are >> used in a file encoded in iso-8859-1: > >> set client_encoding='utf-8'; >> select upper('äöü'); > > Isn't that pilot error, plain and simple? You told the machine your > input is in utf8, not latin1. > > regards, tom lane I used iconv to convert the iso-8859-1 to utf-8. This comes a few lines below those you have quoted. The file is encoded in iso-8859-1, but contains instructions to set the client_encoding to utf-8. The whole file is then converted to utf-8 (iconv -f iso-8859-1 -t utf-8 converts from iso-8859-1 to utf-8) and piped into psql, so this is actually correct. Besides, if this was the problem, then it should not work with either database, but it does work with the second database which has iso-8859-1 encoding. To make this a bit clearer: SELECT upper(some umlauts) with the same encoding and client_encoding does not work with a database with encoding='utf-8', but does work with a database with encoding='iso-8859-1'. Note that at no point data is actually read from the database; the upper() function is applied to user supplied input, which is the same for both databases. If this is all too confusing I will write a simple test case as bash script. Thanks for the quick reply, F. Wunderlich