Thread: BUG #3316: upper() does not convert to upper case on database with encoding utf-8 and locale de_DE

The following bug has been logged online:

Bug reference:      3316
Logged by:          Florian Wunderlich
Email address:      fwunderlich@factor3.de
PostgreSQL version: 8.2.4
Operating system:   Linux 2.6.15.6 (debian)
Description:        upper() does not convert to upper case on database with
encoding utf-8 and locale de_DE
Details:

The database cluster has been initialized with locale=de_DE.

SHOW ALL shows all lc_ variables as "de_DE".

There are two databases: temp which has been created with encoding='utf-8',
and temp2 with encoding='iso-8859-1'.

Both databases are completely empty.

The console is running with encoding iso-8859-1. The following commands are
used in a file encoded in iso-8859-1:

set client_encoding='utf-8';
select upper('äöü');

In case the argument to upper() does not come out as expected: it is an
a-umlaut, o-umlaut and u-umlaut.

The following command is then used:

iconv -f iso-8859-1 -t utf-8 | psql temp | iconv -f utf-8 -t iso-8859-1

This converts the iso-8859-1 encoded file from above to utf-8 and converts
the psql output back to iso-8859-1.

For database temp, this yields "äöü" (lower case letters), while for
temp2, it yields "ÄÖÜ" (upper case letters), which is correct.

I did not find a bug report for this problem on pgsql-bugs or with Google.
It seems that there have been problems in the past with multibyte database,
but for 8.1, they have been fixed and using a multibyte database should work
fine.
"Florian Wunderlich" <fwunderlich@factor3.de> writes:
> The following commands are
> used in a file encoded in iso-8859-1:

> set client_encoding='utf-8';
> select upper('äöü');

Isn't that pilot error, plain and simple?  You told the machine your
input is in utf8, not latin1.

            regards, tom lane
Tom Lane wrote:
> "Florian Wunderlich" <fwunderlich@factor3.de> writes:
>> The following commands are
>> used in a file encoded in iso-8859-1:
>
>> set client_encoding='utf-8';
>> select upper('äöü');
>
> Isn't that pilot error, plain and simple?  You told the machine your
> input is in utf8, not latin1.
>
>             regards, tom lane

I used iconv to convert the iso-8859-1 to utf-8. This comes a few lines
below those you have quoted.

The file is encoded in iso-8859-1, but contains instructions to set the
client_encoding to utf-8. The whole file is then converted to utf-8
(iconv -f iso-8859-1 -t utf-8 converts from iso-8859-1 to utf-8) and
piped into psql, so this is actually correct.

Besides, if this was the problem, then it should not work with either
database, but it does work with the second database which has iso-8859-1
encoding.

To make this a bit clearer:

SELECT upper(some umlauts) with the same encoding and client_encoding
does not work with a database with encoding='utf-8', but does work with
a database with encoding='iso-8859-1'.

Note that at no point data is actually read from the database; the
upper() function is applied to user supplied input, which is the same
for both databases.

If this is all too confusing I will write a simple test case as bash script.

Thanks for the quick reply,

F. Wunderlich