Thread: BUG #3378: UTF-8 upper() and lower() don't work

BUG #3378: UTF-8 upper() and lower() don't work

From
"Kenneth Christensen"
Date:
The following bug has been logged online:

Bug reference:      3378
Logged by:          Kenneth Christensen
Email address:      kec@mediatorsystems.com
PostgreSQL version: 8.2
Operating system:   Mac OS 10.4.9
Description:        UTF-8 upper() and lower() don't work
Details:

I have a DB where encoding is set to UTF-8.

The DB have some tables where some of the columns (varchar) contains danish
chars.

It looks like lower() and upper() ignores the danish chars when I try to
convert to lowercase or uppercase.

E.g.

Case 1:
-------

Column 'name' contains 'Æble, tørret':

select upper(food_name.name) from food_name

Result: ÆBLE, TøRRET
Expected result: ÆBLE, TØRRET

or

select lower(food_name.name) from food_name

Result: Æble, tørret
Expected result: æble, tørret

Case 2:
-------

Column 'name' contains 'æøå':

select upper(food_name.name) from food_name

Result: æøå
Expected result: ÆØÅ

Case 3:
-------

Column 'name' contains 'ÆØÅ':

select lower(food_name.name) from food_name

Result: ÆØÅ
Expected result: æøå

---

I can see I'm not alone with this kind of bug. This bug is really a big
problem for me.

I really don't want to replace PostgreSQL with MySQL !
Will this bug be fixed very soon?

Best regards

Kenneth Christensen

Re: BUG #3378: UTF-8 upper() and lower() don't work

From
"Pavel Stehule"
Date:
SGVsbG8sCgpZb3UgaGF2ZSB0byB3ZWxsIGluaXRpYWxpemVkIGRhdGFiYXNl
IGNsdXN0ZXIgd2l0aCBjb3JyZWN0IGxvY2FsZXMuCgpJIGRvbid0IGtub3cg
Z29vZCBkYW5pc2ggbG9jYWxlcywgYnV0IEkgZXhwZWN0IHNvIGl0IHdpbGwg
YmUgc2ltaWxhciB3aXRoIGN6ZWNoLgoKbXkgZGF0YWJhc2UgY2x1c3RlciB3
YXMgaW5pdGlhbGlzZWQgd2l0aCBjc19DWi5VVEYtOCBhbmQgZGVmYXVsdApl
bmNvZGluZyBpcyBVVEY4LgoKCiBwb3N0Z3Jlcz0jIHNlbGVjdCBsb3dlcign
xb1MVVTDnSBLxa7FhycpLCB1cHBlcignxb5sdXTDvSBrxa/FiCcpOwogICBs
b3dlciAgIHwgICB1cHBlcgotLS0tLS0tLS0tLSstLS0tLS0tLS0tLQogxb5s
dXTDvSBrxa/FiCB8IMW9TFVUw50gS8WuxYcKCgpDaGVjayB5b3VyIGxvY2Fs
ZXMsIGlmIGlzIFVURjguCgpwb3N0Z3Jlcz0jIFNIT1cgbGNfY29sbGF0ZSA7
CiBsY19jb2xsYXRlCi0tLS0tLS0tLS0tLS0KIGNzX0NaLlVURi04CigxIHJv
dykKClJlZ2FycwpQYXZlbCBTdGVodWxlCgoKMjAwNy82LzEwLCBLZW5uZXRo
IENocmlzdGVuc2VuIDxrZWNAbWVkaWF0b3JzeXN0ZW1zLmNvbT46Cj4KPiBU
aGUgZm9sbG93aW5nIGJ1ZyBoYXMgYmVlbiBsb2dnZWQgb25saW5lOgo+Cj4g
QnVnIHJlZmVyZW5jZTogICAgICAzMzc4Cj4gTG9nZ2VkIGJ5OiAgICAgICAg
ICBLZW5uZXRoIENocmlzdGVuc2VuCj4gRW1haWwgYWRkcmVzczogICAgICBr
ZWNAbWVkaWF0b3JzeXN0ZW1zLmNvbQo+IFBvc3RncmVTUUwgdmVyc2lvbjog
OC4yCj4gT3BlcmF0aW5nIHN5c3RlbTogICBNYWMgT1MgMTAuNC45Cj4gRGVz
Y3JpcHRpb246ICAgICAgICBVVEYtOCB1cHBlcigpIGFuZCBsb3dlcigpIGRv
bid0IHdvcmsKPiBEZXRhaWxzOgo+Cj4gSSBoYXZlIGEgREIgd2hlcmUgZW5j
b2RpbmcgaXMgc2V0IHRvIFVURi04Lgo+Cj4gVGhlIERCIGhhdmUgc29tZSB0
YWJsZXMgd2hlcmUgc29tZSBvZiB0aGUgY29sdW1ucyAodmFyY2hhcikgY29u
dGFpbnMgZGFuaXNoCj4gY2hhcnMuCj4KPiBJdCBsb29rcyBsaWtlIGxvd2Vy
KCkgYW5kIHVwcGVyKCkgaWdub3JlcyB0aGUgZGFuaXNoIGNoYXJzIHdoZW4g
SSB0cnkgdG8KPiBjb252ZXJ0IHRvIGxvd2VyY2FzZSBvciB1cHBlcmNhc2Uu
Cj4KPiBFLmcuCj4KPiBDYXNlIDE6Cj4gLS0tLS0tLQo+Cj4gQ29sdW1uICdu
YW1lJyBjb250YWlucyAnw4ZibGUsIHTDuHJyZXQnOgo+Cj4gc2VsZWN0IHVw
cGVyKGZvb2RfbmFtZS5uYW1lKSBmcm9tIGZvb2RfbmFtZQo+Cj4gUmVzdWx0
OiDDhkJMRSwgVMO4UlJFVAo+IEV4cGVjdGVkIHJlc3VsdDogw4ZCTEUsIFTD
mFJSRVQKPgo+IG9yCj4KPiBzZWxlY3QgbG93ZXIoZm9vZF9uYW1lLm5hbWUp
IGZyb20gZm9vZF9uYW1lCj4KPiBSZXN1bHQ6IMOGYmxlLCB0w7hycmV0Cj4g
RXhwZWN0ZWQgcmVzdWx0OiDDpmJsZSwgdMO4cnJldAo+Cj4gQ2FzZSAyOgo+
IC0tLS0tLS0KPgo+IENvbHVtbiAnbmFtZScgY29udGFpbnMgJ8Omw7jDpSc6
Cj4KPiBzZWxlY3QgdXBwZXIoZm9vZF9uYW1lLm5hbWUpIGZyb20gZm9vZF9u
YW1lCj4KPiBSZXN1bHQ6IMOmw7jDpQo+IEV4cGVjdGVkIHJlc3VsdDogw4bD
mMOFCj4KPiBDYXNlIDM6Cj4gLS0tLS0tLQo+Cj4gQ29sdW1uICduYW1lJyBj
b250YWlucyAnw4bDmMOFJzoKPgo+IHNlbGVjdCBsb3dlcihmb29kX25hbWUu
bmFtZSkgZnJvbSBmb29kX25hbWUKPgo+IFJlc3VsdDogw4bDmMOFCj4gRXhw
ZWN0ZWQgcmVzdWx0OiDDpsO4w6UKPgo+IC0tLQo+Cj4gSSBjYW4gc2VlIEkn
bSBub3QgYWxvbmUgd2l0aCB0aGlzIGtpbmQgb2YgYnVnLiBUaGlzIGJ1ZyBp
cyByZWFsbHkgYSBiaWcKPiBwcm9ibGVtIGZvciBtZS4KPgo+IEkgcmVhbGx5
IGRvbid0IHdhbnQgdG8gcmVwbGFjZSBQb3N0Z3JlU1FMIHdpdGggTXlTUUwg
IQo+IFdpbGwgdGhpcyBidWcgYmUgZml4ZWQgdmVyeSBzb29uPwo+Cj4gQmVz
dCByZWdhcmRzCj4KPiBLZW5uZXRoIENocmlzdGVuc2VuCj4KPiAtLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0oZW5kIG9mIGJyb2FkY2FzdCktLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0KPiBUSVAgOTogSW4gdmVyc2lvbnMgYmVs
b3cgOC4wLCB0aGUgcGxhbm5lciB3aWxsIGlnbm9yZSB5b3VyIGRlc2lyZSB0
bwo+ICAgICAgICBjaG9vc2UgYW4gaW5kZXggc2NhbiBpZiB5b3VyIGpvaW5p
bmcgY29sdW1uJ3MgZGF0YXR5cGVzIGRvIG5vdAo+ICAgICAgICBtYXRjaAo+
Cg==

Re: BUG #3378: UTF-8 upper() and lower() don't work

From
"Thomas H."
Date:
hi kenneth

these special characters work fine here:

select lower('ÆØÅ'), upper('æøå'), lower('Æble, tørret'), upper('Æble,
tørret');

result: æøå    ÆØÅ    æble, tørret    ÆBLE, TØRRET

as pavel hinted, you probably aren't using the proper locale settings

cheers,
thomas

-------- Original Message --------
Subject: Re:[BUGS] BUG #3378: UTF-8 upper() and lower() don't work
From: Pavel Stehule <pavel.stehule@gmail.com>
To: Kenneth Christensen <kec@mediatorsystems.com>
Date: 10.06.2007 15:36

> Hello,
>
> You have to well initialized database cluster with correct locales.
>
> I don't know good danish locales, but I expect so it will be similar
> with czech.
>
> my database cluster was initialised with cs_CZ.UTF-8 and default
> encoding is UTF8.
>
>
> postgres=# select lower('ŽLUTÝ KŮŇ'), upper('žlutý kůň');
>   lower   |   upper
> -----------+-----------
> žlutý kůň | ŽLUTÝ KŮŇ
>
>
> Check your locales, if is UTF8.
>
> postgres=# SHOW lc_collate ;
> lc_collate
> -------------
> cs_CZ.UTF-8
> (1 row)
>
> Regars
> Pavel Stehule
>
>
> 2007/6/10, Kenneth Christensen <kec@mediatorsystems.com>:
>>
>> The following bug has been logged online:
>>
>> Bug reference:      3378
>> Logged by:          Kenneth Christensen
>> Email address:      kec@mediatorsystems.com
>> PostgreSQL version: 8.2
>> Operating system:   Mac OS 10.4.9
>> Description:        UTF-8 upper() and lower() don't work
>> Details:
>>
>> I have a DB where encoding is set to UTF-8.
>>
>> The DB have some tables where some of the columns (varchar) contains
>> danish
>> chars.
>>
>> It looks like lower() and upper() ignores the danish chars when I try to
>> convert to lowercase or uppercase.
>>
>> E.g.
>>
>> Case 1:
>> -------
>>
>> Column 'name' contains 'Æble, tørret':
>>
>> select upper(food_name.name) from food_name
>>
>> Result: ÆBLE, TøRRET
>> Expected result: ÆBLE, TØRRET
>>
>> or
>>
>> select lower(food_name.name) from food_name
>>
>> Result: Æble, tørret
>> Expected result: æble, tørret
>>
>> Case 2:
>> -------
>>
>> Column 'name' contains 'æøå':
>>
>> select upper(food_name.name) from food_name
>>
>> Result: æøå
>> Expected result: ÆØÅ
>>
>> Case 3:
>> -------
>>
>> Column 'name' contains 'ÆØÅ':
>>
>> select lower(food_name.name) from food_name
>>
>> Result: ÆØÅ
>> Expected result: æøå
>>
>> ---
>>
>> I can see I'm not alone with this kind of bug. This bug is really a big
>> problem for me.
>>
>> I really don't want to replace PostgreSQL with MySQL !
>> Will this bug be fixed very soon?
>>
>> Best regards
>>
>> Kenneth Christensen
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 9: In versions below 8.0, the planner will ignore your desire to
>>        choose an index scan if your joining column's datatypes do not
>>        match
>>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings