Thread: BUG #4451: initcap() function capitalizes incorrectly
The following bug has been logged online: Bug reference: 4451 Logged by: Scott V Email address: datagenic@gmail.com PostgreSQL version: 8.3.1 Operating system: Mac OS X 10.5.4 Description: initcap() function capitalizes incorrectly Details: initcap() capitalizes incorrectly when passing strings containing certain two-byte UTF-8 characters. E.g., when argument = 'mÄtÅ«rÄte', initcap returns 'MÄTÅ«RÄTe'. Correct result should be 'MÄtÅ«rÄte'. The function appears to be incorrectly interpreting the two-byte chars as non-alphamueric characters. They are in fact alphanumerics, they just have diacritical markings.
Scott V wrote: > The following bug has been logged online: > > Bug reference: 4451 > Logged by: Scott V > Email address: datagenic@gmail.com > PostgreSQL version: 8.3.1 > Operating system: Mac OS X 10.5.4 > Description: initcap() function capitalizes incorrectly > Details: > > initcap() capitalizes incorrectly when passing strings containing certain > two-byte UTF-8 characters. E.g., when argument = 'mÄtÅ«rÄte', initcap > returns 'MÄTÅ«RÄTe'. Correct result should be 'MÄtÅ«rÄte'. > > The function appears to be incorrectly interpreting the two-byte chars as > non-alphamueric characters. They are in fact alphanumerics, they just have > diacritical markings. What's your setting for lc_collate? //Magnus
Magnus Hagander <magnus@hagander.net> writes: > Scott V wrote: >> PostgreSQL version: 8.3.1 >> Operating system: Mac OS X 10.5.4 >> initcap() capitalizes incorrectly when passing strings containing certain >> two-byte UTF-8 characters. E.g., when argument = 'mÄtÅ«rÄte', initcap >> returns 'MÄTÅ«RÄTe'. Correct result should be 'MÄtÅ«rÄte'. > What's your setting for lc_collate? I think actually it's lc_ctype that determines case-folding. But the current theory is that Apple's locale support is simply broken for UTF-8: http://archives.postgresql.org/pgsql-general/2008-02/msg01072.php which means that even if Scott had all his settings right, it wouldn't work :-( A quick test on OS X here seems to confirm this. regards, tom lane
Tm90ZSBzdXJlIHdoYXQgdGhlIGNvcnJlY3Qgc2V0dGluZ3Mgc2hvdWxkIGJl LCBidXQgb3V0cHV0IGZyb20gU0hPVwpBTEwgaW4gcHNxbCBzYXlzOgoKbGNf Y29sbGF0ZSAgICAgIEMKbGNfY3R5cGUgICAgICAgQwoKT24gTW9uLCBPY3Qg NiwgMjAwOCBhdCA1OjM3IEFNLCBUb20gTGFuZSA8dGdsQHNzcy5wZ2gucGEu dXM+IHdyb3RlOgo+IE1hZ251cyBIYWdhbmRlciA8bWFnbnVzQGhhZ2FuZGVy Lm5ldD4gd3JpdGVzOgo+PiBTY290dCBWIHdyb3RlOgo+Pj4gUG9zdGdyZVNR TCB2ZXJzaW9uOiA4LjMuMQo+Pj4gT3BlcmF0aW5nIHN5c3RlbTogICBNYWMg T1MgWCAxMC41LjQKPgo+Pj4gaW5pdGNhcCgpIGNhcGl0YWxpemVzIGluY29y cmVjdGx5IHdoZW4gcGFzc2luZyBzdHJpbmdzIGNvbnRhaW5pbmcgY2VydGFp bgo+Pj4gdHdvLWJ5dGUgVVRGLTggY2hhcmFjdGVycy4gRS5nLiwgd2hlbiBh cmd1bWVudCA9ICdt4nT7cuJ0ZScsIGluaXRjYXAKPj4+IHJldHVybnMgJ03i VPtS4lRlJy4gQ29ycmVjdCByZXN1bHQgc2hvdWxkIGJlICdN4nT7cuJ0ZScu Cj4KPj4gV2hhdCdzIHlvdXIgc2V0dGluZyBmb3IgbGNfY29sbGF0ZT8KPgo+ IEkgdGhpbmsgYWN0dWFsbHkgaXQncyBsY19jdHlwZSB0aGF0IGRldGVybWlu ZXMgY2FzZS1mb2xkaW5nLiAgQnV0IHRoZQo+IGN1cnJlbnQgdGhlb3J5IGlz IHRoYXQgQXBwbGUncyBsb2NhbGUgc3VwcG9ydCBpcyBzaW1wbHkgYnJva2Vu IGZvcgo+IFVURi04Ogo+IGh0dHA6Ly9hcmNoaXZlcy5wb3N0Z3Jlc3FsLm9y Zy9wZ3NxbC1nZW5lcmFsLzIwMDgtMDIvbXNnMDEwNzIucGhwCj4gd2hpY2gg bWVhbnMgdGhhdCBldmVuIGlmIFNjb3R0IGhhZCBhbGwgaGlzIHNldHRpbmdz IHJpZ2h0LCBpdCB3b3VsZG4ndAo+IHdvcmsgOi0oICBBIHF1aWNrIHRlc3Qg b24gT1MgWCBoZXJlIHNlZW1zIHRvIGNvbmZpcm0gdGhpcy4KPgo+ICAgICAg ICAgICAgICAgICAgICAgICAgcmVnYXJkcywgdG9tIGxhbmUKPgo=
Scott Vanderbilt wrote: > Note sure what the correct settings should be, but output from SHOW > ALL in psql says: > > lc_collate C > lc_ctype C There's a chapter on locale support in the user manual: http://www.postgresql.org/docs/8.3/interactive/locale.html The right setting depends on what language's collation rules you want to follow. "locale -a" in a shell should list the available options. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com