Thread: BUG #4451: initcap() function capitalizes incorrectly

BUG #4451: initcap() function capitalizes incorrectly

From
"Scott V"
Date:
The following bug has been logged online:

Bug reference:      4451
Logged by:          Scott V
Email address:      datagenic@gmail.com
PostgreSQL version: 8.3.1
Operating system:   Mac OS X 10.5.4
Description:        initcap() function capitalizes incorrectly
Details:

initcap() capitalizes incorrectly when passing strings containing certain
two-byte UTF-8 characters. E.g., when argument = 'mātūrāte', initcap
returns 'MāTūRāTe'. Correct result should be 'Mātūrāte'.

The function appears to be incorrectly interpreting the two-byte chars as
non-alphamueric characters. They are in fact alphanumerics, they just have
diacritical markings.

Re: BUG #4451: initcap() function capitalizes incorrectly

From
Magnus Hagander
Date:
Scott V wrote:
> The following bug has been logged online:
>
> Bug reference:      4451
> Logged by:          Scott V
> Email address:      datagenic@gmail.com
> PostgreSQL version: 8.3.1
> Operating system:   Mac OS X 10.5.4
> Description:        initcap() function capitalizes incorrectly
> Details:
>
> initcap() capitalizes incorrectly when passing strings containing certain
> two-byte UTF-8 characters. E.g., when argument = 'mātūrāte', initcap
> returns 'MāTūRāTe'. Correct result should be 'Mātūrāte'.
>
> The function appears to be incorrectly interpreting the two-byte chars as
> non-alphamueric characters. They are in fact alphanumerics, they just have
> diacritical markings.

What's your setting for lc_collate?

//Magnus

Re: BUG #4451: initcap() function capitalizes incorrectly

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> Scott V wrote:
>> PostgreSQL version: 8.3.1
>> Operating system:   Mac OS X 10.5.4

>> initcap() capitalizes incorrectly when passing strings containing certain
>> two-byte UTF-8 characters. E.g., when argument = 'mātūrāte', initcap
>> returns 'MāTūRāTe'. Correct result should be 'Mātūrāte'.

> What's your setting for lc_collate?

I think actually it's lc_ctype that determines case-folding.  But the
current theory is that Apple's locale support is simply broken for
UTF-8:
http://archives.postgresql.org/pgsql-general/2008-02/msg01072.php
which means that even if Scott had all his settings right, it wouldn't
work :-(  A quick test on OS X here seems to confirm this.

            regards, tom lane

Re: BUG #4451: initcap() function capitalizes incorrectly

From
"Scott Vanderbilt"
Date:
Tm90ZSBzdXJlIHdoYXQgdGhlIGNvcnJlY3Qgc2V0dGluZ3Mgc2hvdWxkIGJl
LCBidXQgb3V0cHV0IGZyb20gU0hPVwpBTEwgaW4gcHNxbCBzYXlzOgoKbGNf
Y29sbGF0ZSAgICAgIEMKbGNfY3R5cGUgICAgICAgQwoKT24gTW9uLCBPY3Qg
NiwgMjAwOCBhdCA1OjM3IEFNLCBUb20gTGFuZSA8dGdsQHNzcy5wZ2gucGEu
dXM+IHdyb3RlOgo+IE1hZ251cyBIYWdhbmRlciA8bWFnbnVzQGhhZ2FuZGVy
Lm5ldD4gd3JpdGVzOgo+PiBTY290dCBWIHdyb3RlOgo+Pj4gUG9zdGdyZVNR
TCB2ZXJzaW9uOiA4LjMuMQo+Pj4gT3BlcmF0aW5nIHN5c3RlbTogICBNYWMg
T1MgWCAxMC41LjQKPgo+Pj4gaW5pdGNhcCgpIGNhcGl0YWxpemVzIGluY29y
cmVjdGx5IHdoZW4gcGFzc2luZyBzdHJpbmdzIGNvbnRhaW5pbmcgY2VydGFp
bgo+Pj4gdHdvLWJ5dGUgVVRGLTggY2hhcmFjdGVycy4gRS5nLiwgd2hlbiBh
cmd1bWVudCA9ICdt4nT7cuJ0ZScsIGluaXRjYXAKPj4+IHJldHVybnMgJ03i
VPtS4lRlJy4gQ29ycmVjdCByZXN1bHQgc2hvdWxkIGJlICdN4nT7cuJ0ZScu
Cj4KPj4gV2hhdCdzIHlvdXIgc2V0dGluZyBmb3IgbGNfY29sbGF0ZT8KPgo+
IEkgdGhpbmsgYWN0dWFsbHkgaXQncyBsY19jdHlwZSB0aGF0IGRldGVybWlu
ZXMgY2FzZS1mb2xkaW5nLiAgQnV0IHRoZQo+IGN1cnJlbnQgdGhlb3J5IGlz
IHRoYXQgQXBwbGUncyBsb2NhbGUgc3VwcG9ydCBpcyBzaW1wbHkgYnJva2Vu
IGZvcgo+IFVURi04Ogo+IGh0dHA6Ly9hcmNoaXZlcy5wb3N0Z3Jlc3FsLm9y
Zy9wZ3NxbC1nZW5lcmFsLzIwMDgtMDIvbXNnMDEwNzIucGhwCj4gd2hpY2gg
bWVhbnMgdGhhdCBldmVuIGlmIFNjb3R0IGhhZCBhbGwgaGlzIHNldHRpbmdz
IHJpZ2h0LCBpdCB3b3VsZG4ndAo+IHdvcmsgOi0oICBBIHF1aWNrIHRlc3Qg
b24gT1MgWCBoZXJlIHNlZW1zIHRvIGNvbmZpcm0gdGhpcy4KPgo+ICAgICAg
ICAgICAgICAgICAgICAgICAgcmVnYXJkcywgdG9tIGxhbmUKPgo=

Re: BUG #4451: initcap() function capitalizes incorrectly

From
Heikki Linnakangas
Date:
Scott Vanderbilt wrote:
> Note sure what the correct settings should be, but output from SHOW
> ALL in psql says:
>
> lc_collate      C
> lc_ctype       C

There's a chapter on locale support in the user manual:

http://www.postgresql.org/docs/8.3/interactive/locale.html

The right setting depends on what language's collation rules you want to
follow. "locale -a" in a shell should list the available options.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com