Home > mailing lists

Re: Corruption of multibyte identifiers on UTF-8 locale - Mailing list pgsql-bugs

From	Victor Snezhko
Subject	Re: Corruption of multibyte identifiers on UTF-8 locale
Date	September 23, 2006 14:34:07
Msg-id	uu02ylc78.fsf@indorsoft.ru Whole thread Raw
In response to	Re: Corruption of multibyte identifiers on UTF-8 locale (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Corruption of multibyte identifiers on UTF-8 locale
List	pgsql-bugs

Tree view

Tom Lane <tgl@sss.pgh.pa.us> writes:

>> correct utf-8 byte sequence is 0xd18231, so it looks like we call
>> tolower() somewhere on parts of multibyte characters, and it does the
>> same as isspace() - it interprets it's argument as wide character, and
>> converts it.
>
> Indeed, and I am certainly wondering why we should not just say that
> you've got a broken locale definition there.  There is absolutely no
> doubt that the ctype.h functions are defined to work on char, not
> wchar.

Agreed, but such corruption indicates that there is non-multibyte-safe
(octet-wise) case conversion somewhere, at best (with fully working
locale) it will cause case conversion to do nothing instead of actual
conversion.

> They have no business mangling high-bit-set bytes in a multibyte
> encoding.

--
WBR, Victor V. Snezhko
E-mail: snezhko@indorsoft.ru

pgsql-bugs by date:

From: Tom Lane
Date: 23 September 2006, 13:36:54
Subject: Re: Corruption of multibyte identifiers on UTF-8 locale

From: Tom Lane
Date: 23 September 2006, 14:44:40
Subject: Re: Corruption of multibyte identifiers on UTF-8 locale

Re: Corruption of multibyte identifiers on UTF-8 locale - Mailing list pgsql-bugs

Previous

Next