Tom, Andrew,
We have the same issue in our product: Support UTF-8 on Windows.
You know certainly that UTF-8 code page (65001) is no supported by MS Windows
when you set the locale with setlocale(). You cannot rely on standard libc
functions such as isalpha(), mbtowc(), mbstowc(), wctomb(), wcstombs(),
strcoll(), which depend on the current locale.
You should start to centralize all basic character-set related functions
(upper/lower, comparison, etc) in a library, to ease the port on Windows.
Then convert UTF-8 data to wide char and call wide char functions.
For example, to implement an uppercase() function:
1) Convert UTF-8 to Wide Char (algorithm can be easily found)
2) Use towupper()
3) Convert Wide Char result to UTF-8 (algorithm can be easily found)
To compare characters:
1) Convert s1 in UTF-8 to Wide Char => wcs1
2) Convert s2 in UTF-8 to Wide Char => wcs2
3) Use wcscoll(wcs1, wcs2)
Regards,
Seb
On 11/21/2012 06:07 PM, Tom Lane wrote:
> Andrew Dunstan<andrew@dunslane.net> writes:
>> On 11/21/2012 11:11 AM, Tom Lane wrote:
>>> I'm not sure that's the only place we're doing this ...
>
>> Oh, Hmm, darn. Where else do you think we might?
>
> Dunno, but grepping for isupper and/or tolower should find any such
> places.
>
> regards, tom lane
>