Re: Database object names and libpq in UTF-8 locale on Windows - Mailing list pgsql-hackers

From Sebastien FLAESCH
Subject Re: Database object names and libpq in UTF-8 locale on Windows
Date
Msg-id 50ADFDEB.4050103@4js.com
Whole thread Raw
In response to Re: Database object names and libpq in UTF-8 locale on Windows  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom, Andrew,

We have the same issue in our product: Support UTF-8 on Windows.

You know certainly that UTF-8 code page (65001) is no supported by MS Windows
when you set the locale with setlocale(). You cannot rely on standard libc
functions such as isalpha(), mbtowc(), mbstowc(), wctomb(), wcstombs(),
strcoll(), which depend on the current locale.

You should start to centralize all basic character-set related functions
(upper/lower, comparison, etc) in a library, to ease the port on Windows.

Then convert UTF-8 data to wide char and call wide char functions.

For example, to implement an uppercase() function:

1) Convert UTF-8 to Wide Char (algorithm can be easily found)
2) Use towupper()
3) Convert Wide Char result to UTF-8 (algorithm can be easily found)

To compare characters:

1) Convert s1 in UTF-8 to Wide Char => wcs1
2) Convert s2 in UTF-8 to Wide Char => wcs2
3) Use wcscoll(wcs1, wcs2)

Regards,
Seb

On 11/21/2012 06:07 PM, Tom Lane wrote:
> Andrew Dunstan<andrew@dunslane.net>  writes:
>> On 11/21/2012 11:11 AM, Tom Lane wrote:
>>> I'm not sure that's the only place we're doing this ...
>
>> Oh, Hmm, darn. Where else do you think we might?
>
> Dunno, but grepping for isupper and/or tolower should find any such
> places.
>
>             regards, tom lane
>




pgsql-hackers by date:

Previous
From: Chen Huajun
Date:
Subject: fix ecpg core dump when there's a very long struct variable name in .pgc file
Next
From: Pavel Stehule
Date:
Subject: review: Deparsing DDL command strings