Re: Database object names and libpq in UTF-8 locale on Windows - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: Database object names and libpq in UTF-8 locale on Windows
Date
Msg-id 50858444.7050500@dunslane.net
Whole thread Raw
In response to Re: Database object names and libpq in UTF-8 locale on Windows  (Sebastien FLAESCH <sf@4js.com>)
Responses Re: Database object names and libpq in UTF-8 locale on Windows  (Andrew Dunstan <andrew@dunslane.net>)
List pgsql-hackers
On 10/22/2012 12:53 PM, Sebastien FLAESCH wrote:

[Issues with unquoted utf8 identifiers in Windows 1252 locale]

>> I suspect this has something to do with the fact that non-quoted
>> identifiers
>> are converted to lowercase, and because my LC_CTYPE is English_United
>> States.1252,
>> the conversion to lowercase fails...


Quite possibly. The code comment says this:
        /*         * SQL99 specifies Unicode-aware case normalization, which we   don't yet         * have the
infrastructurefor.  Instead we use tolower() to   provide a         * locale-aware translation.  However, there are
somelocales   where this         * is not right either (eg, Turkish may do strange things with   'i' and         *
'I'). Our current compromise is to use tolower() for   characters with         * the high bit set, and use an
ASCII-onlydowncasing for 7-bit         * characters.         */
 

For now your best bet is probably not to use UTF8 non-ascii chars or to 
quote the identifiers.

Given we're calling to_lower() on a single byte in the code referred to, 
should we even be doing that when we have a multi-byte encoding and the 
high bit is set?

Aside: I'd love to fix up our treatment of identifiers, but there is 
probably a LOT of very tedious work involved.

cheers

andrew





pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Successor of MD5 authentication, let's use SCRAM
Next
From: Robert Haas
Date:
Subject: Re: ToDo: KNN Search should to support DISTINCT clasuse?