On 10/22/2012 12:53 PM, Sebastien FLAESCH wrote:
[Issues with unquoted utf8 identifiers in Windows 1252 locale]
>> I suspect this has something to do with the fact that non-quoted
>> identifiers
>> are converted to lowercase, and because my LC_CTYPE is English_United
>> States.1252,
>> the conversion to lowercase fails...
Quite possibly. The code comment says this:
/* * SQL99 specifies Unicode-aware case normalization, which we don't yet * have the
infrastructurefor. Instead we use tolower() to provide a * locale-aware translation. However, there are
somelocales where this * is not right either (eg, Turkish may do strange things with 'i' and *
'I'). Our current compromise is to use tolower() for characters with * the high bit set, and use an
ASCII-onlydowncasing for 7-bit * characters. */
For now your best bet is probably not to use UTF8 non-ascii chars or to
quote the identifiers.
Given we're calling to_lower() on a single byte in the code referred to,
should we even be doing that when we have a multi-byte encoding and the
high bit is set?
Aside: I'd love to fix up our treatment of identifiers, but there is
probably a LOT of very tedious work involved.
cheers
andrew