On 09/21/2014 09:04 PM, Noah Misch wrote:
> On Sun, Sep 21, 2014 at 12:13:25PM -0400, Tom Lane wrote:
>> Noah Misch <noah@leadboat.com> writes:
>>> On Fri, Sep 19, 2014 at 03:15:53PM -0700, Alon wrote:
>>>> The pg_dump file contains this command:
>>>> CREATE DATABASE workgroup WITH TEMPLATE = template0 ENCODING = 'UTF8'
>>>> LC_COLLATE = 'Norwegian (Bokmål)_Norway.1252' LC_CTYPE = 'Norwegian
>>>> (Bokmål)_Norway.1252';
>>
>>> In WIN1252, "e5 6c 29" is "ål)". We're likely failing to set client_encoding
>>> at some essential point in the process.
>>
>> The level of stupidity needed to use non-ASCII characters in a locale name
>> is breathtaking. What were Microsoft thinking?
>
> Windows Vista did deprecate that locale name style in favor of "nb-NO".
> setlocale(LC_x, "") still returns the old style, though. You need PostgreSQL
> built with VS2012 or later to use "nb-NO" style; see IsoLocaleName().
Older versions support "norwegian-bokmal" as an alias, so we can use
that. Unfortunately, the user can't use the alias as a work-around, as
we always use the canonical string returned by setlocale(), regardless
of what alias the user used. But we could use it as a work-around in our
code.
We have a similar mapping for a few country names that have dots in the
name, "Hong Kong S.A.R." , "U.A.E.", and "Macau S.A.R.". This is
slightly different, though. With the dots, the problem is that
setlocale() doesn't accept the country name as argument. With Bokmål,
the problem is that we don't store the locale name in the catalogs
correctly, and hence don't pass correctly to setlocale(). (even if we
stored it correctly when a single encoding is used throughout the
cluster, things will go wrong as soon as you create a database that uses
a non-default encoding).
I think we should map "Norwegian (Bokmål)" to "norwegian-bokmal", so
that the mapping is applied to the return value of setlocale(). So when
initdb does setlocale(NULL, "") to get the locale, we'll return
"norwegian-bokmal", and that gets stored in the catalogs and
postgresql.conf.
Patch for that attached. pg_upgrade canonicalizes locale names by
passing them through setlocale(), before comparing them, so it should
still work. I'm a bit wary of back-patching, though. I think this would
work with existing clusters (as far as they work currently, with the
non-ASCII characters stored in pg_database), but would need some more
testing to be confident.
- Heikki