On 09/22/2014 06:15 PM, Heikki Linnakangas wrote:
> Patch for that attached. pg_upgrade canonicalizes locale names by
> passing them through setlocale(), before comparing them, so it should
> still work. I'm a bit wary of back-patching, though. I think this would
> work with existing clusters (as far as they work currently, with the
> non-ASCII characters stored in pg_database), but would need some more
> testing to be confident.
This seems the best way to fix this in master, but there's a problem if
we backpatch this. If existing databases in the cluster already have
"Norwegian (Bokmål)" as the locale, and you update the binaries and try
to create a new database:
postgres=# create database foodb;
ERROR: new collation (norwegian-bokmal_Norway.1252) is incompatible
with the co llation of the template database (Norwegian
(Bokmål)_Norway.1252)
HINT: Use the same collation as in the template database, or use
template0 as t emplate.
That's straightforward to fix; instead of doing a straight strcmp() to
check if the locales are the same, canonicalize them by calling
check_locale first. Attached patch does that.
After this, it's a bit strange that newly created databases use
"norwegian-bokmal" as the locale, while old ones use "Norwegian (Bokmål)":
foodb=# select datname, encoding, datcollate from pg_database;
datname | encoding | datcollate
-----------+----------+--------------------------------
template1 | 24 | Norwegian (Bokmål)_Norway.1252
template0 | 24 | Norwegian (Bokmål)_Norway.1252
postgres | 24 | Norwegian (Bokmål)_Norway.1252
foodb | 24 | norwegian-bokmal_Norway.1252
utf8db | 6 | norwegian-bokmal_Norway.1252
(5 rows)
But we know those non-ASCII characters are problematic, so I think this
is an improvement even in old clusters. At least you won't get any more
of them. You could also UPDATE pg_database manually to fix that in an
existing cluster.
One more problem: pg_upgrade doesn't canonicalize locale names either,
so you get:
> lc_collate cluster values do not match: old "Norwegian (BokmÕl)_Norway.1252", n
> ew "norwegian-bokmal_Norway.1252"
>
> Failure, exiting
Bruce: do you think it would be OK to canonicalize the locale names
before comparing? pg_upgrade already has a function to canonicalize, but
it's only used when upgrading from a pre-9.2 server; locale names on
newer versions are assumed to be already in canonical form.
Alternatively, we could not bother with changing pg_upgrade or CREATE
DATABASE, and instead instruct Bokmål users to do the manual UPDATE of
pg_database in the release notes. That might be the most robust
solution, if there are more cases where we compare locales that I've missed.
- Heikki