Thread: pg_upgrade, locale and encoding

pg_upgrade, locale and encoding

From
Heikki Linnakangas
Date:
While looking at bug #11431, I noticed that pg_upgrade still seems to
think that encoding and locale are cluster-wide properties. We got
per-database locale support in 8.4, and encoding has been per-database
much longer than that.

pg_upgrade checks the encoding and locale of template0 in both clusters,
and throws an error if they don't match. But it doesn't check the locale
or encoding of postgres or template1 databases. That leads to problems
if e.g. the postgres database was dropped and recreated with a different
encoding or locale in the old cluster. We will merrily upgrade it, but
strings in the database will be incorrectly encoded.

I propose the attached patch, for git master. It's more complicated in
back-branches, as they still support upgrading from pre-8.4 clusters. We
haven't heard any complaints from the field on this, so I don't think
it's worth trying to back-patch this.

This slightly changes the way the locale comparison works. First, it
ignores the encoding suffix of the locale name. It's of course important
that the databases have a compatible encoding, but pg_database has a
separate field for encoding, and that's now compared directly. Secondly,
it tries to canonicalize the names, by calling setlocale(). That seems
like a good idea, in response to bug #11431
(http://www.postgresql.org/message-id/5424090E.9060700@vmware.com).

- Heikki

Attachment

Re: pg_upgrade, locale and encoding

From
Bruce Momjian
Date:
On Tue, Oct  7, 2014 at 03:52:24PM +0300, Heikki Linnakangas wrote:
> While looking at bug #11431, I noticed that pg_upgrade still seems
> to think that encoding and locale are cluster-wide properties. We
> got per-database locale support in 8.4, and encoding has been
> per-database much longer than that.
> 
> pg_upgrade checks the encoding and locale of template0 in both
> clusters, and throws an error if they don't match. But it doesn't
> check the locale or encoding of postgres or template1 databases.
> That leads to problems if e.g. the postgres database was dropped and
> recreated with a different encoding or locale in the old cluster. We
> will merrily upgrade it, but strings in the database will be
> incorrectly encoded.

Wow, I never thought someone would do that, but they certainly could ---
good catch.

> I propose the attached patch, for git master. It's more complicated
> in back-branches, as they still support upgrading from pre-8.4
> clusters. We haven't heard any complaints from the field on this, so
> I don't think it's worth trying to back-patch this.

Agreed.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +