Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47language tags. Should it? - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47language tags. Should it?
Date
Msg-id CAH2-WznUjv8F0_0D-DMzYOAX8q2CdvEQKZ9PQEhcmnJ8JvHSxQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47language tags. Should it?  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Responses Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47language tags. Should it?  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Mon, Sep 25, 2017 at 11:40 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 9/22/17 16:46, Peter Geoghegan wrote:
>> But you are *already* canonicalizing ICU collation names as BCP 47. My
>> point here is: Why not finish the job off, and *also* canonicalize
>> colcollate in the same way? This won't break ucol_open() if we take
>> appropriate precautions when we go to use the Postgres collation/ICU
>> locale.
>
> Reading over this code again, it is admittedly not quite clear why this
> "canonicalization" is in there right now.  I think it had something to
> do with how we built the keyword variants at one point.  It might not
> make sense.  I'd be glad to take that out and use the result straight
> from uloc_getAvailable() for collcollate.  That is, after all, the
> "canonical" version that ICU chooses to report to us.

But then our users categorically have to know about both formats,
without any practical benefit to make up for it. You will also get
people that don't realize that only one format is supported on some
versions if go this way.

>> One concern that makes me suggest this is: What happens when
>> the user *downgrades* ICU version, from a version where colcollate is
>> BCP 47 to one where it would not have been at initdb time? That will
>> break the downgrade in an unpleasant way, including in installations
>> that never do a CREATE COLLATION themselves. We want to be able to
>> restore a basebackup on a somewhat different OS, and have that still
>> work following REINDEX. At least, that seems like it should be an
>> important goal for us.
>
> This is an interesting point, and my proposal above would fix that.

I've already written a patch to standardize collcollate. If we do the
way you describe above instead, then what happens when ICU finally
removes the already deprecated legacy format?

> However, I think that taking a PostgreSQL data directory and moving or
> copying it to an *older* OS installation is always going to have a
> potential for problems.  So I wouldn't spend a huge amount of effort
> just to fix this specific case.

The downgrade thing is just the simplest, most immediate example of
where failing to standardize collcollate now could cause problems.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: [HACKERS] Built-in plugin for logical decoding output
Next
From: Christopher Browne
Date:
Subject: Re: [HACKERS] Built-in plugin for logical decoding output