Re: [BUGS] Crash report for some ICU-52 (debian8) COLLATE andwork_mem values - Mailing list pgsql-bugs

From Peter Geoghegan
Subject Re: [BUGS] Crash report for some ICU-52 (debian8) COLLATE andwork_mem values
Date
Msg-id CAH2-Wzm=HJ6_TXjftfXv+Nk69xBvRd=Pc8N0BPy+oHzjq-Gw=Q@mail.gmail.com
Whole thread Raw
In response to Re: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [BUGS] Crash report for some ICU-52 (debian8) COLLATE andwork_mem values  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
List pgsql-bugs
On Mon, Aug 7, 2017 at 12:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Well, the fact that they're "redundant" doesn't really help you if
> you can't pg_upgrade because the collation name you chose in v10 is
> not present in initdb's results in v11.  So this is still a serious
> issue to my mind.

I agree.

Even MongoDB has ICU support these days. They specifically document
which collations are supported. It's just the same for DB2, and other
systems that build their collations on ICU. Users do not "use the ICU
collations" on these other systems. They simply use the collations
that are available, choosing from a list in the documentation, or
possibly create their own collations with their own customization.

The ICU collations are based on the CLDR data and an IETF standard's
idea of a locale identifier [1], so in an important sense they're
supposed to be universal; they're not tied to ICU in particular. This
is probably why ICU is ridiculously forgiving of alternate collation
names, and will not throw an error if you specify an ICU collation
name that is total garbage within CREATE COLLATION (there is a
Postgres regression test that proves this for ICU, actually): As far
as ICU is concerned, this may be coming from input from an end user
over the web, where it makes sense to be so forgiving.

Even stuff like the names for emoji collations, or phonebook
collations, are covered by a standard, though it's not quite an IETF
standard. RFC 6067 says that the CLDR data is the authoritative source
of which variant subtags are allowed, and ICU uses CLDR, from the
Unicode consortium.

We need to move further away from the idea that there are ICU
collations just like there are libc collations.

[1] https://www.rfc-editor.org/rfc/bcp/bcp47.txt
-- 
Peter Geoghegan


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

pgsql-bugs by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [BUGS] BUG #14771: "Logical decoding" does not cover the impactof "TRUNCATE TABLE" command
Next
From: Andres Freund
Date:
Subject: Re: [BUGS] BUG #14771: "Logical decoding" does not cover the impactof "TRUNCATE TABLE" command