Re: Pg_upgrade and collation - Mailing list pgsql-docs

From Peter Geoghegan
Subject Re: Pg_upgrade and collation
Date
Msg-id CAH2-WzmaWsucQTFtg7gKS95xu=eTiKtPCWxf9fjJtNK7=+MxkQ@mail.gmail.com
Whole thread Raw
In response to Re: Pg_upgrade and collation  (Bruce Momjian <bruce@momjian.us>)
Responses Re: Pg_upgrade and collation
List pgsql-docs
On Tue, Jun 28, 2016 at 3:20 PM, Bruce Momjian <bruce@momjian.us> wrote:
>> I have long advocated adopting ICU as our defacto standard "collation
>> provider", primarily so that we can directly control collations and
>> collation versioning. I think that doing this would solve many
>> problems. Besides, even SQLite has optional ICU support. PostgreSQL is
>> the only major database system that I'm aware of that relies on
>> operating system collations exclusively.
>
> I am hopeful ICU has improved enough since we last researched that
> support for it will soon be added.

There is a patch available that is not ready to be submitted, and
doesn't have a real advocate, but is at least enough to convince me
that it's very doable. Performance is certainly no impediment to
adopting ICU, even without considering that it effectively
re-introduces abbreviated keys for text when the C collation is not
used.

The best argument for ICU is the evidently lax attitude that the glibc
people have towards the correctness and consistency of their
collations:

https://bugzilla.redhat.com/show_bug.cgi?id=1320356#c3

Here, Carlos O'Donnell, a glic committer, says "Regarding (b), the
collations in glibc may change from build to build depending on
changes in the algorithms or locales. You cannot rely on the collation
stay the same once the process exits (nor can you rely upon it via a
shared memory mapping to another process sorting strings in memory)".
Frankly, we have no excuse for not heeding his warning.

I'm not annoyed at the glibc people for taking this position. There
is, quite simply, a misalignment of incentives. For the glibc people,
the assumption is that any problem with collations leads only to
slight annoyance from end users, as when the GUI produces subtly wrong
ordering. Whereas, for us, any inconsistency is an extremely serious
problem. Here we have the maintainers of glibc telling us that they
feel like it's okay that that can happen at any time. Surely that
isn't good enough.

ICU as a project has every incentive to see things the same way as we
do. The library explicitly decouples collation rule versions from
algorithm versions. All of this is carefully considered, for the
benefit of the numerous major database systems that use ICU.

--
Peter Geoghegan


pgsql-docs by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Pg_upgrade and collation
Next
From: Alvaro Herrera
Date:
Subject: Re: Pg_upgrade and collation