Re: Unicode FFFF Special Codepoint should always collate high. - Mailing list pgsql-bugs

From Thomas Munro
Subject Re: Unicode FFFF Special Codepoint should always collate high.
Date
Msg-id CA+hUKGKcTvSMbqTnOcKxyOMAo6fKkc7FW5qNLsoxMyiK6pB=kQ@mail.gmail.com
Whole thread Raw
In response to Re: Unicode FFFF Special Codepoint should always collate high.  (Telford Tendys <psql@lnx-bsp.net>)
Responses Re: Unicode FFFF Special Codepoint should always collate high.  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-bugs
On Wed, Jun 23, 2021 at 9:57 PM Telford Tendys <psql@lnx-bsp.net> wrote:
> I trust those guys, they will figure it out. I strongly predict that
> they will keep the behaviour consistent with RHEL 7.

I'd doubt that.  It's well known that glibc 2.28 (what RHEL8 upgraded
to) included changes that affected everybody by changing the sort
order of common symbols like '-' (though every upgrade potentially
contains subtle changes affecting just a few specific languages), but
I consider the recent big change an improvement because it now agrees
more often with other operating systems and libraries that use CLDR.
Even if you are right that FFFF's sort-high rule should be exposed to
users (need references), RHEL7 was also wrong in that case.

> Is there an easy way to make normal Linux glibc utilities (e.g. sort)
> use a locale from the ICU library? There's a package availble one Centos-8
> here:
>
>     libicu-60.3-2.el8_1.x86_64
>
> Trouble is that only a few applications use it, and I can't find any way
> to plug-in / plug-out this functionality. Introducing postgresql details to
> the bugzilla ticket will muddy the water and create a aura of diffuse
> responsibility. What I've found is generally where there's a lot of words,
> people don't read them.

I don't know, but since you know perl, it might be easy to make a
demonstration with https://metacpan.org/pod/Unicode::ICU::Collator.
Looks as simple as $collator->sort(my_list).



pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #17071: ORDER BY gets ignored when result set has only one row, but another one gets added by rollup()
Next
From: Alexander Korotkov
Date:
Subject: Re: BUG #17066: Cache lookup failed when null (iso-8859-1) is passed as anycompatiblemultirange