Home > mailing lists

Re: Character expansion with ICU collations - Mailing list pgsql-hackers

From	Peter Eisentraut
Subject	Re: Character expansion with ICU collations
Date	June 11, 2021 20:29:54
Msg-id	d75158bd-8f59-0ff9-dd90-f0fcb92e2101@enterprisedb.com Whole thread Raw
In response to	Re: Character expansion with ICU collations ("Finnerty, Jim" <jfinnert@amazon.com>)
List	pgsql-hackers

Tree view

On 11.06.21 22:05, Finnerty, Jim wrote:
>>>
>      You can have these queries return both rows if you use an
>      accent-ignoring collation, like this example in the documentation:
> 
>      CREATE COLLATION ignore_accents (provider = icu, locale =
>      'und-u-ks-level1-kc-true', deterministic = false);
> <<
> 
> Indeed.  Is the dependency between the character expansion capability and accent-insensitive collations documented
anywhere?

The above is merely a consequence of what the default collation elements 
for 'ß' are.

Expansion isn't really a relevant concept in collation.  Any character 
can map to 1..N collation elements.  The collation algorithm doesn't 
care how many it is.

> Can a CI collation be ordered upper case first, or is this a limitation of ICU?

I don't know the authoritative answer to that, but to me it doesn't make 
sense, since the effect of a case-insensitive collation is to throw away 
the third-level weights, so there is nothing left for "upper case first" 
to operate on.

> More generally, is there any interest in leveraging the full power of ICU tailoring rules to get whatever order
someonemay need, subject to the limitations of ICU itself?  what would be required to extend CREATE COLLATION to accept
anoptional sequence of tailoring rules that we would store in the pg_collation catalog and apply along with the
modifiersin the locale string?

yes

pgsql-hackers by date:

From: Robert Haas
Date: 11 June 2021, 20:29:10
Subject: Re: Refactor "mutually exclusive options" error reporting code in parse_subscription_options

From: Alexander Korotkov
Date: 11 June 2021, 20:37:58
Subject: Re: unnesting multirange data types

Re: Character expansion with ICU collations - Mailing list pgsql-hackers

Previous

Next