Re: Character expansion with ICU collations - Mailing list pgsql-hackers

From Finnerty, Jim
Subject Re: Character expansion with ICU collations
Date
Msg-id 1C0E0C6B-2CC1-4711-9FBE-7EC6B1F6A4F5@amazon.com
Whole thread Raw
In response to Character expansion with ICU collations  ("Finnerty, Jim" <jfinnert@amazon.com>)
Responses Re: Character expansion with ICU collations
List pgsql-hackers
>>
    You can have these queries return both rows if you use an
    accent-ignoring collation, like this example in the documentation:

    CREATE COLLATION ignore_accents (provider = icu, locale =
    'und-u-ks-level1-kc-true', deterministic = false);
<<

Indeed.  Is the dependency between the character expansion capability and accent-insensitive collations documented
anywhere?

Another unexpected dependency appears to be @colCaseFirst=upper.  If specified in combination with
colStrength=secondary,it appears that the upper/lower case ordering is random within a group of characters that are
secondaryequal, e.g. 'A' < 'a', but 'b' < 'B', 'c' < 'C', ...  , but then 'L' < 'l'.  It is not even consistently
orderedwith respect to case.  If I make it a nondeterministic CS_AI collation, then it sorts upper before lower
consistently. The rule seems to be that you can't sort by case within a group that is case-insensitive. 
 

Can a CI collation be ordered upper case first, or is this a limitation of ICU?

For example, this is part of the sort order that I'd like to achieve with ICU, with the code point in column 1 and
dense_rank()shown in the rightmost column indicating that 'b' = 'B', for example:
 

66    B    B    138    151
98    b    b    138    151   <- so within a group that is CI_AS equal, the sort order needs to be upper case first
67    C    C    139    152
99    c    c    139    152
199    Ç    Ç    140    153
231    ç    ç    140    153
68    D    D    141    154
100    d    d    141    154
208    Ð    Ð    142    199
240    ð    ð    142    199
69    E    E    143    155
101    e    e    143    155

Can this sort order be achieved with ICU?

More generally, is there any interest in leveraging the full power of ICU tailoring rules to get whatever order someone
mayneed, subject to the limitations of ICU itself?  what would be required to extend CREATE COLLATION to accept an
optionalsequence of tailoring rules that we would store in the pg_collation catalog and apply along with the modifiers
inthe locale string?
 

    /Jim




pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Question about StartLogicalReplication() error path
Next
From: Alexander Korotkov
Date:
Subject: Re: doc issue missing type name "multirange" in chapter title