Is there a way to get ‘character expansions’ with the ICU collations that are available in PostgreSQL?
Using this example on a database with UTF-8 encoding:
CREATE COLLATION CI_AS (provider = icu, locale=’utf8@colStrength=secondary’, deterministic = false);
CREATE TABLE MyTable3
(
ID INT IDENTITY(1, 1),
Comments VARCHAR(100)
)
INSERT INTO MyTable3 (Comments) VALUES ('strasse')
INSERT INTO MyTable3 (Comments) VALUES ('straße')
SELECT * FROM MyTable3 WHERE Comments COLLATE CI_AS = 'strasse'
SELECT * FROM MyTable3 WHERE Comments COLLATE CI_AS = 'straße'
We would like to control whether each SELECT statement finds both records (because the sort key of ‘ß’ equals the sort key of ‘ss’), or whether each SELECT statement finds just one record. ICU supports character expansions and other tailorings that support advanced features like changing the collation order for specific characters, and while CREATE COLLATION doesn’t expose tailoring directives that do either character expansion or specific character reorderings (other than @colReorder to reorder entire categories of characters such as Greek vs Roman) , it seems to be the expectation that many <language> <country> pairs such as en_US should already cause ‘ß’ to match ‘ss’, not just to have them sort close together (which they do).
If PostgreSQL supports character expansion with ICU collations, can someone provide an example where 'strasse' = 'straße'?