Home > mailing lists

Character expansion with ICU collations - Mailing list pgsql-hackers

From	Finnerty, Jim
Subject	Character expansion with ICU collations
Date	June 9, 2021 18:31:33
Msg-id	5FF0F55B-1593-4FBC-A81A-7F3F6D1E4388@amazon.com Whole thread Raw
Responses	Re: Character expansion with ICU collations
List	pgsql-hackers

Tree view

Is there a way to get ‘character expansions’ with the ICU collations that are available in PostgreSQL?

Using this example on a database with UTF-8 encoding:

CREATE COLLATION CI_AS (provider = icu, locale=’utf8@colStrength=secondary’, deterministic = false);

CREATE TABLE MyTable3

(

ID INT IDENTITY(1, 1),

Comments VARCHAR(100)

)

INSERT INTO MyTable3 (Comments) VALUES ('strasse')

INSERT INTO MyTable3 (Comments) VALUES ('straße')

SELECT * FROM MyTable3 WHERE Comments COLLATE CI_AS = 'strasse'

SELECT * FROM MyTable3 WHERE Comments COLLATE CI_AS = 'straße'

We would like to control whether each SELECT statement finds both records (because the sort key of ‘ß’ equals the sort key of ‘ss’), or whether each SELECT statement finds just one record. ICU supports character expansions and other tailorings that support advanced features like changing the collation order for specific characters, and while CREATE COLLATION doesn’t expose tailoring directives that do either character expansion or specific character reorderings (other than @colReorder to reorder entire categories of characters such as Greek vs Roman) , it seems to be the expectation that many <language> <country> pairs such as en_US should already cause ‘ß’ to match ‘ss’, not just to have them sort close together (which they do).

If PostgreSQL supports character expansion with ICU collations, can someone provide an example where 'strasse' = 'straße'?

pgsql-hackers by date:

From: Alvaro Herrera
Date: 09 June 2021, 18:29:17
Subject: Re: Decoding speculative insert with toast leaks memory

From: Matthias van de Meent
Date: 09 June 2021, 18:42:34
Subject: Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

Character expansion with ICU collations - Mailing list pgsql-hackers

Previous

Next