Sorting by respecting diacritics/accents - Mailing list pgsql-general

From JānisE
Subject Sorting by respecting diacritics/accents
Date
Msg-id 1814742357.238678.1753437917809@w8
Whole thread Raw
Responses Re: Sorting by respecting diacritics/accents
List pgsql-general
Hello!

I seem to not be able to get PostgreSQL to sort rows by a string column respecting the diacritics.

I read [1] that it's possible to define a custom collation having collation strength "ks" set to "level2", which would mean that it's accent-sensitive.

However, when I try to actually sort using that collation, the order seem to be accent-insensitive.

For example:

CREATE TABLE test (string text);
INSERT INTO test VALUES ('bar'), ('bat'), ('bär');
CREATE COLLATION "und1" (provider = icu, deterministic = false, locale = 'und-u-ks-level1');
CREATE COLLATION "und2" (provider = icu, deterministic = false, locale = 'und-u-ks-level2');
CREATE COLLATION "und3" (provider = icu, deterministic = false, locale = 'und-u-ks-level3');
SELECT * FROM test ORDER BY string collate "und1";
SELECT * FROM test ORDER BY string collate "und2";
SELECT * FROM test ORDER BY string collate "und3";

All three collations give me the same order: bar < bär < bat, although an accent-sensitive order would be bar < bat < bär

This does lose "bär", meaning that those strength levels do have some kind of an effect on "DISTINCT":
SELECT DISTINCT string COLLATE "und1" FROM test;

But it's not working on "ORDER BY".

Do I misunderstand the collation capabilities? Is there a way to actually get an accent-sensitive order?

Also, is there a way to see what options are there for the default built-in collations? I don't see, for example, the used "ks" level in the "pg_collation" table data.

Best regards,
Janis

[1]  https://www.postgresql.org/docs/current/collation.html#ICU-COLLATION-COMPARISON-LEVELS

pgsql-general by date:

Previous
From: "Pierre Barre"
Date:
Subject: Re: PostgreSQL on S3-backed Block Storage with Near-Local Performance
Next
From: Mark
Date:
Subject: Upper / lower case keys in JSON objects