Re: Built-in case-insensitive collation pg_unicode_ci - Mailing list pgsql-hackers

From Peter Eisentraut
Subject Re: Built-in case-insensitive collation pg_unicode_ci
Date
Msg-id 76d9a422-2e15-4300-9c6d-47a7c3d00caa@eisentraut.org
Whole thread Raw
In response to Built-in case-insensitive collation pg_unicode_ci  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
On 20.09.25 02:21, Jeff Davis wrote:
> New builtin case-insensitive collation PG_UNICODE_CI, where the
> ordering semantics are just:
> 
>     strcmp(CASEFOLD(arg1), CASEFOLD(arg2))
> 
> and the character semantics are the same as PG_UNICODE_FAST.

If it's a variant of PG_UNICODE_FAST, then it ought to be called 
PG_UNICODE_FAST_CI or similar.  Otherwise, one would expect it to be a 
variant of PG_UNICODE (if that existed, but there is also UNICODE).

But that name is also dubious since you later write that it's not 
actually fast.

> Non-deterministic collations cannot be used by SIMILAR TO, and may
> cause problems for ILIKE and regexes. The reason is that pattern
> matching often depends on the character-by-character semantics, but ICU
> collations aren't constrained enough for these semantics to work.

This reasoning is a bit narrow.  SIMILAR TO is kind of deprecated, and 
ILIKE is kind of stupid, and regexes have their own way to control 
case-sensitivity.

Nevertheless, I think there would be some value to provide CI (and maybe 
accent-insensitive?) collations that operate separately from the 
"nondeterministic" mechanism.  But then I would like to see a 
comprehensive approach that covers a variety of providers and locales. 
For example, I would expect there to be something like a "sv_SE_CI" 
locale, either available by default or easily created.




pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: doc: create table improvements
Next
From: Philip Alger
Date:
Subject: Re: [PATCH] Add pg_get_trigger_ddl() to retrieve the CREATE TRIGGER statement