On 20.09.25 02:21, Jeff Davis wrote:
> New builtin case-insensitive collation PG_UNICODE_CI, where the
> ordering semantics are just:
>
> strcmp(CASEFOLD(arg1), CASEFOLD(arg2))
>
> and the character semantics are the same as PG_UNICODE_FAST.
If it's a variant of PG_UNICODE_FAST, then it ought to be called
PG_UNICODE_FAST_CI or similar. Otherwise, one would expect it to be a
variant of PG_UNICODE (if that existed, but there is also UNICODE).
But that name is also dubious since you later write that it's not
actually fast.
> Non-deterministic collations cannot be used by SIMILAR TO, and may
> cause problems for ILIKE and regexes. The reason is that pattern
> matching often depends on the character-by-character semantics, but ICU
> collations aren't constrained enough for these semantics to work.
This reasoning is a bit narrow. SIMILAR TO is kind of deprecated, and
ILIKE is kind of stupid, and regexes have their own way to control
case-sensitivity.
Nevertheless, I think there would be some value to provide CI (and maybe
accent-insensitive?) collations that operate separately from the
"nondeterministic" mechanism. But then I would like to see a
comprehensive approach that covers a variety of providers and locales.
For example, I would expect there to be something like a "sv_SE_CI"
locale, either available by default or easily created.