Re: Add CASEFOLD() function. - Mailing list pgsql-hackers

From Robert Treat
Subject Re: Add CASEFOLD() function.
Date
Msg-id CABV9wwOQQ8y_+cdaH9awN1_gcoHYbncznQgPoLpbm5k+AtLR3w@mail.gmail.com
Whole thread Raw
In response to Re: Add CASEFOLD() function.  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
On Thu, Jun 19, 2025 at 12:33 PM Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Thu, 2025-06-19 at 16:36 +0100, Thom Brown wrote:
> > Ease of use, perhaps. It seems easier to use:
> >
> > column_name cftext
> >
> > rather than:
> >
> > CREATE COLLATION case_insensitive_collation (
> >     PROVIDER = icu,
> >     LOCALE = 'und-u-ks-level2',
> >     DETERMINISTIC = FALSE
> > );
>
> We could auto-create such a collation at initdb time for ICU-enabled
> builds.
>

Providing a generic insensitive/non-deterministic collation by default
would solve a number of different use cases, so +1 on the idea from
me.
And TBH I usually build --without-icu but this would likely cause me
to change that.

> > But I see the arguments against it. It creates an unnecessary
> > dependency on an extension, and if someone wants to ignore both case
> > and accents, they may resort to using 2 extensions (citext +
> > unaccent)
> > when none are needed.
>
> There are at least three ways to do case insensitivity (or other kinds
> of equivalence):
>
> * Explicit function calls in queries, as well as index and constraint
> definitions. E.g. expression index on LOWER(), queries that explicitly
> do "LOWER(x) = ..."
>
> * Wrap those function calls up in a separate data type, like citext.
>
> * Non-deterministic collations.
>
> Given that we have collations, which are a way of organizing alternate
> behaviors for existing data types, I'm not sure I see the need for
> creating an entirely separate data type.
>
> > I guess I don't feel strongly about it either
> > way.
>
> Are you a user of citext? I'm genuinely interested in the use cases,
> and whether the separate-data-type approach has merits that are missing
> in the other approaches.
>

Yeah, I'd be interested to hear if there is some missing bit that
existing users have concerns over; as a former user of citext, it was
a great workaround at the time, but there are "better ways" to handle
those things now (imho).


Robert Treat
https://xzilla.net



pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: Add CASEFOLD() function.
Next
From: Jeff Davis
Date:
Subject: Re: Improve the performance of Unicode Normalization Forms.