Home > mailing lists

Re: Use CASEFOLD() internally rather than LOWER() - Mailing list pgsql-hackers

From	Mark Dilger
Subject	Re: Use CASEFOLD() internally rather than LOWER()
Date	March 26 03:01:26
Msg-id	CAHgHdKuGR7aJxZu7VTPA+kEDkzqJvKmi5799rhW+sKyt-WVihQ@mail.gmail.com Whole thread
In response to	Re: Use CASEFOLD() internally rather than LOWER() (Jeff Davis <pgsql@j-davis.com>)
List	pgsql-hackers

Tree view

On Wed, Mar 25, 2026 at 2:02 PM Jeff Davis <pgsql@j-davis.com> wrote:

I think the precise question would be: "are there any two characters
that lowercase to the same character but do not casefold to the same
character?".

I don't know. I'll set up a test to iterate across all locales across all character pairs... no, I didn't find any on my system. Other searching suggests that the Turkish and Azerbaijani locale do have this characteristic, with I (U+0049) lowercasing to ı (U+0131) and case folding to i (U+0069) while ı (U+0131) lowercases to ı (U+0131) but also case folds to ı (U+0131). I have not confirmed that empirically, though.

I don't have a counterexample, so perhaps using casefold would still be
fine.

Thoughts? Should we enhance regexes to consider more than two case
variants first, or should we proceed with some of these patches (and/or
a similar change to pg_trgm)?

I don't want to take a strong position either way. I'm still wrapping my head around the various implications of the proposed changes, and don't feel I have a complete picture yet.

Mark Dilger

pgsql-hackers by date:

From: Lukas Fittl
Date: 26 March, 02:59:06
Subject: Re: pg_plan_advice

From: Tomas Vondra
Date: 26 March, 03:19:03
Subject: Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions

Re: Use CASEFOLD() internally rather than LOWER() - Mailing list pgsql-hackers

Previous

Next