Re: Use CASEFOLD() internally rather than LOWER() - Mailing list pgsql-hackers

From Mark Dilger
Subject Re: Use CASEFOLD() internally rather than LOWER()
Date
Msg-id CAHgHdKt+_+QhHK8WXQSoMNeUz43Cp2zGNEVX6=0RSaksA9zyJw@mail.gmail.com
Whole thread
In response to Re: Use CASEFOLD() internally rather than LOWER()  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Use CASEFOLD() internally rather than LOWER()
List pgsql-hackers

On Tue, Mar 3, 2026 at 1:01 PM Jeff Davis <pgsql@j-davis.com> wrote:
On Sat, 2026-02-28 at 14:27 +0100, Daniel Verite wrote:
> I tried 0001 with a non-UTF8 database and got quickly stuck:

Attached new versions. I moved the encoding check into the SQL-callable
casefold() function, and other callers use str_casefold(). That
slightly simplifies what happens in ILIKE, also.

I removed the citext changes. citext has somewhat of a legacy status, I
think, so I'm not sure it makes sense to try to modernize or change it.
Also, some SQL-language functions in citext use LOWER(), so the changes
aren't enough: we'd need to make the SQL CASEFOLD function callable in
other encodings, and also run a citext upgrade script to change the
definitions.

Note that these changes affect the result of some expressions (e.g.
ILIKE), so could theoretically make an expression index or predicate
index inconsistent.

Thanks for the patches!

After v2-0001, ILIKE uses str_casefold() for matching, but pg_trgm still
uses str_tolower() for trigram extraction (trgm_op.c:352 and :948).
With builtin collations, these produce different results.

Attachment

pgsql-hackers by date:

Previous
From: John Naylor
Date:
Subject: Re: Add RISC-V Zbb popcount optimization
Next
From: Tom Lane
Date:
Subject: Re: pg_waldump: support decoding of WAL inside tarfile