On Tue, Mar 3, 2026 at 1:01 PM Jeff Davis <pgsql@j-davis.com> wrote:
On Sat, 2026-02-28 at 14:27 +0100, Daniel Verite wrote: > I tried 0001 with a non-UTF8 database and got quickly stuck:
Attached new versions. I moved the encoding check into the SQL-callable casefold() function, and other callers use str_casefold(). That slightly simplifies what happens in ILIKE, also.
I removed the citext changes. citext has somewhat of a legacy status, I think, so I'm not sure it makes sense to try to modernize or change it. Also, some SQL-language functions in citext use LOWER(), so the changes aren't enough: we'd need to make the SQL CASEFOLD function callable in other encodings, and also run a citext upgrade script to change the definitions.
Note that these changes affect the result of some expressions (e.g. ILIKE), so could theoretically make an expression index or predicate index inconsistent.
Thanks for the patches!
After v2-0001, ILIKE uses str_casefold() for matching, but pg_trgm still uses str_tolower() for trigram extraction (trgm_op.c:352 and :948). With builtin collations, these produce different results.