Re: Use CASEFOLD() internally rather than LOWER() - Mailing list pgsql-hackers

From Chao Li
Subject Re: Use CASEFOLD() internally rather than LOWER()
Date
Msg-id 1A46D941-E0A4-4B3E-AAEA-1F7B6CCD24E6@gmail.com
Whole thread Raw
In response to Use CASEFOLD() internally rather than LOWER()  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers

> On Jan 13, 2026, at 02:22, Jeff Davis <pgsql@j-davis.com> wrote:
>
> There are a number of internal callers of LOWER(), and conceptually
> those should all be using CASEFOLD(). Patches attached.
>
> I'm not sure if we want the citext patch -- it would require REINDEX of
> all existing citext indexes after upgrade, and there's already a
> documented tip ("Consider using nondeterministic collations...), so
> perhaps it's a legacy extension anyway.
>
> It would be nice to make the tsearch change this release, as there are
> already changes that could require a reindex.
>
> I didn't change pg_trgm yet, because I think that we have to change the
> regex machinery to be aware of more than two case variants first (and
> potentially increasing string lengths, too).
>
> Regards,
> Jeff Davis
>
>
>
<v1-0001-ILIKE-use-CASEFOLD-rather-than-LOWER.patch><v1-0002-citext-use-CASEFOLD-rather-than-LOWER.patch><v1-0003-dict_xsyn-use-CASEFOLD-rather-than-LOWER.patch><v1-0004-tsearch-use-CASEFOLD-rather-than-LOWER.patch>

Hi Jeff,

Thanks for the patch. I have reviewed the patch set and got a few comments for tests:

1 - 0001
```
+SELECT U&'straße' ILIKE U&'STRASSE' COLLATE PG_C_UTF8;
```

Do we want to added one more test:
```
SELECT U&'straße' ILIKE U&'STRASSE' COLLATE PG_UNICODE_FAST;
 ?column?
----------
 t
(1 row)
```
Which tests the different behaviors against collations.

2 - 0002

Do we need to add a test:
```
SELECT 'straße'::citext = 'STRASSE'::citext;
 ?column?
----------
 f
(1 row)
```

I initially thought to add test cases with different collations, but after debugging, I found that citext intentionally
ignoresspecified collation. 

3 - 0003 LGTM. Seems the existing test coverage is good enough.

4 - 0004

I thought to suggest add a test:
```
SELECT to_tsvector('straße') @@ to_tsquery('strasse');
 ?column?
----------
 f
(1 row)
```

But I don’t see existing tests under backend/tsearch. So, I’m now not sure whether or not to insist the suggestion.

BWT, while reviewing this patch, I noticed a copy-paste error in str_casefold():
```
                                 errmsg("could not determine which collation to use for %s function",
-                                               "lower()"),
+                                               "casefold()”),
```

I have posted a patch to fix. See [1].

[1] https://postgr.es/m/CAEoWx2mMmm9fTZYgE-r_T-KPTFR1rKO029QV-S-6n=7US_9EMA@mail.gmail.com

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







pgsql-hackers by date:

Previous
From: Chao Li
Date:
Subject: str_casefold: fix typo in error message
Next
From: Steven Niu
Date:
Subject: Re: str_casefold: fix typo in error message