Re: Use CASEFOLD() internally rather than LOWER() - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Use CASEFOLD() internally rather than LOWER()
Date
Msg-id 0c21d77497c2316f9f5af143122dd24a81eb40db.camel@j-davis.com
Whole thread Raw
In response to Re: Use CASEFOLD() internally rather than LOWER()  (Mark Dilger <mark.dilger@enterprisedb.com>)
Responses Re: Use CASEFOLD() internally rather than LOWER()
List pgsql-hackers
On Wed, 2026-03-25 at 07:40 -0700, Mark Dilger wrote:
> pg_trgm appears to be lossy, with recheck logic.  I would think you
> just need to make it give answers which at least include everything
> that a regex would match, and then allow recheck to prune that down. 
> My concern is having pg_trgm give less than all the answers, so that
> after recheck you get fewer results than a seqscan would have
> returned.  Would switching to casefold be strictly broader than
> regex?

I think the precise question would be: "are there any two characters
that lowercase to the same character but do not casefold to the same
character?".

I don't have a counterexample, so perhaps using casefold would still be
fine.

Thoughts? Should we enhance regexes to consider more than two case
variants first, or should we proceed with some of these patches (and/or
a similar change to pg_trgm)?

> Sorry if this misses something discussed upthread.  I'm clearly
> assuming here that you don't mind that such a change necessitates a
> REINDEX. 

That's a concern. It may depend on how big the impact would be -- for
libc I don't think it would matter because lowercasing and casefolding
are the same thing.

Regards,
    Jeff Davis




pgsql-hackers by date:

Previous
From: Zsolt Parragi
Date:
Subject: Re: SLOPE - Planner optimizations on monotonic expressions.
Next
From: Bharath Rupireddy
Date:
Subject: Re: another autovacuum scheduling thread