Home > mailing lists

Re: Add CASEFOLD() function. - Mailing list pgsql-hackers

From	Jeff Davis
Subject	Re: Add CASEFOLD() function.
Date	December 19, 2024 20:51:32
Msg-id	898752524b1ad658c5fdae789f823a0dc52e6171.camel@j-davis.com Whole thread Raw
In response to	Re: Add CASEFOLD() function. (Peter Eisentraut <peter@eisentraut.org>)
Responses	Re: Add CASEFOLD() function.
List	pgsql-hackers

Tree view

On Thu, 2024-12-19 at 17:18 +0100, Peter Eisentraut wrote:
> Can you explain this in further detail?  I don't quite follow why
> this
> would be required.

I am unsure now.

My initial reasoning was based on the idea that users would want to use
CASEFOLD(t) in a unique expression index as an improvement over
LOWER(t). And if you do that, you'd be surprised if some equivalent
strings ended up in the index. I don't think that's a huge problem,
because in other contexts we leave it up to the user to keep things
normalized consistently, and a CHECK(t IS NFC NORMALIZED) is a good way
to do that.

But there's a problem: full case folding doesn't preserve the normal
form, so even if the input is NFC normalized, the output might not be.
If we solve this problem, then we can just say that CASEFOLD()
preserves the normal form, consistently with how the spec defines
LOWER()/UPPER(), and I think that would be the best outcome.

I'm not sure if that problem is solvable, though, because what if the
input string is in both NFC and NFD, how do we know which normal form
to preserve?

We could tell users to use an expression index on
NORMALIZE(CASEFOLD(t)) instead, but that feels like inefficient
boilerplate.

>
> Another might be that's not entirely clear how this should work in
> encodings other than UTF-8.  For example, the normalized string might
> not be representable in the encoding.

That's a good point.

Regards,
    Jeff Davis

pgsql-hackers by date:

From: Masahiko Sawada
Date: 19 December 2024, 20:27:04
Subject: Re: Memory leak in WAL sender with pgoutput (v10~)

From: Cary Huang
Date: 19 December 2024, 22:05:44
Subject: Re: sslinfo extension - add notbefore and notafter timestamps

Re: Add CASEFOLD() function. - Mailing list pgsql-hackers

Previous

Next