Re: Add CASEFOLD() function. - Mailing list pgsql-hackers

From Ian Lawrence Barwick
Subject Re: Add CASEFOLD() function.
Date
Msg-id CAB8KJ=jgnpbDTJS+jOQMzo7b1mPTwM0hCqQBBRgoX_kvyxWVog@mail.gmail.com
Whole thread Raw
Responses Re: Add CASEFOLD() function.
List pgsql-hackers
Hi

2024年12月12日(木) 18:00 Jeff Davis <pgsql@j-davis.com>:
>
> Unicode case folding is a way to convert a string to a canonical case
> for the purpose of case-insensitive matching.
>
> Users have long used LOWER() for that purpose, but there are a few edge
> case problems:
>
> * Some characters have more than two cased forms, such as "Σ" (U+03A3),
> which can be lowercased as "σ" (U+03C3) or "ς" (U+03C2). The CASEFOLD()
> function converts all cased forms of the character to "σ".
>
> * The character "İ" (U+0130, capital I with dot) is lowercased to "i",
> which can be a problem in locales that don't expect that.
>
> * If new lower case characters are added to Unicode, the results of
> LOWER() may change.
>
> The CASEFOLD() function solves these problems.
>
> Patch attached.

I took a quick look at this as it sounds useful for the described issue,
and it seems to work as advertised, except the function is named "FOLDCASE()"
in the patch, so I'm wondering which is intended? A quick search indicates
there are no functions of either name in other databases; Python has a
"casefold()"
function [1] and PHP a "foldCase()" function [2], so it doesn't seem there's a
de-facto standard for this.

[1] https://docs.python.org/3/library/stdtypes.html#str.casefold
[2] https://www.php.net/manual/en/intlchar.foldcase.php

Regards


Ian Barwick



pgsql-hackers by date:

Previous
From: Melanie Plageman
Date:
Subject: Re: Wrong results with right-semi-joins
Next
From: Dagfinn Ilmari Mannsåker
Date:
Subject: Re: pg_createsubscriber TAP test wrapping makes command options hard to read.