On Wed, 2024-12-18 at 14:55 -0500, Tom Lane wrote:
> It would not actually be too
> hard I think to build the right regex, if we had the information
> available as to what all the case-variants are. The problem at the
> moment is that the existing code assumes that pg_wc_tolower and
> pg_wc_toupper together give us all the case variants, and that
> API can't cope with multi-glyph expansions.
That's doable. I can do that after refactoring the ctype logic to use a
method table.
I'll have to think about how the API should look though. The maximum
amount of expansion that can occur during case folding is from one
codepoint to 3, and the maximum number of case variants is also ~3, so
it could fill in a caller-supplied 3x3 array of pg_wchar. Somewhat
awkward in C, so I welcome better ideas.
Note: if the string is not normalized consistently with the
pattern, pattern matching in general won't work very well. This has
always been true, but as we make pattern matching smarter we should be
more clear about that point.
Regards,
Jeff Davis