Re: Support regular expressions with nondeterministic collations - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Support regular expressions with nondeterministic collations
Date
Msg-id c10ed44c7e5dcbb7b4597889f02d029298f0c919.camel@j-davis.com
Whole thread Raw
In response to Re: Support regular expressions with nondeterministic collations  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Wed, 2024-12-18 at 14:55 -0500, Tom Lane wrote:
> It would not actually be too
> hard I think to build the right regex, if we had the information
> available as to what all the case-variants are. The problem at the
> moment is that the existing code assumes that pg_wc_tolower and
> pg_wc_toupper together give us all the case variants, and that
> API can't cope with multi-glyph expansions.

That's doable. I can do that after refactoring the ctype logic to use a
method table.

I'll have to think about how the API should look though. The maximum
amount of expansion that can occur during case folding is from one
codepoint to 3, and the maximum number of case variants is also ~3, so
it could fill in a caller-supplied 3x3 array of pg_wchar. Somewhat
awkward in C, so I welcome better ideas.

Note: if the string is not normalized consistently with the
pattern, pattern matching in general won't work very well. This has
always been true, but as we make pattern matching smarter we should be
more clear about that point.

Regards,
    Jeff Davis




pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Using Expanded Objects other than Arrays from plpgsql
Next
From: Robert Haas
Date:
Subject: Re: Eagerly scan all-visible pages to amortize aggressive vacuum