Re: Support regular expressions with nondeterministic collations - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Support regular expressions with nondeterministic collations
Date
Msg-id 2808617.1734551724@sss.pgh.pa.us
Whole thread Raw
In response to Re: Support regular expressions with nondeterministic collations  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Support regular expressions with nondeterministic collations
List pgsql-hackers
Jeff Davis <pgsql@j-davis.com> writes:
> On Mon, 2024-12-16 at 17:16 -0500, Tom Lane wrote:
>> The existing logic in the regex engine for case-insensitive matching
>> is to convert every letter to a bracket expression containing all
>> its case variants.  For example, "a" becomes "[aA]" and "[xY1]"
>> becomes "[xXyY1]".  This fails on "ß", so a better way would be
>> nice...

> We have a couple options:

>  * create more complex regexes like "(ß|[sS][sS])"
>  * case fold the pattern first, and then lazily case fold the string as
> we match against it

> The former sounds faster but the latter sounds simpler.

Yeah, the latter sounds really slow.  It would not actually be too
hard I think to build the right regex, if we had the information
available as to what all the case-variants are.  The problem at the
moment is that the existing code assumes that pg_wc_tolower and
pg_wc_toupper together give us all the case variants, and that
API can't cope with multi-glyph expansions.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: Final result (display) collation?
Next
From: Melanie Plageman
Date:
Subject: Re: Can rs_cindex be < 0 for bitmap heap scans?