Re: encoding affects ICU regex character classification - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: encoding affects ICU regex character classification
Date
Msg-id 3b4657c1145393d5b18228fad173b3f61f2c1a57.camel@j-davis.com
Whole thread Raw
In response to Re: encoding affects ICU regex character classification  (Jeremy Schneider <schneider@ardentperf.com>)
Responses Re: encoding affects ICU regex character classification
List pgsql-hackers
On Tue, 2023-12-12 at 14:35 -0800, Jeremy Schneider wrote:
> Is someone able to test out upper & lower functions on U+A7BA ...
> U+A7BF
> across a few libs/versions?

Those code points are unassigned in Unicode 11.0 and assigned in
Unicode 12.0.

In ICU 63-2 (based on Unicode 11.0), they just get mapped to
themselves. In ICU 64-2 (based on Unicode 12.1) they get mapped the
same way the builtin CTYPE maps them (based on Unicode 15.1).

The concern over unassigned code points is misplaced. The application
may be aware of newly-assigned code points, and there's no way they
will be mapped correctly in Postgres if the provider is not aware of
those code points. The user can either proceed in using unassigned code
points and accept the risk of future changes, or wait for the provider
to be upgraded.

If the user doesn't have many expression indexes dependent on ctype
behavior, it doesn't matter much. If they do have such indexes, the
best we can offer is a controlled process, and the builtin provider
allows the most visibility and control.

(Aside: case mapping has very strong compatibility guarantees, but not
perfect. For better compatibility guarantees, we should support case
folding.)

> And I have no idea if or when
> glibc might have picked up the new unicode characters.

That's a strong argument in favor of a builtin provider.

Regards,
    Jeff Davis




pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock
Next
From: Andrew Dunstan
Date:
Subject: Re: Clean up find_typedefs and add support for Mac