Re: encoding affects ICU regex character classification - Mailing list pgsql-hackers

From Jeremy Schneider
Subject Re: encoding affects ICU regex character classification
Date
Msg-id 452c0341-6c6a-4a87-8b90-6320831094ea@ardentperf.com
Whole thread Raw
In response to Re: encoding affects ICU regex character classification  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: encoding affects ICU regex character classification
Re: encoding affects ICU regex character classification
List pgsql-hackers
On 12/14/23 7:12 AM, Jeff Davis wrote:
> The concern over unassigned code points is misplaced. The application
> may be aware of newly-assigned code points, and there's no way they
> will be mapped correctly in Postgres if the provider is not aware of
> those code points. The user can either proceed in using unassigned code
> points and accept the risk of future changes, or wait for the provider
> to be upgraded.

This does not seem to me like a good way to view the situation.

Earlier this summer, a day or two after writing a document, I was
completely surprised to open it on my work computer and see "unknown
character" boxes. When I had previously written the document on my home
computer and when I had viewed it from my cell phone, everything was
fine. Apple does a very good job of always keeping iPhones and MacOS
versions up-to-date with the latest versions of Unicode and latest
characters. iPhone keyboards make it very easy to access any character.
Emojis are the canonical example here. My work computer was one major
version of MacOS behind my home computer.

And I'm probably one of a few people on this hackers email list who even
understands what the words "unassigned code point" mean. Generally DBAs,
sysadmins, architects and developers who are all part of the tangled web
of building and maintaining systems which use PostgreSQL on their
backend are never going to think about unicode characters proactively.

This goes back to my other thread (which sadly got very little
discussion): PosgreSQL really needs to be safe by /default/ ... having
GUCs is fine though; we can put explanation in the docs about what users
should consider if they change a setting.

-Jeremy


-- 
http://about.me/jeremy_schneider




pgsql-hackers by date:

Previous
From: Jeremy Schneider
Date:
Subject: Re: Built-in CTYPE provider
Next
From: Japin Li
Date:
Subject: Re: Transaction timeout