Re: Built-in CTYPE provider - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Built-in CTYPE provider
Date
Msg-id 3bc653b5d562ae9e2838b11cb696816c328a489a.camel@j-davis.com
Whole thread Raw
In response to Re: Built-in CTYPE provider  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Built-in CTYPE provider
List pgsql-hackers
On Mon, 2024-02-26 at 19:01 -0800, Jeff Davis wrote:
>  * Right now you can't mix all of the full case mapping behavior with
> INITCAP(), it just does simple titlecase mapping. I'm not sure we
> want
> to get too fancy here; after all, INITCAP() is not a SQL standard
> function and it's documented in a narrow fashion that doesn't seem to
> leave a lot of room to be very smart. ICU does a few extra things
> beyond what I did:
>   - it accepts a word break iterator to the case conversion function
>   - it provides some built-in word break iterators
>   - it also has some configurable "break adjustment" behavior[1][2]
> which re-aligns the start of the word, and I'm not entirely sure why
> that isn't done in the word break iterator or the titlecasing rules

Attached v19 which addresses this issue. It does proper Unicode
titlecasing with a word boundary iterator as an argument. For initcap,
it just uses a simple word boundary iterator that breaks whenever
isalnum() changes.

It came out cleaner this way, ultimately, and it feels more complete
even though the behavior isn't much different. It's also easier to
comment the relationship of the functions to Unicode. I removed
CaseKind from the public API but still use it internally to avoid code
duplication.

I made one other change, which is that (for now) I undid the UCS_BASIC
change until we are sure we want to change it. Instead, I have builtin
collations PG_C_UTF8 and PG_UNICODE_FAST. I used the name "FAST" to
indicate that the collation uses fast memcmp() rather than a real
collation, but the Unicode character support is all there (including
full case mapping). I'm open to suggestion here on naming.

Regards,
    Jeff Davis


Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: ALTER TABLE SET ACCESS METHOD on partitioned tables
Next
From: Michael Paquier
Date:
Subject: Re: Improve readability by using designated initializers when possible