Home > mailing lists

Re: Built-in CTYPE provider - Mailing list pgsql-hackers

From	Jeff Davis
Subject	Re: Built-in CTYPE provider
Date	March 1, 2024 05:05:34
Msg-id	3bc653b5d562ae9e2838b11cb696816c328a489a.camel@j-davis.com Whole thread Raw
In response to	Re: Built-in CTYPE provider (Jeff Davis <pgsql@j-davis.com>)
Responses	Re: Built-in CTYPE provider
List	pgsql-hackers

Tree view

On Mon, 2024-02-26 at 19:01 -0800, Jeff Davis wrote:
>  * Right now you can't mix all of the full case mapping behavior with
> INITCAP(), it just does simple titlecase mapping. I'm not sure we
> want
> to get too fancy here; after all, INITCAP() is not a SQL standard
> function and it's documented in a narrow fashion that doesn't seem to
> leave a lot of room to be very smart. ICU does a few extra things
> beyond what I did:
>   - it accepts a word break iterator to the case conversion function
>   - it provides some built-in word break iterators
>   - it also has some configurable "break adjustment" behavior[1][2]
> which re-aligns the start of the word, and I'm not entirely sure why
> that isn't done in the word break iterator or the titlecasing rules

Attached v19 which addresses this issue. It does proper Unicode
titlecasing with a word boundary iterator as an argument. For initcap,
it just uses a simple word boundary iterator that breaks whenever
isalnum() changes.

It came out cleaner this way, ultimately, and it feels more complete
even though the behavior isn't much different. It's also easier to
comment the relationship of the functions to Unicode. I removed
CaseKind from the public API but still use it internally to avoid code
duplication.

I made one other change, which is that (for now) I undid the UCS_BASIC
change until we are sure we want to change it. Instead, I have builtin
collations PG_C_UTF8 and PG_UNICODE_FAST. I used the name "FAST" to
indicate that the collation uses fast memcmp() rather than a real
collation, but the Unicode character support is all there (including
full case mapping). I'm open to suggestion here on naming.

Regards,
    Jeff Davis

Attachment

pgsql-hackers by date:

From: Michael Paquier
Date: 01 March 2024, 05:03:48
Subject: Re: ALTER TABLE SET ACCESS METHOD on partitioned tables

From: Michael Paquier
Date: 01 March 2024, 05:08:40
Subject: Re: Improve readability by using designated initializers when possible

Re: Built-in CTYPE provider - Mailing list pgsql-hackers

Attachment

Previous

Next