Home > mailing lists

Re: Does UCS_BASIC have the right CTYPE? - Mailing list pgsql-hackers

From	Jeff Davis
Subject	Re: Does UCS_BASIC have the right CTYPE?
Date	October 26, 2023 21:42:27
Msg-id	70b79878856d4f2cabe67fb8e3420a92ea641214.camel@j-davis.com Whole thread Raw
In response to	Re: Does UCS_BASIC have the right CTYPE? (Jeff Davis <pgsql@j-davis.com>)
List	pgsql-hackers

Tree view

On Thu, 2023-10-26 at 09:21 -0700, Jeff Davis wrote:
> Our initcap() is not defined in the standard, and we document that it
> only differentiates between alphanumeric and non-alphanumeric
> characters, so we could get that behavior pretty easily as well. If
> we
> wanted to do it the Unicode way instead, we can follow the
> toTitlecase() part of the Default Case Algorithm, which is based on
> word breaks and would require another lookup table for that.

Correction: the rules for word breaks are fairly complex, so it would
not be worth it to try to replicate that just to support initcap(). We
could just use the simple, existing, and documented rules for initcap()
which only differentiate between alphanumeric and not. Anyone who wants
the more sophisticated rules can just use an ICU collation with
initcap().

The point stands that it would be pretty simple to have a collation
that handles upper() and lower() in a standards-compliant way without
relying on libc or ICU. Unfortunately it's too late to call that
collation UCS_BASIC, but it would still be useful.

Regards,
    Jeff Davis

pgsql-hackers by date:

From: Andres Freund
Date: 26 October 2023, 20:41:36
Subject: Re: visibility of open cursors in pg_stat_activity

From: Bruce Momjian
Date: 26 October 2023, 22:43:29
Subject: Re: Partial aggregates pushdown

Re: Does UCS_BASIC have the right CTYPE? - Mailing list pgsql-hackers

Previous

Next