Re: Does UCS_BASIC have the right CTYPE? - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Does UCS_BASIC have the right CTYPE?
Date
Msg-id 70b79878856d4f2cabe67fb8e3420a92ea641214.camel@j-davis.com
Whole thread Raw
In response to Re: Does UCS_BASIC have the right CTYPE?  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
On Thu, 2023-10-26 at 09:21 -0700, Jeff Davis wrote:
> Our initcap() is not defined in the standard, and we document that it
> only differentiates between alphanumeric and non-alphanumeric
> characters, so we could get that behavior pretty easily as well. If
> we
> wanted to do it the Unicode way instead, we can follow the
> toTitlecase() part of the Default Case Algorithm, which is based on
> word breaks and would require another lookup table for that.

Correction: the rules for word breaks are fairly complex, so it would
not be worth it to try to replicate that just to support initcap(). We
could just use the simple, existing, and documented rules for initcap()
which only differentiate between alphanumeric and not. Anyone who wants
the more sophisticated rules can just use an ICU collation with
initcap().

The point stands that it would be pretty simple to have a collation
that handles upper() and lower() in a standards-compliant way without
relying on libc or ICU. Unfortunately it's too late to call that
collation UCS_BASIC, but it would still be useful.

Regards,
    Jeff Davis




pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: visibility of open cursors in pg_stat_activity
Next
From: Bruce Momjian
Date:
Subject: Re: Partial aggregates pushdown