On 21.03.24 01:13, Jeff Davis wrote:
> The v26 patch was not quite complete, so I didn't commit it yet.
> Attached v27-0001 and 0002.
>
> 0002 is necessary because otherwise lc_collate_is_c() short-circuits
> the version check in pg_newlocale_from_collation(). With 0002, the code
> is simpler and all paths go through pg_newlocale_from_collation(), and
> the version check happens even when lc_collate_is_c().
>
> But perhaps there was a reason the code was the way it was, so
> submitting for review in case I missed something.
>
>> 0005 and 0006 don't contain any test cases. So I guess they are
>> really
>> only usable via 0007. Is that understanding correct?
> 0005 is not a functional change, it's just a refactoring to use a
> callback, which is preparation for 0007.
>
>> Are there any test cases that illustrate the word boundary changes in
>> patch 0005? It might be useful to test those against Oracle as well.
> The tests include initcap('123abc') which is '123abc' in the PG_C_UTF8
> collation vs '123Abc' in PG_UNICODE_FAST.
>
> The reason for the latter behavior is that the Unicode Default Case
> Conversion algorithm for toTitlecase() advances to the next Cased
> character before mapping to titlecase, and digits are not Cased. ICU
> has a configurable adjustment, and defaults in a way that produces
> '123abc'.
>
> New rebased series attached.
The patch set v27 is ok with me, modulo (a) discussion about initcap
semantics, and (b) what collation to assign to ucs_basic, which can be
revisited later.