Re: BUG #18771: ICU custom collations with rules ignore collator strength option. - Mailing list pgsql-bugs

From Ruben Ruiz
Subject Re: BUG #18771: ICU custom collations with rules ignore collator strength option.
Date
Msg-id CABKKXvhxC0xUeL=2ETXh5yR2gUmBSSJTBmKtXWUMMC2tgOj0dw@mail.gmail.com
Whole thread Raw
In response to Re: BUG #18771: ICU custom collations with rules ignore collator strength option.  (Peter Eisentraut <peter@eisentraut.org>)
List pgsql-bugs
I think in this case it's not really related, as I'm not trying to copy options from the base locale.

It all seems to come from some missing information on the official icu4c docs. When describing the parameters of ucol_openRules() it says:

"strength: The default collation strength; one of UCOL_PRIMARY, UCOL_SECONDARY, UCOL_TERTIARY, UCOL_IDENTICAL,UCOL_DEFAULT_STRENGTH - can be also set in the rules"

And one could easily assume that if it "can also be set in the rules", you could pass UCOL_DEFAULT_STRENGTH and the rules would take precedence. In no place it does mention that UCOL_DEFAULT is a valid value for that parameter, although it is mentioned for the normalizationMode. But, if you look at icu4c sources (https://github.com/unicode-org/icu/blob/f8aa68b0c1c9584633e7a61157185f1a2c275f58/icu4c/source/i18n/collationbuilder.cpp#L182), you can find this:

RuleBasedCollator::internalBuildTailoring(const UnicodeString &rules,
                                          int32_t strength,
                                          UColAttributeValue decompositionMode,
                                          UParseError *outParseError, UnicodeString *outReason,
                                          UErrorCode &errorCode) {

...
    // Set attributes after building the collator,
    // to keep the default settings consistent with the rule string.
    if(strength != UCOL_DEFAULT) {
        setAttribute(UCOL_STRENGTH, static_cast<UColAttributeValue>(strength), errorCode);
    }
...
}

Which not only implies that UCOL_DEFAULT is a valid argument, but also that if you don't pass UCOL_DEFAULT any 'strength' options will be overridden. So it seems that the 'make_icu_collator' function inside postgres should use UCOL_DEFAULT, to allow the rules to set the desired strength level, instead of the current UCOL_DEFAULT_STRENGTH argument.


On Mon, 13 Jan 2025 at 17:42, Peter Eisentraut <peter@eisentraut.org> wrote:
On 11.01.25 18:27, PG Bug reporting form wrote:
> When using the 'rules' option of CREATE COLLATION to create a custom icu
> collation it seems that, if you include inside the rules a change to the
> comparison strength, it is ignored.

I think this is the same as this ICU bug:

https://unicode-org.atlassian.net/browse/ICU-22456

pgsql-bugs by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: BUG #18771: ICU custom collations with rules ignore collator strength option.
Next
From: Daniel Gustafsson
Date:
Subject: Re: There is a defect in the ReplicationSlotCreate() function where it iterates through ReplicationSlotCtl->replication_slots[max_replication_slots] to find a slot but does not break out of the loop when a slot is found.