On Fri, 2023-05-26 at 10:43 -0700, Jeff Davis wrote:
> We still need to consider backwards compatibility. If someone has a
> collation with locale name C.UTF-8 in an earlier version, any change
> to
> the interpretation of that locale name after an upgrade carries a
> corruption risk. The risks are different in ICU vs libc:
...
> For libc: this change may affect any user who happened to have
> LANG=C.UTF-8 in their environment at initdb time, which is probably a
> lot of users, and some buildfarm members. However, the average risk
> seems to be much lower, because we've gone a long time with the
> assumption that C.UTF-8 has the same behavior as C, and this only
> recently came up. Also, I'm not sure how obscure the cases are even
> if
> there is a difference; perhaps they don't often occur in practice?
> It's
> not clear to me how we mitigate this risk further, though.
We can avoid this risk by converting C.anything or POSIX.anything to
plain "C" or "POSIX", respectively, for new collations before storing
the string in the catalog. For upgraded collations, we can preserve the
existing locale name. When opening the locale, we would still only
recognize plain "C" and "POSIX" as the C locale.
That would be more consistent behavior for new users, without creating
a backwards compatibility problem for existing users who happened to
create a collation with C.UTF-8.
For ICU users, we'd still need the upgrade check, because even the "C"
locale was not implemented with memcmp in prior versions. But I think
that's fine and should be done anyway, as the behavior in that case was
incorrect and was almost certainly a mistake by the user.
Regards,
Jeff Davis