FWIW...
At Fri, 23 Apr 2021 00:17:35 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in
> Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes:
> > At Thu, 22 Apr 2021 23:17:19 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in
> >> Doesn't seem like a good idea, because that locks us into an assumption
> >> that the downcasing conversion doesn't change the string's physical
> >> length. There are a lot of counterexamples to that :-(. I'm not sure
>
> > Mmm. I didn't know of that.
>
> The two examples I know of offhand are in German (eszett "ß" downcases to
> "ss") and Turkish (dotted "Í" downcases to "i", likewise dotless "I"
According to Wikipedia, "ss" is equivalent to "ß" and their upper case
letters are "SS" and "ẞ" respectively. (I didn't even know of the
existence of "ẞ". AFAIK there's no word begins with eszett, but it
seems that there's a case where "ẞ" appears in a word is spelled only
with capital letters.
> downcases to "ı"; one of each of those pairs is an ASCII letter, the
> other is not). Depending on which encoding is in use, these
Upper dotless "I" and lower dotted "i" are in ASCII (or English
alphabet?). That's interesting.
> transformations *could* be the same number of bytes, but they could
> equally well not be. There are probably other examples.
Yeah. Agreed.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center