- - <crossroads0000@googlemail.com> writes:
>>> The original post seemed to be a contrived attempt to say "you should
>>> use ICU".
>>
>> Indeed. The OP should go read all the previous arguments about ICU
>> in our archives.
>
> Not at all. I just was making a suggestion. You may use any other
> library or implement it yourself (I even said that in my original
> post). www.unicode.org - the official website of the Unicode
> consortium, have a complete database of all Unicode characters which
> can be used as a basis.
>
> But if you want to ignore the normalization/multiple code point issue,
> point 2--the collation problem--still remains. And given that even a
> crappy database as MySQL supports Unicode collation, this isn't
> something to be ignored, IMHO.
Sure, supporting multiple collations in a database is definitely a known
missing feature. There is a lot of work required to do it and a patch to do so
was too late to make it into 8.4 and required more work so hopefully the
issues will be worked out for 8.5.
I suggest you read the old threads and make any contibutions you can
suggesting how to solve the problems that arose.
>> I don't believe that the standard forbids the use of combining chars at all.
>> RFC 3629 says:
>>
>> ... This issue is amenable to solutions based on Unicode Normalization
>> Forms, see [UAX15].
This is the relevant part. Tom was claiming that the UTF8 encoding required
normalizing the string of unicode codepoints before encoding. I'm not sure
that's true though, is it?
-- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support!