On Sun, Feb 19, 2006 at 04:35:56PM -0500, Andrew Dunstan wrote:
> Have you looked at the code of citext? Unless I'm misreading, it creates
> a lowercase copy of each string for each comparison. And it doesn't look
> to me like it's encoding/locale aware.
Its cilower function isn't terribly great and could probably do with
some work. toupper/tolower() are encoding/locale sensetive, but the
code used doesn't really handle multibyte encodings. But it's an
excellent starting point for creating new types because almost all the
hard work is done.
> I'm not sure how hard a text type with efficient, encoding and locale
> aware, case-insensitive comparison would be to create , but it would be
> a Good Thing (tm) to have available.
Hmm, "case-insensetive match" is a terribly badly defined concept.
There's a reason why there's a strcasecmp() but no strcasecoll(). The
code currently uses tolower, but if you changed it to do toupper it
would be equally valid yet produce different results.
If/when we ever get to use a real internationalisation library like
ICU, we can do things like convert strings to Normal Form D so we can
compare character seperate from their accents, ie accent-insensetive
comparison. In any case ICU contains mappings for things like
title-case and all the different kinds of space and hyphens so people
can specify their own mapping to get whatever they're happy with.
Until then, people will just have to rely on their system's support for
tolower().
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.