On Jun 1, 2008, at 21:08, Tom Lane wrote:
> "David E. Wheeler" <david@kineticode.com> writes:
>> I really need case-insensitive string comparison in my database.
>
> Okay ... according to whose locale?
I'm using C. Of course you're correct that it depends on the locale, I
always forget that. But does not the Unicode standard offer up some
sort locale-independent case-insensitivity, so that it gets it right
some large percentage of the time?
>> Ideally there'd be a nice ITEXT data type (and friends, ichar,
>> ivarchar, etc.). But of course there isn't, and for years I've just
>> used LOWER() on indexes and queries to get the same result.
>
>> Only it turns out that I'm of course not getting the same result.
>
> I think that means you're not using the right locale.
What locale is right? If I have a Web app, there could be data in many
different languages in a single table/column.
>> 1. Does the use of the tolower() C function in the citext data type
>> on
>> pgfoundry basically give me the same results as using lower() in my
>> SQL has for all these years?
>
> [ broken record... ] Kinda depends on your locale. However,
> tolower()
> is 100% guaranteed not to work for multibyte encodings, so citext is
> quite useless if you're using UTF8. This is fixable, no doubt, but
> it's not fixed in the project as it stands.
Right, okay; thanks. I'm thinking about using it for email addresses
and domain names, however, so it might be adequate for those
applications.
>> 2. Isn't the ICU library distributed with PostgreSQL?
>
> Nope, it is not, and we have already pretty much determined that we
> do not want to make Postgres depend on ICU. See the archives.
Damn. Okay, thanks.
David