Re: Case-Insensitve Text Comparison - Mailing list pgsql-hackers

From David E. Wheeler
Subject Re: Case-Insensitve Text Comparison
Date
Msg-id BC402551-7E1B-4561-8F64-E098C6D58241@kineticode.com
Whole thread Raw
In response to Re: Case-Insensitve Text Comparison  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Case-Insensitve Text Comparison  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Case-Insensitve Text Comparison  (Andrew Sullivan <ajs@commandprompt.com>)
Re: Case-Insensitve Text Comparison  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
On Jun 1, 2008, at 21:08, Tom Lane wrote:

> "David E. Wheeler" <david@kineticode.com> writes:
>> I really need case-insensitive string comparison in my database.
>
> Okay ... according to whose locale?

I'm using C. Of course you're correct that it depends on the locale, I  
always forget that. But does not the Unicode standard offer up some  
sort locale-independent case-insensitivity, so that it gets it right  
some large percentage of the time?

>> Ideally there'd be a nice ITEXT data type (and friends, ichar,
>> ivarchar, etc.). But of course there isn't, and for years I've just
>> used LOWER() on indexes and queries to get the same result.
>
>> Only it turns out that I'm of course not getting the same result.
>
> I think that means you're not using the right locale.

What locale is right? If I have a Web app, there could be data in many  
different languages in a single table/column.

>> 1. Does the use of the tolower() C function in the citext data type  
>> on
>> pgfoundry basically give me the same results as using lower() in my
>> SQL has for all these years?
>
> [ broken record... ]  Kinda depends on your locale.  However,  
> tolower()
> is 100% guaranteed not to work for multibyte encodings, so citext is
> quite useless if you're using UTF8.  This is fixable, no doubt, but
> it's not fixed in the project as it stands.

Right, okay; thanks. I'm thinking about using it for email addresses  
and domain names, however, so it might be adequate for those  
applications.

>> 2. Isn't the ICU library distributed with PostgreSQL?
>
> Nope, it is not, and we have already pretty much determined that we
> do not want to make Postgres depend on ICU.  See the archives.

Damn. Okay, thanks.

David


pgsql-hackers by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: Re: Overhauling GUCS
Next
From: Tom Lane
Date:
Subject: Re: Case-Insensitve Text Comparison