Re: Case Sensitivity - Mailing list pgsql-general

From Andrew Sullivan
Subject Re: Case Sensitivity
Date
Msg-id 20110113161801.GZ26731@shinkuro.com
Whole thread Raw
In response to Re: Case Sensitivity  ("Phillip Smith" <phillip@softworks.com>)
List pgsql-general
On Thu, Jan 13, 2011 at 03:29:03PM -0000, Phillip Smith wrote:

> modifying user inputted data is not an option for me. I need to
> maintain the original data. I had read about indexing on a lower
> function. This all seems a bit of a bodgy workaround to me. A clean
> Case Insensitive collation setting is cleaner. Is this supported in
> postgreSQL 9?

There isn't such a collation setting as far as I know, and it's
actually not as clean as you think.  It turns out to work reasonably
well in ASCII-land, but not very well in other circumstances.  In
general, the case-insensitive but case-preserving technique that
English-speaking computer users have come to find normal is a bad fit
for a wide variety of languages.  (Even "Latin" characterset-using
languages have trouble, because of historic ways of handling accents.
Is the capital version of é E or É?  Well, both, it turns out,
depending on whom you believe.  I could bore you about the effects of
this in the DNS all day long, but I'll resist the temptation.)

Putting an index on lower(column) and then doing all your comparisons
with lower(datum) works fairly well, and I don't see how it's any more
bodgy than a database-wide case insensitive collation.  For instance,
I can assure you that customers named Leblanc and LeBlanc care about
whether those two compare equally.  In your customer name field, if
you have a database-wide collation setting, you can't make the
distinction.

A

--
Andrew Sullivan
ajs@crankycanuck.ca

pgsql-general by date:

Previous
From: hernan gonzalez
Date:
Subject: small suggested change for examples in array docs
Next
From: Jon Smark
Date:
Subject: Record with a field consisting of table rows