Re: Accent insensitive search - Mailing list pgsql-general

From Gregory Stark
Subject Re: Accent insensitive search
Date
Msg-id 877ipx1sbd.fsf@oxford.xeocode.com
Whole thread Raw
In response to Re: Accent insensitive search  (PFC <lists@peufeu.com>)
List pgsql-general
"PFC" <lists@peufeu.com> writes:

>> Hi. I have a few databases created with UNICODE encoding, and I would like to
>> be able to search with accent insensitivity. There's something  in Oracle
>> (NLS_COMP, NLS_SORT) and SQL Server (don't remember) to do  this, but I found
>> nothing in PostgreSQL, just the 'to_ascii' function,  which AFAIK, doesn't
>> work with UNICODE.

Postgres supports localized collation orders but what it doesn't currently
support is having multiple collation orders within the same server.

So when you initialize the server with initdb it takes the setting of
LC_COLLATE (usually from LC_ALL) and stores that permanently. If you initdb
with LC_COLLATE set to a locale like en_GB.UTF-8 or something like that you
may find one that has the behaviour you want. I think they won't be entirely
accent insensitive but they'll consider accents only if the rest of the string
is identical.

You can test the sort order of a locale by writing a file with sample words
and sorting with something like:

LC_ALL=en_GB.UTF-8 sort /tmp/data

>     The easiest way is to create an extra column which will hold a copy of
> your text, with all accents removed. You can also convert it to lowercase  and
> remove apostrophes, punctuation etc. Said column is kept up to date  with a
> trigger.

That's another alternative which is useful if you need multiple collations in
your database. This gives you control over which collation is used when and
exactly what the rules are. The downside is that you have to reinvent the
collation rules which the localized collations already provide.

You don't necessarily have to keep a column in your table with the normalized
strings. You can normalize "on-the-fly" using an expression index as long as
your function always returns the same data given the same inputs (and is
therefore marked "immutable").

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com


pgsql-general by date:

Previous
From: Naz Gassiep
Date:
Subject: Aggregates
Next
From: Gregory Stark
Date:
Subject: Re: 8.2.3 PANIC with "corrupted item pointer"