On Thu, Mar 31, 2005 at 02:01:45AM +0200, Ennio-Sr wrote:
> * Celia McInnis <celia@drmath.ca> [300305, 12:29]:
> > I have French text with accents in it and would rather (for search purposes)
> > be able to search for things with the accents removed. Is there any builtin
> > postgres function which will do this?
> >
> > Here is the format of some accented (French) words in my database:
> >
> > fran\347ais caf\351 t\364\t br\373le r\352ver \342nes b\350gues
> >
> > which I want to be able to search for as:
> >
> > francais cafe tot brule rever anes begues
>
> I'm not an expert, but I'm afraid it's not possible to do that (and I'd
> be glad if somebody came out saying I'm wrong as I've the same problem).
You could write an unaccent() function that calls translate().
http://www.postgresql.org/docs/8.0/interactive/functions-string.html
Here's an (incomplete) example that works in PostgreSQL 8.0.2beta1:
CREATE FUNCTION unaccent(text) RETURNS text AS $$
BEGIN
RETURN translate($1, '\342\347\350\351\352\364\373', 'aceeeou');
END;
$$ LANGUAGE plpgsql IMMUTABLE STRICT;
SELECT unaccent('fran\347ais');
unaccent
----------
francais
(1 row)
You could use unaccent() in a functional index:
CREATE TABLE words (
id serial PRIMARY KEY,
word text NOT NULL
);
CREATE INDEX words_word_idx ON words (unaccent(word));
INSERT INTO words (word) VALUES ('fran\347ais');
INSERT INTO words (word) VALUES ('caf\351');
INSERT INTO words (word) VALUES ('t\364\t');
SELECT * FROM words WHERE unaccent(word) = 'francais';
id | word
----+----------
1 | français
(1 row)
EXPLAIN shows that the index is being used:
EXPLAIN SELECT * FROM words WHERE unaccent(word) = 'francais';
QUERY PLAN
------------------------------------------------------------------------------
Index Scan using words_word_idx on words (cost=0.00..11.31 rows=6 width=36)
Index Cond: (unaccent(word) = 'francais'::text)
(2 rows)
Hope this helps.
--
Michael Fuhr
http://www.fuhr.org/~mfuhr/