Re: how to ignore accents? - Mailing list pgsql-novice

From Michael Fuhr
Subject Re: how to ignore accents?
Date
Msg-id 20050331004236.GA37961@winnie.fuhr.org
Whole thread Raw
In response to Re: how to ignore accents?  (Ennio-Sr <nasr.laili@tin.it>)
Responses Re: how to ignore accents?
Re: how to ignore accents?
List pgsql-novice
On Thu, Mar 31, 2005 at 02:01:45AM +0200, Ennio-Sr wrote:
> * Celia McInnis <celia@drmath.ca> [300305, 12:29]:
> > I have French text with accents in it and would rather (for search purposes)
> > be able to search for things with the accents removed. Is there any builtin
> > postgres function which will do this?
> >
> > Here is the format of some accented (French) words in my database:
> >
> > fran\347ais caf\351 t\364\t br\373le r\352ver \342nes b\350gues
> >
> > which I want to be able to search for as:
> >
> > francais cafe tot brule rever anes begues
>
> I'm not an expert, but I'm afraid it's not possible to do that (and I'd
> be glad if somebody came out saying I'm wrong as I've the same problem).

You could write an unaccent() function that calls translate().

  http://www.postgresql.org/docs/8.0/interactive/functions-string.html

Here's an (incomplete) example that works in PostgreSQL 8.0.2beta1:

  CREATE FUNCTION unaccent(text) RETURNS text AS $$
  BEGIN
      RETURN translate($1, '\342\347\350\351\352\364\373', 'aceeeou');
  END;
  $$ LANGUAGE plpgsql IMMUTABLE STRICT;

  SELECT unaccent('fran\347ais');
   unaccent
  ----------
   francais
  (1 row)

You could use unaccent() in a functional index:

  CREATE TABLE words (
      id    serial PRIMARY KEY,
      word  text NOT NULL
  );

  CREATE INDEX words_word_idx ON words (unaccent(word));

  INSERT INTO words (word) VALUES ('fran\347ais');
  INSERT INTO words (word) VALUES ('caf\351');
  INSERT INTO words (word) VALUES ('t\364\t');

  SELECT * FROM words WHERE unaccent(word) = 'francais';
   id |   word
  ----+----------
    1 | français
  (1 row)

EXPLAIN shows that the index is being used:

  EXPLAIN SELECT * FROM words WHERE unaccent(word) = 'francais';
                                    QUERY PLAN
  ------------------------------------------------------------------------------
   Index Scan using words_word_idx on words  (cost=0.00..11.31 rows=6 width=36)
     Index Cond: (unaccent(word) = 'francais'::text)
  (2 rows)

Hope this helps.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

pgsql-novice by date:

Previous
From: Ennio-Sr
Date:
Subject: Re: how to ignore accents?
Next
From: Noel Faux
Date:
Subject: Test