Home > mailing lists

Re: how to ignore accents? - Mailing list pgsql-novice

From	Michael Fuhr
Subject	Re: how to ignore accents?
Date	March 31, 2005 01:42:43
Msg-id	20050331004236.GA37961@winnie.fuhr.org Whole thread Raw
In response to	Re: how to ignore accents? (Ennio-Sr <nasr.laili@tin.it>)
Responses	Re: how to ignore accents? Re: how to ignore accents?
List	pgsql-novice

Tree view

On Thu, Mar 31, 2005 at 02:01:45AM +0200, Ennio-Sr wrote:
> * Celia McInnis <celia@drmath.ca> [300305, 12:29]:
> > I have French text with accents in it and would rather (for search purposes)
> > be able to search for things with the accents removed. Is there any builtin
> > postgres function which will do this?
> >
> > Here is the format of some accented (French) words in my database:
> >
> > fran\347ais caf\351 t\364\t br\373le r\352ver \342nes b\350gues
> >
> > which I want to be able to search for as:
> >
> > francais cafe tot brule rever anes begues
>
> I'm not an expert, but I'm afraid it's not possible to do that (and I'd
> be glad if somebody came out saying I'm wrong as I've the same problem).

You could write an unaccent() function that calls translate().

  http://www.postgresql.org/docs/8.0/interactive/functions-string.html

Here's an (incomplete) example that works in PostgreSQL 8.0.2beta1:

  CREATE FUNCTION unaccent(text) RETURNS text AS $$
  BEGIN
      RETURN translate($1, '\342\347\350\351\352\364\373', 'aceeeou');
  END;
  $$ LANGUAGE plpgsql IMMUTABLE STRICT;

  SELECT unaccent('fran\347ais');
   unaccent
  ----------
   francais
  (1 row)

You could use unaccent() in a functional index:

  CREATE TABLE words (
      id    serial PRIMARY KEY,
      word  text NOT NULL
  );

  CREATE INDEX words_word_idx ON words (unaccent(word));

  INSERT INTO words (word) VALUES ('fran\347ais');
  INSERT INTO words (word) VALUES ('caf\351');
  INSERT INTO words (word) VALUES ('t\364\t');

  SELECT * FROM words WHERE unaccent(word) = 'francais';
   id |   word
  ----+----------
    1 | français
  (1 row)

EXPLAIN shows that the index is being used:

  EXPLAIN SELECT * FROM words WHERE unaccent(word) = 'francais';
                                    QUERY PLAN
  ------------------------------------------------------------------------------
   Index Scan using words_word_idx on words  (cost=0.00..11.31 rows=6 width=36)
     Index Cond: (unaccent(word) = 'francais'::text)
  (2 rows)

Hope this helps.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

pgsql-novice by date:

From: Ennio-Sr
Date: 31 March 2005, 00:56:53
Subject: Re: how to ignore accents?

From: Noel Faux
Date: 31 March 2005, 02:02:32
Subject: Test

Re: how to ignore accents? - Mailing list pgsql-novice

Previous

Next