Re: FTS Configuration option - Mailing list pgsql-hackers

From Emre Hasegeli
Subject Re: FTS Configuration option
Date
Msg-id CAE2gYzzdqjeCpPk-BU1AWkFsn1yZocBWpAYd1WLA-EFy13Ozgg@mail.gmail.com
Whole thread Raw
In response to FTS Configuration option  (Artur Zakirov <a.zakirov@postgrespro.ru>)
Responses Re: FTS Configuration option  (Artur Zakirov <a.zakirov@postgrespro.ru>)
List pgsql-hackers
> => ALTER TEXT SEARCH CONFIGURATION multi_conf
>     ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
>     word, hword, hword_part
>     WITH german_ispell (JOIN), english_ispell, simple;

I have something like this in my mind since I dealt with FTS for a
Turkish real estate listing application.  Being able to pipe output of
some dictionaries is a nice feature we have since 9.0, but it is not
always sufficient.  I think it is wrong to decide this per dictionary
bases.  Something slightly more complicated to connect dictionaries
parallel or serial to each other might be more useful.

My problem was related to the special characters on Turkish (ç, ğ, ı,
ö, ü).  It is very common to just type 7-bit-close-looking characters
(c, g, i, o, u) instead of those.  Unaccent extension changes them as
desired, and passes the altered words to the subsequent dictionary,
when this configuration is changed like this:

> ALTER TEXT SEARCH CONFIGURATION turkish
> ALTER MAPPING FOR word, hword, hword_part
> WITH unaccent, turkish_stem;

However then the stemmer doesn't do a good job on those words, because
the changed characters are important for the language.  What I really
needed was something like this:

> ALTER TEXT SEARCH CONFIGURATION turkish
> ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, hword_part
> WITH (fix_mistyped_characters AND (turkish_hunspell OR turkish_stem) AND unaccent);



pgsql-hackers by date:

Previous
From: Shay Rojansky
Date:
Subject: Re: PATCH: Batch/pipelining support for libpq
Next
From: Alvaro Herrera
Date:
Subject: Re: macaddr 64 bit (EUI-64) datatype support